BioFSharp


TargetP BioContainer

TargetP 1.1 is a widely used tool to predict subcellular localization of proteins by predicting N-terminal presequences. We can leverage the power of targetP from F# by using it in a docker container. To get academical access to the targetP software, please contact the friendly people at DTU.

The image

After aquiring the software you can create a dockerfile that abides biocontainer conventions at the packages root and run

docker build . -t nameOfYourContainer:yourTag

to get the image needed. Here is an example of a possible dockerfile:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
################## BASE IMAGE ######################

FROM biocontainers/biocontainers

################## METADATA ######################
LABEL base_image="biocontainers:v1.0.0_cv4"
LABEL version="3"
LABEL software="TargetP"
LABEL software.version="1.1.0"
LABEL about.summary="TargetP 1.1 predicts the subcellular location of eukaryotic proteins"
LABEL about.home="http://www.cbs.dtu.dk/services/TargetP/"
LABEL about.documentation="http://www.cbs.dtu.dk/services/TargetP/instructions.php"
LABEL about.license_file="http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?targetp"
LABEL extra.identifiers.biotools="TargetP"
LABEL about.tags="Sequence analysis"

#################### INSTALL ########################

ENV PATH="/usr/local/bin:${PATH}"
# TargetP perl script
ADD ./targetp_BioFSharp /usr/local/bin/targetp
# TargetP install
ADD ./targetp-1.1b.Linux.tar /opt/
# ChloroP install
ADD ./chlorop-1.1.Linux.tar /opt/
# SignalP install
ADD ./signalp-4.1f.Linux.tar.gz /opt/

Running targetp from F#

As always, we need to define the docker client endpoint, image, and container context to run:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
open BioFSharp.BioContainers

///docker daemon endpoint on windows
let client = Docker.connect "npipe://./pipe/docker_engine"

///image to create containers from
let image  = Docker.DockerId.ImageId "nameOfYourContainer:yourTag"

///The container context we will use to execute targetP
1: 
2: 
3: 
let imageContext = 
    BioContainer.initBcContextWithMountAsync client image "path/to/your/directory"
    |> Async.RunSynchronously

To analyze a file with the container, we can use the runWithMountedFile function to work on a fasta file in the mounted directory. The file can be either coming from outside or upstream analysis pipelines using BioFSharp and written to disk by FastA.write.

Note: this function is available from version 2.0.0 onwards.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
let results = 
    TargetP.runWithMountedFile
        imageContext 
        TargetP.TargetpParams.NonPlant //Note that you can run custom commands using either NonPlantCustom or PlantCustom
        @"path/to/your/file"
    |> Seq.item 0

//dont forget to get rid of the container after using it

imageContext
|> BioContainer.disposeAsync
|> Async.RunSynchronously

Here is an example result for the pathogenesis-related (homeodomain protein)[https://www.uniprot.org/uniprot/P48786.fasta]:

val it : TargetP.TargetpItem = { 
    Name = "P48786;"
    Len = 1088
    Mtp = 0.054
    Ctp = nan
    SP = 0.068
    Other = 0.943
    Loc = "_"
    RC = 1
    TPlen = "" 
}

It may not always be convenient to analyze files on the disk. To use a memory stream of a FastAItem instead, we can write it to a stream and analyze it using the runWithStream function

Note: in versions before 2.0.0, this function is named run

1: 
2: 
3: 
4: 
5: 
6: 
7: 
open System.IO
open BioFSharp.IO

let testSequence = "SEQUENCE" |> BioFSharp.BioArray.ofAminoAcidString

let testFasta = 
    FastA.createFastaItem "Header" testSequence
{ Header = "Header"
  Sequence = [|Ser; Glu; Gln; Sel; Glu; Asn; Cys; Glu|] }
1: 
2: 
3: 
4: 
5: 
6: 
7: 
let stream = new MemoryStream()

[testFasta]
|> FastA.writeToStream BioFSharp.BioItem.symbol stream

//reset position of the memory stream
stream.Position <- 0L
1: 
2: 
3: 
let streamRes = 
    TargetP.runWithStream imageContext TargetP.TargetpParams.NonPlant stream
    |> Array.ofSeq
val it : TargetP.TargetpItem = { 
    Name = "Header"
    Len = 8
    Mtp = 0.056
    Ctp = nan
    SP = 0.033
    Other = 0.975
    Loc = "_"
    RC = 1
    TPlen = "" 
}

What we are doing with it

Our workgroup uses this DSL to power our iMTS-L prediction service. For additional information about iMTS-L prediction, see the paper or take a look at our step-by-step recipe

namespace BioFSharp
namespace BioFSharp.BioContainers
val client : Docker.DotNet.DockerClient


docker daemon endpoint on windows
Multiple items
module Docker

from BioFSharp.BioContainers

--------------------
namespace Docker
val connect : str:string -> Docker.DotNet.DockerClient
val image : Docker.DockerId


image to create containers from
type DockerId =
  | ImageId of string
  | ImageName of string
  | ContainerId of string
  | ContainerName of string
  | Tag of string * string
    override ToString : unit -> string
union case Docker.DockerId.ImageId: string -> Docker.DockerId
val imageContext : BioContainer.BcContext


The container context we will use to execute targetP
module BioContainer

from BioFSharp.BioContainers
val initBcContextWithMountAsync : connection:Docker.DotNet.DockerClient -> image:Docker.DockerId -> hostdirectory:string -> Async<BioContainer.BcContext>
Multiple items
type Async =
  static member AsBeginEnd : computation:('Arg -> Async<'T>) -> ('Arg * AsyncCallback * obj -> IAsyncResult) * (IAsyncResult -> 'T) * (IAsyncResult -> unit)
  static member AwaitEvent : event:IEvent<'Del,'T> * ?cancelAction:(unit -> unit) -> Async<'T> (requires delegate and 'Del :> Delegate)
  static member AwaitIAsyncResult : iar:IAsyncResult * ?millisecondsTimeout:int -> Async<bool>
  static member AwaitTask : task:Task -> Async<unit>
  static member AwaitTask : task:Task<'T> -> Async<'T>
  static member AwaitWaitHandle : waitHandle:WaitHandle * ?millisecondsTimeout:int -> Async<bool>
  static member CancelDefaultToken : unit -> unit
  static member Catch : computation:Async<'T> -> Async<Choice<'T,exn>>
  static member Choice : computations:seq<Async<'T option>> -> Async<'T option>
  static member FromBeginEnd : beginAction:(AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
  ...

--------------------
type Async<'T> =
static member Async.RunSynchronously : computation:Async<'T> * ?timeout:int * ?cancellationToken:System.Threading.CancellationToken -> 'T
val results : TargetP.TargetpItem
module TargetP

from BioFSharp.BioContainers
val runWithMountedFile : bcContext:BioContainer.BcContext -> opt:TargetP.TargetpParams -> inputFile:string -> seq<TargetP.TargetpItem>
type TargetpParams =
  | NonPlant
  | Plant
  | NonPlantCustom of seq<TargetpCustomParams>
  | PlantCustom of seq<TargetpCustomParams>
    static member make : (TargetpParams -> string)
    static member makeCmd : (TargetpParams -> string list)
union case TargetP.TargetpParams.NonPlant: TargetP.TargetpParams
module Seq

from Microsoft.FSharp.Collections
val item : index:int -> source:seq<'T> -> 'T
val disposeAsync : bc:BioContainer.BcContext -> Async<unit>
namespace System
namespace System.IO
namespace BioFSharp.IO
val testSequence : BioFSharp.BioArray.BioArray<BioFSharp.AminoAcids.AminoAcid>
module BioArray

from BioFSharp
val ofAminoAcidString : s:#seq<char> -> BioFSharp.BioArray.BioArray<BioFSharp.AminoAcids.AminoAcid>
val testFasta : FastA.FastaItem<BioFSharp.BioArray.BioArray<BioFSharp.AminoAcids.AminoAcid>>
module FastA

from BioFSharp.IO
val createFastaItem : header:string -> sequence:'a -> FastA.FastaItem<'a>
val stream : MemoryStream
Multiple items
type MemoryStream =
  inherit Stream
  new : unit -> MemoryStream + 6 overloads
  member CanRead : bool
  member CanSeek : bool
  member CanWrite : bool
  member Capacity : int with get, set
  member CopyTo : destination:Stream * bufferSize:int -> unit
  member CopyToAsync : destination:Stream * bufferSize:int * cancellationToken:CancellationToken -> Task
  member Flush : unit -> unit
  member FlushAsync : cancellationToken:CancellationToken -> Task
  member GetBuffer : unit -> byte[]
  ...

--------------------
MemoryStream() : MemoryStream
MemoryStream(capacity: int) : MemoryStream
MemoryStream(buffer: byte []) : MemoryStream
MemoryStream(buffer: byte [], writable: bool) : MemoryStream
MemoryStream(buffer: byte [], index: int, count: int) : MemoryStream
MemoryStream(buffer: byte [], index: int, count: int, writable: bool) : MemoryStream
MemoryStream(buffer: byte [], index: int, count: int, writable: bool, publiclyVisible: bool) : MemoryStream
val writeToStream : toString:('T -> char) -> stream:Stream -> data:seq<FastA.FastaItem<#seq<'T>>> -> unit
module BioItem

from BioFSharp
val symbol : bItem:#BioFSharp.IBioItem -> char
val streamRes : TargetP.TargetpItem []
val runWithStream : bcContext:BioContainer.BcContext -> opt:TargetP.TargetpParams -> fsaStream:Stream -> seq<TargetP.TargetpItem>
module Array

from Microsoft.FSharp.Collections
val ofSeq : source:seq<'T> -> 'T []
Fork me on GitHub