BioFSharp


BioFSharp.BioContainers

BioFSharp.BioContainers is all about connecting BioFSharp and F# itself with the Bioinformatics community and integrating common bioinformatic workflows.There are many established tools and software suites in bioinformatics, and most of the time there is no point in rewriting them for a specific programming language. Containerized applications make it possible to use all these tools while being OS and programming language agnostic - meaning you can use containerized linux tools in your windows based pipeline, python applications in containers followed by a containerized C++ application and so on. Take a look at this graphic to get an overview of how BioFSharp.BioContainers works in principle:

BioContainers_Overview

BioFSharp.BioContainers gives you the possibility to leverage containerized applications without leaving you F# environment. We build on the fondation of Docker.DotNet to programmatically access the the REST API on top of the docker daemon (1 + 2). The daemon then does its thing, executing the commands given, for example creating containers, executing computations in a container, etc (3). We provide special functions to use with biocontainers, which is a standardized way to create containerized bioinformatic software (4). The results are then returned via the daemon and REST API back to you F# interactive (5)

The project is in its early stages and many features provided by the REST API are not fully implemented yet, so if you would like to help out, this is a great place to do so!

Prerequisites

Windows

Linux

  • Not tested yet, but If there is a way to use a named pipe as in windows, everything should work as it does on windows.

General Usage

lets say we have Docker for Windows set up and pulled an ubuntu image (docker pull ubuntu). To connect with the Rest API, first use the Docker.connect function to initialize a Docker Client.

1: 
2: 
3: 
4: 
5: 
open BioFSharp.BioContainers
///parameters for blastP

///npipe://./pipe/docker_engine is the named pipe for the docker engine under windows.
let client = Docker.connect "npipe://./pipe/docker_engine"

Some vanilla docker commands are already implemented. here is an example of listing information about all images from F# interactive (equivalent to docker images ls -a in docker cli)

1: 
2: 
3: 
4: 
5: 
6: 
client
|> Docker.Image.listImages
|> fun images ->
    printfn "Repository/Tags\tImageId\tCreated\t Size"
    images 
    |> Seq.iter (fun img -> printfn "%A\t%s\t%A\t %A" img.RepoTags img.ID img.Created img.Size)

Output:

1: 
2: 
    Repository/Tags         ImageId                                                                     Created                 Size
    seq ["ubuntu:latest"]   sha256:cf0f3ca922e08045795f67138b394c7287fbc0f4842ee39244a1a1aaca8c5e1c     10/18/2019 6:48:51 PM   64192156L

Creating containers and executing commands in them

To create a container from an existing image, initialite a BioContainer.BcContext. That way, a new container will not only be spawned, but kept running to receive your commands.

1: 
2: 
3: 
4: 
5: 
6: 
7: 
///Create a representation of your image on the F# side
let ubuntuImage = Docker.DockerId.ImageId "ubuntu:latest"

///Create a container from the image and keep it running (the container context for all future commands)
let bcContext = 
    BioContainer.initBcContextAsync client ubuntuImage
    |> Async.RunSynchronously

To run a command in the container, use either BioContainer.execAsync to run the command, only printing stdout/stderr to the F# interactive, or BioCOntainer.execReturnAsync to get stdout/stderr as a string that you can bind to a name

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
//just run the command and print the results
BioContainer.execAsync bcContext ["echo";"hello world"]
|> Async.RunSynchronously

///bind the results of the command to a value
let returnValue =
    BioContainer.execReturnAsync bcContext ["echo";"hello world"]
    |> Async.RunSynchronously

Don't forget, the container will be kept running, so dispose it if you do not need it anymore to prevent it from eating up ressources

1: 
2: 
3: 
bcContext
|> BioContainer.disposeAsync
|> Async.RunSynchronously

Using actual biocontainers

To leverage the advantages of F#, namingly its type safety, passing commands as strings is not enough. The tools in BioContainers come with extensive documentation of input commands and allowed/parsed types. We here propose a type safe modelling of the commands, with a little bit of overhead to ensure correct paths (going from windows to linux paths in the containers). We already provide API wrappers for some tools, such as BLAST, TMHMM, or intaRNA. If you want to create your own, please refer to the design guide.

Here is a short example for BLAST (indepth information about this the type modelling of this tool can be found here:

First you have to either pull the BLAST BioContainer image either via docker cli or by building them from the docker file A way to do this from F# is in the making.

The protein fasta used here can be found here

1: 
2: 
3: 
4: 
open BioFSharp.BioContainers
open Blast

///this time, we set the container up using a mount, making sure that it can access data from the file system we want to use.
 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
let ImageBlast = Docker.DockerId.ImageId "blast:2.2.31--pl526h3066fca_3"

let blastContext = 
    BioContainer.initBcContextWithMountAsync 
        client 
        ImageBlast 
        "absolute/path/to/mounted/directory"
    |> Async.RunSynchronously

///parameters for search DB creation
let makeDBParams =
    [
        MakeBlastDbParams.DbType    Protein
        MakeBlastDbParams.Input     "absolute/path/to/your/subject"
        MakeBlastDbParams.Output    "absolute/path/to/your/outputfile"
    ]

///parameters for the blastp search
let runParams =
    [
        BlastP.BlastPParameters.CommonOptions [
            Query               "absolute/path/to/your/query"
            SearchDB            "absolute/path/to/your/subject"
            Output              "absolute/path/to/your/outputfile"
        ]
        BlastP.BlastPParameters.SpecificOptions [
            BlastP.BlastPParams.WordSize    11
        ]
    ]

finally, we can run the searchdb generation and the subsequent search in the container like this:

1: 
2: 
3: 
4: 
5: 
makeDBParams
|> Blast.runMakeBlastDB blastContext

runParams
|> Blast.BlastP.runBlastP blastContext

Don't forget, the container will be kept running, so dispose it if you do not need it anymore to prevent it from eating up ressources

1: 
2: 
3: 
4: 
blastContext
|> BioContainer.disposeAsync
|> Async.RunSynchronously
    
namespace BioFSharp
namespace BioFSharp.BioContainers
val client : Docker.DotNet.DockerClient


npipe://./pipe/docker_engine is the named pipe for the docker engine under windows.
Multiple items
module Docker

from BioFSharp.BioContainers

--------------------
namespace Docker
val connect : str:string -> Docker.DotNet.DockerClient
module Image

from BioFSharp.BioContainers.Docker
val listImages : connection:Docker.DotNet.DockerClient -> seq<Docker.DotNet.Models.ImagesListResponse>
val images : seq<Docker.DotNet.Models.ImagesListResponse>
val printfn : format:Printf.TextWriterFormat<'T> -> 'T
module Seq

from Microsoft.FSharp.Collections
val iter : action:('T -> unit) -> source:seq<'T> -> unit
val img : Docker.DotNet.Models.ImagesListResponse
property Docker.DotNet.Models.ImagesListResponse.RepoTags: System.Collections.Generic.IList<string> with get, set
property Docker.DotNet.Models.ImagesListResponse.ID: string with get, set
property Docker.DotNet.Models.ImagesListResponse.Created: System.DateTime with get, set
property Docker.DotNet.Models.ImagesListResponse.Size: int64 with get, set
Multiple items
val seq : sequence:seq<'T> -> seq<'T>

--------------------
type seq<'T> = System.Collections.Generic.IEnumerable<'T>
val ubuntuImage : Docker.DockerId


Create a representation of your image on the F# side
type DockerId =
  | ImageId of string
  | ImageName of string
  | ContainerId of string
  | ContainerName of string
  | Tag of string * string
    override ToString : unit -> string
union case Docker.DockerId.ImageId: string -> Docker.DockerId
val bcContext : BioContainer.BcContext


Create a container from the image and keep it running (the container context for all future commands)
module BioContainer

from BioFSharp.BioContainers
val initBcContextAsync : connection:Docker.DotNet.DockerClient -> image:Docker.DockerId -> Async<BioContainer.BcContext>
Multiple items
type Async =
  static member AsBeginEnd : computation:('Arg -> Async<'T>) -> ('Arg * AsyncCallback * obj -> IAsyncResult) * (IAsyncResult -> 'T) * (IAsyncResult -> unit)
  static member AwaitEvent : event:IEvent<'Del,'T> * ?cancelAction:(unit -> unit) -> Async<'T> (requires delegate and 'Del :> Delegate)
  static member AwaitIAsyncResult : iar:IAsyncResult * ?millisecondsTimeout:int -> Async<bool>
  static member AwaitTask : task:Task -> Async<unit>
  static member AwaitTask : task:Task<'T> -> Async<'T>
  static member AwaitWaitHandle : waitHandle:WaitHandle * ?millisecondsTimeout:int -> Async<bool>
  static member CancelDefaultToken : unit -> unit
  static member Catch : computation:Async<'T> -> Async<Choice<'T,exn>>
  static member Choice : computations:seq<Async<'T option>> -> Async<'T option>
  static member FromBeginEnd : beginAction:(AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
  ...

--------------------
type Async<'T> =
static member Async.RunSynchronously : computation:Async<'T> * ?timeout:int * ?cancellationToken:System.Threading.CancellationToken -> 'T
val execAsync : bc:BioContainer.BcContext -> cmd:seq<string> -> Async<unit>
val returnValue : string


bind the results of the command to a value
val execReturnAsync : bc:BioContainer.BcContext -> cmd:seq<string> -> Async<string>
val disposeAsync : bc:BioContainer.BcContext -> Async<unit>
module Blast

from BioFSharp.BioContainers
val ImageBlast : Docker.DockerId


this time, we set the container up using a mount, making sure that it can access data from the file system we want to use.
val blastContext : BioContainer.BcContext
val initBcContextWithMountAsync : connection:Docker.DotNet.DockerClient -> image:Docker.DockerId -> hostdirectory:string -> Async<BioContainer.BcContext>
val makeDBParams : MakeBlastDbParams list


parameters for search DB creation
type MakeBlastDbParams =
  | Input of string
  | InputType of MakeBlastDBInputType
  | DbType of DbType
  | Title of string
  | ParseSeqIds
  | HashIndex
  | MaskData of string list
  | Output of string
  | MaxFileSize of string
  | TaxId of int
  ...
    static member makeCmd : (MakeBlastDbParams -> string list)
    static member makeCmdWith : m:MountInfo -> (MakeBlastDbParams -> string list)
union case MakeBlastDbParams.DbType: DbType -> MakeBlastDbParams
union case DbType.Protein: DbType
union case MakeBlastDbParams.Input: string -> MakeBlastDbParams
union case MakeBlastDbParams.Output: string -> MakeBlastDbParams
val runParams : BlastP.BlastPParameters list


parameters for the blastp search
module BlastP

from BioFSharp.BioContainers.Blast
type BlastPParameters =
  | CommonOptions of BlastParams list
  | SpecificOptions of BlastPParams list
    static member makeCmd : (BlastPParameters -> string list)
    static member makeCmdWith : m:MountInfo -> (BlastPParameters -> string list)
union case BlastP.BlastPParameters.CommonOptions: BlastParams list -> BlastP.BlastPParameters
union case BlastParams.Query: string -> BlastParams
union case BlastParams.SearchDB: string -> BlastParams
union case BlastParams.Output: string -> BlastParams
union case BlastP.BlastPParameters.SpecificOptions: BlastP.BlastPParams list -> BlastP.BlastPParameters
type BlastPParams =
  | WordSize of int
  | GapOpen of int
  | GapExtend of int
  | Matrix of string
  | Threshold of int
  | CompBasedStats of string
  | SoftMasking of bool
  | WindowSize of int
  | Seg of string
  | LowerCaseMasking
  ...
    static member makeCmd : (BlastPParams -> string list)
    static member makeCmdWith : m:MountInfo -> (BlastPParams -> string list)
union case BlastP.BlastPParams.WordSize: int -> BlastP.BlastPParams
val runMakeBlastDB : bcContext:BioContainer.BcContext -> opt:MakeBlastDbParams list -> unit
val runBlastP : bcContext:BioContainer.BcContext -> opt:BlastP.BlastPParameters list -> unit
Fork me on GitHub