BioFSharp


BioFSharp

BioFSharp aims to be a user-friendly functional library for bioinformatics written in F#. It contains the basic data structures for common biological objects like amino acids and nucleotides based on chemical formulas and chemical elements.

BioFSharp facilitates working with sequences in a strongly typed way and is designed to work well with F# Interactive. It provides a variety of parsers for many biological file formats and a variety of algorithms suited for bioinformatic workflows.

The core datamodel implements in ascending hierarchical order:

  • Chemical elements and formulas which are a collection of elements
  • Amino Acids, Nucleotides and Modifications, which all implement the common IBioItem interface
  • BioCollections (BioItem,BioList,BioSeq) as representation of biological sequences

Data model



Installation

From Nuget.org:

You can get the stable versions of all BioFSharp packages from nuget:

Install-Package BioFSharp
paket add BioFSharp

All associated packages can be found here

Prerelease packages from the nuget branch:

Unstable/Experimental packages only.

If you are using paket, add the following line to you paket.dependencies file:

git https://github.com/CSBiology/BioFSharp.git nuget Packages: /

you can then access the individual packages:

nuget BioFSharp

nuget BioFSharp.BioContainers

nuget BioFSharp.IO

nuget BioFSharp.Stats

nuget BioFSharp.ML

nuget BioFSharp.BioDB

nuget BioFSharp.Vis

Build the binaries yourself:

Windows:

  • Install .Net Core SDK 3.0 +
  • go to the project folder
  • .\build.cmd

Linux(using Mono):

  • BioDB is excluded from this build.
  • Install .Net Core SDK

  • go to the project folder
  • ./build.sh -t monoBuildChainLocal

Linux(Dotnet Core only):

  • this does only build projects targeting netstandard2.0 (Core, BioContainers, IO, Stats, ML)
  • Install .Net Core SDK

  • go to the project folder
  • ./build.sh -t dotnetBuildChainLocal


Example

The following example shows how easy it is to start working with sequences:

1: 
2: 
// Create a peptide sequence 
let peptideSequence = "PEPTIDE" |> BioSeq.ofAminoAcidString
seq [Pro; Glu; Pro; Thr; ...]
1: 
2: 
// Create a nucleotide sequence 
let nucleotideSequence = "ATGC" |> BioSeq.ofNucleotideString
seq [A; T; G; C]

BioFSharp comes equipped with a broad range of features and functions to map amino acids and nucleotides.

1: 
2: 
// Returns the corresponding nucleotide of the complementary strand
let antiG = Nucleotides.G |> Nucleotides.complement
C
1: 
2: 
// Returns the monoicsotopic mass of Arginine (minus H2O)
let arginineMass = AminoAcids.Arg |> AminoAcids.monoisoMass
156.101111

The various file readers in BioFSharp help to easyly retrieve information and write biology-associated file formats like for example FastA:

1: 
2: 
3: 
4: 
5: 
6: 
open BioFSharp.IO

let filepathFastaA = (__SOURCE_DIRECTORY__ + "/data/Chlamy_Cp.fastA")
//reads from file to an array of FastaItems.
let fastaItems = 
    FastA.fromFile BioArray.ofAminoAcidString filepathFastaA

This will return a sequence of FastaItems, where you can directly start working with the individual sequences represented as a BioArray of amino acids.

1: 
let firstItem = fastaItems |> Seq.item 0
{ Header = "sp|P19528| cytochrome b6/f complex subunit 4 GN=petD PE=petD.p01"
  Sequence =
            [|Met; Ser; Val; Thr; Lys; Lys; Pro; Asp; Leu; Ser; Asp; Pro; Val;
              Leu; Lys; Ala; Lys; Leu; Ala; Lys; Gly; Met; Gly; His; Asn; Thr;
              Tyr; Gly; Glu; Pro; Ala; Trp; Pro; Asn; Asp; Leu; Leu; Tyr; Met;
              Phe; Pro; Val; Val; Ile; Leu; Gly; Thr; Phe; Ala; Cys; Val; Ile;
              Gly; Leu; Ser; Val; Leu; Asp; Pro; Ala; Ala; Met; Gly; Glu; Pro;
              Ala; Asn; Pro; Phe; Ala; Thr; Pro; Leu; Glu; Ile; Leu; Pro; Glu;
              Trp; Tyr; Phe; Tyr; Pro; Val; Phe; Gln; Ile; Leu; Arg; Val; Val;
              Pro; Asn; Lys; Leu; Leu; Gly; Val; Leu; Leu; ...|] }

For more detailed examples continue to explore the BioFSharp documentation. In the near future we will start to provide a cookbook like tutorial in the CSBlog.

Contributing and copyright

The project is hosted on GitHub where you can report issues, fork the project and submit pull requests. If you're adding a new public API, please also consider adding samples that can be turned into a documentation. You might also want to read the library design notes to understand how it works.

The library is available under the OSI-approved MIT license. For more information see the License file in the GitHub repository.

namespace BioFSharp
namespace FSharpAux
namespace FSharpAux.IO
val peptideSequence : BioSeq.BioSeq<AminoAcids.AminoAcid>
module BioSeq

from BioFSharp
val ofAminoAcidString : s:#seq<char> -> BioSeq.BioSeq<AminoAcids.AminoAcid>
val nucleotideSequence : BioSeq.BioSeq<Nucleotides.Nucleotide>
val ofNucleotideString : s:#seq<char> -> BioSeq.BioSeq<Nucleotides.Nucleotide>
val antiG : Nucleotides.Nucleotide
module Nucleotides

from BioFSharp
union case Nucleotides.Nucleotide.G: Nucleotides.Nucleotide
val complement : nuc:Nucleotides.Nucleotide -> Nucleotides.Nucleotide
val arginineMass : float
module AminoAcids

from BioFSharp
union case AminoAcids.AminoAcid.Arg: AminoAcids.AminoAcid
val monoisoMass : aa:AminoAcids.AminoAcid -> float
namespace BioFSharp.IO
val filepathFastaA : string
val fastaItems : seq<FastA.FastaItem<BioArray.BioArray<AminoAcids.AminoAcid>>>
module FastA

from BioFSharp.IO
val fromFile : converter:(seq<char> -> 'a) -> filePath:string -> seq<FastA.FastaItem<'a>>
module BioArray

from BioFSharp
val ofAminoAcidString : s:#seq<char> -> BioArray.BioArray<AminoAcids.AminoAcid>
val firstItem : FastA.FastaItem<BioArray.BioArray<AminoAcids.AminoAcid>>
Multiple items
module Seq

from FSharpAux

--------------------
module Seq

from Microsoft.FSharp.Collections

--------------------
type Seq =
  static member CSV : separator:string -> header:bool -> flatten:bool -> data:seq<'a> -> seq<string>
  static member CSVwith : valFunc:('a -> ('a -> obj) []) -> strFunc:(string -> bool -> obj -> obj -> string) -> separator:string -> header:bool -> flatten:bool -> data:seq<'a> -> seq<string>
  static member fromFile : filePath:string -> seq<string>
  static member fromFileWithCsvSchema : filePath:string * separator:char * firstLineHasHeader:bool * ?skipLines:int * ?skipLinesBeforeHeader:int * ?schemaMode:SchemaModes -> seq<'schema>
  static member fromFileWithSep : separator:char -> filePath:string -> seq<string []>
  static member stringFunction : separator:string -> flatten:bool -> input:'a -> (obj -> string)
  static member valueFunction : dataEntry:'a -> ('a -> obj) []
  static member write : path:string -> data:seq<'a> -> unit
  static member writeOrAppend : path:string -> data:seq<'a> -> unit
val item : index:int -> source:seq<'T> -> 'T
Fork me on GitHub