BioFSharp


BioFSharp

BioFSharp aims to be a user-friendly functional library for bioinformatics written in F#. It contains the basic data structures for common biological objects like amino acids and nucleotides based on chemical formulas and chemical elements.

BioFSharp facilitates working with sequences in a strongly typed way and is designed to work well with F# Interactive. It provides a variety of parsers for many biological file formats and a variety of algorithms suited for bioinformatic workflows.

The core datamodel implements in ascending hierarchical order:

  • Chemical elements and formulas which are a collection of elements
  • Amino Acids], Nucleotides and Modifications, which all implement the common IBioItem interface
  • BioCollections (BioItem,BioList,BioSeq) as representation of biological sequences

Data model



Installation

BioFSharp is currently on the way to its 1.0.0 release. When this process is done, we will provide a nuget package at nuget.org. However, currently the way to get BioFSharp running on your machine is to either clone the repository and build the binaries yourself or download the prerelease packages from our nuget branch.

Using prerelease packages from the nuget branch:

If you are using paket, add the following line to you paket.dependencies file:

git https://github.com/CSBiology/BioFSharp.git nuget Packages: /

you can then access the individual packages:

nuget BioFSharp

nuget BioFSharp.IO

nuget BioFSharp.Stats

nuget BioFSharp.BioDB

nuget BioFSharp.Vis

To build the binaries yourself:

Windows:

  • Install .Net Core SDK
  • Install the dotnet tool fake cli by dotnet tool install fake-cli -g for global installation or dotnet tool install fake-cli --tool-path yourtoolpath
  • go to the project folder
  • use the console command fake build

Linux(Ubuntu, using Mono):

  • Install .Net Core SDK
  • go to the project folder
  • use the console command dotnet fake build --target Linux


Example

The following example shows how easy it is to start working with sequences:

1: 
2: 
// Create a peptide sequence 
let peptideSequence = "PEPTIDE" |> BioSeq.ofAminoAcidString
seq [Pro; Glu; Pro; Thr; ...]
1: 
2: 
// Create a nucleotide sequence 
let nucleotideSequence = "ATGC" |> BioSeq.ofNucleotideString
seq [A; T; G; C]

BioFSharp comes equipped with a broad range of features and functions to map amino acids and nucleotides.

1: 
2: 
// Returns the corresponding nucleotide of the anti-parallel strand
let antiG = Nucleotides.G |> Nucleotides.antiparallel
A
1: 
2: 
// Returns the monoicsotopic mass of Arginine (minus H2O)
let arginineMass = AminoAcids.Arg |> AminoAcids.monoisoMass
156.101111

The various file readers in BioFSharp help to easyly retrieve information and write biology-associated file formats like for example FastA:

1: 
2: 
3: 
4: 
5: 
6: 
open BioFSharp.IO

let filepathFastaA = (__SOURCE_DIRECTORY__ + "/data/Chlamy_Cp.fastA")
//reads from file to an array of FastaItems.
let fastaItems = 
    FastA.fromFile BioArray.ofAminoAcidString filepathFastaA

This will return a sequence of FastaItems, where you can directly start working with the individual sequences represented as a BioArray of amino acids.

1: 
let firstItem = fastaItems |> Seq.item 0
{Header = "sp|P19528| cytochrome b6/f complex subunit 4 GN=petD PE=petD.p01";
 Sequence =
  [|Met; Ser; Val; Thr; Lys; Lys; Pro; Asp; Leu; Ser; Asp; Pro; Val; Leu; Lys;
    Ala; Lys; Leu; Ala; Lys; Gly; Met; Gly; His; Asn; Thr; Tyr; Gly; Glu; Pro;
    Ala; Trp; Pro; Asn; Asp; Leu; Leu; Tyr; Met; Phe; Pro; Val; Val; Ile; Leu;
    Gly; Thr; Phe; Ala; Cys; Val; Ile; Gly; Leu; Ser; Val; Leu; Asp; Pro; Ala;
    Ala; Met; Gly; Glu; Pro; Ala; Asn; Pro; Phe; Ala; Thr; Pro; Leu; Glu; Ile;
    Leu; Pro; Glu; Trp; Tyr; Phe; Tyr; Pro; Val; Phe; Gln; Ile; Leu; Arg; Val;
    Val; Pro; Asn; Lys; Leu; Leu; Gly; Val; Leu; Leu; ...|];}

For more detailed examples continue to explore the BioFSharp documentation. In the near future we will start to provide a cookbook like tutorial in the CSBlog.

Contributing and copyright

The project is hosted on GitHub where you can report issues, fork the project and submit pull requests. If you're adding a new public API, please also consider adding samples that can be turned into a documentation. You might also want to read the library design notes to understand how it works.

The library is available under Public Domain license, which allows modification and redistribution for both commercial and non-commercial purposes. For more information see the License file in the GitHub repository.

namespace BioFSharp
namespace FSharpAux
namespace FSharpAux.IO
val peptideSequence : BioSeq.BioSeq<AminoAcids.AminoAcid>
module BioSeq

from BioFSharp
val ofAminoAcidString : s:#seq<char> -> BioSeq.BioSeq<AminoAcids.AminoAcid>
val nucleotideSequence : BioSeq.BioSeq<Nucleotides.Nucleotide>
val ofNucleotideString : s:#seq<char> -> BioSeq.BioSeq<Nucleotides.Nucleotide>
val antiG : Nucleotides.Nucleotide
module Nucleotides

from BioFSharp
union case Nucleotides.Nucleotide.G: Nucleotides.Nucleotide
val antiparallel : nuc:Nucleotides.Nucleotide -> Nucleotides.Nucleotide
val arginineMass : float
module AminoAcids

from BioFSharp
union case AminoAcids.AminoAcid.Arg: AminoAcids.AminoAcid
val monoisoMass : aa:AminoAcids.AminoAcid -> float
namespace BioFSharp.IO
val filepathFastaA : string
val fastaItems : seq<FastA.FastaItem<BioArray.BioArray<AminoAcids.AminoAcid>>>
module FastA

from BioFSharp.IO
val fromFile : converter:(seq<char> -> 'a) -> filePath:string -> seq<FastA.FastaItem<'a>>
module BioArray

from BioFSharp
val ofAminoAcidString : s:#seq<char> -> BioArray.BioArray<AminoAcids.AminoAcid>
val firstItem : FastA.FastaItem<BioArray.BioArray<AminoAcids.AminoAcid>>
Multiple items
module Seq

from FSharpAux

--------------------
module Seq

from Microsoft.FSharp.Collections

--------------------
type Seq =
  static member fromFile : filePath:string -> seq<string>
  static member fromFileWithCsvSchema : filePath:string * separator:char * firstLineHasHeader:bool * ?skipLines:int * ?skipLinesBeforeHeader:int * ?schemaMode:SchemaModes -> seq<'schema>
  static member fromFileWithSep : separator:char -> filePath:string -> seq<string []>
  static member toCSV : separator:string -> header:bool -> data:seq<'a> -> seq<string>
  static member write : path:string -> data:seq<'a> -> unit
  static member writeOrAppend : path:string -> data:seq<'a> -> unit
val item : index:int -> source:seq<'T> -> 'T
Fork me on GitHub