BioFSharp

BioFSharp aims to be a user-friendly functional library for bioinformatics written in F#. It contains the basic data structures for common biological objects like amino acids and nucleotides based on chemical formulas and chemical elements.

BioFSharp facilitates working with sequences in a strongly typed way and is designed to work well with F# Interactive. It provides a variety of parsers for many biological file formats and a variety of algorithms suited for bioinformatic workflows.

The core datamodel implements in ascending hierarchical order:

Chemical elements and formulas which are a collection of elements
Amino Acids, Nucleotides and Modifications, which all implement the common IBioItem interface
BioCollections (BioItem,BioList,BioSeq) as representation of biological sequences

Data model

Installation

For applications and libraries

You can find all available package versions on nuget.

dotnet CLI
dotnet add package BioFSharp
paket CLI
paket add BioFSharp

package manager

Install-Package BioFSharp -Version 2.0.0

Or add the package reference directly to your .*proj file:

<PackageReference Include="BioFSharp" Version="2.0.0" />

For scripting and interactive notebooks

You can include the package via an inline package reference:

#r "nuget: BioFSharp"

Example

The following example shows how easy it is to start working with sequences:

Create a peptide sequence:

open BioFSharp

"PEPTIDE" |> BioArray.ofAminoAcidString

         1  PEPTIDE

Create a nucleotide sequence:

"ATGC" |> BioArray.ofNucleotideString

         1  ATGC

BioFSharp comes equipped with a broad range of features and functions to map amino acids and nucleotides.

// Returns the corresponding nucleotide of the complementary strand
Nucleotides.G |> Nucleotides.complement

C

// Returns the monoisotopic mass of Arginine (minus H2O)
AminoAcids.Arg |> AminoAcids.monoisoMass

156.10111102304

The various file readers in BioFSharp help to easily retrieve information and write biology-associated file formats like for example FastA:

open BioFSharp.IO

let filepathFastaA = (__SOURCE_DIRECTORY__ + "/data/Chlamy_Cp.fastA")
//reads from file to an array of FastaItems.

let fastaItems = FastA.fromFile BioArray.ofAminoAcidString filepathFastaA

This will return a sequence of FastaItems, where you can directly start working with the individual sequences represented as a BioArray of amino acids.

fastaItems |> Seq.item 0

{ Header = "sp|P19528| cytochrome b6/f complex subunit 4 GN=petD PE=petD.p01"\n  Sequence =\n   [|Met; Ser; Val; Thr; Lys; Lys; Pro; Asp; Leu; Ser; Asp; Pro; Val; Leu; Lys;\n     Ala; Lys; Leu; Ala; Lys; Gly; Met; Gly; His; Asn; Thr; Tyr; Gly; Glu; Pro;\n     Ala; Trp; Pro; Asn; Asp; Leu; Leu; Tyr; ...


Header	sp\|P19528\| cytochrome b6/f complex subunit 4 GN=petD PE=petD.p01
Sequence	1 MSVTKKPDLS DPVLKAKLAK GMGHNTYGEP AWPNDLLYMF PVVILGTFAC VIGLSVLDPA 61 AMGEPANPFA TPLEILPEWY FYPVFQILRV VPNKLLGVLL MAAVPAGLIT VPFIESINKF 121 QNPYRRPIAT ILFLLGTLVA VWLGIGSTFP IDISLTLGLF *

Header

sp|P19528| cytochrome b6/f complex subunit 4 GN=petD PE=petD.p01

Sequence

         1  MSVTKKPDLS DPVLKAKLAK GMGHNTYGEP AWPNDLLYMF PVVILGTFAC VIGLSVLDPA
        61  AMGEPANPFA TPLEILPEWY FYPVFQILRV VPNKLLGVLL MAAVPAGLIT VPFIESINKF
       121  QNPYRRPIAT ILFLLGTLVA VWLGIGSTFP IDISLTLGLF *

For more detailed examples continue to explore the BioFSharp documentation. In the near future we will start to provide a cookbook like tutorial in the CSBlog.

Contributing and copyright

The project is hosted on GitHub where you can report issues, fork the project and submit pull requests. If you're adding a new public API, please also consider adding samples that can be turned into a documentation.

The library is available under the OSI-approved MIT license. For more information see the License file in the GitHub repository.

namespace BioFSharp

Multiple items
module BioArray from BioFSharp.BioCollectionsExtensions

--------------------
module BioArray from BioFSharp
<summary> This module contains the BioArray type and its according functions. The BioArray type is an array of objects using the IBioItem interface </summary>

val ofAminoAcidString: s: #(char seq) -> BioArray.BioArray<AminoAcids.AminoAcid>
<summary> Generates amino acid sequence of one-letter-code raw string </summary>

val ofNucleotideString: s: #(char seq) -> BioArray.BioArray<Nucleotides.Nucleotide>
<summary> Generates nucleotide sequence of one-letter-code raw string </summary>

module Nucleotides from BioFSharp
<summary> Contains the Nucleotide type and its according functions. </summary>

union case Nucleotides.Nucleotide.G: Nucleotides.Nucleotide
<summary> G : Guanine </summary>

val complement: nuc: Nucleotides.Nucleotide -> Nucleotides.Nucleotide
<summary> Returns the Nucleotide from the complementary strand </summary>

module AminoAcids from BioFSharp
<summary> Contains the AminoAcid type and its according functions. The AminoAcid type is a complex presentation of amino acids, allowing modifications </summary>

union case AminoAcids.AminoAcid.Arg: AminoAcids.AminoAcid
<summary> 'R' - Arg - Arginine Functionally similar to lysine. </summary>

val monoisoMass: aa: AminoAcids.AminoAcid -> float
<summary> Returns the monoisotopic mass of AminoAcid (without H20) </summary>

namespace BioFSharp.IO

val filepathFastaA: string

val fastaItems: FastA.FastaItem<BioArray.BioArray<AminoAcids.AminoAcid>> seq

module FastA from BioFSharp.IO

val fromFile: converter: (char seq -> 'a) -> filePath: string -> FastA.FastaItem<'a> seq
<summary> Reads FastaItem from file. Converter determines type of sequence by converting seq<char> -> type </summary>

module Seq from Microsoft.FSharp.Collections

val item: index: int -> source: 'T seq -> 'T