BioFSharp
BioFSharp aims to be a user-friendly functional library for bioinformatics written in F#. It contains the basic data structures for common biological objects like amino acids and nucleotides based on chemical formulas and chemical elements.
BioFSharp facilitates working with sequences in a strongly typed way and is designed to work well with F# Interactive. It provides a variety of parsers for many biological file formats and a variety of algorithms suited for bioinformatic workflows.
The core datamodel implements in ascending hierarchical order:
- Chemical elements and formulas which are a collection of elements
- Amino Acids, Nucleotides and Modifications, which all implement the common IBioItem interface
- BioCollections (BioItem,BioList,BioSeq) as representation of biological sequences
Installation
For applications and libraries
You can find all available package versions on nuget.
-
dotnet CLI
dotnet add package BioFSharp
-
paket CLI
paket add BioFSharp
-
package manager
Install-Package BioFSharp -Version 2.0.0
Or add the package reference directly to your
.*proj
file:<PackageReference Include="BioFSharp" Version="2.0.0" />
For scripting and interactive notebooks
You can include the package via an inline package reference:
#r "nuget: BioFSharp"
Example
The following example shows how easy it is to start working with sequences:
Create a peptide sequence:
open BioFSharp
"PEPTIDE" |> BioArray.ofAminoAcidString
1 PEPTIDE
Create a nucleotide sequence:
"ATGC" |> BioArray.ofNucleotideString
1 ATGC
BioFSharp comes equipped with a broad range of features and functions to map amino acids and nucleotides.
// Returns the corresponding nucleotide of the complementary strand
Nucleotides.G |> Nucleotides.complement
C
// Returns the monoisotopic mass of Arginine (minus H2O)
AminoAcids.Arg |> AminoAcids.monoisoMass
156.10111102304
The various file readers in BioFSharp help to easily retrieve information and write biology-associated file formats like for example FastA:
open BioFSharp.IO
let filepathFastaA = (__SOURCE_DIRECTORY__ + "/data/Chlamy_Cp.fastA")
//reads from file to an array of FastaItems.
let fastaItems = FastA.fromFile BioArray.ofAminoAcidString filepathFastaA
This will return a sequence of FastaItem
s, where you can directly start working with the individual sequences represented as a BioArray
of amino acids.
fastaItems |> Seq.item 0
{ Header = "sp|P19528| cytochrome b6/f complex subunit 4 GN=petD PE=petD.p01"\n Sequence =\n [|Met; Ser; Val; Thr; Lys; Lys; Pro; Asp; Leu; Ser; Asp; Pro; Val; Leu; Lys;\n Ala; Lys; Leu; Ala; Lys; Gly; Met; Gly; His; Asn; Thr; Tyr; Gly; Glu; Pro;\n Ala; Trp; Pro; Asn; Asp; Leu; Leu; Tyr; ...
Header sp|P19528| cytochrome b6/f complex subunit 4 GN=petD PE=petD.p01
Sequence
1 MSVTKKPDLS DPVLKAKLAK GMGHNTYGEP AWPNDLLYMF PVVILGTFAC VIGLSVLDPA
61 AMGEPANPFA TPLEILPEWY FYPVFQILRV VPNKLLGVLL MAAVPAGLIT VPFIESINKF
121 QNPYRRPIAT ILFLLGTLVA VWLGIGSTFP IDISLTLGLF *
For more detailed examples continue to explore the BioFSharp documentation. In the near future we will start to provide a cookbook like tutorial in the CSBlog.
Contributing and copyright
The project is hosted on GitHub where you can report issues, fork the project and submit pull requests. If you're adding a new public API, please also consider adding samples that can be turned into a documentation.
The library is available under the OSI-approved MIT license. For more information see the License file in the GitHub repository.
module BioArray from BioFSharp.BioCollectionsExtensions
--------------------
module BioArray from BioFSharp
<summary> This module contains the BioArray type and its according functions. The BioArray type is an array of objects using the IBioItem interface </summary>
<summary> Generates amino acid sequence of one-letter-code raw string </summary>
<summary> Generates nucleotide sequence of one-letter-code raw string </summary>
<summary> Contains the Nucleotide type and its according functions. </summary>
<summary> G : Guanine </summary>
<summary> Returns the Nucleotide from the complementary strand </summary>
<summary> Contains the AminoAcid type and its according functions. The AminoAcid type is a complex presentation of amino acids, allowing modifications </summary>
<summary> 'R' - Arg - Arginine Functionally similar to lysine. </summary>
<summary> Returns the monoisotopic mass of AminoAcid (without H20) </summary>
<summary> Reads FastaItem from file. Converter determines type of sequence by converting seq<char> -> type </summary>