BioCollections

Summary: This example shows how to use collections of biological items in BioFSharp

Analogous to the build-in collections BioFSharp provides BioSeq, BioList and BioArray for individual collection specific optimized operations. The easiest way to create them are the ofBioItemString -functions

open BioFSharp

let s1 = "PEPTIDE" |> BioSeq.ofAminoAcidString 
let s2 = "PEPTIDE" |> BioList.ofAminoAcidString 
let s3 = "TAGCAT"  |> BioArray.ofNucleotideString 

//Peptide represented as a Bioseq
"PEPTIDE" |> BioSeq.ofAminoAcidString

Warning: Output, it-value and value references require --eval

//Peptide represented as a BioList
"PEPTIDE"|> BioList.ofAminoAcidString

Warning: Output, it-value and value references require --eval

//Nucleotide sequence represented as a BioArray
"TAGCAT" |> BioArray.ofNucleotideString

Warning: Output, it-value and value references require --eval

Nucleotides

Nucleotides1

Figure 1: Selection of covered nucleotide operations (A) Biological principle. (B) Workflow with BioSeq. (C) Other covered functionalities.

Let's imagine you have a given gene sequence and want to find out what the according protein might look like.

let myGene = BioArray.ofNucleotideString "ATGGCTAGATCGATCGATCGGCTAACGTAA"

Warning: Output, it-value and value references require --eval

Yikes! Unfortunately we got the 5'-3' coding strand. For proper transcription we should get the complementary strand first:

let myProperGene = BioArray.complement myGene

Warning: Output, it-value and value references require --eval

Now let's transcribe and translate it:

let myTranslatedGene = 
    myProperGene
    |> BioArray.transcribeTemplateStrand
    |> BioArray.translate 0

Warning: Output, it-value and value references require --eval

Of course, if your input sequence originates from the coding strand, you can directly transcribe it to mRNA since the only difference between the coding strand and the mRNA is the replacement of 'T' by 'U' (Figure 1B)

let myTranslatedGeneFromCodingStrand = 
    myGene
    |> BioArray.transcribeCodingStrand
    |> BioArray.translate 0

Warning: Output, it-value and value references require --eval

Other Nucleotide conversion operations are also covered:

let mySmallGene = BioSeq.ofNucleotideString  "ATGTTCCGAT"

Warning: Output, it-value and value references require --eval

let smallGeneRev = BioSeq.reverse mySmallGene

Warning: Output, it-value and value references require --eval

let smallGeneComp    = BioSeq.complement mySmallGene

Warning: Output, it-value and value references require --eval

let smallGeneRevComp = BioSeq.reverseComplement mySmallGene

Warning: Output, it-value and value references require --eval

AminoAcids

Basics

Some functions which might be needed regularly are defined to work with nucleotides and amino acids:

let myPeptide = "PEPTIDE" |> BioSeq.ofAminoAcidString

Warning: Output, it-value and value references require --eval

let myPeptideFormula = BioSeq.toFormula myPeptide |> Formula.toString

Warning: Output, it-value and value references require --eval

let myPeptideMass = BioSeq.toAverageMass myPeptide

Warning: Output, it-value and value references require --eval

Digestion

BioFSharp also comes equipped with a set of tools aimed at cutting apart amino acid sequences. To demonstrate the usage, we'll throw some trypsin at the small RuBisCO subunit of Arabidopos thaliana:
In the first step, we define our input sequence and the protease we want to use.

let RBCS = 
    """MASSMLSSATMVASPAQATMVAPFNGLKSSAAFPATRKANNDITSITSNGGRVNCMQVWP
    PIGKKKFETLSYLPDLTDSELAKEVDYLIRNKWIPCVEFELEHGFVYREHGNSPGYYDGR
    YWTMWKLPLFGCTDSAQVLKEVEECKKEYPNAFIRIIGFDNTRQVQCISFIAYKPPSFT""" 
    |> BioArray.ofAminoAcidString

Warning: Output, it-value and value references require --eval

let trypsin = Digestion.Table.getProteaseBy "Trypsin"

With these two things done, digesting the protein is a piece of cake. For doing this, just use the digest function.

let digestedRBCS = Digestion.BioArray.digest trypsin 0 RBCS 

digestedRBCS
|> Seq.head

Warning: Output, it-value and value references require --eval

(*
In reality, proteases don't always completely cut the protein down. Instead, some sites stay intact and should be considered for in silico analysis. 
This can easily be done with the `concernMissCleavages` function. It takes the minimum and maximum amount of misscleavages you want to have and also the digested protein. As a result you get all possible combinations arising from this information.
*)

let digestedRBCS' = Digestion.BioArray.concernMissCleavages 0 2 digestedRBCS

digestedRBCS
|> Seq.item 1

Warning: Output, it-value and value references require --eval

namespace BioFSharp

val s1: BioSeq.BioSeq<AminoAcids.AminoAcid>

Multiple items
module BioSeq from BioFSharp.BioCollectionsExtensions

--------------------
module BioSeq from BioFSharp
<summary> This module contains the BioSeq type and its according functions. The BioSeq type is a sequence of objects using the IBioItem interface </summary>

val ofAminoAcidString: s: #(char seq) -> BioSeq.BioSeq<AminoAcids.AminoAcid>
<summary> Generates AminoAcid sequence of one-letter-code raw string </summary>

val s2: BioList.BioList<AminoAcids.AminoAcid>

Multiple items
module BioList from BioFSharp.BioCollectionsExtensions

--------------------
module BioList from BioFSharp
<summary> This module contains the BioList type and its according functions. The BioList type is a List of objects using the IBioItem interface </summary>

val ofAminoAcidString: s: #(char seq) -> BioList.BioList<AminoAcids.AminoAcid>
<summary> Generates amino acid sequence of one-letter-code raw string </summary>

val s3: BioArray.BioArray<Nucleotides.Nucleotide>

Multiple items
module BioArray from BioFSharp.BioCollectionsExtensions

--------------------
module BioArray from BioFSharp
<summary> This module contains the BioArray type and its according functions. The BioArray type is an array of objects using the IBioItem interface </summary>

val ofNucleotideString: s: #(char seq) -> BioArray.BioArray<Nucleotides.Nucleotide>
<summary> Generates nucleotide sequence of one-letter-code raw string </summary>

val myGene: BioArray.BioArray<Nucleotides.Nucleotide>

val myProperGene: BioArray.BioArray<Nucleotides.Nucleotide>

val complement: nucs: BioArray.BioArray<Nucleotides.Nucleotide> -> BioArray.BioArray<Nucleotides.Nucleotide>
<summary> Create the complement DNA or cDNA (from RNA) strand. For example, the sequence "ATGC" is converted to "TACG" </summary>

val myTranslatedGene: BioArray.BioArray<AminoAcids.AminoAcid>

val transcribeTemplateStrand: nucs: BioArray.BioArray<Nucleotides.Nucleotide> -> BioArray.BioArray<Nucleotides.Nucleotide>
<summary> Transcribe a given DNA template strand (3'-----5') </summary>

val translate: nucleotideOffset: int -> rnaSeq: BioArray.BioArray<Nucleotides.Nucleotide> -> BioArray.BioArray<AminoAcids.AminoAcid>
<summary> translates nucleotide sequence to aminoacid sequence </summary>

val myTranslatedGeneFromCodingStrand: BioArray.BioArray<AminoAcids.AminoAcid>

val transcribeCodingStrand: nucs: BioArray.BioArray<Nucleotides.Nucleotide> -> BioArray.BioArray<Nucleotides.Nucleotide>
<summary> Transcribe a given DNA coding strand (5'-----3') </summary>

val mySmallGene: BioSeq.BioSeq<Nucleotides.Nucleotide>

val ofNucleotideString: s: #(char seq) -> BioSeq.BioSeq<Nucleotides.Nucleotide>
<summary> Generates nucleotide sequence of one-letter-code raw string </summary>

val smallGeneRev: BioSeq.BioSeq<Nucleotides.Nucleotide>

val reverse: nucs: Nucleotides.Nucleotide seq -> BioSeq.BioSeq<Nucleotides.Nucleotide>
<summary> Create the reverse DNA or RNA strand. For example, the sequence "ATGC" is converted to "CGTA" </summary>

val smallGeneComp: BioSeq.BioSeq<Nucleotides.Nucleotide>

val complement: nucs: Nucleotides.Nucleotide seq -> BioSeq.BioSeq<Nucleotides.Nucleotide>
<summary> Create the complement DNA or cDNA (from RNA) strand. For example, the sequence "ATGC" is converted to "TACG" </summary>

val smallGeneRevComp: BioSeq.BioSeq<Nucleotides.Nucleotide>

val reverseComplement: nucs: Nucleotides.Nucleotide seq -> BioSeq.BioSeq<Nucleotides.Nucleotide>
<summary> Create the reverse complement strand meaning antiparallel DNA strand or the cDNA (from RNA) respectivly. For example, the sequence "ATGC" is converted to "GCAT". "Antiparallel" combines the two functions "Complement" and "Inverse". </summary>

val myPeptide: BioSeq.BioSeq<AminoAcids.AminoAcid>

val myPeptideFormula: string

val toFormula: bs: #IBioItem seq -> Formula.Formula
<summary> Returns formula </summary>

module Formula from BioFSharp
<summary> Contains functionality for working with molecules as a formula of their elements and formulas of biologically relevant molecules </summary>

val toString: f: Formula.Formula -> string
<summary> Returns Formula as string </summary>

val myPeptideMass: float

val toAverageMass: bs: #IBioItem seq -> float
<summary> Returns average mass of the given sequence </summary>

val RBCS: BioArray.BioArray<AminoAcids.AminoAcid>

val ofAminoAcidString: s: #(char seq) -> BioArray.BioArray<AminoAcids.AminoAcid>
<summary> Generates amino acid sequence of one-letter-code raw string </summary>

val trypsin: Digestion.Protease

module Digestion from BioFSharp
<summary> Contains types and functions needed to digest amino acid sequences with proteases </summary>

module Table from BioFSharp.Digestion
<summary> Contains frequently needed proteases </summary>

val getProteaseBy: name: string -> Digestion.Protease

val digestedRBCS: Digestion.DigestedPeptide<int> array

module BioArray from BioFSharp.Digestion

val digest: protease: Digestion.Protease -> proteinID: 'a -> aas: AminoAcids.AminoAcid array -> Digestion.DigestedPeptide<'a> array
<summary> Takes Proteinsequence as input and returns Array of resulting DigestedPeptides </summary>

module Seq from Microsoft.FSharp.Collections

val head: source: 'T seq -> 'T

val digestedRBCS': Digestion.DigestedPeptide<int> array

val concernMissCleavages: minMissCleavages: int -> maxMisscleavages: int -> digestedPeptides: Digestion.DigestedPeptide<'a> array -> Digestion.DigestedPeptide<'a> array (requires equality)
<summary> Returns Sequence of DigestedPeptides including those resulting of one or more Misscleavage events. </summary>

val item: index: int -> source: 'T seq -> 'T