BioFSharp


Introduction

After you have downloaded and set up BioFSharp (as described here), it is time to see it in action. Since we are programmers, let's say 'Hello World' !

The focus of this brief tutorial lies on introducing you to the BioFSharp workflow. We will build a protein from a raw string and look at some of it's properties. Detailed information about the functions used can be either found in the specific tutorials in the sidebar or the API reference.

First, we reference and open BioFSharp.

1: 
2: 
3: 
4: 
//Your path may differ
#r"BioFSharp.dll"

open BioFSharp

The Hello World Protein

Time to say 'Hello World' the bioinformatician way. This is our raw string:

1: 
2: 
///string form of our hello world protein
let rawString = "HELLOWORLD!"

BioFSharp comes with various data structures for biological objects such as amino acids. As you most likely know, you can abbreviate a sequence of aminoacids with a three or oneletter code for each single aminoacid. We can convert a character to an aminoacid by using the charToOptionAminoAcid function from the BioItemsConverter library. This will return us an option type, being either an aminoacid or None if the caracter is not coding for an aminoacid.

1: 
2: 
3: 
4: 
open BioFSharp.BioItemsConverter

///valid character 
let valid = OptionConverter.charToOptionAminoAcid 'A'
Some Ala
1: 
2: 
///invalid character
let invalid = OptionConverter.charToOptionAminoAcid '?'
 None
Which results in 'A' being recognized as Some Ala and '?' as None.

To parse our entire string, we can use any of the BioCollections' ofAminoAcidString functions, which use this converter internally. For more information about BioSeq, BioList and BioArray go here

1: 
2: 
///Protein represented as a Bioseq
let parsedProtein1 = rawString |> BioSeq.ofAminoAcidString 
seq [His; Glu; Leu; Leu; ...]
1: 
2: 
///Protein represented as a BioList
let parsedProtein2 = rawString |> BioList.ofAminoAcidString 
[His; Glu; Leu; Leu; Pyl; Trp; Pyl; Arg; Leu; Asp]
1: 
2: 
///Protein represented as a BioArray
let parsedProtein3 = rawString |> BioArray.ofAminoAcidString 
[|His; Glu; Leu; Leu; Pyl; Trp; Pyl; Arg; Leu; Asp|]

This yields us our Hello World protein. note that the '!' from the raw string is not contained in the sequence as it is not coding for an aminoacid.

Pretty Printing

Especially when working with longer sequences or multiple sequences, it can be beneficial to overwrite the default printing of biological datastructures. BioFSharp comes with a pretty printer for all BioCollections:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
let largerSequence = 
    """MASSMLSSATMVASPAQATMVAPFNGLKSSAAFPATRKANNDITSITSNGGRVNCMQVWP
    PIGKKKFETLSYLPDLTDSELAKEVDYLIRNKWIPCVEFELEHGFVYREHGNSPGYYDGR
    YWTMWKLPLFGCTDSAQVLKEVEECKKEYPNAFIRIIGFDNTRQVQCISFIAYKPPSFT
    MASSMLSSATMVASPAQATMVAPFNGLKSSAAFPATRKANNDITSITSNGGRVNCMQVWP
    PIGKKKFETLSYLPDLTDSELAKEVDYLIRNKWIPCVEFELEHGFVYREHGNSPGYYDGR
    YWTMWKLPLFGCTDSAQVLKEVEECKKEYPNAFIRIIGFDNTRQVQCISFIAYKPPSFT
    MASSMLSSATMVASPAQATMVAPFNGLKSSAAFPATRKANNDITSITSNGGRVNCMQVWP
    PIGKKKFETLSYLPDLTDSELAKEVDYLIRNKWIPCVEFELEHGFVYREHGNSPGYYDGR
    YWTMWKLPLFGCTDSAQVLKEVEECKKEYPNAFIRIIGFDNTRQVQCISFIAYKPPSFT""" 
    |> BioArray.ofAminoAcidString
[|Met; Ala; Ser; Ser; Met; Leu; Ser; Ser; Ala; Thr; Met; Val; Ala; Ser; Pro; Ala;
  Gln; Ala; Thr; Met; Val; Ala; Pro; Phe; Asn; Gly; Leu; Lys; Ser; Ser; Ala; Ala;
  Phe; Pro; Ala; Thr; Arg; Lys; Ala; Asn; Asn; Asp; Ile; Thr; Ser; Ile; Thr; Ser;
  Asn; Gly; Gly; Arg; Val; Asn; Cys; Met; Gln; Val; Trp; Pro; Pro; Ile; Gly; Lys;
  Lys; Lys; Phe; Glu; Thr; Leu; Ser; Tyr; Leu; Pro; Asp; Leu; Thr; Asp; Ser; Glu;
  Leu; Ala; Lys; Glu; Val; Asp; Tyr; Leu; Ile; Arg; Asn; Lys; Trp; Ile; Pro; Cys;
  Val; Glu; Phe; Glu; ...|]

As you can see here, using the standard printing you are only able to see the first 100 amino acids in this sequence. Now lets take a look on the output when we use the pretty printer:

1: 
2: 
open BioFSharp.IO.FSIPrinters
fsi.AddPrinter(prettyPrintBioCollection)
|val it : BioArray.BioArray =
|  
|         1  MASSMLSSAT MVASPAQATM VAPFNGLKSS AAFPATRKAN NDITSITSNG GRVNCMQVWP
|        61  PIGKKKFETL SYLPDLTDSE LAKEVDYLIR NKWIPCVEFE LEHGFVYREH GNSPGYYDGR
|       121  YWTMWKLPLF GCTDSAQVLK EVEECKKEYP NAFIRIIGFD NTRQVQCISF IAYKPPSFTM
|       181  ASSMLSSATM VASPAQATMV APFNGLKSSA AFPATRKANN DITSITSNGG RVNCMQVWPP
|       241  IGKKKFETLS YLPDLTDSEL AKEVDYLIRN KWIPCVEFEL EHGFVYREHG NSPGYYDGRY
|       301  WTMWKLPLFG CTDSAQVLKE VEECKKEYPN AFIRIIGFDN TRQVQCISFI AYKPPSFTMA
|       361  SSMLSSATMV ASPAQATMVA PFNGLKSSAA FPATRKANND ITSITSNGGR VNCMQVWPPI
|       421  GKKKFETLSY LPDLTDSELA KEVDYLIRNK WIPCVEFELE HGFVYREHGN SPGYYDGRYW
|       481  TMWKLPLFGC TDSAQVLKEV EECKKEYPNA FIRIIGFDNT RQVQCISFIA YKPPSFT
namespace BioFSharp
val rawString : string


string form of our hello world protein
module BioItemsConverter

from BioFSharp
val valid : AminoAcids.AminoAcid option


valid character
module OptionConverter

from BioFSharp.BioItemsConverter
val charToOptionAminoAcid : aac:char -> AminoAcids.AminoAcid option
val invalid : AminoAcids.AminoAcid option


invalid character
val parsedProtein1 : BioSeq.BioSeq<AminoAcids.AminoAcid>


Protein represented as a Bioseq
module BioSeq

from BioFSharp
val ofAminoAcidString : s:#seq<char> -> BioSeq.BioSeq<AminoAcids.AminoAcid>
val parsedProtein2 : BioList.BioList<AminoAcids.AminoAcid>


Protein represented as a BioList
module BioList

from BioFSharp
val ofAminoAcidString : s:#seq<char> -> BioList.BioList<AminoAcids.AminoAcid>
val parsedProtein3 : BioArray.BioArray<AminoAcids.AminoAcid>


Protein represented as a BioArray
module BioArray

from BioFSharp
val ofAminoAcidString : s:#seq<char> -> BioArray.BioArray<AminoAcids.AminoAcid>
val largerSequence : BioArray.BioArray<AminoAcids.AminoAcid>
namespace BioFSharp.IO
module FSIPrinters

from BioFSharp.IO
val fsi : FSharp.Compiler.Interactive.InteractiveSession
member FSharp.Compiler.Interactive.InteractiveSession.AddPrinter : ('T -> string) -> unit
val prettyPrintBioCollection : sequence:seq<#IBioItem> -> string
Fork me on GitHub