BioCollections |
📂View BioSeq documentation |
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: |
|
Resulting BioSeq containing our peptide:
|
Resulting BioList containing our peptide:
|
Resulting BioArray containing our oligonucleotide:
|
Nucleotides
Figure 1: Selection of covered nucleotide operations (A) Bilogical principle. (B) Workflow with BioSeq
. (C) Other covered functionalities.
Let's imagine you have a given gene sequence and want to find out what the according protein might look like.
1:
|
|
|
Yikes! Unfortunately we got the 5'-3' coding strand. For proper transcription we should get the complementary strand first:
1:
|
|
|
Now let's transcribe and translate it:
1: 2: 3: 4: |
|
|
Of course, if your input sequence originates from the coding strand, you can directly transcribe it to mRNA since the only difference between the coding strand and the mRNA is the replacement of 'T' by 'U' (Figure 1B)
1: 2: 3: 4: |
|
|
Other Nucleotide conversion operations are also covered:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: |
|
AminoAcids
Basics
Some functions which might be needed regularly are defined to work with nucleotides and amino acids:
1:
|
|
|
1:
|
|
|
1:
|
|
|
Digestion
BioFSharp also comes equipped with a set of tools aimed at cutting apart amino acid sequences. To demonstrate the usage, we'll throw some trypsin
at the small RuBisCO subunit of Arabidopos thaliana:
In the first step, we define our input sequence and the protease we want to use.
1: 2: 3: 4: 5: 6: 7: |
|
With these two things done, digesting the protein is a piece of cake. For doing this, just use the digest
function.
1:
|
|
val digestedRBCS : Digestion.DigestedPeptide [] = [|{ProteinID = 0; MissCleavages = 0; MissCleavageStart = 1; MissCleavageEnd = 28; PepSequence = [Met; Ala; Ser; Ser; Met; Leu; Ser; Ser; Ala; Thr; Met; Val; Ala; Ser; Pro; Ala; Gln; Ala; Thr; Met; Val; Ala; Pro; Phe; Asn; Gly; Leu; Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 29; MissCleavageEnd = 37; PepSequence = [Ser; Ser; Ala; Ala; Phe; Pro; Ala; Thr; Arg];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 38; MissCleavageEnd = 38; PepSequence = [Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 39; MissCleavageEnd = 52; PepSequence = [Ala; Asn; Asn; Asp; Ile; Thr; Ser; Ile; Thr; Ser; Asn; Gly; Gly; Arg];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 53; MissCleavageEnd = 64; PepSequence = [Val; Asn; Cys; Met; Gln; Val; Trp; Pro; Pro; Ile; Gly; Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 65; MissCleavageEnd = 65; PepSequence = [Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 66; MissCleavageEnd = 66; PepSequence = [Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 67; MissCleavageEnd = 83; PepSequence = [Phe; Glu; Thr; Leu; Ser; Tyr; Leu; Pro; Asp; Leu; Thr; Asp; Ser; Glu; Leu; Ala; Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 84; MissCleavageEnd = 90; PepSequence = [Glu; Val; Asp; Tyr; Leu; Ile; Arg];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 91; MissCleavageEnd = 92; PepSequence = [Asn; Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 93; MissCleavageEnd = 108; PepSequence = [Trp; Ile; Pro; Cys; Val; Glu; Phe; Glu; Leu; Glu; His; Gly; Phe; Val; Tyr; Arg];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 109; MissCleavageEnd = 120; PepSequence = [Glu; His; Gly; Asn; Ser; Pro; Gly; Tyr; Tyr; Asp; Gly; Arg];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 121; MissCleavageEnd = 126; PepSequence = [Tyr; Trp; Thr; Met; Trp; Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 127; MissCleavageEnd = 140; PepSequence = [Leu; Pro; Leu; Phe; Gly; Cys; Thr; Asp; Ser; Ala; Gln; Val; Leu; Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 141; MissCleavageEnd = 146; PepSequence = [Glu; Val; Glu; Glu; Cys; Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 147; MissCleavageEnd = 147; PepSequence = [Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 148; MissCleavageEnd = 155; PepSequence = [Glu; Tyr; Pro; Asn; Ala; Phe; Ile; Arg];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 156; MissCleavageEnd = 163; PepSequence = [Ile; Ile; Gly; Phe; Asp; Asn; Thr; Arg];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 164; MissCleavageEnd = 180; PepSequence = [Gln; Val; Gln; Cys; Ile; Ser; Phe; Ile; Ala; Tyr; Lys; Pro; Pro; Ser; Phe; Thr; Gly];}|]
In reality, proteases don't always completely cut the protein down. Instead, some sites stay intact and should be considered for in silico analysis. This can easily be done with the `concernMissCleavages` function. It takes the minimum and maximum amount of misscleavages you want to have and also the digested protein. As a result you get all possible combinations arising from this information.
1:
|
|
val digestedRBCS' : Digestion.DigestedPeptide [] = [|{ProteinID = 0; MissCleavages = 0; MissCleavageStart = 164; MissCleavageEnd = 180; PepSequence = [Gln; Val; Gln; Cys; Ile; Ser; Phe; Ile; Ala; Tyr; Lys; Pro; Pro; Ser; Phe; Thr; Gly];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 156; MissCleavageEnd = 163; PepSequence = [Ile; Ile; Gly; Phe; Asp; Asn; Thr; Arg];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 148; MissCleavageEnd = 155; PepSequence = [Glu; Tyr; Pro; Asn; Ala; Phe; Ile; Arg];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 147; MissCleavageEnd = 147; PepSequence = [Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 141; MissCleavageEnd = 146; PepSequence = [Glu; Val; Glu; Glu; Cys; Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 127; MissCleavageEnd = 140; PepSequence = [Leu; Pro; Leu; Phe; Gly; Cys; Thr; Asp; Ser; Ala; Gln; Val; Leu; Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 121; MissCleavageEnd = 126; PepSequence = [Tyr; Trp; Thr; Met; Trp; Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 109; MissCleavageEnd = 120; PepSequence = [Glu; His; Gly; Asn; Ser; Pro; Gly; Tyr; Tyr; Asp; Gly; Arg];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 93; MissCleavageEnd = 108; PepSequence = [Trp; Ile; Pro; Cys; Val; Glu; Phe; Glu; Leu; Glu; His; Gly; Phe; Val; Tyr; Arg];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 91; MissCleavageEnd = 92; PepSequence = [Asn; Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 84; MissCleavageEnd = 90; PepSequence = [Glu; Val; Asp; Tyr; Leu; Ile; Arg];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 67; MissCleavageEnd = 83; PepSequence = [Phe; Glu; Thr; Leu; Ser; Tyr; Leu; Pro; Asp; Leu; Thr; Asp; Ser; Glu; Leu; Ala; Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 66; MissCleavageEnd = 66; PepSequence = [Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 65; MissCleavageEnd = 65; PepSequence = [Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 53; MissCleavageEnd = 64; PepSequence = [Val; Asn; Cys; Met; Gln; Val; Trp; Pro; Pro; Ile; Gly; Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 39; MissCleavageEnd = 52; PepSequence = [Ala; Asn; Asn; Asp; Ile; Thr; Ser; Ile; Thr; Ser; Asn; Gly; Gly; Arg];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 38; MissCleavageEnd = 38; PepSequence = [Lys];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 29; MissCleavageEnd = 37; PepSequence = [Ser; Ser; Ala; Ala; Phe; Pro; Ala; Thr; Arg];}; {ProteinID = 0; MissCleavages = 0; MissCleavageStart = 1; MissCleavageEnd = 28; PepSequence = [Met; Ala; Ser; Ser; Met; Leu; Ser; Ser; Ala; Thr; Met; Val; Ala; Ser; Pro; Ala; Gln; Ala; Thr; Met; Val; Ala; Pro; Phe; Asn; Gly; Leu; Lys];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 156; MissCleavageEnd = 180; PepSequence = [Ile; Ile; Gly; Phe; Asp; Asn; Thr; Arg; Gln; Val; Gln; Cys; Ile; Ser; Phe; Ile; Ala; Tyr; Lys; Pro; Pro; Ser; Phe; Thr; Gly];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 148; MissCleavageEnd = 163; PepSequence = [Glu; Tyr; Pro; Asn; Ala; Phe; Ile; Arg; Ile; Ile; Gly; Phe; Asp; Asn; Thr; Arg];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 147; MissCleavageEnd = 155; PepSequence = [Lys; Glu; Tyr; Pro; Asn; Ala; Phe; Ile; Arg];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 141; MissCleavageEnd = 147; PepSequence = [Glu; Val; Glu; Glu; Cys; Lys; Lys];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 127; MissCleavageEnd = 146; PepSequence = [Leu; Pro; Leu; Phe; Gly; Cys; Thr; Asp; Ser; Ala; Gln; Val; Leu; Lys; Glu; Val; Glu; Glu; Cys; Lys];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 121; MissCleavageEnd = 140; PepSequence = [Tyr; Trp; Thr; Met; Trp; Lys; Leu; Pro; Leu; Phe; Gly; Cys; Thr; Asp; Ser; Ala; Gln; Val; Leu; Lys];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 109; MissCleavageEnd = 126; PepSequence = [Glu; His; Gly; Asn; Ser; Pro; Gly; Tyr; Tyr; Asp; Gly; Arg; Tyr; Trp; Thr; Met; Trp; Lys];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 93; MissCleavageEnd = 120; PepSequence = [Trp; Ile; Pro; Cys; Val; Glu; Phe; Glu; Leu; Glu; His; Gly; Phe; Val; Tyr; Arg; Glu; His; Gly; Asn; Ser; Pro; Gly; Tyr; Tyr; Asp; Gly; Arg];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 91; MissCleavageEnd = 108; PepSequence = [Asn; Lys; Trp; Ile; Pro; Cys; Val; Glu; Phe; Glu; Leu; Glu; His; Gly; Phe; Val; Tyr; Arg];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 84; MissCleavageEnd = 92; PepSequence = [Glu; Val; Asp; Tyr; Leu; Ile; Arg; Asn; Lys];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 67; MissCleavageEnd = 90; PepSequence = [Phe; Glu; Thr; Leu; Ser; Tyr; Leu; Pro; Asp; Leu; Thr; Asp; Ser; Glu; Leu; Ala; Lys; Glu; Val; Asp; Tyr; Leu; Ile; Arg];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 66; MissCleavageEnd = 83; PepSequence = [Lys; Phe; Glu; Thr; Leu; Ser; Tyr; Leu; Pro; Asp; Leu; Thr; Asp; Ser; Glu; Leu; Ala; Lys];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 65; MissCleavageEnd = 66; PepSequence = [Lys; Lys];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 53; MissCleavageEnd = 65; PepSequence = [Val; Asn; Cys; Met; Gln; Val; Trp; Pro; Pro; Ile; Gly; Lys; Lys];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 39; MissCleavageEnd = 64; PepSequence = [Ala; Asn; Asn; Asp; Ile; Thr; Ser; Ile; Thr; Ser; Asn; Gly; Gly; Arg; Val; Asn; Cys; Met; Gln; Val; Trp; Pro; Pro; Ile; Gly; Lys];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 38; MissCleavageEnd = 52; PepSequence = [Lys; Ala; Asn; Asn; Asp; Ile; Thr; Ser; Ile; Thr; Ser; Asn; Gly; Gly; Arg];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 29; MissCleavageEnd = 38; PepSequence = [Ser; Ser; Ala; Ala; Phe; Pro; Ala; Thr; Arg; Lys];}; {ProteinID = 0; MissCleavages = 1; MissCleavageStart = 1; MissCleavageEnd = 37; PepSequence = [Met; Ala; Ser; Ser; Met; Leu; Ser; Ser; Ala; Thr; Met; Val; Ala; Ser; Pro; Ala; Gln; Ala; Thr; Met; Val; Ala; Pro; Phe; Asn; Gly; Leu; Lys; Ser; Ser; Ala; Ala; Phe; Pro; Ala; Thr; Arg];}; {ProteinID = 0; MissCleavages = 2; MissCleavageStart = 148; MissCleavageEnd = 180; PepSequence = [Glu; Tyr; Pro; Asn; Ala; Phe; Ile; Arg; Ile; Ile; Gly; Phe; Asp; Asn; Thr; Arg; Gln; Val; Gln; Cys; Ile; Ser; Phe; Ile; Ala; Tyr; Lys; Pro; Pro; Ser; Phe; Thr; Gly];}; {ProteinID = 0; MissCleavages = 2; MissCleavageStart = 147; MissCleavageEnd = 163; PepSequence = [Lys; Glu; Tyr; Pro; Asn; Ala; Phe; Ile; Arg; Ile; Ile; Gly; Phe; Asp; Asn; Thr; Arg];}; {ProteinID = 0; MissCleavages = 2; MissCleavageStart = 141; MissCleavageEnd = 155; PepSequence = [Glu; Val; Glu; Glu; Cys; Lys; Lys; Glu; Tyr; Pro; Asn; Ala; Phe; Ile; Arg];}; {ProteinID = 0; MissCleavages = 2; MissCleavageStart = 127; MissCleavageEnd = 147; PepSequence = [Leu; Pro; Leu; Phe; Gly; Cys; Thr; Asp; Ser; Ala; Gln; Val; Leu; Lys; Glu; Val; Glu; Glu; Cys; Lys; Lys];}; {ProteinID = 0; MissCleavages = 2; MissCleavageStart = 121; MissCleavageEnd = 146; PepSequence = [Tyr; Trp; Thr; Met; Trp; Lys; Leu; Pro; Leu; Phe; Gly; Cys; Thr; Asp; Ser; Ala; Gln; Val; Leu; Lys; Glu; Val; Glu; Glu; Cys; Lys];}; {ProteinID = 0; MissCleavages = 2; MissCleavageStart = 109; MissCleavageEnd = 140; PepSequence = [Glu; His; Gly; Asn; Ser; Pro; ...];}; ...|]
namespace FSharp
--------------------
namespace Microsoft.FSharp
from BioFSharp
from BioFSharp
from BioFSharp
Nucleotide sequence represented as a BioArray
from BioFSharp.IO
module Seq
from FSharpAux
--------------------
module Seq
from FSharp.Plotly
--------------------
module Seq
from Microsoft.FSharp.Collections
from BioFSharp
from BioFSharp
from BioFSharp
from BioFSharp.Digestion
from BioFSharp.Digestion