NB02b Digestion and mass calculation

Digestion and mass calculation
1. Accessing the protein sequences of Chlamydomonas reinhardtii
2. Amino acid distribution for C. reinhardtii
Calculating the molecular weight for peptides
1. In silico digestion of FASTA proteins with trypsin
2. Calculating peptide masses
3. Calculating peptide masses for charge 2
References
Questions

Digestion and mass calculation

The most widely applied method for protein digestion involves the use of enzymes. Many proteases are available for this purpose, each having their own characteristics in terms of specificity, efficiency and optimum digestion conditions. Trypsin is most widely applied in bottom-up proteomics and and has a very high degree of specificity, cleaving the peptide bonds C-terminal to the basic residues Lys and Arg, except when followed by Pro (Burkhart et al. 2012). In general, Lys and Arg are relatively abundant amino acids and are usually well distributed throughout a protein (Switzar et al. 2013). This leads to tryptic peptides with an average length of ∼14 amino acids that carry at least two positive charges, which is ideally suited for CID-MS analysis (Burkhart et al. 2012).

Using in silico analysis, we want to confirm that the general properties of trypsin digestion also apply for the proteome of Chlamydomonas reinhardtii. First, we load the proteome of Chlamydomonas in standard FASTA format. Amino acid composition of the proteome is simply counting each amino acid occurrence and can be visualized by a histogram:

#r "nuget: BioFSharp, 2.0.0-beta5"
#r "nuget: BioFSharp.IO, 2.0.0-beta5"
#r "nuget: Plotly.NET, 4.2.0"
#r "nuget: BIO-BTE-06-L-7_Aux, 0.0.10"

#if IPYNB
#r "nuget: Plotly.NET.Interactive, 4.2.0"
#endif // IPYNB

open Plotly.NET
open BioFSharp
open BIO_BTE_06_L_7_Aux.FS3_Aux
open System.IO

Accessing the protein sequences of Chlamydomonas reinhardtii

FASTA is a standardized text format, containing gene or protein sequence information. Such FASTAs can be donwloaded from UniProt for example.

To gain informations about the amino acid composition of C. reinhardtii, we need information about the proteome of Chlamydomonas, which is saved in the .fasta file we are accessing below.

// __SOURCE_DIRECTORY__ returns the directory in which the current notebook is located
let directory = __SOURCE_DIRECTORY__
let path = Path.Combine [|directory; "downloads/Chlamy_JGI5_5(Cp_Mp).fasta"|]
downloadFile path "Chlamy_JGI5_5(Cp_Mp).fasta" "bio-bte-06-l-7"
// with /../ we navigate a directory 
path

No value returned by any evaluator

Functions to read information from FASTA files exist in the BioFSharp library.

let sequences = 
    path
    |> IO.FastA.fromFile BioArray.ofAminoAcidString
    |> Seq.toArray
    
// Display the first element in the array collection
sequences |> Array.head

No value returned by any evaluator

Amino acid distribution for C. reinhardtii

To count the amino acid composition, we take the sequence of every protein and count the occurences of each amino acid:

let aminoAcidDistribution =
    sequences
    // only access Sequence from each array element.
    |> Array.collect (fun fastAItem -> fastAItem.Sequence)
    // count each occurence of all amino acids. 
    |> Array.countBy id
    
aminoAcidDistribution

let aaDistributionHis =
    aminoAcidDistribution
    |> Array.map (fun (name,count) -> string name, count)
    // sort by number of occurences
    |> Array.sortByDescending snd
    // create chart
    |> Chart.Column
    // style chart
    |> Chart.withYAxisStyle "Count"
    |> Chart.withSize (650.,600.)
    |> Chart.withTitle "Amino Acid composition of the <i>Chlamydomonas reinhardtii</i> proteome"

aaDistributionHis

No value returned by any evaluator

Calculating the molecular weight for peptides

The molecular weight M of a peptide may be estimated by calculating the equation for the molecular weight of a peptide:

$M = M_{N}+M_{C}\sum_{i=0}^{n}N_{i}M_{i}$

where N(i) are the number, and M(i) the average residue molecular weights, of the amino acids. M(N) + M(C) are added to the total in order to account for the termini: H at the N-terminus and OH at the C-terminus. (Remark: If the termini are modified, these additions are replaced by those of the modifiers.)

The distribution of all molecular weights for the peptides resulting from the previous proteome digest can be calculated and visualized using a histogram chart:

In silico digestion of FASTA proteins with trypsin

To gain information about the peptide sequences of each protein, we have to compute the digested sequence, A digest function with variable protease exists in BioFSharp.

let digestedProteins =
    // sequences is the fasta data
    sequences
    |> Array.mapi (fun i fastAItem ->
        // in silico digestion
        Digestion.BioArray.digest Digestion.Table.Trypsin i fastAItem.Sequence
        |> Digestion.BioArray.concernMissCleavages 0 1
    )
    |> Array.concat
    
digestedProteins |> Array.head

No value returned by any evaluator

Calculating peptide masses

We calculate the mass of each peptide by calculating the monoisotopic mass of each amino acid and adding the weight of an H(2)O to each peptide weight.

let chartDigestedProteins =
    digestedProteins
    |> Array.map (fun peptide ->
        // calculate mass for each peptide
        BioSeq.toMonoisotopicMassWith (BioItem.monoisoMass ModificationInfo.Table.H2O) peptide.PepSequence
        )
    |> Array.filter (fun x -> x < 3000.)
    // visualize distribution of all peptide masses < 3000 Da
    |> fun masses -> Chart.Histogram(data = masses, orientation = StyleParam.Orientation.Vertical, NBinsX = 100)
    |> Chart.withXAxisStyle (TitleText = "Mass [Da]", MinMax = (0., 3000.))
    |> Chart.withYAxisStyle "Count"

chartDigestedProteins

No value returned by any evaluator

Calculating peptide masses for charge 2

However, in mass spectrometry we are only able to detect ions. Therefore, the measurements report the mass-to-charge ratio. The abbreviation m/z (m = mass; z = charge) is used to denote the dimensionless quantity formed by dividing the molecular weight of an ion (M+nH(+)) by its charge number (n).

$M_{z}=\frac{(M+nH^{+})}{n}$

In the following, we will convert the uncharged peptide masses to the m/z ratio with charge two by applaying the Mass.toMZ function from the BioFSharp library and displax its distribution again. Note that m/z ratio with a charge of two represents the predominant charge species.

let digestedPeptideMasses =
    digestedProteins
    |> Array.map (fun peptide ->
        BioSeq.toMonoisotopicMassWith (BioItem.monoisoMass ModificationInfo.Table.H2O) peptide.PepSequence
    )

let chartDigestedPeptideMasses =
    digestedPeptideMasses
    |> Array.map (fun ucMass -> Mass.toMZ ucMass 2.)
    |> Array.filter (fun x -> x < 3000.)
    |> fun masses -> Chart.Histogram(data = masses, orientation = StyleParam.Orientation.Vertical, NBinsX=100)
    |> Chart.withXAxisStyle (TitleText = "m/z", MinMax = (0., 3000.))
    |> Chart.withYAxisStyle "Count"
    
chartDigestedPeptideMasses

No value returned by any evaluator

Questions

When trypsin is used for digestion in an MS experiment, it is often combined with another protease (e.g. Lys-C). Why can it be beneficial to combine trypsin?
A peptide with a charge of 2 has an m/z of 414. What is the m/z of the same peptide with a charge of 3? Visualize the m/z of the peptides from the FASTA with a charge of 3 like done above.
Peptides can occur at different charge states during an MS run. Do the different charge states of an peptide usually possess similar intensities?

References

Burkhart, J. M., Schumbrutzki, C., Wortelkamp, S., Sickmann, A. & Zahedi, R. P. Systematic and quantitative comparison of digest efficiency and specificity reveals the impact of trypsin quality on MS-based proteomics. Journal of proteomics 75, 1454–1462; 10.1016/j.jprot.2011.11.016 (2012).
Switzar, L., Giera, M. & Niessen, W. M. A. Protein digestion: an overview of the available techniques and recent developments. J. Proteome Res. 12, 1067–1077; 10.1021/pr301201x (2013).

namespace System

namespace System.IO

val directory : string

val path : string

type Path = static member ChangeExtension : path: string * extension: string -> string static member Combine : path1: string * path2: string -> string + 3 overloads static member EndsInDirectorySeparator : path: ReadOnlySpan<char> -> bool + 1 overload static member GetDirectoryName : path: ReadOnlySpan<char> -> ReadOnlySpan<char> + 1 overload static member GetExtension : path: ReadOnlySpan<char> -> ReadOnlySpan<char> + 1 overload static member GetFileName : path: ReadOnlySpan<char> -> ReadOnlySpan<char> + 1 overload static member GetFileNameWithoutExtension : path: ReadOnlySpan<char> -> ReadOnlySpan<char> + 1 overload static member GetFullPath : path: string -> string + 1 overload static member GetInvalidFileNameChars : unit -> char [] static member GetInvalidPathChars : unit -> char [] ...
<summary>Performs operations on <see cref="T:System.String" /> instances that contain file or directory path information. These operations are performed in a cross-platform manner.</summary>

Path.Combine([<System.ParamArray>] paths: string []) : string
Path.Combine(path1: string, path2: string) : string
Path.Combine(path1: string, path2: string, path3: string) : string
Path.Combine(path1: string, path2: string, path3: string, path4: string) : string

val sequences : obj []

module Seq from Microsoft.FSharp.Collections
<summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.seq`1" />.</summary>

val toArray : source:seq<'T> -> 'T []
<summary>Builds an array from the given collection.</summary>
<param name="source">The input sequence.</param>
<returns>The result array.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>

module Array from Microsoft.FSharp.Collections
<summary>Contains operations for working with arrays.</summary>
<remarks> See also <a href="https://docs.microsoft.com/dotnet/fsharp/language-reference/arrays">F# Language Guide - Arrays</a>. </remarks>

val head : array:'T [] -> 'T
<summary>Returns the first element of the array.</summary>
<param name="array">The input array.</param>
<returns>The first element of the array.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>
<exception cref="T:System.ArgumentException">Thrown when the input array is empty.</exception>

val aminoAcidDistribution : (obj * int) []

val collect : mapping:('T -> 'U []) -> array:'T [] -> 'U []
<summary>For each element of the array, applies the given function. Concatenates all the results and return the combined array.</summary>
<param name="mapping">The function to create sub-arrays from the input array elements.</param>
<param name="array">The input array.</param>
<returns>The concatenation of the sub-arrays.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>

val fastAItem : obj

val countBy : projection:('T -> 'Key) -> array:'T [] -> ('Key * int) [] (requires equality)
<summary>Applies a key-generating function to each element of an array and returns an array yielding unique keys and their number of occurrences in the original array.</summary>
<param name="projection">A function transforming each item of the input array into a key to be compared against the others.</param>
<param name="array">The input array.</param>
<returns>The result array.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>

val id : x:'T -> 'T
<summary>The identity function</summary>
<param name="x">The input value.</param>
<returns>The same value.</returns>

val aaDistributionHis : obj

val map : mapping:('T -> 'U) -> array:'T [] -> 'U []
<summary>Builds a new array whose elements are the results of applying the given function to each of the elements of the array.</summary>
<param name="mapping">The function to transform elements of the array.</param>
<param name="array">The input array.</param>
<returns>The array of transformed elements.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>

val name : obj

val count : int

Multiple items
val string : value:'T -> string
<summary>Converts the argument to a string using <c>ToString</c>.</summary>
<remarks>For standard integer and floating point values the and any type that implements <c>IFormattable</c><c>ToString</c> conversion uses <c>CultureInfo.InvariantCulture</c>. </remarks>
<param name="value">The input value.</param>
<returns>The converted string.</returns>

--------------------
type string = System.String
<summary>An abbreviation for the CLI type <see cref="T:System.String" />.</summary>
<category>Basic Types</category>

val sortByDescending : projection:('T -> 'Key) -> array:'T [] -> 'T [] (requires comparison)
<summary>Sorts the elements of an array, in descending order, using the given projection for the keys and returning a new array. Elements are compared using <see cref="M:Microsoft.FSharp.Core.Operators.compare" />.</summary>
<remarks>This is not a stable sort, i.e. the original order of equal elements is not necessarily preserved. For a stable sort, consider using <see cref="M:Microsoft.FSharp.Collections.SeqModule.Sort" />.</remarks>
<param name="projection">The function to transform array elements into the type that is compared.</param>
<param name="array">The input array.</param>
<returns>The sorted array.</returns>

val snd : tuple:('T1 * 'T2) -> 'T2
<summary>Return the second element of a tuple, <c>snd (a,b) = b</c>.</summary>
<param name="tuple">The input tuple.</param>
<returns>The second value.</returns>

val digestedProteins : obj []

val mapi : mapping:(int -> 'T -> 'U) -> array:'T [] -> 'U []
<summary>Builds a new array whose elements are the results of applying the given function to each of the elements of the array. The integer index passed to the function indicates the index of element being transformed.</summary>
<param name="mapping">The function to transform elements and their indices.</param>
<param name="array">The input array.</param>
<returns>The array of transformed elements.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>

val i : int

val concat : arrays:seq<'T []> -> 'T []
<summary>Builds a new array that contains the elements of each of the given sequence of arrays.</summary>
<param name="arrays">The input sequence of arrays.</param>
<returns>The concatenation of the sequence of input arrays.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>

val chartDigestedProteins : obj

val peptide : obj

val filter : predicate:('T -> bool) -> array:'T [] -> 'T []
<summary>Returns a new collection containing only the elements of the collection for which the given predicate returns "true".</summary>
<param name="predicate">The function to test the input elements.</param>
<param name="array">The input array.</param>
<returns>An array containing the elements for which the given predicate returns true.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>

val x : float

val masses : float []

val digestedPeptideMasses : obj []

val chartDigestedPeptideMasses : obj

val ucMass : obj