Sequence Properties
Summary: This example shows how to calculate properties of amino acid sequences in BioFSharp
General
BioFSharp comes equipped with a range of numerical values for important amino acid properties. To access them in an easy fashion, you can use the initGetAminoProperty
function in the following way. The result is a mapping function, which assigns a value to each compatible amino acid.
In this tutorial our aim is to find out the hydrophobicity of a peptide. We start by calling the aforementioned function.
open BioFSharp
open AminoProperties
let getHydrophobicityIndex = initGetAminoProperty AminoProperty.HydrophobicityIndex
getHydrophobicityIndex AminoAcidSymbols.AminoAcidSymbol.Ala
|
let getHydrophobicityIndexZ = initGetAminoPropertyZnorm AminoProperty.HydrophobicityIndex
getHydrophobicityIndexZ AminoAcidSymbols.AminoAcidSymbol.Ala
|
With this function you might easily estimate the hydrophobictiy of our peptide by calling it on every element with a map. Usually close amino acids in a peptide influence each other. To cover this you can use the ofWindowedBioArray
function. It also takes a window size and calculates the value of the property of every amino acid in the chain with regards to the effect of the adjacent amino acids in this window.
let peptide =
"REYAHMIGMEYDTVQK"
|> BioArray.ofAminoAcidString
|> Array.map AminoAcidSymbols.aminoAcidSymbol
let peptidehydrophobicites = peptide |> Array.map getHydrophobicityIndex
|
let peptidehydrophobicites' = peptide |> AminoProperties.ofWindowedBioArray 3 getHydrophobicityIndex
|
In the last step you can then just sum or average over the values to get a summary value of the hydrophobicity, depending on wether you want a length dependent or independent value.
Array.sum peptidehydrophobicites
|
Array.sum peptidehydrophobicites'
|
Array.average peptidehydrophobicites
|
Array.average peptidehydrophobicites'
|
Isoelectric Point
The isoelectric point (pI) of a protein is the point at which it carries as many positive as negative charges.
Therefore the overall charge is zero. Knowing this value can e.g. be useful for isolation of single proteins in a voltage gradient.
The implementation is based on: this document.
In principle, the distinct amino acids in the protein are counted.
By using the Henderson-Hasselbalch equation and the pKr values, the theoretic charge states of the amino acids for a specific pH can be calculated.
Multiplying those charge states with the count of the associated amino acids and adding those products together then gives the overall charge of the protein. This is only done with the amino acids, which might be charged (basic, acidic).
The isoelectric point is the pH value for which this function returns zero. It is found by bisection (also called Binary Search).
Disclaimer: Keep in mind, that this algorithm ignores post-translational modifications and interactions of the amino acids with each other. Therefore it is only intented to be a rough approximation and should be used as such.
The function for finding the isoelectric point is found in the IsoelectricPoint
module.
- Besides the peptide sequence in form of a
AminoAcidSymbol
Seq, it takes - a mapping function, which maps an
AminoAcidSymbol
to a float representing the pKr and - an accuracy value. The found pI has to be at least as close to zero as this accuracy value
//AA sequence
let myProteinForPI =
"ATPIIEMNYPWTMNIKLSSDACMTNWWPNCMTLKIIA"
|> Seq.map AminoAcidSymbols.aminoAcidSymbol
//accuracy in z
let acc = 0.5
IsoelectricPoint.tryFind IsoelectricPoint.getpKr acc myProteinForPI
|
<summary> Contains functionalities for obtaining included literary data on key amino acid properties </summary>
<summary> Returns a simple mapping function for the given amino acid property </summary>
<summary> Union case of amino acid properties, referencing the according included information in this library. Use "initGetAminoProperty" function to obtain a simple mapping function </summary>
<summary> Hydrophobicity index (Argos et al., 1982) </summary>
<summary> Contains the AminoAcidSymbol type and its according functions. The AminoAcidSymbol type is a lightweight, efficient presentation of amino acids </summary>
<summary> 'A' *Alanin </summary>
<summary> Returns a simple mapping function for the given amino acid property. Normalizes the values to the Z-Norm scale </summary>
module BioArray from BioFSharp.BioCollectionsExtensions
--------------------
module BioArray from BioFSharp
<summary> This module contains the BioArray type and its according functions. The BioArray type is an array of objects using the IBioItem interface </summary>
<summary> Generates amino acid sequence of one-letter-code raw string </summary>
<summary> Maps input to AminoAcidSymbol if possible </summary>
<summary> Returns an array of sliding windows based property averages. Each window contains the n elements surrounding the current element </summary>
<summary> Finding the isoelectric point of peptides </summary>
<summary> Finds the pH for which the global charge of the aaSeq is closer to 0 than the given accuracy. </summary>
<summary> Maps AminoAcidSymbol to default pK value of it's sidechain. Returns 0.0 if sidechain is neither acidic nor basic </summary>