NB07a Working with Deedle

Binder

Download Notebook

Deedle is an easy to use library for data and time series manipulation and for scientific programming. It supports working with structured data frames, ordered and unordered data, as well as time series.

The analysis of your data in the following notebooks will be mostly done in Deedle, so here are some explanations and examples to help you better understand the analysis notebooks.

We start by loading our usual nuget packages and the Deedle package.

#r "nuget: Deedle, 2.5.0"
#r "nuget: BioFSharp, 2.0.0-preview.3"
#r "nuget: BioFSharp.IO, 2.0.0-preview.3"
#r "nuget: BioFSharp.Mz, 0.1.5-beta"
#r "nuget: BIO-BTE-06-L-7_Aux, 0.0.10"
#r "nuget: FSharp.Stats, 0.4.2"

#if IPYNB
#r "nuget: Plotly.NET, 4.2.0"
#r "nuget: Plotly.NET.Interactive, 4.2.0"
#endif // IPYNB

open Plotly.NET
open BioFSharp
open BioFSharp.Mz
open BIO_BTE_06_L_7_Aux.FS3_Aux
open BIO_BTE_06_L_7_Aux.Deedle_Aux
open System.IO
open Deedle
open FSharp.Stats

Deedle Basics

Familiarize yourself with Deedle! Create a Series yourself that you add to the Frame 'persons'.

let firstNames      = Series.ofValues ["Kevin"; "Lukas"; "Benedikt";" Michael"] 
let coffeesPerWeek  = Series.ofValues [15; 12; 10; 11] 
let lastNames       = Series.ofValues ["Schneider"; "Weil"; "Venn"; "Schroda"]
let group           = Series.ofValues ["CSB"; "CSB"; "CSB"; "MBS"] 
let persons = 
    Frame.ofColumns(List.zip ["fN"; "lN"; "g"] [firstNames; lastNames; group])
    |> Frame.addCol "cpw" coffeesPerWeek

Follow the above scheme and create another frame that is exactly the same, but represents different persons (the frame can be small, e.g. two persons). Use the function Frame.merge to combine your frame and 'persons'. Does it work? If not, why?
Hint: Think about how Frames are built. What could be a reasons why exactly THOSE Frames won't merge?

Back to the Frame 'persons'! In the following you see a Series of Frame/Series manipulations.
Look how the Frames and Series have changed. Use the functions formatAsTable and Chart.withSize as seen above to convert a Frame into a Plotly table. For Series, use .Print() on the object.

let coffeePerWeek' : Series<int,int> = Frame.getCol ("cpw") persons 
let groupedByG : Frame<string*int,_> = persons |> Frame.groupRowsBy "g"
let withOutG : Frame<string*int,_> = groupedByG |> Frame.sliceCols ["fN"; "lN"; "cpw"]
let coffeePerWeek'' : Series<string*int,int>= groupedByG |> Frame.getCol ("cpw")
let coffeePerWeekPerGroup = Series.applyLevel Pair.get1Of2 (Series.values >> Seq.sum) coffeePerWeek''

Now that you got to know the object Frame which is a collection of Series, we move on to a real dataset. As our dataset we take the FASTA with Chlamy proteins, select 50 random proteins, and digest them. The digested peptides are represented using a record type. Deedle frames can be directly constructed from record types with Frame.ofRecords. Alternatively, a character separated file could be used as source for a Frame as well.

let path = Path.Combine[|__SOURCE_DIRECTORY__;"downloads/Chlamy_JGI5_5(Cp_Mp).fasta"|]
downloadFile path "Chlamy_JGI5_5(Cp_Mp).fasta" "bio-bte-06-l-7"

let examplePeptides = 
    path
    |> IO.FastA.fromFile BioArray.ofAminoAcidString
    |> Seq.toArray
    |> Array.take 50
    |> Array.mapi (fun i fastAItem ->
        Digestion.BioArray.digest Digestion.Table.Trypsin i fastAItem.Sequence
        |> Digestion.BioArray.concernMissCleavages 0 0 
        |> Array.map (fun dp ->
            {|
                PeptideSequence = dp.PepSequence
                Protein = fastAItem.Header.Split ' ' |> Array.head
            |}
        )
    )
    |> Array.concat
    |> Array.filter (fun x -> x.PeptideSequence.Length > 5)

let peptidesFrame =
    examplePeptides
    |> Frame.ofRecords

As you can see, our columns are named the same as the field of the record type, while our rows are indexed by numbers only. It is often helpful to use a more descriptive row key. In this case, we can use the peptide sequence for that.
Note: Row keys must be unique. By grouping with "PeptidesSequence", we get the sequence tupled with the index as key. The function Frame.reduceLevel aggregates the rows now based on the first part of the tuple, the peptide sequence, ignoring the second part of the tuple, the index. The aggregator function given to Frame.reduceLevel aggregates each column separately.

let pfIndexedSequenceList : Frame<list<AminoAcids.AminoAcid>,string> =
    peptidesFrame
    |> Frame.groupRowsBy "PeptideSequence"
    |> Frame.dropCol "PeptideSequence"
    |> Frame.reduceLevel fst (fun a b -> a + "," + b)

Our rows are now indexed with the peptide sequences. The peptide sequence is still an array of amino acids. For better visibility we can transform it to its string representation. For that we can map over our row keys similar to an array and call the function BioList.toString on each row key.

let pfIndexedStringSequence =
    pfIndexedSequenceList
    |> Frame.mapRowKeys (fun rc -> rc |> BioList.toString)

We now have a Frame containing information about our peptides. To add additional information we can go back to the peptide array we started with and calculate the monoisotopic mass, for example. The monoisotopic mass is tupled with the peptide sequence as string, the same as in our peptide Frame. The resulting array can then be transformed into a Series.

let peptidesAndMasses =
    examplePeptides
    |> Array.distinctBy (fun x -> x.PeptideSequence)
    |> Array.map (fun peptide ->
        // calculate mass for each peptide
        peptide.PeptideSequence |> BioList.toString, BioSeq.toMonoisotopicMassWith (BioItem.monoisoMass ModificationInfo.Table.H2O) peptide.PeptideSequence
        )

let peptidesAndMassesSeries =
    peptidesAndMasses
    |> series

The columns in Frames consist of Series. Since we now have a series containing our monoisotopic masses, together with the peptide sequence, we can simply add it to our Frame and give the column a name.

let pfAddedMass =
    pfIndexedStringSequence
    |> Frame.addCol "Mass" peptidesAndMassesSeries

Alternatively, we can take a column from our Frame, apply a function to it, and create a new frame from the Series.

let pfChargedMass =
    pfAddedMass
    |> Frame.getCol "Mass"
    |> Series.mapValues (fun mass -> Mass.toMZ mass 2.)
    |> fun s -> ["Mass Charge 2", s]
    |> Frame.ofColumns

The new Frame has the same row keys as our previous Frame. The information from our new Frame can be joined with our old Frame by using Frame.join. Frame.join is similar to Frame.addCol, but can join whole Frames at once instead of single columns.

let joinedFrame =
    pfAddedMass
    |> Frame.join JoinKind.Left pfChargedMass
namespace System
namespace System.IO
namespace Microsoft.FSharp
val firstNames : obj
val coffeesPerWeek : obj
val lastNames : obj
val group : obj
val persons : obj
Multiple items
module List from Microsoft.FSharp.Collections
<summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.list`1" />.</summary>
<namespacedoc><summary>Operations for collections such as lists, arrays, sets, maps and sequences. See also <a href="https://docs.microsoft.com/dotnet/fsharp/language-reference/fsharp-collection-types">F# Collection Types</a> in the F# Language Guide. </summary></namespacedoc>


--------------------
type List<'T> = | ( [] ) | ( :: ) of Head: 'T * Tail: 'T list interface IReadOnlyList<'T> interface IReadOnlyCollection<'T> interface IEnumerable interface IEnumerable<'T> member GetReverseIndex : rank:int * offset:int -> int member GetSlice : startIndex:int option * endIndex:int option -> 'T list static member Cons : head:'T * tail:'T list -> 'T list member Head : 'T member IsEmpty : bool member Item : index:int -> 'T with get ...
<summary>The type of immutable singly-linked lists.</summary>
<remarks>Use the constructors <c>[]</c> and <c>::</c> (infix) to create values of this type, or the notation <c>[1;2;3]</c>. Use the values in the <c>List</c> module to manipulate values of this type, or pattern match against the values directly. </remarks>
<exclude />
val zip : list1:'T1 list -> list2:'T2 list -> ('T1 * 'T2) list
<summary>Combines the two lists into a list of pairs. The two lists must have equal lengths.</summary>
<param name="list1">The first input list.</param>
<param name="list2">The second input list.</param>
<returns>A single list containing pairs of matching elements from the input lists.</returns>
val coffeePerWeek' : obj
Multiple items
val int : value:'T -> int (requires member op_Explicit)
<summary>Converts the argument to signed 32-bit integer. This is a direct conversion for all primitive numeric types. For strings, the input is converted using <c>Int32.Parse()</c> with InvariantCulture settings. Otherwise the operation requires an appropriate static conversion method on the input type.</summary>
<param name="value">The input value.</param>
<returns>The converted int</returns>


--------------------
[<Struct>] type int = int32
<summary>An abbreviation for the CLI type <see cref="T:System.Int32" />.</summary>
<category>Basic Types</category>


--------------------
type int<'Measure> = int
<summary>The type of 32-bit signed integer numbers, annotated with a unit of measure. The unit of measure is erased in compiled code and when values of this type are analyzed using reflection. The type is representationally equivalent to <see cref="T:System.Int32" />.</summary>
<category>Basic Types with Units of Measure</category>
val groupedByG : obj
Multiple items
val string : value:'T -> string
<summary>Converts the argument to a string using <c>ToString</c>.</summary>
<remarks>For standard integer and floating point values the and any type that implements <c>IFormattable</c><c>ToString</c> conversion uses <c>CultureInfo.InvariantCulture</c>. </remarks>
<param name="value">The input value.</param>
<returns>The converted string.</returns>


--------------------
type string = System.String
<summary>An abbreviation for the CLI type <see cref="T:System.String" />.</summary>
<category>Basic Types</category>
val withOutG : obj
val coffeePerWeek'' : obj
val coffeePerWeekPerGroup : obj
module Seq from Microsoft.FSharp.Collections
<summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.seq`1" />.</summary>
val sum : source:seq<'T> -> 'T (requires member ( + ) and member get_Zero)
<summary>Returns the sum of the elements in the sequence.</summary>
<remarks>The elements are summed using the <c>+</c> operator and <c>Zero</c> property associated with the generated type.</remarks>
<param name="source">The input sequence.</param>
<returns>The computed sum.</returns>
val path : string
type Path = static member ChangeExtension : path: string * extension: string -> string static member Combine : path1: string * path2: string -> string + 3 overloads static member EndsInDirectorySeparator : path: ReadOnlySpan<char> -> bool + 1 overload static member GetDirectoryName : path: ReadOnlySpan<char> -> ReadOnlySpan<char> + 1 overload static member GetExtension : path: ReadOnlySpan<char> -> ReadOnlySpan<char> + 1 overload static member GetFileName : path: ReadOnlySpan<char> -> ReadOnlySpan<char> + 1 overload static member GetFileNameWithoutExtension : path: ReadOnlySpan<char> -> ReadOnlySpan<char> + 1 overload static member GetFullPath : path: string -> string + 1 overload static member GetInvalidFileNameChars : unit -> char [] static member GetInvalidPathChars : unit -> char [] ...
<summary>Performs operations on <see cref="T:System.String" /> instances that contain file or directory path information. These operations are performed in a cross-platform manner.</summary>
Path.Combine([<System.ParamArray>] paths: string []) : string
Path.Combine(path1: string, path2: string) : string
Path.Combine(path1: string, path2: string, path3: string) : string
Path.Combine(path1: string, path2: string, path3: string, path4: string) : string
val examplePeptides : {| PeptideSequence: obj; Protein: obj |} []
val toArray : source:seq<'T> -> 'T []
<summary>Builds an array from the given collection.</summary>
<param name="source">The input sequence.</param>
<returns>The result array.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
module Array from Microsoft.FSharp.Collections
<summary>Contains operations for working with arrays.</summary>
<remarks> See also <a href="https://docs.microsoft.com/dotnet/fsharp/language-reference/arrays">F# Language Guide - Arrays</a>. </remarks>
val take : count:int -> array:'T [] -> 'T []
<summary>Returns the first N elements of the array.</summary>
<remarks>Throws <c>InvalidOperationException</c> if the count exceeds the number of elements in the array. <c>Array.truncate</c> returns as many items as the array contains instead of throwing an exception.</remarks>
<param name="count">The number of items to take.</param>
<param name="array">The input array.</param>
<returns>The result array.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>
<exception cref="T:System.ArgumentException">Thrown when the input array is empty.</exception>
<exception cref="T:System.InvalidOperationException">Thrown when count exceeds the number of elements in the list.</exception>
val mapi : mapping:(int -> 'T -> 'U) -> array:'T [] -> 'U []
<summary>Builds a new array whose elements are the results of applying the given function to each of the elements of the array. The integer index passed to the function indicates the index of element being transformed.</summary>
<param name="mapping">The function to transform elements and their indices.</param>
<param name="array">The input array.</param>
<returns>The array of transformed elements.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>
val i : int
val fastAItem : obj
val map : mapping:('T -> 'U) -> array:'T [] -> 'U []
<summary>Builds a new array whose elements are the results of applying the given function to each of the elements of the array.</summary>
<param name="mapping">The function to transform elements of the array.</param>
<param name="array">The input array.</param>
<returns>The array of transformed elements.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>
val dp : obj
val head : array:'T [] -> 'T
<summary>Returns the first element of the array.</summary>
<param name="array">The input array.</param>
<returns>The first element of the array.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>
<exception cref="T:System.ArgumentException">Thrown when the input array is empty.</exception>
val concat : arrays:seq<'T []> -> 'T []
<summary>Builds a new array that contains the elements of each of the given sequence of arrays.</summary>
<param name="arrays">The input sequence of arrays.</param>
<returns>The concatenation of the sequence of input arrays.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
val filter : predicate:('T -> bool) -> array:'T [] -> 'T []
<summary>Returns a new collection containing only the elements of the collection for which the given predicate returns "true".</summary>
<param name="predicate">The function to test the input elements.</param>
<param name="array">The input array.</param>
<returns>An array containing the elements for which the given predicate returns true.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>
val x : {| PeptideSequence: obj; Protein: obj |}
anonymous record field PeptideSequence: obj
val peptidesFrame : obj
val x : obj
val pfIndexedSequenceList : obj
type 'T list = List<'T>
<summary>The type of immutable singly-linked lists. </summary>
<remarks>See the <see cref="T:Microsoft.FSharp.Collections.ListModule" /> module for further operations related to lists. Use the constructors <c>[]</c> and <c>::</c> (infix) to create values of this type, or the notation <c>[1; 2; 3]</c>. Use the values in the <c>List</c> module to manipulate values of this type, or pattern match against the values directly. See also <a href="https://docs.microsoft.com/dotnet/fsharp/language-reference/lists">F# Language Guide - Lists</a>. </remarks>
val fst : tuple:('T1 * 'T2) -> 'T1
<summary>Return the first element of a tuple, <c>fst (a,b) = a</c>.</summary>
<param name="tuple">The input tuple.</param>
<returns>The first value.</returns>
val pfIndexedStringSequence : obj
val peptidesAndMasses : (obj * obj) []
val distinctBy : projection:('T -> 'Key) -> array:'T [] -> 'T [] (requires equality)
<summary>Returns an array that contains no duplicate entries according to the generic hash and equality comparisons on the keys returned by the given key-generating function. If an element occurs multiple times in the array then the later occurrences are discarded.</summary>
<param name="projection">A function transforming the array items into comparable keys.</param>
<param name="array">The input array.</param>
<returns>The result array.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>
val peptide : {| PeptideSequence: obj; Protein: obj |}
val peptidesAndMassesSeries : obj
val pfAddedMass : obj
val pfChargedMass : obj
val s : obj
val joinedFrame : obj