If one has a csv that contains the data, the first question is whether the frame contains columns that are needed for the indexing.
Should this be the case one can use the readAndIndexFrame
, is that not the case one has first to reading the frame with
readFrame
add the columns that are needed for indexing and then index with indexWithColumnValues
.
You will need to index with at least 2 columns should you wish to aggregate
because then can have the same Key
value pairs in one column as needed/wished for the aggregation. Each time you want to aggregate means you have to have an additional column
for the indexing. Keep in mind that you can use functions of the GroupTransform
module as often as you want because the row keys are not changed. In this example we go with 2 aggregations because we have biological and technical replicates.
let frameForTutorial =
frame [
("ConditionA")=> series ["row1" => 2.;"row2" =>3.;"row3" =>1.;"row4" =>7.]
("ConditionB")=> series ["row1" => 2.4;"row2" =>4.5;"row3" =>6.1;"row4" =>5.1]
]
|>Frame.addCol "Gen"(series ["row1" => "A" ;"row2" =>"A";"row3" =>"A";"row4" =>"A"])
|>Frame.addCol "technicalReplicate"(series ["row1" => "B" ;"row2" =>"B";"row3" =>"C";"row4" =>"C"])
|>Frame.addCol "BioRep"(series ["row1" => "D" ;"row2" =>"E";"row3" =>"D";"row4" =>"E"])
let indexedTutorialFrame = indexWithColumnValues ["Gen";"technicalReplicate";"BioRep"] frameForTutorial
indexedTutorialFrame.Print()
ConditionA ConditionB Gen technicalReplicate BioRep
Gen: A
technicalReplicate: B
BioRep: D
-> 2 2.4 A B D
Gen: A
technicalReplicate: B
BioRep: E
-> 3 4.5 A B E
Gen: A
technicalReplicate: C
BioRep: D
-> 1 6.1 A C D
Gen: A
technicalReplicate: C
BioRep: E
-> 7 5.1 A C E
|
!!!!
Keep in mind that the function working on the whole frame will error should
you have strings as values inside the frame for the NumericAggregation
function or ints/floats for the StringAggregation
.
So remove the values/columns or use the single column versions multiple times.
!!!!
let indexedFrameWithoutIndexingColumns =
indexedTutorialFrame
|>Frame.dropCol "Gen"
|>Frame.dropCol "technicalReplicate"
|>Frame.dropCol "BioRep"
indexedFrameWithoutIndexingColumns.Print()
ConditionA ConditionB
Gen: A
technicalReplicate: B
BioRep: D
-> 2 2.4
Gen: A
technicalReplicate: B
BioRep: E
-> 3 4.5
Gen: A
technicalReplicate: C
BioRep: D
-> 1 6.1
Gen: A
technicalReplicate: C
BioRep: E
-> 7 5.1
|
As you can see each row has multiple Key objects and the combinations are unique.
Now one can create a filter with either the NumericFilter
or GroupFilter
module functions.
to use the resultant series or seq series one has to keep in mind that the functions in the
NumericAggregation
and StringAggregation
modules need for each column a seq of series, so one needs
to create a seq of seq of series for the function that affect the whole frame. If not done correctly this could
result in an error(an empty filter is not suitable for the functions working on the whole frame!).
When you have a suitable seq of series or seq of seq of series then one can then use
the aggregation module of choice.
Let's look back at our example with filters that filters would always say true, we want the Mean, and we use the seq ["Gen";"technicalReplicate"] to determine that Rep.
let filterA = numericFilterServeralCol (IsBiggerThan 0.5) indexedFrameWithoutIndexingColumns
let filterB = numericFilterServeralCol (IsSmallerThan 100.) indexedFrameWithoutIndexingColumns
let seqOfFilterMulty filterOne filterTwo =
filterOne
|>Seq.mapi (fun i x ->
let colKeysForSeq seqIt b=
seqIt
|>Seq.item b
let cIII = seq [x;colKeysForSeq filterTwo i]
cIII)
let aggregatedFrameA = numAgAllCol Mean indexedFrameWithoutIndexingColumns ["Gen";"technicalReplicate"] (seqOfFilterMulty filterA filterB)
aggregatedFrameA.Print()
ConditionA ConditionB
Gen: A
technicalReplicate: B
-> 2.5 3.45
Gen: A
technicalReplicate: C
-> 4 5.6
|
As you can see the BioRep
parts of the keys was dropped and we aggregated Keys that were identical.
Now we also want to do the median of the technicalReplicate
. For that we need new filters and need to adjust the input parameters.
let filterC = numericFilterServeralCol (IsBiggerThan 0.5) aggregatedFrameA
let filterD = numericFilterServeralCol (IsSmallerThan 100.) aggregatedFrameA
let aggregatedFrameB = numAgAllCol Median aggregatedFrameA ["Gen"] (seqOfFilterMulty filterC filterD)
aggregatedFrameB.Print()
ConditionA ConditionB
Gen: A
-> 3.25 4.525
|
namespace Deedle
Multiple items
namespace FSharp
--------------------
namespace Microsoft.FSharp
namespace FSharp.Stats
namespace FSharpAux
namespace Drafo
module Core
from Drafo
namespace NumericFilter
module NumericFilter
from NumericFilter
namespace NumericAggregation
module NumericAggregation
from NumericAggregation
val frameForTutorial: Frame<string,string>
val frame: columns: seq<'a * #ISeries<'c>> -> Frame<'c,'a> (requires equality and equality)
val series: observations: seq<'a * 'b> -> Series<'a,'b> (requires equality)
Multiple items
module Frame
from Deedle
--------------------
type Frame =
static member ReadCsv: location: string * hasHeaders: Nullable<bool> * inferTypes: Nullable<bool> * inferRows: Nullable<int> * schema: string * separators: string * culture: string * maxRows: Nullable<int> * missingValues: string[] * preferOptions: bool -> Frame<int,string> + 1 overload
static member ReadReader: reader: IDataReader -> Frame<int,string>
static member CustomExpanders: Dictionary<Type,Func<obj,seq<string * Type * obj>>>
static member NonExpandableInterfaces: ResizeArray<Type>
static member NonExpandableTypes: HashSet<Type>
--------------------
type Frame<'TRowKey,'TColumnKey (requires equality and equality)> =
interface IDynamicMetaObjectProvider
interface INotifyCollectionChanged
interface IFsiFormattable
interface IFrame
new: rowIndex: IIndex<'TRowKey> * columnIndex: IIndex<'TColumnKey> * data: IVector<IVector> * indexBuilder: IIndexBuilder * vectorBuilder: IVectorBuilder -> Frame<'TRowKey,'TColumnKey> + 1 overload
member AddColumn: column: 'TColumnKey * series: seq<'V> -> unit + 3 overloads
member AggregateRowsBy: groupBy: seq<'TColumnKey> * aggBy: seq<'TColumnKey> * aggFunc: Func<Series<'TRowKey,'a>,'b> -> Frame<int,'TColumnKey>
member Clone: unit -> Frame<'TRowKey,'TColumnKey>
member ColumnApply: f: Func<Series<'TRowKey,'T>,ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey> + 1 overload
member DropColumn: column: 'TColumnKey -> unit
...
--------------------
new: names: seq<'TColumnKey> * columns: seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
new: rowIndex: Indices.IIndex<'TRowKey> * columnIndex: Indices.IIndex<'TColumnKey> * data: IVector<IVector> * indexBuilder: Indices.IIndexBuilder * vectorBuilder: Vectors.IVectorBuilder -> Frame<'TRowKey,'TColumnKey>
val addCol: column: 'C -> series: Series<'R,'V> -> frame: Frame<'R,'C> -> Frame<'R,'C> (requires equality and equality)
val indexedTutorialFrame: Frame<Key,string>
val indexWithColumnValues: keyCols: seq<string> -> f: Frame<'a,string> -> Frame<Key,string> (requires equality)
static member FrameExtensions.Print: frame: Frame<'K,'V> -> unit (requires equality and equality)
static member FrameExtensions.Print: frame: Frame<'K,'V> * printTypes: bool -> unit (requires equality and equality)
val indexedFrameWithoutIndexingColumns: Frame<Key,string>
val dropCol: column: 'C -> frame: Frame<'R,'C> -> Frame<'R,'C> (requires equality and equality)
val filterA: seq<Series<Key,bool>>
val numericFilterServeralCol: filter: NumericFilter -> fp: Frame<Key,string> -> seq<Series<Key,bool>>
union case NumericFilter.IsBiggerThan: float -> NumericFilter
val filterB: seq<Series<Key,bool>>
union case NumericFilter.IsSmallerThan: float -> NumericFilter
val seqOfFilterMulty: filterOne: seq<'a> -> filterTwo: seq<'a> -> seq<seq<'a>>
val filterOne: seq<'a>
val filterTwo: seq<'a>
Multiple items
module Seq
from FSharpAux
--------------------
module Seq
from FSharp.Stats
<summary>
Module to compute common statistical measure
</summary>
--------------------
module Seq
from Microsoft.FSharp.Collections
<summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.seq`1" />.</summary>
val mapi: mapping: (int -> 'T -> 'U) -> source: seq<'T> -> seq<'U>
<summary>Builds a new collection whose elements are the results of applying the given function
to each of the elements of the collection. The integer index passed to the
function indicates the index (from 0) of element being transformed.</summary>
<param name="mapping">A function to transform items from the input sequence that also supplies the current index.</param>
<param name="source">The input sequence.</param>
<returns>The result sequence.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
<example id="item-1"><code lang="fsharp">
let inputs = [ 10; 10; 10 ]
inputs |> Seq.mapi (fun i x -> i + x)
</code>
Evaluates to a sequence yielding the same results as <c>seq { 10; 11; 12 }</c></example>
val i: int
val x: 'a
val colKeysForSeq: (seq<'b> -> int -> 'b)
val seqIt: seq<'b>
val b: int
val item: index: int -> source: seq<'T> -> 'T
<summary>Computes the element at the specified index in the collection.</summary>
<param name="index">The index of the element to retrieve.</param>
<param name="source">The input sequence.</param>
<returns>The element at the specified index of the sequence.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
<exception cref="T:System.ArgumentException">Thrown when the index is negative or the input sequence does not contain enough elements.</exception>
<example id="item-1"><code lang="fsharp">
let inputs = ["a"; "b"; "c"]
inputs |> Seq.item 1
</code>
Evaluates to <c>"b"</c></example>
<example id="item-2"><code lang="fsharp">
let inputs = ["a"; "b"; "c"]
inputs |> Seq.item 4
</code>
Throws <c>ArgumentException</c></example>
val cIII: seq<'a>
Multiple items
val seq: sequence: seq<'T> -> seq<'T>
<summary>Builds a sequence using sequence expression syntax</summary>
<param name="sequence">The input sequence.</param>
<returns>The result sequence.</returns>
<example id="seq-cast-example"><code lang="fsharp">
seq { for i in 0..10 do yield (i, i*i) }
</code></example>
--------------------
type seq<'T> = System.Collections.Generic.IEnumerable<'T>
<summary>An abbreviation for the CLI type <see cref="T:System.Collections.Generic.IEnumerable`1" /></summary>
<remarks>
See the <see cref="T:Microsoft.FSharp.Collections.SeqModule" /> module for further operations related to sequences.
See also <a href="https://docs.microsoft.com/dotnet/fsharp/language-reference/sequences">F# Language Guide - Sequences</a>.
</remarks>
val aggregatedFrameA: Frame<Key,string>
val numAgAllCol: aggregation: NumericAggregation -> fp: Frame<Key,string> -> keyC: seq<string> -> filterCols: seq<seq<Series<Key,bool>>> -> Frame<Key,string>
union case NumericAggregation.Mean: NumericAggregation
val filterC: seq<Series<Key,bool>>
val filterD: seq<Series<Key,bool>>
val aggregatedFrameB: Frame<Key,string>
union case NumericAggregation.Median: NumericAggregation