BioFSharp


FastQ format

📂View module documentation

This module allows to parse FASTQ format data with original 4-lines entries into this record type

1: 
2: 
3: 
4: 
5: 
6: 
7: 
/// FastqItem record contains header, sequence, qualityheader, qualitysequence of one entry
type FastqItem<'a,'b> = {
    Header          : string
    Sequence        : 'a
    QualityHeader   : string
    QualitySequence : 'b      
}

To be able to use this parser you need to define two converter functions, one example for each you can also find in our module, but you also may need to write your own.

We can convert sequence string to predefined option type of Amino Acids, using converter function from our library 'BioFSharp.BioItemsConverter.OptionConverter'

1: 
2: 
3: 
4: 
5: 
/// get characters as sequence units
let converterToAA string =
    string
    |> String.toCharArray
    |> Array.map (BioFSharp.BioItemsConverter.OptionConverter.charToOptionAminoAcid)

If you have following possible values for quality sequence: '!""#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~', with Sanger format, that can encode a Phred quality score from 0 to 93 using ASCII 33 to 126, then you can use our converting function:

1: 
2: 
3: 
4: 
5: 
/// get Phred quality score
let qualityConvertFn string =
    string
    |> String.toCharArray
    |> Array.map (fun i -> i.GetHashCode()-33)

And then you can easily use this module to read your FastQ file

1: 
2: 
3: 
4: 
let yourFastqFile = (__SOURCE_DIRECTORY__ + "/data/FastQtest.fastq")

let FastQSequence = 
    FastQ.fromFile converterToAA qualityConvertFn yourFastqFile
namespace System
namespace BioFSharp
namespace BioFSharp.IO
namespace FSharpAux
namespace FSharpAux.IO
type FastqItem<'a,'b> =
  { Header: string
    Sequence: 'a
    QualityHeader: string
    QualitySequence: 'b }


 FastqItem record contains header, sequence, qualityheader, qualitysequence of one entry
FastqItem.Header: string
Multiple items
val string : value:'T -> string

--------------------
type string = String
FastqItem.Sequence: 'a
FastqItem.QualityHeader: string
FastqItem.QualitySequence: 'b
val converterToAA : string:string -> AminoAcids.AminoAcid option []


 get characters as sequence units
Multiple items
val string : string

--------------------
type string = String
Multiple items
type String =
  new : value:char[] -> string + 8 overloads
  member Chars : int -> char
  member Clone : unit -> obj
  member CompareTo : value:obj -> int + 1 overload
  member Contains : value:string -> bool + 3 overloads
  member CopyTo : sourceIndex:int * destination:char[] * destinationIndex:int * count:int -> unit
  member EndsWith : value:string -> bool + 3 overloads
  member EnumerateRunes : unit -> StringRuneEnumerator
  member Equals : obj:obj -> bool + 2 overloads
  member GetEnumerator : unit -> CharEnumerator
  ...

--------------------
String(value: char []) : String
String(value: nativeptr<char>) : String
String(value: nativeptr<sbyte>) : String
String(value: ReadOnlySpan<char>) : String
String(c: char, count: int) : String
String(value: char [], startIndex: int, length: int) : String
String(value: nativeptr<char>, startIndex: int, length: int) : String
String(value: nativeptr<sbyte>, startIndex: int, length: int) : String
String(value: nativeptr<sbyte>, startIndex: int, length: int, enc: Text.Encoding) : String
val toCharArray : str:string -> char []
type Array =
  member Clone : unit -> obj
  member CopyTo : array:Array * index:int -> unit + 1 overload
  member GetEnumerator : unit -> IEnumerator
  member GetLength : dimension:int -> int
  member GetLongLength : dimension:int -> int64
  member GetLowerBound : dimension:int -> int
  member GetUpperBound : dimension:int -> int
  member GetValue : [<ParamArray>] indices:int[] -> obj + 7 overloads
  member Initialize : unit -> unit
  member IsFixedSize : bool
  ...
val map : mapping:('T -> 'U) -> array:'T [] -> 'U []
module BioItemsConverter

from BioFSharp
module OptionConverter

from BioFSharp.BioItemsConverter
val charToOptionAminoAcid : aac:char -> AminoAcids.AminoAcid option
val qualityConvertFn : string:string -> int []


 get Phred quality score
val i : char
val yourFastqFile : string
val FastQSequence : seq<FastQ.FastqItem<AminoAcids.AminoAcid option [],int []>>
module FastQ

from BioFSharp.IO
val fromFile : converter:(string -> 'a) -> qualityConverter:(string -> 'b) -> filePath:string -> seq<FastQ.FastqItem<'a,'b>>
Fork me on GitHub