Fastq parsing
Summary: This example shows how to parse and write fastq formatted files with BioFSharp
This module allows to parse FASTQ format data with original 4-lines entries into this record type
/// FastqItem record contains header, sequence, qualityheader, qualitysequence of one entry
type FastqItem<'a,'b> = {
Header : string
Sequence : 'a
QualityHeader : string
QualitySequence : 'b
}
To be able to use this parser you need to define two converter functions, one example for each you can also find in our module, but you also may need to write your own.
If you have following possible values for quality sequence:
|
with Sanger format, that can encode a Phred quality score from 0 to 93 using ASCII 33 to 126, then you can use our converting function:
/// get Phred quality score
let qualityConvertFn (string:string) =
string.ToCharArray()
|> Array.map (fun i -> int i - 33)
And then you can easily use this module to read your FastQ file
open BioFSharp
open BioFSharp.IO
let yourFastqFile = (__SOURCE_DIRECTORY__ + "/data/FastQtest.fastq")
let FastQSequence =
FastQ.fromFile BioArray.ofAminoAcidString qualityConvertFn yourFastqFile
Warning: Output, it-value and value references require --eval
type FastqItem<'a,'b> =
{
Header: string
Sequence: 'a
QualityHeader: string
QualitySequence: 'b
}
FastqItem record contains header, sequence, qualityheader, qualitysequence of one entry
FastqItem record contains header, sequence, qualityheader, qualitysequence of one entry
'a
'b
Multiple items
val string: value: 'T -> string
--------------------
type string = System.String
val string: value: 'T -> string
--------------------
type string = System.String
val qualityConvertFn: string: string -> int array
get Phred quality score
get Phred quality score
Multiple items
val string: string
--------------------
type string = System.String
val string: string
--------------------
type string = System.String
System.String.ToCharArray() : char array
System.String.ToCharArray(startIndex: int, length: int) : char array
System.String.ToCharArray(startIndex: int, length: int) : char array
module Array
from Microsoft.FSharp.Collections
val map: mapping: ('T -> 'U) -> array: 'T array -> 'U array
val i: char
Multiple items
val int: value: 'T -> int (requires member op_Explicit)
--------------------
type int = int32
--------------------
type int<'Measure> = int
val int: value: 'T -> int (requires member op_Explicit)
--------------------
type int = int32
--------------------
type int<'Measure> = int
namespace BioFSharp
namespace BioFSharp.IO
val yourFastqFile: string
val FastQSequence: FastQ.FastqItem<BioArray.BioArray<AminoAcids.AminoAcid>,int array> seq
module FastQ
from BioFSharp.IO
val fromFile: converter: (string -> 'a) -> qualityConverter: (string -> 'b) -> filePath: string -> FastQ.FastqItem<'a,'b> seq
<summary> Reads FastqItem from FastQ format file. Converter and qualityConverter determines type of sequence by converting seq<char> -> type </summary>
<summary> Reads FastqItem from FastQ format file. Converter and qualityConverter determines type of sequence by converting seq<char> -> type </summary>
Multiple items
module BioArray from BioFSharp.BioCollectionsExtensions
--------------------
module BioArray from BioFSharp
<summary> This module contains the BioArray type and its according functions. The BioArray type is an array of objects using the IBioItem interface </summary>
module BioArray from BioFSharp.BioCollectionsExtensions
--------------------
module BioArray from BioFSharp
<summary> This module contains the BioArray type and its according functions. The BioArray type is an array of objects using the IBioItem interface </summary>
val ofAminoAcidString: s: #(char seq) -> BioArray.BioArray<AminoAcids.AminoAcid>
<summary> Generates amino acid sequence of one-letter-code raw string </summary>
<summary> Generates amino acid sequence of one-letter-code raw string </summary>