BioFSharp
BioFSharp aims to be a user-friendly functional library for bioinformatics written in F#. It contains the basic data structures for common biological objects like amino acids and nucleotides based on chemical formulas and chemical elements.
BioFSharp facilitates working with sequences in a strongly typed way and is designed to work well with F# Interactive. It provides a variety of parsers for many biological file formats and a variety of algorithms suited for bioinformatic workflows.
The core datamodel implements in ascending hierarchical order:
- Chemical elements and formulas which are a collection of elements
- Amino Acids, Nucleotides and Modifications, which all implement the common IBioItem interface
- BioCollections (BioItem,BioList,BioSeq) as representation of biological sequences
Installation
From Nuget.org:
You can get the stable versions of all BioFSharp packages from nuget:
Install-Package BioFSharp paket add BioFSharp
All associated packages can be found here
Prerelease packages from the nuget branch:
Unstable/Experimental packages only.
If you are using paket, add the following line to you paket.dependencies
file:
git https://github.com/CSBiology/BioFSharp.git nuget Packages: /
you can then access the individual packages:
nuget BioFSharp
nuget BioFSharp.BioContainers
nuget BioFSharp.IO
nuget BioFSharp.Stats
nuget BioFSharp.ML
nuget BioFSharp.BioDB
nuget BioFSharp.Vis
Build the binaries yourself:
Windows:
- Install .Net Core SDK 3.0 +
- go to the project folder
.\build.cmd
Linux(using Mono):
- BioDB is excluded from this build.
Install .Net Core SDK
- go to the project folder
- ./build.sh -t monoBuildChainLocal
Linux(Dotnet Core only):
- this does only build projects targeting netstandard2.0 (Core, BioContainers, IO, Stats, ML)
Install .Net Core SDK
- go to the project folder
- ./build.sh -t dotnetBuildChainLocal
Example
The following example shows how easy it is to start working with sequences:
1: 2: |
|
|
1: 2: |
|
|
BioFSharp comes equipped with a broad range of features and functions to map amino acids and nucleotides.
1: 2: |
|
|
1: 2: |
|
|
The various file readers in BioFSharp help to easyly retrieve information and write biology-associated file formats like for example FastA:
1: 2: 3: 4: 5: 6: |
|
This will return a sequence of FastaItem
s, where you can directly start working with the individual sequences represented as a BioArray
of amino acids.
1:
|
|
|
For more detailed examples continue to explore the BioFSharp documentation. In the near future we will start to provide a cookbook like tutorial in the CSBlog.
Contributing and copyright
The project is hosted on GitHub where you can report issues, fork the project and submit pull requests. If you're adding a new public API, please also consider adding samples that can be turned into a documentation. You might also want to read the library design notes to understand how it works.
The library is available under the OSI-approved MIT license. For more information see the License file in the GitHub repository.
from BioFSharp
from BioFSharp
from BioFSharp
from BioFSharp.IO
from BioFSharp
module Seq
from FSharpAux
--------------------
module Seq
from Microsoft.FSharp.Collections
--------------------
type Seq =
static member CSV : separator:string -> header:bool -> flatten:bool -> data:seq<'a> -> seq<string>
static member CSVwith : valFunc:('a -> ('a -> obj) []) -> strFunc:(string -> bool -> obj -> obj -> string) -> separator:string -> header:bool -> flatten:bool -> data:seq<'a> -> seq<string>
static member fromFile : filePath:string -> seq<string>
static member fromFileWithCsvSchema : filePath:string * separator:char * firstLineHasHeader:bool * ?skipLines:int * ?skipLinesBeforeHeader:int * ?schemaMode:SchemaModes -> seq<'schema>
static member fromFileWithSep : separator:char -> filePath:string -> seq<string []>
static member stringFunction : separator:string -> flatten:bool -> input:'a -> (obj -> string)
static member valueFunction : dataEntry:'a -> ('a -> obj) []
static member write : path:string -> data:seq<'a> -> unit
static member writeOrAppend : path:string -> data:seq<'a> -> unit