Interested? Contact muehlhaus@bio.uni-kl.de or ottj@rhrk.uni-kl.de
Sequencing by Hybridization (SBH) involves building a miniature DNA array (or DNA chip), that contains thousands of short DNA fragments (probes). Each of these probes gives information about the presence
of a known, but short, sequence in the unknown DNA sequence. All these pieces of information together should reveal the identity of the target DNA sequence.
Given a short probe (an 8- to 30-nucleotide single-stranded synthetic DNA fragment) and a single-stranded target DNA fragment, the target will hybridize
with the probe if the probe is a substring of the target's complement. When the probe and the target are mixed together, they form a
weak chemical bond and stick together. For example, a probe ACCGTGGA will hybridize to a target CCCTGGCACCTA since it is complementary to the substring TGGCACCT of the target.
Given an unknown DNA sequence, an array provides information about all strings of length l (l-mer) that the sequence contains, but does not provide information
about their positions in the sequence. For example, the 8-mer composition of CCCTGGCACCTA is {CCCTGGCA, CCTGGCAC, CTGGCACC, TGGCACCT,GGCACCTA}
The reduction of the SBH problem to an Eulerian Path problem is to construct a graph whose edges,
rather than vertices, correspond to those l-mers, and then to find a path in this graph visiting every edge exactly once. Paths visiting ALL EDGES correspond to sequence reconstructions.
- Blog post introducing the method, its applications, and limitations.
- Implement a function which creates a directed graph from a given set of l-mers
Implement Hierholzer's algorithm for finding Eulerian paths in directed graphs
- Create a F# script file (.fsx) and paste the following text at the top of your file:
#r "nuget: FSharpAux, 1.1.0"
#r "nuget: BioFSharp, 2.0.0-preview.1"
#r "nuget: FSharp.FGL, 0.0.2"
#r "nuget: Cyjs.NET, 0.0.4"
open FSharpAux
open BioFSharp
open FSharp.FGL
open FSharp.FGL.Directed
open Cyjs.NET
- Vertices correspond to (l-1)-mers
- Edges correspond to l-mers from the spectrum
- All functions should operate on either
BioArray
, BioList
or BioSeq
- Start with any Nucleotide array
let sampleSequence: array<Nucleotide> =
"ATGGCGTGCA"
|> BioArray.ofNucleotideString
- Compute the l-mers of the Nucleotide array (in this case 3-mers)
let lMers3: array<array<Nucleotide>> =
[|[|A;T;G|]; [|T;G;G|]; [|T;G;C|]; [|G;T;G|]; [|G;G;C|]; [|G;C;A|]; [|G;C;G|]; [|C;G;T|]|]
- Initialize a directed graph
let graph: Graph<int,array<Nucleotide>,array<Nucleotide>> = Graph.empty
- Create a list with vertices based on the l-mers and add them to the graph
let vertices: list<int*array<Nucleotide>> =
[(1, [|A; T|]); (2, [|T; G|]); (3, [|G; T|]); (4, [|G; G|]); (5, [|G; C|]);(6, [|C; G|]); (7, [|C; A|])]
let graphWithVertices: Graph<int,array<Nucleotide>,array<Nucleotide>> =
Vertices.addMany vertices graph
- Create a list with edges based in the vertices and l-mers and add them to the graph
let edges: list<int*int*array<Nucleotide>>=
[(1, 2, [|A; T; G|]); (2, 4, [|T; G; G|]); (2, 5, [|T; G; C|]); (3, 2, [|G; T; G|]);
(4, 5, [|G; G; C|]); (5, 7, [|G; C; A|]); (5, 6, [|G; C; G|]); (6, 3, [|C; G; T|])]
let graphWithEdges: Graph<int,array<Nucleotide>,array<Nucleotide>> =
Edges.addMany edges graphWithVertices
- You can visualize the graph using Cyjs.NET
let inline toCyJS (g : Graph<'Vertex,array<Nucleotide>,array<Nucleotide>>) =
let vertices =
g
|> Vertices.toVertexList
|> List.map (fun (id,name) ->
Elements.node (string id) [CyParam.label (name |> BioArray.toString)]
)
let edges =
g
|> Edges.toEdgeList
|> List.map (fun (v1,v2,weight) ->
Elements.edge (string v1 + string v2) (string v1) (string v2) [CyParam.weight (weight |> BioArray.toString)]
)
CyGraph.initEmpty ()
|> CyGraph.withElements vertices
|> CyGraph.withElements edges
|> CyGraph.withLayout (Layout.initBreadthfirst id)
|> CyGraph.withStyle "node" [CyParam.content =. CyParam.label]
|> CyGraph.withStyle "edge"
[
CyParam.Curve.style "bezier"
CyParam.opacity 0.666
CyParam.width <=. (CyParam.weight, 70, 100, 5, 5)
CyParam.Target.Arrow.shape "triangle"
CyParam.Line.color =. CyParam.color
CyParam.Target.Arrow.color =. CyParam.color
CyParam.Source.Arrow.color =. CyParam.color
]
graphWithEdges
|> toCyJS
No value returned by any evaluator
|
-
This graph is semibalanced (|indegree - outdegree| = 1)
. If a graph has an Eulerian path starting at vertex s and ending at vertex t, then all its vertices are balanced,
with the possible exception of s and t, which may be semibalanced.
- The Eulerian path problem can be reduced to the Eulerian cycle problem by adding an edge between two semibalanced vertices.
-
Choose any starting vertex, and follow a path along the unused edges from that vertex until you return to it. You will alway return to the starting vertex in a balanced Eulerian graph.
Since every vertex has indegreee = outdegree
, there is always an unused edge to leave the current vertex. The path found by doing this is a closed tour, starting and ending at the same vertex,
but not necessarily covering all vertices and edges.
- As long as there exists a vertex in the current closed tour that has unused edges, you can start finding a new closed tour and join it with the previously found tour.
- Since we assume the original graph is connected, repeating the previous step will cover all edges of the graph.
Implement the algorithm, so that it works on a FSharp.FGL.Graph
- Returns all outward edges of vertex 1
Graph.getContext 1 graphWithEdges
|> Vertices.outwardEdges
- Returns all inward edges of vertex 2
Graph.getContext 2 graphWithEdges
|> Vertices.inwardEdges
[(1, 2, [|A; T; G|]); (3, 2, [|G; T; G|])]
|
- Returns all inward edges for all vertices in the graph
Graph.mapContexts Vertices.inwardEdges graphWithEdges
map
[(1, []); (2, [(1, 2, [|A; T; G|]); (3, 2, [|G; T; G|])]);
(3, [(6, 3, [|C; G; T|])]); (4, [(2, 4, [|T; G; G|])]);
(5, [(2, 5, [|T; G; C|]); (4, 5, [|G; G; C|])]); (6, [(5, 6, [|G; C; G|])]);
(7, [(5, 7, [|G; C; A|])])]
|
- Returns all outward edges for all vertices in the graph
Graph.mapContexts Vertices.outwardEdges graphWithEdges
map
[(1, [(1, 2, [|A; T; G|])]); (2, [(2, 4, [|T; G; G|]); (2, 5, [|T; G; C|])]);
(3, [(3, 2, [|G; T; G|])]); (4, [(4, 5, [|G; G; C|])]);
(5, [(5, 6, [|G; C; G|]); (5, 7, [|G; C; A|])]); (6, [(6, 3, [|C; G; T|])]);
(7, [])]
|
- Returns the adjacency matrix representation of the graph
Graph.toAdjacencyMatrix graphWithEdges
- Implement a function that returns an array of l-mers for any given DNA-Sequence
- Generate an Eulerian Graph based on the returned set of l-mers
- Compare the reconstructed sequence to the original sequence
- Optional: Implement a function that performs the steps above for a large number of sequences with varying l-mer lengths
- Introduction about SBH, and how it is used today
- Describe the limits and weaknesses of the approach in your blog post
- Compare Hierholzer's algorithm to Fleury's algorithm
namespace FSharpAux
namespace BioFSharp
Multiple items
namespace FSharp
--------------------
namespace Microsoft.FSharp
namespace FSharp.FGL
namespace FSharp.FGL.Directed
namespace Cyjs
namespace Cyjs.NET
module Nucleotides
from BioFSharp
<summary>
Contains the Nucleotide type and its according functions.
</summary>
val sampleSequence : Nucleotide array
type 'T array = 'T []
<summary>Single dimensional, zero-based arrays, written <c>int[]</c>, <c>string[]</c> etc.</summary>
<remarks>Use the values in the <see cref="T:Microsoft.FSharp.Collections.ArrayModule" /> module to manipulate values
of this type, or the notation <c>arr.[x]</c> to get/set array
values.</remarks>
<category>Basic Types</category>
type Nucleotide =
| A
| T
| G
| C
| U
| I
| Gap
| Ter
| R
| Y
...
interface IBioItem
static member op_Explicit : value:#IBioItem -> byte + 1 overload
<summary>
Nucleotide Codes
</summary>
Multiple items
module BioArray
from BioFSharp.BioCollectionsExtensions
--------------------
module BioArray
from BioFSharp
<summary>
This module contains the BioArray type and its according functions. The BioArray type is an array of objects using the IBioItem interface
</summary>
val ofNucleotideString : s:#seq<char> -> BioArray.BioArray<Nucleotide>
<summary>
Generates nucleotide sequence of one-letter-code raw string
</summary>
val lMers3 : Nucleotide array array
union case Nucleotide.A: Nucleotide
<summary>
A : Adenine
</summary>
union case Nucleotide.T: Nucleotide
<summary>
T : Thymidine (only DNA)
</summary>
union case Nucleotide.G: Nucleotide
<summary>
G : Guanine
</summary>
union case Nucleotide.C: Nucleotide
<summary>
C : Cytosine
</summary>
val graph : Graph<int,Nucleotide array,Nucleotide array>
Multiple items
module Graph
from FSharp.FGL.Directed
--------------------
module Graph
from FSharp.FGL
<summary>
General functions for both directed and undirected graphs
</summary>
--------------------
module Graph
from FSharpAux
--------------------
type Graph<'Vertex,'Label,'Edge (requires comparison)> = Map<'Vertex,MContext<'Vertex,'Label,'Edge>>
<summary>
Map of Vertices as keys and MContexts as values
</summary>
Multiple items
val int : value:'T -> int (requires member op_Explicit)
<summary>Converts the argument to signed 32-bit integer. This is a direct conversion for all
primitive numeric types. For strings, the input is converted using <c>Int32.Parse()</c>
with InvariantCulture settings. Otherwise the operation requires an appropriate
static conversion method on the input type.</summary>
<param name="value">The input value.</param>
<returns>The converted int</returns>
--------------------
[<Struct>]
type int = int32
<summary>An abbreviation for the CLI type <see cref="T:System.Int32" />.</summary>
<category>Basic Types</category>
--------------------
type int<'Measure> =
int
<summary>The type of 32-bit signed integer numbers, annotated with a unit of measure. The unit
of measure is erased in compiled code and when values of this type
are analyzed using reflection. The type is representationally equivalent to
<see cref="T:System.Int32" />.</summary>
<category>Basic Types with Units of Measure</category>
val empty : Graph<'Vertex,'Label,'Edge> (requires comparison)
<summary>
Creates a new, empty graph.
</summary>
val vertices : (int * Nucleotide array) list
type 'T list = List<'T>
<summary>The type of immutable singly-linked lists. </summary>
<remarks>See the <see cref="T:Microsoft.FSharp.Collections.ListModule" /> module for further operations related to lists.
Use the constructors <c>[]</c> and <c>::</c> (infix) to create values of this type, or
the notation <c>[1; 2; 3]</c>. Use the values in the <c>List</c> module to manipulate
values of this type, or pattern match against the values directly.
See also <a href="https://docs.microsoft.com/dotnet/fsharp/language-reference/lists">F# Language Guide - Lists</a>.
</remarks>
val graphWithVertices : Graph<int,Nucleotide array,Nucleotide array>
Multiple items
module Vertices
from FSharp.FGL.Directed
<summary>
Functions for vertices of directed Graphs
</summary>
--------------------
module Vertices
from FSharp.FGL
<summary>
Functions for vertices of both directed and undirected graphs
</summary>
val addMany : vertices:LVertex<'Vertex,'Label> list -> g:Graph<'Vertex,'Label,'Edge> -> Graph<'Vertex,'Label,'Edge> (requires comparison)
<summary>
Adds a list of labeled vertices to the graph.
</summary>
val edges : (int * int * Nucleotide array) list
val graphWithEdges : Graph<int,Nucleotide array,Nucleotide array>
module Edges
from FSharp.FGL.Directed
<summary>
Functions for edges of directed Graphs
</summary>
val addMany : edges:LEdge<'Vertex,'Edge> list -> g:Graph<'Vertex,'Label,'Edge> -> Graph<'Vertex,'Label,'Edge> (requires comparison)
<summary>
Adds a list of labeled, directed edges to the graph.
</summary>
val toCyJS : g:Graph<'Vertex,Nucleotide array,Nucleotide array> -> CyGraph.CyGraph (requires comparison)
val g : Graph<'Vertex,Nucleotide array,Nucleotide array> (requires comparison)
val vertices : Elements.Node list
val toVertexList : g:Graph<'Vertex,'Label,'Edge> -> LVertex<'Vertex,'Label> list (requires comparison)
<summary>
Creates a list of all vertices and their labels.
</summary>
Multiple items
module List
from FSharpAux
--------------------
module List
from Microsoft.FSharp.Collections
<summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.list`1" />.</summary>
<namespacedoc><summary>Operations for collections such as lists, arrays, sets, maps and sequences. See also
<a href="https://docs.microsoft.com/dotnet/fsharp/language-reference/fsharp-collection-types">F# Collection Types</a> in the F# Language Guide.
</summary></namespacedoc>
--------------------
type List<'T> =
| ( [] )
| ( :: ) of Head: 'T * Tail: 'T list
interface IReadOnlyList<'T>
interface IReadOnlyCollection<'T>
interface IEnumerable
interface IEnumerable<'T>
member GetReverseIndex : rank:int * offset:int -> int
member GetSlice : startIndex:int option * endIndex:int option -> 'T list
static member Cons : head:'T * tail:'T list -> 'T list
member Head : 'T
member IsEmpty : bool
member Item : index:int -> 'T with get
...
<summary>The type of immutable singly-linked lists.</summary>
<remarks>Use the constructors <c>[]</c> and <c>::</c> (infix) to create values of this type, or
the notation <c>[1;2;3]</c>. Use the values in the <c>List</c> module to manipulate
values of this type, or pattern match against the values directly.
</remarks>
<exclude />
val map : mapping:('T -> 'U) -> list:'T list -> 'U list
<summary>Builds a new collection whose elements are the results of applying the given function
to each of the elements of the collection.</summary>
<param name="mapping">The function to transform elements from the input list.</param>
<param name="list">The input list.</param>
<returns>The list of transformed elements.</returns>
val id : 'Vertex (requires comparison)
val name : Nucleotide array
Multiple items
module Elements
from Cyjs.NET
--------------------
module Elements
from BioFSharp
<summary>
Contains chemical elements represented as a mixture of their stable isotopes and functionality for building them
</summary>
val node : id:string -> dataAttributes:CyParam.CyStyleParam list -> Elements.Node
Multiple items
val string : value:'T -> string
<summary>Converts the argument to a string using <c>ToString</c>.</summary>
<remarks>For standard integer and floating point values the and any type that implements <c>IFormattable</c><c>ToString</c> conversion uses <c>CultureInfo.InvariantCulture</c>. </remarks>
<param name="value">The input value.</param>
<returns>The converted string.</returns>
--------------------
type string = System.String
<summary>An abbreviation for the CLI type <see cref="T:System.String" />.</summary>
<category>Basic Types</category>
module CyParam
from Cyjs.NET
val label : v:'a -> CyParam.CyStyleParam
val toString : bs:BioArray.BioArray<#IBioItem> -> string
<summary>
Returns string of one-letter-code
</summary>
val edges : Elements.Edge list
val toEdgeList : g:Graph<'Vertex,'Label,'Edge> -> LEdge<'Vertex,'Edge> list (requires comparison)
<summary>
Creates a list of all edges and their labels.
</summary>
val v1 : 'Vertex (requires comparison)
val v2 : 'Vertex (requires comparison)
val weight : Nucleotide array
val edge : id:string -> sourceId:string -> targetId:string -> dataAttributes:CyParam.CyStyleParam list -> Elements.Edge
val weight : v:'a -> CyParam.CyStyleParam
module CyGraph
from Cyjs.NET
val initEmpty : unit -> CytoscapeModel.Cytoscape
val withElements : elems:seq<CytoscapeModel.Element> -> cy:CyGraph.CyGraph -> CyGraph.CyGraph
val withLayout : ly:Layout -> cy:CyGraph.CyGraph -> CyGraph.CyGraph
Multiple items
module Layout
from Cyjs.NET
--------------------
type Layout =
inherit DynamicObj
new : name:string -> Layout
member name : string
<summary>
Layout type inherits from dynamic object
</summary>
--------------------
new : name:string -> Layout
val initBreadthfirst : applyOption:(Layout -> Layout) -> Layout
<summary>
initializes a layout of type "breadthfirst" applying the givin layout option function.
The "breadthfirst" layout puts nodes in a hierarchy, based on a breadthfirst traversal of the graph. It is best suited to trees
and forests in its default top-down mode, and it is best suited to DAGs in its circle mode.
</summary>
val id : x:'T -> 'T
<summary>The identity function</summary>
<param name="x">The input value.</param>
<returns>The same value.</returns>
val withStyle : selector:string -> cyStyles:seq<CyParam.CyStyleParam> -> cy:CyGraph.CyGraph -> CyGraph.CyGraph
val content : v:'a -> CyParam.CyStyleParam
module Curve
from Cyjs.NET.CyParam
val style : v:'a -> CyParam.CyStyleParam
val opacity : v:'a -> CyParam.CyStyleParam
val width : v:'a -> CyParam.CyStyleParam
module Target
from Cyjs.NET.CyParam
module Arrow
from Cyjs.NET.CyParam.Target
val shape : v:'a -> CyParam.CyStyleParam
module Line
from Cyjs.NET.CyParam
val color : v:'a -> CyParam.CyStyleParam
module Source
from Cyjs.NET.CyParam
module Arrow
from Cyjs.NET.CyParam.Source
val withSize : width:int * height:int -> cy:CyGraph.CyGraph -> CyGraph.CyGraph
module HTML
from Cyjs.NET
<summary>
HTML template for Cytoscape
</summary>
val toEmbeddedHTML : cy:CytoscapeModel.Cytoscape -> string
<summary>
Converts a CyGraph to it HTML representation and embeds it into a html page.
</summary>
val getContext : v:'Vertex -> g:Graph<'Vertex,'Label,'Edge> -> Context<'Vertex,'Label,'Edge> (requires comparison)
<summary>
Lookup a context in the graph. Raising KeyNotFoundException if no binding exists in the graph.
</summary>
val outwardEdges : Adj<'Vertex,'Edge> * 'Vertex * 'Label * Adj<'Vertex,'Edge> -> LEdge<'Vertex,'Edge> list (requires comparison)
<summary>
Returns all edges pointing away from the given vertex
</summary>
val inwardEdges : Adj<'Vertex,'Edge> * 'Vertex * 'Label * Adj<'Vertex,'Edge> -> LEdge<'Vertex,'Edge> list (requires comparison)
<summary>
Returns all edges pointing to the given vertex
</summary>
val mapContexts : mapping:(Context<'Vertex,'Label,'Edge> -> 'T) -> g:Graph<'Vertex,'Label,'Edge> -> Map<'Vertex,'T> (requires comparison)
<summary>
Maps contexts of the graph.
</summary>
val toAdjacencyMatrix : g:Graph<'Vertex,'Label,'Edge> -> 'Edge [] [] (requires comparison)
<summary>
Transforms a graph into an adjacency matrix of its edges.
</summary>