An ontology is a formal specification of the terms in the domain (taxonomy) and relations among them. It defines a common vocabulary for researchers. These can be used by domain experts to share and annotate information in their fields, to categorize observations, and to extract and aggregate information from a huge data sets in an semi-automated manner. The main components of an ontology are classes - the entities within the domain - and roles which are the properties of the classes. To fill this abstract definition of an ontology with something tangible, in the following an ontology for bioentities is generated.
A class of Protein represent all proteins. Cytochrom c oxidase (COX) is an instance of the Protein class. A class can have several subclasses, e.g. Membrane protein/Globular protein or Enzymatic/Structural etc. that are more specific than the superclass. Consequently every instance of class Membrane protein also is an instance of class 'Protein. Roles describe properties of the single instances. A protein e.g. could have the roles (i) mass, (ii) sequence, (iii) localization, and (iv) full name. These are properties of all instances of the Protein class. The membrane protein subclass may have additional properties, that are specific to this kind of protein, e.g. number of transmembrane domains. The value types of the roles could be primitive types (string, int, float), or instances of another class. Compartment for example could be the value of field localization. Compartment itself could have several roles and relationships.
Fig. 1: Some classes, instances and relations of the cell domain. The interdependence (inverse relation) of COX and Mitochondrion (blue arrows) is not mandatory.
The cardinality defines how many values a role can have. The mass is distinct and the same for a single protein indicating a single cardinality, while present-in is a multi-cardinality role of the Compartment class. Sometimes is can necessary to set the cardinality of a subclass role to zero. If an instance can be assigned to more than one ‘class branch’ an additional class can be defined, that has several superclasses. Consequently it inherits the roles from all its superclasses.
structural proteins
and small HSPs
)The Open Biological and Biomedical Ontologies (OBO) Foundry is a collection of ontologies relevant for life sciences. Additionally, obo defines a file format, that unifies the distribution of ontologies that is based on the principles of the Web Ontology Language (OWL).
The GeneOntology (GO) provides a uniform way to describe the functions of gene products from organisms across all kingdoms of life and thereby enable analysis of genomic data. It can be found here.
Example of a GO class in OBO format:
[Term]
id: GO:0000269
name: toxin export channel activity
namespace: molecular_function
def: "Enables the energy independent passage of toxins, sized less than 1000 Da, across a membrane towards the outside of the cell. The transmembrane portions of porins consist exclusively of beta-strands which form a beta-barrel. They are found in the outer membranes of Gram-negative bacteria, mitochondria, plastids and possibly acid-fast Gram-positive bacteria." [GOC:mtg_transport]
is_a: GO:0015288 ! porin activity
is_a: GO:0019534 ! toxin transmembrane transporter activity
A class is initialized with the [Term]
keyword. Roles are separated from its corresponding values by a colon. A term can be accessed by its unique id that points to this entry. You can see two is_a relationships of ‘toxin export channel activity’. Thereby it is clear that it represents a subclass of the two superclasses porin activity
and toxin transmembrane transporter activity
. If a protein is assigned to GO term 0000269 it automatically is a children of GO:0015288 and GO:0019534.
The goal is to visualize a subbranch of the gene ontology using BioFSharp.IO for parsing and Cyjs.NET for visualization. Parsing describes the process of analyzing a text according to a defined set of rules (e.g. the OBO file format convention).
string array
#r "nuget: FSharp.Data"
#r "nuget: BioFSharp.IO, 2.0.0-preview.3"
#r "nuget: Cyjs.NET, 0.0.4"
open FSharp.Data //required for data download
open BioFSharp.IO //required for parsing
open Cyjs.NET //required for visualization
let getRawString url =
Http.RequestString url
|> fun x -> x.Split '\n'
let goUrl = @"http://current.geneontology.org/ontology/go.obo"
let goRawData = getRawString goUrl
false
to seq<OboTerm>
into a Oboterm []
after parsinglet go =
BioFSharp.IO.Obo.parseOboTerms false goRawData
|> Seq.toArray
GO:0071704
as id
, relationship
or is_A
relationship and call the filtered triple sequence goElementslet getElements (terms:Obo.OboTerm[]) id =
terms
|> Array.filter (fun x ->
x.Id = id ||
List.exists (fun (rel : string) -> rel.Contains id) x.IsA
)
|> Array.map (fun x ->
let source = x.Id
let label = x.Name
//sometimes a relationship can be stored as dedicated relationship rather than a isA role
let target = List.append x.IsA x.Relationships
target
|> List.map (fun t ->
let singleTarget = t.Split ' ' |> Seq.head
label,source, singleTarget
)
)
|> Seq.concat
let goElements = getElements go "GO:0071704"
goElements
index | Item1 | Item2 | Item3 |
---|---|---|---|
0 | carbohydrate metabolic process | GO:0005975 | GO:0044238 |
1 | carbohydrate metabolic process | GO:0005975 | GO:0071704 |
2 | cellular aldehyde metabolic process | GO:0006081 | GO:0044237 |
3 | cellular aldehyde metabolic process | GO:0006081 | GO:0071704 |
4 | organic acid metabolic process | GO:0006082 | GO:0044237 |
5 | organic acid metabolic process | GO:0006082 | GO:0044281 |
6 | organic acid metabolic process | GO:0006082 | GO:0071704 |
7 | lipid metabolic process | GO:0006629 | GO:0044238 |
8 | lipid metabolic process | GO:0006629 | GO:0071704 |
9 | glutathione metabolic process | GO:0006749 | GO:0071704 |
10 | flavonoid metabolic process | GO:0009812 | GO:0071704 |
11 | carbon fixation | GO:0015977 | GO:0071704 |
12 | ether metabolic process | GO:0018904 | GO:0044281 |
13 | ether metabolic process | GO:0018904 | GO:0071704 |
14 | dimethyl sulfoxide metabolic process | GO:0018907 | GO:0006790 |
15 | dimethyl sulfoxide metabolic process | GO:0018907 | GO:0071704 |
16 | nitroglycerin metabolic process | GO:0018937 | GO:0006805 |
17 | nitroglycerin metabolic process | GO:0018937 | GO:0006807 |
18 | nitroglycerin metabolic process | GO:0018937 | GO:0071704 |
19 | 2-nitropropane metabolic process | GO:0018938 | GO:0006805 |
... (more) |
let getCytoVertices (input: seq<string*string*string>) =
input
|> Seq.collect (fun (l,s,t) ->
let stylingSource = [CyParam.label (sprintf "%s %s" l s); CyParam.weight 24]
let stylingTarget = [CyParam.label (sprintf "%s %s" l t); CyParam.weight 24]
[|Elements.node s stylingSource;Elements.node t stylingTarget|]
)
|> Seq.distinct
let getCytoEdges (input: seq<string*string*string>)=
input
|> Seq.distinct
|> Seq.mapi (fun i (l,s,t) ->
let styling = [CyParam.weight 0.3]
Elements.edge ("e" + string i) s t styling
)
let goVertices = getCytoVertices goElements
let goEdges = getCytoEdges goElements
GO:0071704
? let cytoGraph vertices edges =
CyGraph.initEmpty ()
|> CyGraph.withElements vertices
|> CyGraph.withElements edges
|> CyGraph.withStyle "node"
[
CyParam.shape "circle"
CyParam.content =. CyParam.label
CyParam.Border.color "#A00975"
]
|> CyGraph.withStyle "edge"
[
CyParam.Line.color "#3D1244"
CyParam.Curve.style "bezier"
CyParam.Target.Arrow.shape "triangle"
]
|> CyGraph.withLayout (Layout.initBreadthfirst id)
// Send the cytograph to the browser
cytoGraph goVertices goEdges
|> CyGraph.withSize (1300,1000)
ms
ontology (OBO format) and subbranch MS:1000031
let msRawData =
getRawString @"https://raw.githubusercontent.com/HUPO-PSI/psi-ms-CV/master/psi-ms.obo"
let ms =
BioFSharp.IO.Obo.parseOboTerms false msRawData
|> Seq.toArray
let msElements = getElements ms "MS:1000031"
let msVertices = getCytoVertices msElements
let msEdges = getCytoEdges msElements
cytoGraph msVertices msEdges
|> CyGraph.withSize (1300,1000)
MS:1000121
Seq.append seq1 seq2
let msElements2 = getElements ms "MS:1000121"
let msVertices2 = getCytoVertices (Seq.append msElements msElements2)
let msEdges2 = getCytoEdges (Seq.append msElements msElements2)
cytoGraph msVertices2 msEdges2
|> CyGraph.withSize (1300,1000)