bio_db


Access, use and manage big, biological datasets.

Bio_db gives access to pre-packed biological databases and simplifies management and translation of biological data to Prolog friendly formats. There are currently 2 major types of data supported: maps, and graphs. Maps define product mappings, translations and memberships, while graphs define interactions which can be visualised as weighed graphs. Bio_db itself, does not come with the datasets. You can either download all of them in Prolog form, as a single pack, pack(bio_db_repo) which contains all of the Prolog datasets (246Mb of compressed data, can expand to 3.1 Gb for a Prolog only access scenario that interrogates all tables), or let auto-downloading retrieve the datasets serving each of the data predicates as you query them. Auto-downloading works transparently to the user, where a data set is downloaded by simply calling the predicate. See the documentation for further details (static html available here:html).

Bio_db works on current versions of SWI and Yap.
It serves re-packaged high quality bio-data in formats that are Prolog friendly: Prolog facts and database back ends. SQLite, Berkeley and ROcksDB databases are supported. The method used to serve the data is transparent to the user and depends on a single set-up call. The user can access the data as Prolog facts irrespective of how the library will serve the data.

install

SWI

bio_dp can be installed from within SWI using its package manager.

?- pack_install(bio_db).

And then load by :

?- use_module( library(lib) ).
?- lib( bio_db ).

If you do not have bio_db_repo, individual data sets will be downloaded when you first try to querry them:

    ?- map_hgnc_hgnc_symb( 19295, Symb ). 

% prolog DB:table hgnc:map_hgnc_hgnc_symb/2 is not installed, do you want to download it (Y/n) ? 
% Continuing with: yes
% Downloading dataset from server: http://stoics.org.uk/~nicos/sware/packs/bio_db_repo/data
% Delete the zip file: '/usr/local/users/nicos/local/git/lib/swipl/pack/bio_db_repo/data/hs/maps/hgnc/map_hgnc_hgnc_symb.pl.zip' (y/N) ? 
% Continuing with: yes

Symb = 'LMTK3'.

compositional interface

Version 2.0 depends on lib 2.0 which allows packs to be composed of a hierarchy of "cells".
Cell composed modules depend on a core base which is as per usual situated in prolog/Pack.pl and a number of cell files located in cell/Cell.pl.
Cells can be hierarchical decomposed in subdirectories.

The new features in lib 2.0 were driven by the desire to include organism based bio_db cells.
Previously it served only human data. As of version 2.0, data for mouse is also included.
The new architecture makes it trivial to add data from other organisms (and databases within served organisms).


?- lib( & bio_db ).
only loads the skeleton module interface.

either of the following two
?- use_module( library(bio_db) )
. ?- lib( bio_db ).
provide access to all data predicates of the library.
[Only hot-swappable code is loaded in bio_db at this point.]


?- lib( & bio_db(mouse) ).
access to all mouse data.


?- lib( & bio_db(hs(hgnc)) ).
access to the HGNC datasets for human.

To see all available predicates along with their organism and primary cell:
?- bio_db_data_predicate( Pa, Pn, Org, Cell ).
Pa = map_ense_ensg_chrl,
Pn = 5,
Org = hs,
Cell = 'hs/ense.pl' ;
...

data

All the scripts for downloading the primary data and converting them to Prolog
facts are included in the sources (auxil/build_repo).
Assuming the upsh executable is in the path
a single query can be used to prepare all dataset (only tested on Linux).

?- ['auxil/build_repo/std_repo'], std_repo.

statistics

?- [pack('bio_db/examples/tables_stats')].

?- bio_db_stats.
% edge_gont_is_a/2 has 77155 records.
% map_hgnc_ensg_hgnc/2 has 37647 records.
% edge_gont_regulates/2 has 3573 records.
% map_hgnc_ccds_hgnc/2 has 19019 records.
% map_mgim_mouse_mgim_genb/2 has 276279 records.
% map_hgnc_entz_symb/2 has 41496 records.
% map_mgim_mouse_mgim_symb/2 has 300103 records.
...
% map_hgnc_hgnc_name/2 has 46016 records.
% edge_gont_positively_regulates/2 has 3103 records.

% Total number of predicates: 67, and records: 38710918
% ...halting as all predicates have been retracted.

help

bio_db depends on couple of other stoics.org.uk libraries and it is actively developed and supported.
If you have any problems with installation or useage we are happy to be contacted at: bio_db_contact or you can post a message at SWI forum.

Test on-line

Jan Wielemaker has kindly created a web-page giving full access to the library's datasets: swish access

materials

Html docs. Also available in distribution directory doc/html/

publications

Accessing biological data as Prolog facts
Nicos Angelopoulos and Jan Wielemaker
Proceedings of the 19th International Symposium on Principles and Practice of Declarative Programming (PPDP 2017)
Pages 29-38. Namur, Belgium. October 9-11, 2017
URL: ACM.org DOI:10.1145/3131851.3131857
slides: ppdp.pdf

A logical approach to working with biological databases
Nicos Angelopoulos and Georgios Giamas
Proceedings of the International Conference on Logic Programming
Accepted as a technical communication, Cork, September 2015.
[paper]

contact

We welcome comments on use cases. Particularly, applications and publications that use this pack.
We also welcome bug reports and fixes.
For contact details see: contact

author

Nicos Angelopoulos

---
London,
November, 2018