bio_db_repo


Data package for bio_db

See bio_db for more information: bio_db

bio_db_repo allows to download all the datasets of bio_db in one go.
Once the Prolog fact files are downloaded, the rest of the formats can be generated from those.
To use the DB interface: rocksdb, proSQLite and Berkeley, you need to have the necessary libraries installed.

bio_db_repo is versioned as 17.10.13 where 17 is the year (2017) 10th is the month and 13 is the day of publication date.
There are normally 2 releases: one in early autumn (Sept/Oct) and one in early spring (March).

The data releases for at least the past 2 years are kept on the server: packs/bio_db_repo
For instance the datasets for the 24.4.5 (current at time of writing) release can be found at: bio_db_repo-24.4.5.tgz

install

SWI

bio_db_repo can be installed from within SWI using its package manager.


?- pack_install(bio_db_repo).

But be warned, this will download a 246Mb .tgz file and gunzip it in your local filestore.
Individual bio_db table predicates will remain in compressed format until their first usage.

If accessed via the Prolog intefrace the interpreter will also auto-create a .qlf file that speed-up subsequent loads to memory. If all tables are accessed then the .pl and .qlf files will expand to a total of 3.1Gb. For instance via

?- use_module( library(bio_db) ).
?- map_hgnc_hgnc_symb( Hgnc, 'LMTK3' ).
% prolog DB:table hgnc:map_hgnc_hgnc_symb/2 is not installed, but the zipped prolog db exists. Shall it be created from this (Y/n) ?
% Continuing with: yes
Hgnc = 19295.
To test:

?- use_module( library(bio_db) ).
true.

If you haven't installed bio_db_repo, datasets will be downloaded on demand:

?- use_module( library(bio_db) ).
true.

?- map_hgnc_hgnc_symb( Hgnc, 'LMTK3' ).
% prolog DB:table hgnc:map_hgnc_hgnc_symb/2 is not installed, do you want to download (Y/n) ?
% Continuing with: yes
% Downloading dataset from server: http://stoics.org.uk/~nicos/sware/packs/bio_db_repo/data
% Delete the zip file: '/usr/local/users/nicos/local/git/lib/swipl-7.7.11/pack/bio_db_repo/data/maps/hgnc/map_hgnc_hgnc_symb.pl.zip' (y/N) ?
% Continuing with: yes
Hgnc = 19295.

?- bio_db_info( Iface, map_hgnc_hgnc_symb/2, Key, Value), write( Iface:Key:Value ), nl, fail.
prolog:source_url:ftp://ftp.ebi.ac.uk/pub/databases/genenames/hgnc_complete_set.txt.gz
prolog:datetime:datetime(2018,3,30,15,25,32)
prolog:data_types:data_types(integer,atom)
prolog:unique_lengths:unique_lengths(45744,45744,45744)
prolog:relation_type:relation_type(1,1)
prolog:header:row(HGNC ID,Approved Symbol)
false.

materials

Docs, also available in distribution directory doc/html/
Sources for SWI packager (and individual data table files): bio_db_repo
bio_db page: bio_db

publications

Accessing biological data as Prolog facts
Nicos Angelopoulos and Jan Wielemaker
Proceedings of the 19th International Symposium on Principles and Practice of Declarative Programming (PPDP 2017)
Pages 29-38. Namur, Belgium. October 9-11, 2017
URL: ACM.org DOI:10.1145/3131851.3131857
slides: ppdp.pdf

A logical approach to working with biological databases
Nicos Angelopoulos and Georgios Giamas
Proceedings of the International Conference on Logic Programming
Accepted as a technical communication, Cork, September 2015.
[paper]

contact

We welcome comments on use cases. Particularly, applications and publications that use this pack.
We also welcome bug reports and fixes.
For contact details see: contact

author

Nicos Angelopoulos

---
London,
March, 2018

Last update: April 2018.