bio_db_repo allows to download all the datasets of bio_db in one go.
Once the Prolog fact files are downloaded, the rest of the formats can be generated
from those.
To use the DB interface: rocksdb, proSQLite and Berkeley,
you need to have the necessary libraries installed.
bio_db_repo is versioned as 17.10.13 where 17 is the year (2017) 10th is the month and 13 is the day of publication date.
There are normally 2 releases: one in early autumn (Sept/Oct) and one in early spring (March).
The data releases for at least the past 2 years are kept on the server:
packs/bio_db_repo
For instance the datasets for the 24.4.5 (current at time of writing) release can be found at:
bio_db_repo-24.4.5.tgz
bio_db_repo can be installed from within SWI using its
package manager.
?- pack_install(bio_db_repo).
But be warned, this will download a 246Mb .tgz file and gunzip it in your local filestore.
Individual bio_db table predicates will remain in compressed format until their first usage.
If accessed via the Prolog intefrace the interpreter will also auto-create a .qlf file that
speed-up subsequent loads to memory. If all tables are accessed then the .pl and .qlf files
will expand to a total of 3.1Gb.
For instance via
?- use_module( library(bio_db) ).
?- map_hgnc_hgnc_symb( Hgnc, 'LMTK3' ).
% prolog DB:table hgnc:map_hgnc_hgnc_symb/2 is not installed, but the zipped prolog db exists. Shall it be created from this (Y/n) ?
% Continuing with: yes
Hgnc = 19295.
To test:
?- use_module( library(bio_db) ).
true.
If you haven't installed bio_db_repo, datasets will be downloaded on demand:
?- use_module( library(bio_db) ).
true.
?- map_hgnc_hgnc_symb( Hgnc, 'LMTK3' ).
% prolog DB:table hgnc:map_hgnc_hgnc_symb/2 is not installed, do you want to download (Y/n) ?
% Continuing with: yes
% Downloading dataset from server: http://stoics.org.uk/~nicos/sware/packs/bio_db_repo/data
% Delete the zip file: '/usr/local/users/nicos/local/git/lib/swipl-7.7.11/pack/bio_db_repo/data/maps/hgnc/map_hgnc_hgnc_symb.pl.zip' (y/N) ?
% Continuing with: yes
Hgnc = 19295.
?- bio_db_info( Iface, map_hgnc_hgnc_symb/2, Key, Value), write( Iface:Key:Value ), nl, fail.
prolog:source_url:ftp://ftp.ebi.ac.uk/pub/databases/genenames/hgnc_complete_set.txt.gz
prolog:datetime:datetime(2018,3,30,15,25,32)
prolog:data_types:data_types(integer,atom)
prolog:unique_lengths:unique_lengths(45744,45744,45744)
prolog:relation_type:relation_type(1,1)
prolog:header:row(HGNC ID,Approved Symbol)
false.
Accessing biological data as Prolog facts
Nicos Angelopoulos and Jan Wielemaker
Proceedings of the 19th International Symposium on Principles and Practice of Declarative Programming (PPDP 2017)
Pages 29-38. Namur, Belgium. October 9-11, 2017
URL: ACM.org
DOI:10.1145/3131851.3131857
slides: ppdp.pdf
A logical approach to working with biological databases
Nicos Angelopoulos and Georgios Giamas
Proceedings of the International Conference on Logic Programming
Accepted as a technical communication, Cork, September 2015.
[paper]
Last update: April 2018.