A simple library for communicating with publication information servers: pub med and semantic scholar.
Currently allows (a) searching on conjunctions and disjunctions, (b) fetching the details of a paper
(c) the publications citing a paper, (d) publications cited by a paper, (e) simple reporting of fetched information
and (f) storing fethed information to local databases.
Since version 0.1 the library supports caching of the paper information on Prolog term or csv data files
and odbc connected or sqlite databases. Also as of 0.1 pub_graph is debug/1 aware. To see information regarding
the progress of execution, use
?- debug(pub_graph).
The pack requires the curl executable to be in the path. Only tested on Linux.
It is being developed on SWI-Prolog 6.1.8 and it should also work on Yap Prolog.
To install under SWI simply do
?- pack_install(pub_graph).
% and load with
?- use_module(library(pub_graph)).
The storing of paper and citation depends on db_facts and for sqlite connectivity on proSQlite (both available as SWI packs and from http://stoics.org.uk/~nicos/sware/)
ncbi (https://www.ncbi.nlm.nih.gov/pubmed/) and semscholar (http://semanticscholar.org/) are the known IdTypes.
The predicate does not connect to the server, it only type checks the shape of Id.
If Id is an integer or an atom that can be turned to an integer, then IdType is instantiated to ncbi.
There are three term forms for semscholar.
The following two ids correspond to the same paper.
?-
pub_graph_id( 12075665, Type ).
Type = ncbi.
?-
pub_graph_id( cbd251a03b1a29a94f7348f4f5c2f830ab80a909, Type ).
Type = semscholar.
?-
pub_graph_version(V,D).
V = 1:2:0,
D = date(2023, 9, 20).
Search in pub_graph for terms in the search term STerm.
In this, conjunction is marked by , (comma) and
disjunction by ; (semi-column). '-' pair terms are considered as
Key-Value and interpreted as Value[Key] in the query.
List are thought to be flat conjoint search terms with no pair values in them which are
interpreted by pub_graph also as OR operations.
(See example below.)
Known keys are : journal, pdat. au, All Fields
The predicate constructs a query that is posted via the http API provided
by NCBI (http://www.ncbi.nlm.nih.gov/books/NBK25500/).
Options should be a term or list of terms from:
ncbi terms: Title, Title/Abstract and Affiliation.
The higher the number the looser the match. The default allows for no intervening words, so only
exact sub-matches will be returned (see example: fixme below)
see: https://pubmed.ncbi.nlm.nih.gov/help/#proximity-searchingQTrans the actual query ran on the
the pub_graph server.Tmp is variable the file that was used
to receive the results from pub_graph.Keep==trueVerbose == true then the predicate is verbose about its progress by,
for instance, requesting query is printed on current output stream.
?-
St = (journal=science,[breast,cancer],pdat=2008),
pub_graph_search( St, Ids, [verbose(true),qtranslation(QTrans)] ),
length( Ids, Len ), write( number_of:Len ), nl,
pub_graph_summary_display( Ids, _, display(all) ).
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=100&term=science[journal]+AND+breast+cancer+AND+2008[pdat]
tmp_file(/tmp/swipl_3884_9)
number_of:6
----
1:19008416
Author=[Varambally S,Cao Q,Mani RS,Shankar S,Wang X,Ateeq B,Laxman B,Cao X,Jing X,Ramnarayanan K,Brenner JC,Yu J,Kim JH,Han B,Tan P,Kumar-Sinha C,Lonigro RJ,Palanisamy N,Maher CA,Chinnaiyan AM]
Title=Genomic loss of microRNA-101 leads to overexpression of histone methyltransferase EZH2 in cancer.
Source=Science
Pages=1695-9
PubDate=2008 Dec 12
Volume=322
Issue=5908
ISSN=0036-8075
PmcRefCount=352
PubType=Journal Article
FullJournalName=Science (New York, N.Y.)
----
2:18927361
Author=Couzin J
Title=Genetics. DNA test for breast cancer risk draws criticism.
Source=Science
...
...
...
6:18239125
Author=[Silva JM,Marran K,Parker JS,Silva J,Golding M,Schlabach MR,Elledge SJ,Hannon GJ,Chang K]
Title=Profiling essential genes in human mammary cells by multiplex RNAi screening.
Source=Science
Pages=617-20
PubDate=2008 Feb 1
Volume=319
Issue=5863
ISSN=0036-8075
PmcRefCount=132
PubType=Journal Article
FullJournalName=Science (New York, N.Y.)
----
St = (journal=science, [breast, cancer], pdat=2008),
Ids = ['19008416', '18927361', '18787170', '18487186', '18239126', '18239125'],
QTrans = ['("Science"[Journal] OR "Science (80- )"[Journal] OR "J Zhejiang Univ Sci"[Journal]) AND ("breast neoplasms"[MeSH Terms] OR ("breast"[All Fields] AND "neoplasms"[All Fields]) OR "breast neoplasms"[All Fields] OR ("breast"[All Fields] AND "cancer"[All Fields]) OR "breast cancer"[All Fields]) AND 2008[pdat]'],
Len = 6.
?-
date(Date),
St = (author='Borst Piet'),
pub_graph_search( St, Ids, verbose(true) ),
length( Ids, Len ), write( number_of:Len ), nl.
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=100&term=Borst%20Piet\[author\]
tmp_file(/tmp/swipl_18703_0)
number_of:83
Date = date(2018, 9, 22),
St = (author='Borst Piet'),
Ids = ['29894693', '29256493', '28821557', '27021571', '26774285', '26530471', '26515061', '25799992', '25662217'|...],
Len = 83.
?-
date(Date), pub_graph_search( prolog, Ids ),
length( Ids, Len ), write( number_of:Len ), nl.
number_of:100
Date = date(2018, 9, 22),
Ids = ['30089663', '28647861', '28486579', '27684214', '27142769', '25509153', '24995073', '22586414', '22462194'|...],
Len = 100.
?-
date(Date), pub_graph_search( prolog, Ids, retmax(200) ),
length( Ids, Len ), write( number_of:Len ), nl.
number_of:127
Date = date(2018, 9, 22),
Ids = ['30089663', '28647861', '28486579', '27684214', '27142769', '25509153', '24995073', '22586414', '22462194'|...],
Len = 127.
?-
St = ('breast','cancer','Publication Type'='Review'),
date(Date), pub_graph_search( St, Ids, reldate(30) ),
length( Ids, Len ).
Date = date(2018, 9, 22),
Ids = ['30240898', '30240537', '30240152', '30238542', '30238005', '30237735', '30236642', '30236594', '30234119'|...],
Len = 100.
?-
pub_graph_summary_display( 30243159, _, true ).
----
1:30243159
Author=[Wang K,Yee C,Tam S,Drost L,Chan S,Zaki P,Rico V,Ariello K,Dasios M,Lam H,DeAngelis C,Chow E]
Title=Prevalence of pain in patients with breast cancer post-treatment: A systematic review.
----
true.
Version 0:3 (pub_graph_version(1:2:0,_D)).
?-
date(Date), pub_graph_search(title='Bayesian networks elucidate', Ids, true), length(Ids,Len).
Date = date(2023, 9, 20),
Ids = ['35379892'],
Len = 1.
?-
date(Date), pub_graph_search(title='Bayesian elucidate', Ids, true), length(Ids,Len).
Date = date(2023, 9, 20),
Ids = [],
Len = 0.
?-
date(Date), pub_graph_search(title='Bayesian elucidate', Ids, gap(1)), length(Ids, Len), pub_graph_summary_display(Ids, _, true).
----
1:35379892
Author=[Angelopoulos N,Chatzipli A,Nangalia J,Maura F,Campbell PJ]
Title=Bayesian networks elucidate complex genomic landscapes in cancer.
----
Date = date(2023, 9, 20),
Ids = ['35379892'],
Len = 1.
?-
date(D),
write('Appears in abstract: "explainable Artificial Intelligence models"'), nl,
pub_graph_search('Title/Abstract'='explainable Artificial Intelligence models', Ids, true),
pub_graph_summary_display(Ids).
1
...
10:32417928
Author=[Payrovnaziri SN,Chen Z,Rengifo-Moreno P,Miller T,Bian J,Chen JH,Liu X,He Z]
Title=Explainable artificial intelligence models using real-world electronic health record data: a systematic scoping review.
?-
date(D), pub_graph_search('Title/Abstract'='explainable Intelligence models', Ids, true).
D = date(2023, 9, 20),
Ids = [].
?-
date(D), pub_graph_search((tiab='explainable Intelligence models',affiliation=sanger), Ids, gap(1)).
D = date(2023, 9, 20),
Ids = ['35379892'].
Also 0:3 added quote_value(Qv). Compare:
?- date(Date), pub_graph_search(title='Bayesian networks elucidate', Ids, true), length(Ids,Len). Date = date(2023, 9, 20), Ids = ['35379892'], Len = 1. ?- date(Date), pub_graph_search(title='Bayesian networks elucidate', Ids, quote_value(false)), length(Ids,Len). Date = date(2023, 9, 20), Ids = ['35923659', '35379892', '32609725', '29055062', '27303742', '26362267'], Len = 6.
pub_graph_summary_display( Ids, _Summary, [] ).pub_graph_summary_display( Ids, Summary, [] ).Opts
Ids.
Disp values of var(Disp), '*' and 'all', list all available values.
?-
date(Date),
pub_graph_search((programming,'Prolog'), Ids),
length( Ids, Len),
Ids = [A,B,C|_], pub_graph_summary_display( [A,B,C] ).
----
1:28486579
Author=[Holmes IH,Mungall CJ]
Title=BioMake: a GNU make-compatible utility for declarative workflow management.
----
2:24995073
Author=[Melioli G,Spenser C,Reggiardo G,Passalacqua G,Compalati E,Rogkakou A,Riccio AM,Di Leo E,Nettis E,Canonica GW]
Title=Allergenius, an expert system for the interpretation of allergen microarray results.
----
3:22215819
Author=[Mørk S,Holmes I]
Title=Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.
----
Date = date(2018, 9, 22),
Ids = ['28486579', '24995073', '22215819', '21980276', '15360781', '11809317', '9783213', '9293715', '9390313'|...],
Len = 43.
A = '28486579',
B = '24995073',
C = '22215819'.
?-
pub_graph_summary_display( 30235570, _, display(*) ).
----
1:30235570
Author=[Morgan CC,Huyck S,Jenkins M,Chen L,Bedding A,Coffey CS,Gaydos B,Wathen JK]
Title=Adaptive Design: Results of 2012 Survey on Perception and Use.
Source=Ther Innov Regul Sci
Pages=473-481
PubDate=2014 Jul
Volume=48
Issue=4
ISSN=2168-4790
PmcRefCount=0
PubType=Journal Article
FullJournalName=Therapeutic innovation & regulatory science
----
?-
pub_graph_cited_by( 20195494, These ),
pub_graph_summary_display( These, _, [display(['Title','Author','PubDate'])] ).
----
1:29975690
Author=[Tang K,Boudreau CG,Brown CM,Khadra A]
Title=Paxillin phosphorylation at serine 273 and its effects on Rac, Rho and adhesion dynamics.
PubDate=2018 Jul
----
2:29694862
Author=[McKenzie M,Ha SM,Rammohan A,Radhakrishnan R,Ramakrishnan N]
Title=Multivalent Binding of a Ligand-Coated Particle: Role of Shape, Size, and Ligand Heterogeneity.
PubDate=2018 Apr 24
----
3:29669897
Author=[Padmanabhan P,Goodhill GJ]
Title=Axon growth regulation by a bistable molecular switch.
PubDate=2018 Apr 25
...
...
26:20473365
Author=[Welf ES,Haugh JM]
Title=Stochastic Dynamics of Membrane Protrusion Mediated by the DOCK180/Rac Pathway in Migrating Cells.
PubDate=2010 Mar 1
----
These = [29975690, 29694862, 29669897, 28752950, 27939309, 27588610, 27276271, 25969948, 25904526|...].
?-
pub_graph_summary_display( 20195494, _Res, true ).
----
1:20195494
Author=[Cirit M,Krajcovic M,Choi CK,Welf ES,Horwitz AF,Haugh JM]
Title=Stochastic model of integrin-mediated signaling and adhesion dynamics at the leading edges of migrating cells.
----
true.
?-
pub_graph_summary_display( cbd251a03b1a29a94f7348f4f5c2f830ab80a909, _, display(all) ).
----
1:cbd251a03b1a29a94f7348f4f5c2f830ab80a909
arxivId=[]
authors=[Graham J. L. Kemp,Nicos Angelopoulos,Peter M. D. Gray]
doi=10.1109/TITB.2002.1006298
title=Architecture of a mediator for a bioinformatics database federation
topics=[]
venue=IEEE Transactions on Information Technology in Biomedicine
year=2002
----
true.
Options is a term option or list of terms from the following;
true
update the cache if you do an explicit retrieval.
?-
date(D), pub_graph_cited_by( 12075665, By ), length( By, Len ).
D = date(2018, 9, 22),
By = [25825659, 19497389, 19458771],
Len = 3.
?-
date(D), pub_graph_cited_by( cbd251a03b1a29a94f7348f4f5c2f830ab80a909, By ), length( By, Len ).
D = date(2018, 9, 22),
By = ['2e1f686c2357cead711c8db034ff9aa2b7509621', '6f125881788967e1eec87e78b3d2db61d1a8d0ac'|...],
Len = 12.
Options is a term option or list of terms from the following;
?-
date(D),
pub_graph_cites( 20195494, Ids ),
length( Ids, Len ), write( D:Len ), nl.
date(2018,9,22):38
D = date(2018, 9, 22),
Ids = ['19160484', '19118212', '18955554', '18800171', '18586481'|...],
Len = 38.
% pubmed does not have references cited by the following paper:
?-
date(D),
pub_graph_cites( 12075665, Ids ),
length( Ids, Len ), write( D:Len ), nl.
false.
% whereas, semanticscholar.org finds 17 (non '') of the 21:
?-
date(D),
pub_graph_cites( cbd251a03b1a29a94f7348f4f5c2f830ab80a909, Ids ),
length( Ids, Len ), write( D:Len ), nl.
date(2018,9,22):17
D = date(2018, 9, 22),
Ids = ['6477792829dd059c7d318927858d307347c54c2e', '1448901572d1afd0019c86c42288108a94f1fb25', |...],
Len = 17.
?-
pub_graph_summary_display( 12075665, Results, true ).
----
1:12075665
Author=[Kemp GJ,Angelopoulos N,Gray PM]
Title=Architecture of a mediator for a bioinformatics database federation.
----
Results = [12075665-['Author'-['Kemp GJ', 'Angelopoulos N', 'Gray PM'], ... - ...|...]].
Can include journal impact factor if jif/6 is provided.
Output rows contain #citing, [IF ,] Date, Journal, Title, Author, (Title urled to pubmed/$id)
Opts
has(Val),quite(Val)]
?-
pub_graph_table
id(s) IdS.Options is a single term, or list of the following terms:
true be verbose.false
if you dont want the cache to be updated with newly downloaded information.
?-
date(Date),
Opts = names(['Author','PmcRefCount','Title']),
pub_graph_summary_info( 12075665, Results, Opts ),
write( date:Date ), nl,
member( R, Results ), write( R ), nl,
fail.
date:date(2018,9,22)
Author-[Kemp GJ,Angelopoulos N,Gray PM]
PmcRefCount-3
Title-Architecture of a mediator for a bioinformatics database federation.
false.
?-
pub_graph_summary_info(12075665,Res,[]),
member(R,Res), write( R ), nl,
fail.
Author-[Kemp GJ,Angelopoulos N,Gray PM]
Title-Architecture of a mediator for a bioinformatics database federation.
Source-IEEE Trans Inf Technol Biomed
Pages-116-22
PubDate-2002 Jun
Volume-6
Issue-2
ISSN-1089-7771
PmcRefCount-3
PubType-Journal Article
FullJournalName-IEEE transactions on information technology in biomedicine : a publication of the IEEE Engineering in Medicine and Biology Society
?-
pub_graph_summary_info( cbd251a03b1a29a94f7348f4f5c2f830ab80a909, Res, true ),
member( R, Res ), write( R ), nl,
fail.
arxivId-[]
authors-[Graham J. L. Kemp,Nicos Angelopoulos,Peter M. D. Gray]
doi-10.1109/TITB.2002.1006298
title-Architecture of a mediator for a bioinformatics database federation
topics-[]
venue-IEEE Transactions on Information Technology in Biomedicine
year-2002
false.
?- pub_graph_abstracts( 24939894, Abs ). Abs = ['Lemur tyrosine kinase 3 (LMTK3) is associated with cell proliferation and',...].
Options is a single term, or list of the following terms:
Type == false or absent to turn caching offtrue
Type is one of csv,prolog,sqlite and odbc. In the first 3 cases, Object should be a filename
and for odbc it should be a DSN token. In the case of filenames, the default value for Object
is formed as, <type>_<id1>{_<id2>}.<type_ext>.
<type_ext> is either set to Ext or if this is missing it is deduced from Type. It can be set to ''
if you want no extension added.
Graph is compatible with the graph representation of Prolog unweighted graphs. That is, all vertices should appear in a keysorted list as V-Ns pairs, where V is the vertex and Ns is the sorted list of all its neighbours. Ns is the empty list if V has no neighbours, although this should only be the case here, if one of the input Ids has no citing papers or for the nodes at the edge of Depth.
?-
pub_graph_cited_by_graph( 12075665, G, cache(sqlite) ).
Options is a single term or list of the following:
file(File) file to use for storagesingle_file(Single) boolean value, def. is true.
false seperate (aggregating) files are created
at each iteration
depth(D) the overall depth limitcsv,prolog,odbc and sqlite files are recognised.
The former two are consulted into module pub_graph_cache, and Handle is therofore not used.
For odbc/sqlite files the lookups and database access is via the odbc and prosqlite libraries respectively.
Handle can be named to an alias of choise, otherwise a opaque atom is returned with which the db is accessed.
Which, should either be cited_by or info .
Options is a term or list of terms from:
ext(Ext) extension to try on the file. Use the empty atom if you do not want the library to
use the default extension for the type of cache used.
Options are also passed to the underlying open operations for the type chosen. So for instance
you can provide the username and passward for the odbc connection with user(U) and password(P).
Opts a term or list of terms from: