Collects a number of biological data analytics tasks. This library provides tools for the bio_db served data, empowering downstream analyses of user's experimental data.
Installation and loading:
?- pack_install(bio_analytics). % also installs pack(lib). other dependencies are at load time via lib ?- use_module(library(lib)). ?- lib(bio_analytics).
The library comes with one experimental dataset: data/silac/bt.csv which is used in the example files in directory examples/ .
Currently 2 organisms are supported: human and mouse.
Family can be a gene ontology term (atom of the form GO:XXXXXX).
If family is a list of numbers or atoms that map to numbers, then they are taken to be Entrez ids which
are converted to gene symbols in Symbols.
If family is a list of symbols, it is passed on to Symbols.
Listens to debug(gene_family)
.
Known families:
Located family as the bio_analytics gene collection: autophagy ?- gene_family( autophagy, human, Auto ), length( Auto, Len ). Auto = ['AMBRA1', 'APOL1', 'ARNT', 'ARSA', 'ARSB', 'ATF4', 'ATF6', 'ATG10', 'ATG12'|...], Len = 232. ?- debug( gene_family ). ?- gene_family( 375, hs, Symbs ), length( Symbs, Len ). % Located GO term as the family identifier. Symbs = ['BCAS2', 'DBR1', 'DDX23', 'GEMIN2', 'KHSRP', 'LSM1', 'MPHOSPH10', 'PRPF3', 'PRPF4'|...], Len = 26. ?- gene_family( 'GO:0000375', hs, Symbs ), length( Symbs, Len ). % Located GO term as the family identifier. Symbs = ['BCAS2', 'DBR1', 'DDX23', 'GEMIN2', 'KHSRP', 'LSM1', 'MPHOSPH10', 'PRPF3', 'PRPF4'|...], Len = 26. ?- gene_family( [55626, 8542, 405], hs, Auto ). Converted input from Entrezes to Symbols. Auto = ['AMBRA1', 'APOL1', 'ARNT']. ?- gene_family( unknown, hs, Auto ). ERROR: Unhandled exception: gene_family(cannot_find_input_family_in_the_known_ones(unknown,[autophagy]))
Family datasets are in pack(bio_analytics/data/families)
.
Go can be either a GO: atom or an integer.
[debug] ?- go_org_symbol( mouse, 2, Symbs ), write(Symbs), nl, fail. Akt3 Mef2a Mgme1 Mpv17 Mrpl15 Mrpl17 Mrpl39 Msto1 Opa1 Slc25a33 Slc25a36 Tymp false. [debug] ?- go_org_symbol( hs, 'GO:0000002', Symbs ), write(Symbs), nl, fail. AKT3 LONP1 MEF2A MGME1 MPV17 MSTO1 OPA1 PIF1 SLC25A33 SLC25A36 SLC25A4 TYMP false.
?- go_org_symbols( 'GO:0000375', human, Symbs ), length( Symbs, Len ). Symbs = ['BCAS2', 'DBR1', 'DDX23', 'GEMIN2', 'KHSRP'|...], Len = 26. ?- go_org_symbols( 'GO:0000375', mouse, Symbs ), length( Symbs, Len ). Symbs = ['Dbr1', 'Rbm17', 'Rnu4atac', 'Rnu6atac', 'Scaf11', 'Sf3a2', 'Slu7', 'Srrm1', 'Srsf10'|...], Len = 10. ?- go_org_symbols( 375, human, Symbs ), length( Symbs, Len ). Symbs = ['BCAS2', 'DBR1', 'DDX23', 'GEMIN2', 'KHSRP', 'LSM1', 'MPHOSPH10', 'PRPF3', 'PRPF4'|...], Len = 26.
Opts
Listens to debug(go_symbols_reach)
.
?- go_symbols_reach( 'GO:0000375', Symbs, [] ), length( Symbs, Len ). Symbs = ['AAR2', 'ALYREF', 'AQR', 'ARC', 'BCAS2', 'BUD13', 'BUD31', 'CACTIN', 'CASC3'|...], Len = 293. ?- go_symbols_reach( 'GO:0000375', Symbs, organism(mouse) ), length( Symbs, Len ). Symbs = ['4930595M18Rik', 'Aar2', 'Aqr', 'Bud13', 'Bud31', 'Casc3', 'Cdc40', 'Cdc5l', 'Cdk13'|...]. Len = 190. ?- go_symbols_reach( 375, Symbs, true ), length( Symbs, Len ). Symbs = ['AAR2', 'ALYREF', 'AQR', 'ARC', 'BCAS2', 'BUD13', 'BUD31', 'CACTIN', 'CASC3'|...], Len = 293.
The Graph can be saved via options to wgraph_plot/2.
Opts are passed to symbols_string_graph/3, go_term_symbols/3 and wgraph_plot/2.
Opts
?- go_string_graph( 'GO:0016601', G, true ). ?- go_string_graph( 'GO:0016601', G, plot(true) ).
Opts
?- Got = 'GO:0043552', gene_family( Got, hs, Symbs ), length( Symbs, SymbsLen ), symbols_string_graph(Symbs, Graph, true ), length( Graph, GraphLen ). Got = 'GO:0043552', Symbs = ['AMBRA1', 'ATG14', 'CCL19', 'CCL21', 'CCR7'|...], SymbsLen = 31, Graph = ['TNFAIP8L3', 'AMBRA1'-'ATG14':981, 'AMBRA1'-'PIK3R4':974|...], GraphLen = 235. % the following is flawed as it uses Got from mouse and tries to build the graph in human STRING... ?- Got = 'GO:0043552', gene_family( Got, mouse, Symbs ), length( Symbs, SymbsLen ), symbols_string_graph(Symbs, Graph, true ), length( Graph, GraphLen ). Got = 'GO:0043552', Symbs = Graph, Graph = ['Ambra1', 'Atg14', 'Ccl19', 'Cd19', 'Cdc42', 'Epha8', 'Fgf2', 'Fgfr3', 'Fgr'|...], SymbsLen = GraphLen, GraphLen = 28. % this the correct way for running the first query for mouse: ?- Got = 'GO:0043552', gene_family( Got, mouse, Symbs ), length( Symbs, SymbsLen ), symbols_string_graph(Symbs, Graph, organism(mouse) ), length( Graph, GraphLen ). Got = 'GO:0043552', Symbs = ['Ambra1', 'Atg14', 'Ccl19', 'Cd19', 'Cdc42', 'Epha8'|...], SymbsLen = 28, Graph = ['Ambra1', 'Cdc42', 'Lyn', 'Nod2', 'Pdgfra', 'Sh3glb1', 'Tnfaip8l3', 'Vav3', ... : ...|...], GraphLen = 117.
We use ev as short for expression value, to avoid confusion with exp which short for experiment.
Opts
as_non(AsNon)
what to return as non differentially expressed: either everything in Gcnm (AsNon=all, default when EvLog is true),
or only those with numeric pvalue (AsNon=pvalue, default when EvLog is false)exp_pv_cnm(ExpPcnm='adj.pvalue')
the experimental column (found in MsF) on which Pcut is applied as a filterexp_pv_cut(Pcut=0.05)
p.value cut off for experimental input from MSstatsexp_ev_log(EvLog=true)
are the expression values log valuesexp_ev_cnm(EvCnm)
expession value column name (log2FC if EvLog=true, expression otherwise)exp_ev_cut_get(EvGet)
expression values above (greater or equal) which values are selected (defaults is 2 if EvLog is false, and 1 otherwise)exp_ev_include_inf(IncInf)
include infinity values as diffexs ? (default is false if EvLog is false, and true otherwise)gene_id_cnm(Gcnm='Symbols')
column name for the key value in the pair lists: DEPrs and NonDEPrs.?- lib(mtx), absolute_file_name(pack('bio_analytics/data/silac/bt.csv'), CsvF), mtx(CsvF, Mtx, convert(true)), assert(mtx_data(Mtx)). CsvF = '/usr/local/users/nicos/local/git/lib/swipl/pack/bio_analytics/data/silac/bt.csv', Mtx = [row('Protein IDs', 'Symbols', log2FC, adj.pvalue), row('Q9P126', ...), row(..., ..., ..., ...)|...]. ?- lib(debug_call), debug(testo), mtx_data(Mtx), debug_call(testo, dims, mtx/Mtx), exp_diffex(Mtx, DEPrs, NonDEPrs, []), debug_call(testo, length, de/DEPrs), debug_call(testo, length, nde/NonDEPrs). % Dimensions for matrix, (mtx) nR: 1245, nC: 4. % Length for list, de: 426. % Length for list, nde: 818. DEPrs = ['CLEC1B'- -4.80614893171469, 'LGALS1'- -2.065877096524, ... - ...|...], NonDEPrs = ['CNN2'- -0.69348026828097, 'CXCR4'-0.73039667221395, ... - ...|...]. ?- mtx_data(Mtx), exp_diffex(Mtx, DEPrs, NonDEPrs, exp_pv_cut(0.01)), debug_call(testo, length, de/DEPrs), debug_call(testo, length, nde/NonDEPrs). % Length for list, de: 286. % Length for list, nde: 958. ?- mtx_data(Mtx), Opts = [exp_ev_cut_let(inf),exp_ev_cut_get(-inf)], exp_diffex(Mtx, DEPrs, NonDEPrs, Opts), debug_call(testo, length, de/DEPrs), debug_call(testo, length, nde/NonDEPrs). % Length for list, de: 581. % Length for list, nde: 663.
Note that if de-reguation trumps identification. That is there currently no distiction between genes that seen to be both significantly de-regulated and identified versus just those that are simply significantly de-regulated.
Opts
Options are passed to a number of other predicates.
See [pack('bio_analytics/examples/bt.pl')].
?- lib(real), lib(mtx), absolute_file_name( pack('bio_analytics/data/silac/bt.csv'), CsvF ), Ppts = [ vjust= -1, node_size(3), mode="fruchtermanreingold", format(svg), stem(bt)], Opts = [ exp_ev_cut_let(inf), exp_ev_cut_get(-inf), include_non_present(false), include_non_significant(false), minw(200), wgraph_plot_opts(Ppts) ], exp_gene_family_string_graph( CsvF, autophagy, G, Opts ).
Produces file: bt.svg
Opts
Options are passed to exp_gene_family_string_graph/4.
?- debug(real). ?- debug(exp_go_over_string_graphs). ?- absolute_file_name( pack('bio_analytics/data/silac/bt.csv'), Exp ), exp_go_over_string_graphs( Exp, Gov, Dir, [] ). % Sending to R: pltv <- ggnet2(lp_adj,vjust = -1,size = 3,label = pl_v_1,color = pl_v_2,edge.size = pl_v_3,edge.color = "#BEAED4")
?- bio_analytics_version(V,D). V = 0:3:0, D = date(2019,5,12).
The following predicates are exported, but not or incorrectly documented.