Collects a number of data analytics tasks on biological data sets. This library provides analyses tools for the bio_db served data.
Installation and loading:
?- pack_install(bio_analytics). % also installs pack(lib) ?- use_module(library(lib)). ?- lib(bio_analytics).
See example in exp_gene_family_string_graph/4.
Currently 2 organisms are supported: human and mouse.
Family can be a gene ontology term (atom of the form GO:XXXXXX).
If family is a list of numbers or atoms that map to numbers, then they are taken to be Entrez ids which
are converted to gene symbols in Symbols.
If family is a list of symbols, it is passed on to Symbols.
Listens to debug(gene_family)
.
Known families:
Located family as the bio_analytics gene collection: autophagy ?- gene_family( autophagy, human, Auto ), length( Auto, Len ). Auto = ['AMBRA1', 'APOL1', 'ARNT', 'ARSA', 'ARSB', 'ATF4', 'ATF6', 'ATG10', 'ATG12'|...], Len = 232. ?- debug( gene_family ). ?- gene_family( 375, hs, Symbs ), length( Symbs, Len ). % Located GO term as the family identifier. Symbs = ['BCAS2', 'DBR1', 'DDX23', 'GEMIN2', 'KHSRP', 'LSM1', 'MPHOSPH10', 'PRPF3', 'PRPF4'|...], Len = 26. ?- gene_family( 'GO:0000375', hs, Symbs ), length( Symbs, Len ). % Located GO term as the family identifier. Symbs = ['BCAS2', 'DBR1', 'DDX23', 'GEMIN2', 'KHSRP', 'LSM1', 'MPHOSPH10', 'PRPF3', 'PRPF4'|...], Len = 26. ?- gene_family( [55626, 8542, 405], hs, Auto ). Converted input from Entrezes to Symbols. Auto = ['AMBRA1', 'APOL1', 'ARNT']. ?- gene_family( unknown, hs, Auto ). ERROR: Unhandled exception: gene_family(cannot_find_input_family_in_the_known_ones(unknown,[autophagy]))
Family datasets are in pack(bio_analytics/data/families)
.
Go can be either a GO: atom or an integer.
[debug] ?- go_org_symbol( mouse, 2, Symbs ), write(Symbs), nl, fail. Akt3 Mef2a Mgme1 Mpv17 Mrpl15 Mrpl17 Mrpl39 Msto1 Opa1 Slc25a33 Slc25a36 Tymp false. [debug] ?- go_org_symbol( hs, 'GO:0000002', Symbs ), write(Symbs), nl, fail. AKT3 LONP1 MEF2A MGME1 MPV17 MSTO1 OPA1 PIF1 SLC25A33 SLC25A36 SLC25A4 TYMP false.
?- go_org_symbols( 'GO:0000375', human, Symbs ), length( Symbs, Len ). Symbs = ['BCAS2', 'DBR1', 'DDX23', 'GEMIN2', 'KHSRP'|...], Len = 26. ?- go_org_symbols( 'GO:0000375', mouse, Symbs ), length( Symbs, Len ). Symbs = ['Dbr1', 'Rbm17', 'Rnu4atac', 'Rnu6atac', 'Scaf11', 'Sf3a2', 'Slu7', 'Srrm1', 'Srsf10'|...], Len = 10. ?- go_org_symbols( 375, human, Symbs ), length( Symbs, Len ). Symbs = ['BCAS2', 'DBR1', 'DDX23', 'GEMIN2', 'KHSRP', 'LSM1', 'MPHOSPH10', 'PRPF3', 'PRPF4'|...], Len = 26.
Opts
Listens to debug(go_symbols_reach)
.
?- go_symbols_reach( 'GO:0000375', Symbs, [] ), length( Symbs, Len ). Symbs = ['AAR2', 'ALYREF', 'AQR', 'ARC', 'BCAS2', 'BUD13', 'BUD31', 'CACTIN', 'CASC3'|...], Len = 293. ?- go_symbols_reach( 'GO:0000375', Symbs, organism(mouse) ), length( Symbs, Len ). Symbs = ['4930595M18Rik', 'Aar2', 'Aqr', 'Bud13', 'Bud31', 'Casc3', 'Cdc40', 'Cdc5l', 'Cdk13'|...]. Len = 190. ?- go_symbols_reach( 375, Symbs, true ), length( Symbs, Len ). Symbs = ['AAR2', 'ALYREF', 'AQR', 'ARC', 'BCAS2', 'BUD13', 'BUD31', 'CACTIN', 'CASC3'|...], Len = 293.
The Graph can be saved via options to wgraph_plot/2.
Opts are passed to symbols_string_graph/3, go_term_symbols/3 and wgraph_plot/2.
Opts
?- go_string_graph( 'GO:0016601', G, true ). ?- go_string_graph( 'GO:0016601', G, plot(true) ).
Opts
?- Got = 'GO:0043552', gene_family( Got, hs, Symbs ), length( Symbs, SymbsLen ), symbols_string_graph(Symbs, Graph, true ), length( Graph, GraphLen ). Got = 'GO:0043552', Symbs = ['AMBRA1', 'ATG14', 'CCL19', 'CCL21', 'CCR7'|...], SymbsLen = 31, Graph = ['TNFAIP8L3', 'AMBRA1'-'ATG14':981, 'AMBRA1'-'PIK3R4':974|...], GraphLen = 235. % the following is flawed as it uses Got from mouse and tries to build the graph in human STRING... ?- Got = 'GO:0043552', gene_family( Got, mouse, Symbs ), length( Symbs, SymbsLen ), symbols_string_graph(Symbs, Graph, true ), length( Graph, GraphLen ). Got = 'GO:0043552', Symbs = Graph, Graph = ['Ambra1', 'Atg14', 'Ccl19', 'Cd19', 'Cdc42', 'Epha8', 'Fgf2', 'Fgfr3', 'Fgr'|...], SymbsLen = GraphLen, GraphLen = 28. % this the correct way for running the first query for mouse: ?- Got = 'GO:0043552', gene_family( Got, mouse, Symbs ), length( Symbs, SymbsLen ), symbols_string_graph(Symbs, Graph, organism(mouse) ), length( Graph, GraphLen ). Got = 'GO:0043552', Symbs = ['Ambra1', 'Atg14', 'Ccl19', 'Cd19', 'Cdc42', 'Epha8'|...], SymbsLen = 28, Graph = ['Ambra1', 'Cdc42', 'Lyn', 'Nod2', 'Pdgfra', 'Sh3glb1', 'Tnfaip8l3', 'Vav3', ... : ...|...], GraphLen = 117.
We use ev as short for expression value, to avoid confusion with exp which short for experiment.
Opts
as_non(AsNon)
what to return as non differentially expressed: either everything in Gcnm (AsNon=all, default when EvLog is true),
or only those with numeric pvalue (AsNon=pvalue, default when EvLog is false)exp_pv_cnm(ExpPcnm='adj.pvalue')
the experimental column (found in MsF) on which Pcut is applied as a filterexp_pv_cut(Pcut=0.05)
p.value cut off for experimental input from MSstatsexp_ev_log(EvLog=true)
are the expression values log valuesexp_ev_cnm(EvCnm)
expession value column name (log2FC if EvLog=true, expression otherwise)exp_ev_cut_get(EvGet)
expression values above (greater or equal) which values are selected (defaults is 2 if EvLog is false, and 1 otherwise)exp_ev_include_inf(IncInf)
include infinity values as diffexs ? (default is false if EvLog is false, and true otherwise)gene_id_cnm(Gcnm='Symbols')
column name for the key value in the pair lists: DEPrs and NonDEPrs.?- debug(testo), lib(debug_call), lib(mtx), D = '/home/nicos/ac/uci/proj/prcf010/totals/ms_stats-18.10.31', working_directory( Old, D ), CompF = 'ComparisonResult.normEqua.csv', mtx( CompF, Mtx, convert(true) ), debug_call( testo, dims, mtx_comp/Mtx ), mtx_column_values_select( Mtx, 'Label', 'D-C', Sub, _, true ), assert( sub(Sub) ), debug_call( testo, dims, mtx_sub/Sub ), exp_diffex( Sub, DEPrs, NonDEPrs, [] ), debug_call( testo, length, de/DEPrs ), debug_call( testo, length, nde/NonDEPrs ), working_directory( _, Old ). % Dimensions for matrix, (mtx_comp) nR: 125077, nC: 13. % Dimensions for matrix, (mtx_sub) nR: 5957, nC: 13. % Length for list, de: 135. % Length for list, nde: 5821. true. ?- sub(Sub), exp_diffex( Sub, DEPrs, NonDEPrs, exp_pv_cut(0.01) ), debug_call( testo, length, de/DEPrs ), debug_call( testo, length, nde/NonDEPrs ). % Length for list, de: 60. % Length for list, nde: 5896. true. ?- sub( Sub ), Opts = [exp_ev_cut_let(inf),exp_ev_cut_get(-inf)], exp_diffex( Sub, DEPrs, NonDEPrs, Opts ), debug_call( testo, length, de/DEPrs ), debug_call( testo, length, nde/NonDEPrs ). % Length for list, de: 344. % Length for list, nde: 5612. true.
Note that if de-reguation trumps identification. That is there currently no distiction between genes that seen to be both significantly de-regulated and identified versus just those that are simply significantly de-regulated.
Opts
Options are passed to a number of other predicates.
See [pack('bio_analytics/examples/bt.pl')].
?- lib(real), lib(mtx), absolute_file_name( pack('bio_analytics/data/silac/bt.csv'), CsvF ), Ppts = [ vjust= -1, node_size(3), mode="fruchtermanreingold", format(svg), stem(bt)], Opts = [ exp_ev_cut_let(inf), exp_ev_cut_get(-inf), include_non_present(false), include_non_significant(false), minw(200), wgraph_plot_opts(Ppts) ], exp_gene_family_string_graph( CsvF, autophagy, G, Opts ).
Produces file: bt.svg
?- bio_analytics_version(V,D). V = 0:1:0, D = date(2019,4,22).