A menagerie of machine learning utilities.
Currently implements k_fold learning and k_fold comparative performance plots via Real.
It is likely that bootstrapping will be added soon and also a couple of additional types of comparative plots.
Split data to N segments and run Goal on each one to learn
a Model or a sequence of models. Data should be a list of things.
Normally it will be a list of data rows, such as one read by
csv_read_file/3. The predicate can use existing partitions of Data
(see option segments(Segms)
).
Opts
fold(K)
is also added
where K is the number of fold.predictor(Predictor=false)
if non false, Predictor is called after calling a learner. the Predictor takes Model and (hold out) Data as input and returns Statistic
?- k_fold_learn( [a,b,c], true, Mods, [is_matrix(false)] ). ERROR: pack(mlu): Insufficient length of data (2) as 10 folds are requierd ?- assert(true(_,true)), assert(second(_,false)). ?- debug( mlu(k_fold_learn)), numlist( 1, 33, Tthree ), k_fold_learn( Tthree, true, Mods, [is_matrix(false),segments(Segms),repeats(10)]), length( Mods, LenMods ), flatten( Mods, AllMods ), length( AllMods, LenAllMods ). % Data fold partition, segment lengths: [3,3,3,3,4,3,3,3,3,4] LenMods = 10, LenAllMods = 100.
Compare M learners, Goals, over N cross sections of Data. On each of the N iterations, the learners are ran on the N-1 sections and tested on the hold out Nth section.
If a single Predictor is given, (singleton list or unlisted), then it is used on all Learerns. Else number should be equal to number of lerners. There should be at least two Learners, else you better call k_fold_learn directly (see Predictor and Statistic options there).
Opts
post()
options, who are passed
on to Post calls via options, this is passed to explicitly.
This not used in main call, and it defaults to model_01...model_NN
If a variable, the generated names are returned.
Additional Opts for the Post are allowed. For each Post, all options with matching outer term name
are stripped and passed. For instance jitter(accuracy(2,'AUC'))
is passed to Post jitter, and
it signifies that the predictor passes as it is second argument the AUC of the model against the
leave out segment. The implementation is expected correct at the predictor end, here we just
provide means to pass the information to the plotter.
Also see options for k_fold_learn/4.
Jitter
call(Pname,N,Name)
rerun_ground(Rerun=true)
set to false to avoid re-runing ground models. Convenient for running comparatives
?- [pack(mlu/examples/stoic)]. ?- stoic. % use ?- stoic_ng. % if you don't have library(real).
Opts
@see options for k_fold_pair_predictions/7 @tbd allow distinct Predictors
Pack mlu uses pack pack_errors for throwing errors.
File mlu_errors.pl
defines local errors within the pack_errors infrastructure.