mlu.pl -- Machine learning utilities

A menagerie of machine learning utilities.

Currently implements k_fold learning and k_fold comparative performance plots via Real.

It is likely that bootstrapping will be added soon and also a couple of additional types of comparative plots.

author
- nicos angelopoulos
version
- 0.1, 2016/3/5
See also
- http://stoics.org.uk/~nicos/sware/mlu
- http://stoics.org.uk/~nicos/sware/mlu/html/mlu.html
- pack(mlu/examples/stoic.pl)
mlu_version(-Version, -Date)
Current version and release date for the library.
k_fold_learn(+Data, +Goal, ?Models, +Opts)
Split data to N segments and run Goal on each one to learn a Model or a sequence of models. Data should be a list of things. Normally it will be a list of data rows, such as one read by csv_read_file/3. The predicate can use existing partitions of Data (see option segments(Segms)).

Opts

call_options(Copts=false)
if anything else than false, then Copts is added as last argument to Goal (after it is passed through en_list/2). The option fold(K) is also added where K is the number of fold.
folds(Folds=10)
number of segments to split the data to (currently only exhaustive and mutual exclusive splits are supported)
has_header(HasHdr=true)
whether Data include a Header element (first)
include_header(IncHdr=false)
include header to segments, if HasHdr==true and IncHdr==false 'false' is added as a header
include_stem(Stem=arity)
include automatically generated stem to call. The stem looks like, <learner>_<LSeq>_<FSeq>[_<RSeq>. eg truism_A_01_r1
arity
include if there is definition for Goal+1 (accounts for Copts too)
false
does not add Stem no matter what
model
Stem replaces to model argument,
true
adds it no matter what
is_matrix(IsM=true/false)
whether to pass the data through mtx/2. Default depends on whether mtx/2 is in memory (true) or not (false)
model_name_stem(Stem)
Stem for model name
?- k_fold_learn( [a,b,c], true, Mods, [is_matrix(false)] ).
ERROR: pack(mlu): Insufficient length of data (2) as 10 folds are requierd

?- assert(true(_,true)), assert(second(_,false)).

?- debug( mlu(k_fold_learn)), numlist( 1, 33, Tthree ),
   k_fold_learn( Tthree, true, Mods, [is_matrix(false),segments(Segms),repeats(10)]),
   length( Mods, LenMods ),
   flatten( Mods, AllMods ),
   length( AllMods, LenAllMods ).

% Data fold partition, segment lengths: [3,3,3,3,4,3,3,3,3,4]
LenMods = 10,
LenAllMods = 100.
author
- nicos angelopoulos
version
- 0.1 2015/12/17
See also
- example in pack(mlu/examples/stoic.pl)
k_fold_pairwise_comparisons(+Data, +Learners, +Predictors, -Models, -Statistics, +Opts)
Compare M learners, Goals, over N cross sections of Data. On each of the N iterations, the learners are ran on the N-1 sections and tested on the hold out Nth section.
If a single Predictor is given, (singleton list or unlisted), then it is used on all Learerns. Else number should be equal to number of lerners. There should be at least two Learners, else you better call k_fold_learn directly (see Predictor and Statistic options there).

Opts

learner_names(Lnames)
used to distinguish the models for each learner, l<N> is used by default
names(Names)
Names for Models. Unlike other non post() options, who are passed on to Post calls via options, this is passed to explicitly. This not used in main call, and it defaults to model_01...model_NN If a variable, the generated names are returned.
post(Post)
Post processing after Models + Statistics are constructed Known Post values:
  • jitter
statistic_names(StaNames)
names for the statistics
Additional Opts for the Post are allowed. For each Post, all options with matching outer term name are stripped and passed. For instance jitter(accuracy(2,'AUC')) is passed to Post jitter, and it signifies that the predictor passes as it is second argument the AUC of the model against the leave out segment. The implementation is expected correct at the predictor end, here we just provide means to pass the information to the plotter.

Also see options for k_fold_learn/4.

Jitter

accuracy(N, Name)
the Nth (>1) position of the statistic is accuracy identified by Name
accuracy(Pname)
predicate name for obtaining the accuracy names, called as call(Pname,N,Name)
average_accuracy(Avg=mean)
R function for obtaining the single accuracy from the k_fold accuracies
  • rerun_ground(Rerun=true)

set to false to avoid re-runing ground models. Convenient for running comparatives

        ?- [pack(mlu/examples/stoic)].

        ?- stoic.  % use ?- stoic_ng.  % if you don't have library(real).
k_fold_pairwise_predictions(+Dat, +Learners, +Predictor, -Models, -Stats, +Opts)
Run k_fold_pair_predictions/7 on pairs of Learners on a single k_fold segmentation. By default all pairwise comparisons are considered.

Opts

pairs(Pairs)
list or single pair (L1-L2) of predictions to consider

@see options for k_fold_pair_predictions/7 @tbd allow distinct Predictors

mlu_errors
Documentation predicate.

Pack mlu uses pack pack_errors for throwing errors.

File mlu_errors.pl defines local errors within the pack_errors infrastructure.