2021-11-09 15:50:53 +01:00
<!doctype html>
2023-02-08 19:06:53 +01:00
< html lang = "en" >
2021-11-09 15:50:53 +01:00
< head >
< meta charset = "utf-8" / >
2023-02-08 19:06:53 +01:00
< meta name = "viewport" content = "width=device-width, initial-scale=1.0" / > < meta name = "generator" content = "Docutils 0.19: https://docutils.sourceforge.io/" / >
< title > Plotting — QuaPy 0.1.7 documentation< / title >
2021-11-09 15:50:53 +01:00
< link rel = "stylesheet" type = "text/css" href = "_static/pygments.css" / >
< link rel = "stylesheet" type = "text/css" href = "_static/bizstyle.css" / >
< script data-url_root = "./" id = "documentation_options" src = "_static/documentation_options.js" > < / script >
< script src = "_static/jquery.js" > < / script >
< script src = "_static/underscore.js" > < / script >
2023-02-08 19:06:53 +01:00
< script src = "_static/_sphinx_javascript_frameworks_compat.js" > < / script >
2021-11-09 15:50:53 +01:00
< script src = "_static/doctools.js" > < / script >
2023-02-08 19:06:53 +01:00
< script src = "_static/sphinx_highlight.js" > < / script >
2021-11-09 15:50:53 +01:00
< script src = "_static/bizstyle.js" > < / script >
< link rel = "index" title = "Index" href = "genindex.html" / >
< link rel = "search" title = "Search" href = "search.html" / >
< link rel = "next" title = "quapy" href = "modules.html" / >
2023-02-08 19:06:53 +01:00
< link rel = "prev" title = "Model Selection" href = "Model-Selection.html" / >
2021-11-09 15:50:53 +01:00
< meta name = "viewport" content = "width=device-width,initial-scale=1.0" / >
<!-- [if lt IE 9]>
< script src = "_static/css3-mediaqueries.js" > < / script >
<![endif]-->
< / head > < body >
< div class = "related" role = "navigation" aria-label = "related navigation" >
< h3 > Navigation< / h3 >
< ul >
< li class = "right" style = "margin-right: 10px" >
< a href = "genindex.html" title = "General Index"
accesskey="I">index< / a > < / li >
< li class = "right" >
< a href = "py-modindex.html" title = "Python Module Index"
>modules< / a > |< / li >
< li class = "right" >
< a href = "modules.html" title = "quapy"
accesskey="N">next< / a > |< / li >
< li class = "right" >
2023-02-08 19:06:53 +01:00
< a href = "Model-Selection.html" title = "Model Selection"
2021-11-09 15:50:53 +01:00
accesskey="P">previous< / a > |< / li >
2023-02-08 19:06:53 +01:00
< li class = "nav-item nav-item-0" > < a href = "index.html" > QuaPy 0.1.7 documentation< / a > » < / li >
2021-11-09 15:50:53 +01:00
< li class = "nav-item nav-item-this" > < a href = "" > Plotting< / a > < / li >
< / ul >
< / div >
< div class = "document" >
< div class = "documentwrapper" >
< div class = "bodywrapper" >
< div class = "body" role = "main" >
2023-02-08 19:06:53 +01:00
< section id = "plotting" >
< h1 > Plotting< a class = "headerlink" href = "#plotting" title = "Permalink to this heading" > ¶< / a > < / h1 >
2021-11-09 15:50:53 +01:00
< p > The module < em > qp.plot< / em > implements some basic plotting functions
that can help analyse the performance of a quantification method.< / p >
< p > All plotting functions receive as inputs the outcomes of
some experiments and include, for each experiment,
the following three main arguments:< / p >
< ul class = "simple" >
< li > < p > < em > method_names< / em > a list containing the names of the quantification methods< / p > < / li >
< li > < p > < em > true_prevs< / em > a list containing matrices of true prevalences< / p > < / li >
< li > < p > < em > estim_prevs< / em > a list containing matrices of estimated prevalences
(should be of the same shape as the corresponding matrix in < em > true_prevs< / em > )< / p > < / li >
< / ul >
< p > Note that a method (as indicated by a name in < em > method_names< / em > ) can
appear more than once. This could occur when various datasets are
involved in the experiments. In this case, all experiments for the
method will be merged and the plot will represent the method’ s
performance across various datasets.< / p >
< p > This is a very simple example of a valid input for the plotting functions:< / p >
< div class = "highlight-python notranslate" > < div class = "highlight" > < pre > < span > < / span > < span class = "n" > method_names< / span > < span class = "o" > =< / span > < span class = "p" > [< / span > < span class = "s1" > ' classify & count' < / span > < span class = "p" > ,< / span > < span class = "s1" > ' EMQ' < / span > < span class = "p" > ,< / span > < span class = "s1" > ' classify & count' < / span > < span class = "p" > ]< / span >
< span class = "n" > true_prevs< / span > < span class = "o" > =< / span > < span class = "p" > [< / span >
< span class = "n" > np< / span > < span class = "o" > .< / span > < span class = "n" > array< / span > < span class = "p" > ([[< / span > < span class = "mf" > 0.5< / span > < span class = "p" > ,< / span > < span class = "mf" > 0.5< / span > < span class = "p" > ],< / span > < span class = "p" > [< / span > < span class = "mf" > 0.25< / span > < span class = "p" > ,< / span > < span class = "mf" > 0.75< / span > < span class = "p" > ]]),< / span >
< span class = "n" > np< / span > < span class = "o" > .< / span > < span class = "n" > array< / span > < span class = "p" > ([[< / span > < span class = "mf" > 0.0< / span > < span class = "p" > ,< / span > < span class = "mf" > 1.0< / span > < span class = "p" > ],< / span > < span class = "p" > [< / span > < span class = "mf" > 0.25< / span > < span class = "p" > ,< / span > < span class = "mf" > 0.75< / span > < span class = "p" > ],< / span > < span class = "p" > [< / span > < span class = "mf" > 0.0< / span > < span class = "p" > ,< / span > < span class = "mf" > 0.1< / span > < span class = "p" > ]]),< / span >
< span class = "n" > np< / span > < span class = "o" > .< / span > < span class = "n" > array< / span > < span class = "p" > ([[< / span > < span class = "mf" > 0.0< / span > < span class = "p" > ,< / span > < span class = "mf" > 1.0< / span > < span class = "p" > ],< / span > < span class = "p" > [< / span > < span class = "mf" > 0.25< / span > < span class = "p" > ,< / span > < span class = "mf" > 0.75< / span > < span class = "p" > ],< / span > < span class = "p" > [< / span > < span class = "mf" > 0.0< / span > < span class = "p" > ,< / span > < span class = "mf" > 0.1< / span > < span class = "p" > ]]),< / span >
< span class = "p" > ]< / span >
< span class = "n" > estim_prevs< / span > < span class = "o" > =< / span > < span class = "p" > [< / span >
< span class = "n" > np< / span > < span class = "o" > .< / span > < span class = "n" > array< / span > < span class = "p" > ([[< / span > < span class = "mf" > 0.45< / span > < span class = "p" > ,< / span > < span class = "mf" > 0.55< / span > < span class = "p" > ],< / span > < span class = "p" > [< / span > < span class = "mf" > 0.6< / span > < span class = "p" > ,< / span > < span class = "mf" > 0.4< / span > < span class = "p" > ]]),< / span >
< span class = "n" > np< / span > < span class = "o" > .< / span > < span class = "n" > array< / span > < span class = "p" > ([[< / span > < span class = "mf" > 0.0< / span > < span class = "p" > ,< / span > < span class = "mf" > 1.0< / span > < span class = "p" > ],< / span > < span class = "p" > [< / span > < span class = "mf" > 0.5< / span > < span class = "p" > ,< / span > < span class = "mf" > 0.5< / span > < span class = "p" > ],< / span > < span class = "p" > [< / span > < span class = "mf" > 0.2< / span > < span class = "p" > ,< / span > < span class = "mf" > 0.8< / span > < span class = "p" > ]]),< / span >
< span class = "n" > np< / span > < span class = "o" > .< / span > < span class = "n" > array< / span > < span class = "p" > ([[< / span > < span class = "mf" > 0.1< / span > < span class = "p" > ,< / span > < span class = "mf" > 0.9< / span > < span class = "p" > ],< / span > < span class = "p" > [< / span > < span class = "mf" > 0.3< / span > < span class = "p" > ,< / span > < span class = "mf" > 0.7< / span > < span class = "p" > ],< / span > < span class = "p" > [< / span > < span class = "mf" > 0.0< / span > < span class = "p" > ,< / span > < span class = "mf" > 0.1< / span > < span class = "p" > ]]),< / span >
< span class = "p" > ]< / span >
< / pre > < / div >
< / div >
< p > in which the < em > classify & count< / em > has been tested in two datasets and
the < em > EMQ< / em > method has been tested only in one dataset. For the first
experiment, only two (binary) quantifications have been tested,
while for the second and third experiments three instances have
been tested.< / p >
< p > In general, we would like to test the performance of the
quantification methods across different scenarios showcasing
the accuracy of the quantifier in predicting class prevalences
for a wide range of prior distributions. This can easily be
achieved by means of the
< a class = "reference external" href = "https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation" > artificial sampling protocol< / a >
that is implemented in QuaPy.< / p >
< p > The following code shows how to perform one simple experiment
in which the 4 < em > CC-variants< / em > , all equipped with a linear SVM, are
applied to one binary dataset of reviews about < em > Kindle< / em > devices and
tested across the entire spectrum of class priors (taking 21 splits
of the interval [0,1], i.e., using prevalence steps of 0.05, and
generating 100 random samples at each prevalence).< / p >
< div class = "highlight-python notranslate" > < div class = "highlight" > < pre > < span > < / span > < span class = "kn" > import< / span > < span class = "nn" > quapy< / span > < span class = "k" > as< / span > < span class = "nn" > qp< / span >
< span class = "kn" > from< / span > < span class = "nn" > quapy.method.aggregative< / span > < span class = "kn" > import< / span > < span class = "n" > CC< / span > < span class = "p" > ,< / span > < span class = "n" > ACC< / span > < span class = "p" > ,< / span > < span class = "n" > PCC< / span > < span class = "p" > ,< / span > < span class = "n" > PACC< / span >
< span class = "kn" > from< / span > < span class = "nn" > sklearn.svm< / span > < span class = "kn" > import< / span > < span class = "n" > LinearSVC< / span >
< span class = "n" > qp< / span > < span class = "o" > .< / span > < span class = "n" > environ< / span > < span class = "p" > [< / span > < span class = "s1" > ' SAMPLE_SIZE' < / span > < span class = "p" > ]< / span > < span class = "o" > =< / span > < span class = "mi" > 500< / span >
< span class = "k" > def< / span > < span class = "nf" > gen_data< / span > < span class = "p" > ():< / span >
< span class = "k" > def< / span > < span class = "nf" > base_classifier< / span > < span class = "p" > ():< / span >
< span class = "k" > return< / span > < span class = "n" > LinearSVC< / span > < span class = "p" > ()< / span >
< span class = "k" > def< / span > < span class = "nf" > models< / span > < span class = "p" > ():< / span >
< span class = "k" > yield< / span > < span class = "n" > CC< / span > < span class = "p" > (< / span > < span class = "n" > base_classifier< / span > < span class = "p" > ())< / span >
< span class = "k" > yield< / span > < span class = "n" > ACC< / span > < span class = "p" > (< / span > < span class = "n" > base_classifier< / span > < span class = "p" > ())< / span >
< span class = "k" > yield< / span > < span class = "n" > PCC< / span > < span class = "p" > (< / span > < span class = "n" > base_classifier< / span > < span class = "p" > ())< / span >
< span class = "k" > yield< / span > < span class = "n" > PACC< / span > < span class = "p" > (< / span > < span class = "n" > base_classifier< / span > < span class = "p" > ())< / span >
< span class = "n" > data< / span > < span class = "o" > =< / span > < span class = "n" > qp< / span > < span class = "o" > .< / span > < span class = "n" > datasets< / span > < span class = "o" > .< / span > < span class = "n" > fetch_reviews< / span > < span class = "p" > (< / span > < span class = "s1" > ' kindle' < / span > < span class = "p" > ,< / span > < span class = "n" > tfidf< / span > < span class = "o" > =< / span > < span class = "kc" > True< / span > < span class = "p" > ,< / span > < span class = "n" > min_df< / span > < span class = "o" > =< / span > < span class = "mi" > 5< / span > < span class = "p" > )< / span >
< span class = "n" > method_names< / span > < span class = "p" > ,< / span > < span class = "n" > true_prevs< / span > < span class = "p" > ,< / span > < span class = "n" > estim_prevs< / span > < span class = "p" > ,< / span > < span class = "n" > tr_prevs< / span > < span class = "o" > =< / span > < span class = "p" > [],< / span > < span class = "p" > [],< / span > < span class = "p" > [],< / span > < span class = "p" > []< / span >
< span class = "k" > for< / span > < span class = "n" > model< / span > < span class = "ow" > in< / span > < span class = "n" > models< / span > < span class = "p" > ():< / span >
< span class = "n" > model< / span > < span class = "o" > .< / span > < span class = "n" > fit< / span > < span class = "p" > (< / span > < span class = "n" > data< / span > < span class = "o" > .< / span > < span class = "n" > training< / span > < span class = "p" > )< / span >
< span class = "n" > true_prev< / span > < span class = "p" > ,< / span > < span class = "n" > estim_prev< / span > < span class = "o" > =< / span > < span class = "n" > qp< / span > < span class = "o" > .< / span > < span class = "n" > evaluation< / span > < span class = "o" > .< / span > < span class = "n" > artificial_sampling_prediction< / span > < span class = "p" > (< / span >
< span class = "n" > model< / span > < span class = "p" > ,< / span > < span class = "n" > data< / span > < span class = "o" > .< / span > < span class = "n" > test< / span > < span class = "p" > ,< / span > < span class = "n" > qp< / span > < span class = "o" > .< / span > < span class = "n" > environ< / span > < span class = "p" > [< / span > < span class = "s1" > ' SAMPLE_SIZE' < / span > < span class = "p" > ],< / span > < span class = "n" > n_repetitions< / span > < span class = "o" > =< / span > < span class = "mi" > 100< / span > < span class = "p" > ,< / span > < span class = "n" > n_prevpoints< / span > < span class = "o" > =< / span > < span class = "mi" > 21< / span >
< span class = "p" > )< / span >
< span class = "n" > method_names< / span > < span class = "o" > .< / span > < span class = "n" > append< / span > < span class = "p" > (< / span > < span class = "n" > model< / span > < span class = "o" > .< / span > < span class = "vm" > __class__< / span > < span class = "o" > .< / span > < span class = "vm" > __name__< / span > < span class = "p" > )< / span >
< span class = "n" > true_prevs< / span > < span class = "o" > .< / span > < span class = "n" > append< / span > < span class = "p" > (< / span > < span class = "n" > true_prev< / span > < span class = "p" > )< / span >
< span class = "n" > estim_prevs< / span > < span class = "o" > .< / span > < span class = "n" > append< / span > < span class = "p" > (< / span > < span class = "n" > estim_prev< / span > < span class = "p" > )< / span >
< span class = "n" > tr_prevs< / span > < span class = "o" > .< / span > < span class = "n" > append< / span > < span class = "p" > (< / span > < span class = "n" > data< / span > < span class = "o" > .< / span > < span class = "n" > training< / span > < span class = "o" > .< / span > < span class = "n" > prevalence< / span > < span class = "p" > ())< / span >
< span class = "k" > return< / span > < span class = "n" > method_names< / span > < span class = "p" > ,< / span > < span class = "n" > true_prevs< / span > < span class = "p" > ,< / span > < span class = "n" > estim_prevs< / span > < span class = "p" > ,< / span > < span class = "n" > tr_prevs< / span >
< span class = "n" > method_names< / span > < span class = "p" > ,< / span > < span class = "n" > true_prevs< / span > < span class = "p" > ,< / span > < span class = "n" > estim_prevs< / span > < span class = "p" > ,< / span > < span class = "n" > tr_prevs< / span > < span class = "o" > =< / span > < span class = "n" > gen_data< / span > < span class = "p" > ()< / span >
< / pre > < / div >
< / div >
< p > the plots that can be generated are explained below.< / p >
2023-02-08 19:06:53 +01:00
< section id = "diagonal-plot" >
< h2 > Diagonal Plot< a class = "headerlink" href = "#diagonal-plot" title = "Permalink to this heading" > ¶< / a > < / h2 >
2021-11-09 15:50:53 +01:00
< p > The < em > diagonal< / em > plot shows a very insightful view of the
quantifier’ s performance. It plots the predicted class
prevalence (in the y-axis) against the true class prevalence
(in the x-axis). Unfortunately, it is limited to binary quantification,
although one can simply generate as many < em > diagonal< / em > plots as
classes there are by indicating which class should be considered
the target of the plot.< / p >
< p > The following call will produce the plot:< / p >
< div class = "highlight-python notranslate" > < div class = "highlight" > < pre > < span > < / span > < span class = "n" > qp< / span > < span class = "o" > .< / span > < span class = "n" > plot< / span > < span class = "o" > .< / span > < span class = "n" > binary_diagonal< / span > < span class = "p" > (< / span > < span class = "n" > method_names< / span > < span class = "p" > ,< / span > < span class = "n" > true_prevs< / span > < span class = "p" > ,< / span > < span class = "n" > estim_prevs< / span > < span class = "p" > ,< / span > < span class = "n" > train_prev< / span > < span class = "o" > =< / span > < span class = "n" > tr_prevs< / span > < span class = "p" > [< / span > < span class = "mi" > 0< / span > < span class = "p" > ],< / span > < span class = "n" > savepath< / span > < span class = "o" > =< / span > < span class = "s1" > ' ./plots/bin_diag.png' < / span > < span class = "p" > )< / span >
< / pre > < / div >
< / div >
< p > the last argument is optional, and indicates the path where to save
the plot (the file extension will determine the format – typical extensions
are ‘ .png’ or ‘ .pdf’ ). If this path is not provided, then the plot
will be shown but not saved.
The resulting plot should look like:< / p >
< p > < img alt = "diagonal plot on Kindle" src = "_images/bin_diag.png" / > < / p >
< p > Note that in this case, we are also indicating the training
prevalence, which is plotted in the diagonal a as cyan dot.
The color bands indicate the standard deviations of the predictions,
and can be hidden by setting the argument < em > show_std=False< / em > (see
the complete list of arguments in the documentation).< / p >
< p > Finally, note how most quantifiers, and specially the “unadjusted”
variants CC and PCC, are strongly biased towards the
prevalence seen during training.< / p >
2023-02-08 19:06:53 +01:00
< / section >
< section id = "quantification-bias" >
< h2 > Quantification bias< a class = "headerlink" href = "#quantification-bias" title = "Permalink to this heading" > ¶< / a > < / h2 >
2021-11-09 15:50:53 +01:00
< p > This plot aims at evincing the bias that any quantifier
displays with respect to the training prevalences by
means of < a class = "reference external" href = "https://en.wikipedia.org/wiki/Box_plot" > box plots< / a > .
This plot can be generated by:< / p >
< div class = "highlight-python notranslate" > < div class = "highlight" > < pre > < span > < / span > < span class = "n" > qp< / span > < span class = "o" > .< / span > < span class = "n" > plot< / span > < span class = "o" > .< / span > < span class = "n" > binary_bias_global< / span > < span class = "p" > (< / span > < span class = "n" > method_names< / span > < span class = "p" > ,< / span > < span class = "n" > true_prevs< / span > < span class = "p" > ,< / span > < span class = "n" > estim_prevs< / span > < span class = "p" > ,< / span > < span class = "n" > savepath< / span > < span class = "o" > =< / span > < span class = "s1" > ' ./plots/bin_bias.png' < / span > < span class = "p" > )< / span >
< / pre > < / div >
< / div >
< p > and should look like:< / p >
< p > < img alt = "bias plot on Kindle" src = "_images/bin_bias.png" / > < / p >
< p > The box plots show some interesting facts:< / p >
< ul class = "simple" >
< li > < p > all methods are biased towards the training prevalence but specially
so CC and PCC (an unbiased quantifier would have a box centered at 0)< / p > < / li >
< li > < p > the bias is always positive, indicating that all methods tend to
overestimate the positive class prevalence< / p > < / li >
< li > < p > CC and PCC have high variability while ACC and specially PACC exhibit
lower variability.< / p > < / li >
< / ul >
< p > Again, these plots could be generated for experiments ranging across
different datasets, and the plot will merge all data accordingly.< / p >
< p > Another illustrative example can be shown that consists of
training different CC quantifiers trained at different
(artificially sampled) training prevalences.
For this example, we generate training samples of 5000
documents containing 10%, 20%, …, 90% of positives from the
IMDb dataset, and generate the bias plot again.
This example can be run by rewritting the < em > gen_data()< / em > function
like this:< / p >
< div class = "highlight-python notranslate" > < div class = "highlight" > < pre > < span > < / span > < span class = "k" > def< / span > < span class = "nf" > gen_data< / span > < span class = "p" > ():< / span >
< span class = "n" > data< / span > < span class = "o" > =< / span > < span class = "n" > qp< / span > < span class = "o" > .< / span > < span class = "n" > datasets< / span > < span class = "o" > .< / span > < span class = "n" > fetch_reviews< / span > < span class = "p" > (< / span > < span class = "s1" > ' imdb' < / span > < span class = "p" > ,< / span > < span class = "n" > tfidf< / span > < span class = "o" > =< / span > < span class = "kc" > True< / span > < span class = "p" > ,< / span > < span class = "n" > min_df< / span > < span class = "o" > =< / span > < span class = "mi" > 5< / span > < span class = "p" > )< / span >
< span class = "n" > model< / span > < span class = "o" > =< / span > < span class = "n" > CC< / span > < span class = "p" > (< / span > < span class = "n" > LinearSVC< / span > < span class = "p" > ())< / span >
< span class = "n" > method_data< / span > < span class = "o" > =< / span > < span class = "p" > []< / span >
< span class = "k" > for< / span > < span class = "n" > training_prevalence< / span > < span class = "ow" > in< / span > < span class = "n" > np< / span > < span class = "o" > .< / span > < span class = "n" > linspace< / span > < span class = "p" > (< / span > < span class = "mf" > 0.1< / span > < span class = "p" > ,< / span > < span class = "mf" > 0.9< / span > < span class = "p" > ,< / span > < span class = "mi" > 9< / span > < span class = "p" > ):< / span >
< span class = "n" > training_size< / span > < span class = "o" > =< / span > < span class = "mi" > 5000< / span >
< span class = "c1" > # since the problem is binary, it suffices to specify the negative prevalence (the positive is constrained)< / span >
< span class = "n" > training< / span > < span class = "o" > =< / span > < span class = "n" > data< / span > < span class = "o" > .< / span > < span class = "n" > training< / span > < span class = "o" > .< / span > < span class = "n" > sampling< / span > < span class = "p" > (< / span > < span class = "n" > training_size< / span > < span class = "p" > ,< / span > < span class = "mi" > 1< / span > < span class = "o" > -< / span > < span class = "n" > training_prevalence< / span > < span class = "p" > )< / span >
< span class = "n" > model< / span > < span class = "o" > .< / span > < span class = "n" > fit< / span > < span class = "p" > (< / span > < span class = "n" > training< / span > < span class = "p" > )< / span >
< span class = "n" > true_prev< / span > < span class = "p" > ,< / span > < span class = "n" > estim_prev< / span > < span class = "o" > =< / span > < span class = "n" > qp< / span > < span class = "o" > .< / span > < span class = "n" > evaluation< / span > < span class = "o" > .< / span > < span class = "n" > artificial_sampling_prediction< / span > < span class = "p" > (< / span >
< span class = "n" > model< / span > < span class = "p" > ,< / span > < span class = "n" > data< / span > < span class = "o" > .< / span > < span class = "n" > sample< / span > < span class = "p" > ,< / span > < span class = "n" > qp< / span > < span class = "o" > .< / span > < span class = "n" > environ< / span > < span class = "p" > [< / span > < span class = "s1" > ' SAMPLE_SIZE' < / span > < span class = "p" > ],< / span > < span class = "n" > n_repetitions< / span > < span class = "o" > =< / span > < span class = "mi" > 100< / span > < span class = "p" > ,< / span > < span class = "n" > n_prevpoints< / span > < span class = "o" > =< / span > < span class = "mi" > 21< / span >
< span class = "p" > )< / span >
< span class = "c1" > # method names can contain Latex syntax< / span >
2023-02-08 19:06:53 +01:00
< span class = "n" > method_name< / span > < span class = "o" > =< / span > < span class = "s1" > ' CC$_{' < / span > < span class = "o" > +< / span > < span class = "sa" > f< / span > < span class = "s1" > ' < / span > < span class = "si" > {< / span > < span class = "nb" > int< / span > < span class = "p" > (< / span > < span class = "mi" > 100< / span > < span class = "w" > < / span > < span class = "o" > *< / span > < span class = "w" > < / span > < span class = "n" > training_prevalence< / span > < span class = "p" > )< / span > < span class = "si" > }< / span > < span class = "s1" > ' < / span > < span class = "o" > +< / span > < span class = "s1" > ' \%}$' < / span >
2021-11-09 15:50:53 +01:00
< span class = "n" > method_data< / span > < span class = "o" > .< / span > < span class = "n" > append< / span > < span class = "p" > ((< / span > < span class = "n" > method_name< / span > < span class = "p" > ,< / span > < span class = "n" > true_prev< / span > < span class = "p" > ,< / span > < span class = "n" > estim_prev< / span > < span class = "p" > ,< / span > < span class = "n" > training< / span > < span class = "o" > .< / span > < span class = "n" > prevalence< / span > < span class = "p" > ()))< / span >
< span class = "k" > return< / span > < span class = "nb" > zip< / span > < span class = "p" > (< / span > < span class = "o" > *< / span > < span class = "n" > method_data< / span > < span class = "p" > )< / span >
< / pre > < / div >
< / div >
< p > and the plot should now look like:< / p >
< p > < img alt = "bias plot on IMDb" src = "_images/bin_bias_cc.png" / > < / p >
< p > which clearly shows a negative bias for CC variants trained on
data containing more negatives (i.e., < 50%) and positive biases
in cases containing more positives (i.e., > 50%). The CC trained
at 50% behaves as an unbiased estimator of the positive class
prevalence.< / p >
< p > The function < em > qp.plot.binary_bias_bins< / em > allows the user to
generate box plots broken down by bins of true test prevalence.
To this aim, an argument < em > nbins< / em > is passed which indicates
how many isometric subintervals to take. For example
the following plot is produced for < em > nbins=3< / em > :< / p >
< p > < img alt = "bias plot on IMDb" src = "_images/bin_bias_bin_cc.png" / > < / p >
< p > Interestingly enough, the seemingly unbiased estimator (CC at 50%) happens to display
a positive bias (or a tendency to overestimate) in cases of low prevalence
(i.e., when the true prevalence of the positive class is below 33%),
and a negative bias (or a tendency to underestimate) in cases of high prevalence
(i.e., when the true prevalence is beyond 67%).< / p >
< p > Out of curiosity, the diagonal plot for this experiment looks like:< / p >
< p > < img alt = "diag plot on IMDb" src = "_images/bin_diag_cc.png" / > < / p >
< p > showing pretty clearly the dependency of CC on the prior probabilities
of the labeled set it was trained on.< / p >
2023-02-08 19:06:53 +01:00
< / section >
< section id = "error-by-drift" >
< h2 > Error by Drift< a class = "headerlink" href = "#error-by-drift" title = "Permalink to this heading" > ¶< / a > < / h2 >
2021-11-09 15:50:53 +01:00
< p > Above discussed plots are useful for analyzing and comparing
the performance of different quantification methods, but are
limited to the binary case. The “error by drift” is a plot
that shows the error in predictions as a function of the
(prior probability) drift between each test sample and the
training set. Interestingly, the error and drift can both be measured
in terms of any evaluation measure for quantification (like the
ones available in < em > qp.error< / em > ) and can thus be computed
irrespectively of the number of classes.< / p >
< p > The following shows how to generate the plot for the 4 CC variants,
using 10 bins for the drift
and < em > absolute error< / em > as the measure of the error (the
drift in the x-axis is always computed in terms of < em > absolute error< / em > since
other errors are harder to interpret):< / p >
< div class = "highlight-python notranslate" > < div class = "highlight" > < pre > < span > < / span > < span class = "n" > qp< / span > < span class = "o" > .< / span > < span class = "n" > plot< / span > < span class = "o" > .< / span > < span class = "n" > error_by_drift< / span > < span class = "p" > (< / span > < span class = "n" > method_names< / span > < span class = "p" > ,< / span > < span class = "n" > true_prevs< / span > < span class = "p" > ,< / span > < span class = "n" > estim_prevs< / span > < span class = "p" > ,< / span > < span class = "n" > tr_prevs< / span > < span class = "p" > ,< / span >
< span class = "n" > error_name< / span > < span class = "o" > =< / span > < span class = "s1" > ' ae' < / span > < span class = "p" > ,< / span > < span class = "n" > n_bins< / span > < span class = "o" > =< / span > < span class = "mi" > 10< / span > < span class = "p" > ,< / span > < span class = "n" > savepath< / span > < span class = "o" > =< / span > < span class = "s1" > ' ./plots/err_drift.png' < / span > < span class = "p" > )< / span >
< / pre > < / div >
< / div >
< p > < img alt = "diag plot on IMDb" src = "_images/err_drift.png" / > < / p >
< p > Note that all methods work reasonably well in cases of low prevalence
drift (i.e., any CC-variant is a good quantifier whenever the IID
assumption is approximately preserved). The higher the drift, the worse
those quantifiers tend to perform, although it is clear that PACC
yields the lowest error for the most difficult cases.< / p >
< p > Remember that any plot can be generated < em > across many datasets< / em > , and
that this would probably result in a more solid comparison.
In those cases, however, it is likely that the variances of each
method get higher, to the detriment of the visualization.
We recommend to set < em > show_std=False< / em > in those cases
in order to hide the color bands.< / p >
2023-02-08 19:06:53 +01:00
< / section >
< / section >
2021-11-09 15:50:53 +01:00
< div class = "clearer" > < / div >
< / div >
< / div >
< / div >
< div class = "sphinxsidebar" role = "navigation" aria-label = "main navigation" >
< div class = "sphinxsidebarwrapper" >
2023-02-08 19:06:53 +01:00
< div >
< h3 > < a href = "index.html" > Table of Contents< / a > < / h3 >
< ul >
2021-11-09 15:50:53 +01:00
< li > < a class = "reference internal" href = "#" > Plotting< / a > < ul >
< li > < a class = "reference internal" href = "#diagonal-plot" > Diagonal Plot< / a > < / li >
< li > < a class = "reference internal" href = "#quantification-bias" > Quantification bias< / a > < / li >
< li > < a class = "reference internal" href = "#error-by-drift" > Error by Drift< / a > < / li >
< / ul >
< / li >
< / ul >
2023-02-08 19:06:53 +01:00
< / div >
< div >
< h4 > Previous topic< / h4 >
< p class = "topless" > < a href = "Model-Selection.html"
title="previous chapter">Model Selection< / a > < / p >
< / div >
< div >
< h4 > Next topic< / h4 >
< p class = "topless" > < a href = "modules.html"
title="next chapter">quapy< / a > < / p >
< / div >
2021-11-09 15:50:53 +01:00
< div role = "note" aria-label = "source link" >
< h3 > This Page< / h3 >
< ul class = "this-page-menu" >
< li > < a href = "_sources/Plotting.md.txt"
rel="nofollow">Show Source< / a > < / li >
< / ul >
< / div >
< div id = "searchbox" style = "display: none" role = "search" >
< h3 id = "searchlabel" > Quick search< / h3 >
< div class = "searchformwrapper" >
< form class = "search" action = "search.html" method = "get" >
< input type = "text" name = "q" aria-labelledby = "searchlabel" autocomplete = "off" autocorrect = "off" autocapitalize = "off" spellcheck = "false" / >
< input type = "submit" value = "Go" / >
< / form >
< / div >
< / div >
2023-02-08 19:06:53 +01:00
< script > document . getElementById ( 'searchbox' ) . style . display = "block" < / script >
2021-11-09 15:50:53 +01:00
< / div >
< / div >
< div class = "clearer" > < / div >
< / div >
< div class = "related" role = "navigation" aria-label = "related navigation" >
< h3 > Navigation< / h3 >
< ul >
< li class = "right" style = "margin-right: 10px" >
< a href = "genindex.html" title = "General Index"
>index< / a > < / li >
< li class = "right" >
< a href = "py-modindex.html" title = "Python Module Index"
>modules< / a > |< / li >
< li class = "right" >
< a href = "modules.html" title = "quapy"
>next< / a > |< / li >
< li class = "right" >
2023-02-08 19:06:53 +01:00
< a href = "Model-Selection.html" title = "Model Selection"
2021-11-09 15:50:53 +01:00
>previous< / a > |< / li >
2023-02-08 19:06:53 +01:00
< li class = "nav-item nav-item-0" > < a href = "index.html" > QuaPy 0.1.7 documentation< / a > » < / li >
2021-11-09 15:50:53 +01:00
< li class = "nav-item nav-item-this" > < a href = "" > Plotting< / a > < / li >
< / ul >
< / div >
< div class = "footer" role = "contentinfo" >
© Copyright 2021, Alejandro Moreo.
2023-02-08 19:06:53 +01:00
Created using < a href = "https://www.sphinx-doc.org/" > Sphinx< / a > 5.3.0.
2021-11-09 15:50:53 +01:00
< / div >
< / body >
< / html >