forked from moreo/QuaPy
281 lines
23 KiB
HTML
281 lines
23 KiB
HTML
|
||
|
||
<!doctype html>
|
||
|
||
<html lang="en">
|
||
<head>
|
||
<meta charset="utf-8" />
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||
|
||
<title>Evaluation — QuaPy 0.1.7 documentation</title>
|
||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||
<link rel="stylesheet" type="text/css" href="_static/bizstyle.css" />
|
||
|
||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||
<script src="_static/jquery.js"></script>
|
||
<script src="_static/underscore.js"></script>
|
||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||
<script src="_static/doctools.js"></script>
|
||
<script src="_static/sphinx_highlight.js"></script>
|
||
<script src="_static/bizstyle.js"></script>
|
||
<link rel="index" title="Index" href="genindex.html" />
|
||
<link rel="search" title="Search" href="search.html" />
|
||
<link rel="next" title="Protocols" href="Protocols.html" />
|
||
<link rel="prev" title="Datasets" href="Datasets.html" />
|
||
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
||
<!--[if lt IE 9]>
|
||
<script src="_static/css3-mediaqueries.js"></script>
|
||
<![endif]-->
|
||
</head><body>
|
||
<div class="related" role="navigation" aria-label="related navigation">
|
||
<h3>Navigation</h3>
|
||
<ul>
|
||
<li class="right" style="margin-right: 10px">
|
||
<a href="genindex.html" title="General Index"
|
||
accesskey="I">index</a></li>
|
||
<li class="right" >
|
||
<a href="py-modindex.html" title="Python Module Index"
|
||
>modules</a> |</li>
|
||
<li class="right" >
|
||
<a href="Protocols.html" title="Protocols"
|
||
accesskey="N">next</a> |</li>
|
||
<li class="right" >
|
||
<a href="Datasets.html" title="Datasets"
|
||
accesskey="P">previous</a> |</li>
|
||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||
<li class="nav-item nav-item-this"><a href="">Evaluation</a></li>
|
||
</ul>
|
||
</div>
|
||
|
||
<div class="document">
|
||
<div class="documentwrapper">
|
||
<div class="bodywrapper">
|
||
<div class="body" role="main">
|
||
|
||
<section id="evaluation">
|
||
<h1>Evaluation<a class="headerlink" href="#evaluation" title="Permalink to this heading">¶</a></h1>
|
||
<p>Quantification is an appealing tool in scenarios of dataset shift,
|
||
and particularly in scenarios of prior-probability shift.
|
||
That is, the interest in estimating the class prevalences arises
|
||
under the belief that those class prevalences might have changed
|
||
with respect to the ones observed during training.
|
||
In other words, one could simply return the training prevalence
|
||
as a predictor of the test prevalence if this change is assumed
|
||
to be unlikely (as is the case in general scenarios of
|
||
machine learning governed by the iid assumption).
|
||
In brief, quantification requires dedicated evaluation protocols,
|
||
which are implemented in QuaPy and explained here.</p>
|
||
<section id="error-measures">
|
||
<h2>Error Measures<a class="headerlink" href="#error-measures" title="Permalink to this heading">¶</a></h2>
|
||
<p>The module quapy.error implements the following error measures for quantification:</p>
|
||
<ul class="simple">
|
||
<li><p><em>mae</em>: mean absolute error</p></li>
|
||
<li><p><em>mrae</em>: mean relative absolute error</p></li>
|
||
<li><p><em>mse</em>: mean squared error</p></li>
|
||
<li><p><em>mkld</em>: mean Kullback-Leibler Divergence</p></li>
|
||
<li><p><em>mnkld</em>: mean normalized Kullback-Leibler Divergence</p></li>
|
||
</ul>
|
||
<p>Functions <em>ae</em>, <em>rae</em>, <em>se</em>, <em>kld</em>, and <em>nkld</em> are also available,
|
||
which return the individual errors (i.e., without averaging the whole).</p>
|
||
<p>Some errors of classification are also available:</p>
|
||
<ul class="simple">
|
||
<li><p><em>acce</em>: accuracy error (1-accuracy)</p></li>
|
||
<li><p><em>f1e</em>: F-1 score error (1-F1 score)</p></li>
|
||
</ul>
|
||
<p>The error functions implement the following interface, e.g.:</p>
|
||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">mae</span><span class="p">(</span><span class="n">true_prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">)</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>in which the first argument is a ndarray containing the true
|
||
prevalences, and the second argument is another ndarray with
|
||
the estimations produced by some method.</p>
|
||
<p>Some error functions, e.g., <em>mrae</em>, <em>mkld</em>, and <em>mnkld</em>, are
|
||
smoothed for numerical stability. In those cases, there is a
|
||
third argument, e.g.:</p>
|
||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">mrae</span><span class="p">(</span><span class="n">true_prevs</span><span class="p">,</span> <span class="n">prevs_hat</span><span class="p">,</span> <span class="n">eps</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span> <span class="o">...</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>indicating the value for the smoothing parameter epsilon.
|
||
Traditionally, this value is set to 1/(2T) in past literature,
|
||
with T the sampling size. One could either pass this value
|
||
to the function each time, or to set a QuaPy’s environment
|
||
variable <em>SAMPLE_SIZE</em> once, and omit this argument
|
||
thereafter (recommended);
|
||
e.g.:</p>
|
||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">qp</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s1">'SAMPLE_SIZE'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">100</span> <span class="c1"># once for all</span>
|
||
<span class="n">true_prev</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">])</span> <span class="c1"># let's assume 3 classes</span>
|
||
<span class="n">estim_prev</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">([</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.6</span><span class="p">])</span>
|
||
<span class="n">error</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">mrae</span><span class="p">(</span><span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">)</span>
|
||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'mrae(</span><span class="si">{</span><span class="n">true_prev</span><span class="si">}</span><span class="s1">, </span><span class="si">{</span><span class="n">estim_prev</span><span class="si">}</span><span class="s1">) = </span><span class="si">{</span><span class="n">error</span><span class="si">:</span><span class="s1">.3f</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>will print:</p>
|
||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">mrae</span><span class="p">([</span><span class="mf">0.500</span><span class="p">,</span> <span class="mf">0.300</span><span class="p">,</span> <span class="mf">0.200</span><span class="p">],</span> <span class="p">[</span><span class="mf">0.100</span><span class="p">,</span> <span class="mf">0.300</span><span class="p">,</span> <span class="mf">0.600</span><span class="p">])</span> <span class="o">=</span> <span class="mf">0.914</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>Finally, it is possible to instantiate QuaPy’s quantification
|
||
error functions from strings using, e.g.:</p>
|
||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">error_function</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">from_name</span><span class="p">(</span><span class="s1">'mse'</span><span class="p">)</span>
|
||
<span class="n">error</span> <span class="o">=</span> <span class="n">error_function</span><span class="p">(</span><span class="n">true_prev</span><span class="p">,</span> <span class="n">estim_prev</span><span class="p">)</span>
|
||
</pre></div>
|
||
</div>
|
||
</section>
|
||
<section id="evaluation-protocols">
|
||
<h2>Evaluation Protocols<a class="headerlink" href="#evaluation-protocols" title="Permalink to this heading">¶</a></h2>
|
||
<p>An <em>evaluation protocol</em> is an evaluation procedure that uses
|
||
one specific <em>sample generation procotol</em> to genereate many
|
||
samples, typically characterized by widely varying amounts of
|
||
<em>shift</em> with respect to the original distribution, that are then
|
||
used to evaluate the performance of a (trained) quantifier.
|
||
These protocols are explained in more detail in a dedicated <a class="reference internal" href="Protocols.html"><span class="doc std std-doc">entry
|
||
in the wiki</span></a>. For the moment being, let us assume we already have
|
||
chosen and instantiated one specific such protocol, that we here
|
||
simply call <em>prot</em>. Let also assume our model is called
|
||
<em>quantifier</em> and that our evaluatio measure of choice is
|
||
<em>mae</em>. The evaluation comes down to:</p>
|
||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">mae</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">prot</span><span class="p">,</span> <span class="n">error_metric</span><span class="o">=</span><span class="s1">'mae'</span><span class="p">)</span>
|
||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'MAE = </span><span class="si">{</span><span class="n">mae</span><span class="si">:</span><span class="s1">.4f</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>It is often desirable to evaluate our system using more than one
|
||
single evaluatio measure. In this case, it is convenient to generate
|
||
a <em>report</em>. A report in QuaPy is a dataframe accounting for all the
|
||
true prevalence values with their corresponding prevalence values
|
||
as estimated by the quantifier, along with the error each has given
|
||
rise.</p>
|
||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">report</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">evaluation_report</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">prot</span><span class="p">,</span> <span class="n">error_metrics</span><span class="o">=</span><span class="p">[</span><span class="s1">'mae'</span><span class="p">,</span> <span class="s1">'mrae'</span><span class="p">,</span> <span class="s1">'mkld'</span><span class="p">])</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>From a pandas’ dataframe, it is straightforward to visualize all the results,
|
||
and compute the averaged values, e.g.:</p>
|
||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">pd</span><span class="o">.</span><span class="n">set_option</span><span class="p">(</span><span class="s1">'display.expand_frame_repr'</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span>
|
||
<span class="n">report</span><span class="p">[</span><span class="s1">'estim-prev'</span><span class="p">]</span> <span class="o">=</span> <span class="n">report</span><span class="p">[</span><span class="s1">'estim-prev'</span><span class="p">]</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">strprev</span><span class="p">)</span>
|
||
<span class="nb">print</span><span class="p">(</span><span class="n">report</span><span class="p">)</span>
|
||
|
||
<span class="nb">print</span><span class="p">(</span><span class="s1">'Averaged values:'</span><span class="p">)</span>
|
||
<span class="nb">print</span><span class="p">(</span><span class="n">report</span><span class="o">.</span><span class="n">mean</span><span class="p">())</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>This will produce an output like:</p>
|
||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="n">true</span><span class="o">-</span><span class="n">prev</span> <span class="n">estim</span><span class="o">-</span><span class="n">prev</span> <span class="n">mae</span> <span class="n">mrae</span> <span class="n">mkld</span>
|
||
<span class="mi">0</span> <span class="p">[</span><span class="mf">0.308</span><span class="p">,</span> <span class="mf">0.692</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.314</span><span class="p">,</span> <span class="mf">0.686</span><span class="p">]</span> <span class="mf">0.005649</span> <span class="mf">0.013182</span> <span class="mf">0.000074</span>
|
||
<span class="mi">1</span> <span class="p">[</span><span class="mf">0.896</span><span class="p">,</span> <span class="mf">0.104</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.909</span><span class="p">,</span> <span class="mf">0.091</span><span class="p">]</span> <span class="mf">0.013145</span> <span class="mf">0.069323</span> <span class="mf">0.000985</span>
|
||
<span class="mi">2</span> <span class="p">[</span><span class="mf">0.848</span><span class="p">,</span> <span class="mf">0.152</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.809</span><span class="p">,</span> <span class="mf">0.191</span><span class="p">]</span> <span class="mf">0.039063</span> <span class="mf">0.149806</span> <span class="mf">0.005175</span>
|
||
<span class="mi">3</span> <span class="p">[</span><span class="mf">0.016</span><span class="p">,</span> <span class="mf">0.984</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.033</span><span class="p">,</span> <span class="mf">0.967</span><span class="p">]</span> <span class="mf">0.017236</span> <span class="mf">0.487529</span> <span class="mf">0.005298</span>
|
||
<span class="mi">4</span> <span class="p">[</span><span class="mf">0.728</span><span class="p">,</span> <span class="mf">0.272</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.751</span><span class="p">,</span> <span class="mf">0.249</span><span class="p">]</span> <span class="mf">0.022769</span> <span class="mf">0.057146</span> <span class="mf">0.001350</span>
|
||
<span class="o">...</span> <span class="o">...</span> <span class="o">...</span> <span class="o">...</span> <span class="o">...</span> <span class="o">...</span>
|
||
<span class="mi">4995</span> <span class="p">[</span><span class="mf">0.72</span><span class="p">,</span> <span class="mf">0.28</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.698</span><span class="p">,</span> <span class="mf">0.302</span><span class="p">]</span> <span class="mf">0.021752</span> <span class="mf">0.053631</span> <span class="mf">0.001133</span>
|
||
<span class="mi">4996</span> <span class="p">[</span><span class="mf">0.868</span><span class="p">,</span> <span class="mf">0.132</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.888</span><span class="p">,</span> <span class="mf">0.112</span><span class="p">]</span> <span class="mf">0.020490</span> <span class="mf">0.088230</span> <span class="mf">0.001985</span>
|
||
<span class="mi">4997</span> <span class="p">[</span><span class="mf">0.292</span><span class="p">,</span> <span class="mf">0.708</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.298</span><span class="p">,</span> <span class="mf">0.702</span><span class="p">]</span> <span class="mf">0.006149</span> <span class="mf">0.014788</span> <span class="mf">0.000090</span>
|
||
<span class="mi">4998</span> <span class="p">[</span><span class="mf">0.24</span><span class="p">,</span> <span class="mf">0.76</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.220</span><span class="p">,</span> <span class="mf">0.780</span><span class="p">]</span> <span class="mf">0.019950</span> <span class="mf">0.054309</span> <span class="mf">0.001127</span>
|
||
<span class="mi">4999</span> <span class="p">[</span><span class="mf">0.948</span><span class="p">,</span> <span class="mf">0.052</span><span class="p">]</span> <span class="p">[</span><span class="mf">0.965</span><span class="p">,</span> <span class="mf">0.035</span><span class="p">]</span> <span class="mf">0.016941</span> <span class="mf">0.165776</span> <span class="mf">0.003538</span>
|
||
|
||
<span class="p">[</span><span class="mi">5000</span> <span class="n">rows</span> <span class="n">x</span> <span class="mi">5</span> <span class="n">columns</span><span class="p">]</span>
|
||
<span class="n">Averaged</span> <span class="n">values</span><span class="p">:</span>
|
||
<span class="n">mae</span> <span class="mf">0.023588</span>
|
||
<span class="n">mrae</span> <span class="mf">0.108779</span>
|
||
<span class="n">mkld</span> <span class="mf">0.003631</span>
|
||
<span class="n">dtype</span><span class="p">:</span> <span class="n">float64</span>
|
||
|
||
<span class="n">Process</span> <span class="n">finished</span> <span class="k">with</span> <span class="n">exit</span> <span class="n">code</span> <span class="mi">0</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>Alternatively, we can simply generate all the predictions by:</p>
|
||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">true_prevs</span><span class="p">,</span> <span class="n">estim_prevs</span> <span class="o">=</span> <span class="n">qp</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">prediction</span><span class="p">(</span><span class="n">quantifier</span><span class="p">,</span> <span class="n">protocol</span><span class="o">=</span><span class="n">prot</span><span class="p">)</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>All the evaluation functions implement specific optimizations for speeding-up
|
||
the evaluation of aggregative quantifiers (i.e., of instances of <em>AggregativeQuantifier</em>).
|
||
The optimization comes down to generating classification predictions (either crisp or soft)
|
||
only once for the entire test set, and then applying the sampling procedure to the
|
||
predictions, instead of generating samples of instances and then computing the
|
||
classification predictions every time. This is only possible when the protocol
|
||
is an instance of <em>OnLabelledCollectionProtocol</em>. The optimization is only
|
||
carried out when the number of classification predictions thus generated would be
|
||
smaller than the number of predictions required for the entire protocol; e.g.,
|
||
if the original dataset contains 1M instances, but the protocol is such that it would
|
||
at most generate 20 samples of 100 instances, then it would be preferable to postpone the
|
||
classification for each sample. This behaviour is indicated by setting
|
||
<em>aggr_speedup=”auto”</em>. Conversely, when indicating <em>aggr_speedup=”force”</em> QuaPy will
|
||
precompute all the predictions irrespectively of the number of instances and number of samples.
|
||
Finally, this can be deactivated by setting <em>aggr_speedup=False</em>. Note that this optimization
|
||
is not only applied for the final evaluation, but also for the internal evaluations carried
|
||
out during <em>model selection</em>. Since these are typically many, the heuristic can help reduce the
|
||
execution time a lot.</p>
|
||
</section>
|
||
</section>
|
||
|
||
|
||
<div class="clearer"></div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||
<div class="sphinxsidebarwrapper">
|
||
<div>
|
||
<h3><a href="index.html">Table of Contents</a></h3>
|
||
<ul>
|
||
<li><a class="reference internal" href="#">Evaluation</a><ul>
|
||
<li><a class="reference internal" href="#error-measures">Error Measures</a></li>
|
||
<li><a class="reference internal" href="#evaluation-protocols">Evaluation Protocols</a></li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
|
||
</div>
|
||
<div>
|
||
<h4>Previous topic</h4>
|
||
<p class="topless"><a href="Datasets.html"
|
||
title="previous chapter">Datasets</a></p>
|
||
</div>
|
||
<div>
|
||
<h4>Next topic</h4>
|
||
<p class="topless"><a href="Protocols.html"
|
||
title="next chapter">Protocols</a></p>
|
||
</div>
|
||
<div role="note" aria-label="source link">
|
||
<h3>This Page</h3>
|
||
<ul class="this-page-menu">
|
||
<li><a href="_sources/Evaluation.md.txt"
|
||
rel="nofollow">Show Source</a></li>
|
||
</ul>
|
||
</div>
|
||
<div id="searchbox" style="display: none" role="search">
|
||
<h3 id="searchlabel">Quick search</h3>
|
||
<div class="searchformwrapper">
|
||
<form class="search" action="search.html" method="get">
|
||
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
|
||
<input type="submit" value="Go" />
|
||
</form>
|
||
</div>
|
||
</div>
|
||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||
</div>
|
||
</div>
|
||
<div class="clearer"></div>
|
||
</div>
|
||
<div class="related" role="navigation" aria-label="related navigation">
|
||
<h3>Navigation</h3>
|
||
<ul>
|
||
<li class="right" style="margin-right: 10px">
|
||
<a href="genindex.html" title="General Index"
|
||
>index</a></li>
|
||
<li class="right" >
|
||
<a href="py-modindex.html" title="Python Module Index"
|
||
>modules</a> |</li>
|
||
<li class="right" >
|
||
<a href="Protocols.html" title="Protocols"
|
||
>next</a> |</li>
|
||
<li class="right" >
|
||
<a href="Datasets.html" title="Datasets"
|
||
>previous</a> |</li>
|
||
<li class="nav-item nav-item-0"><a href="index.html">QuaPy 0.1.7 documentation</a> »</li>
|
||
<li class="nav-item nav-item-this"><a href="">Evaluation</a></li>
|
||
</ul>
|
||
</div>
|
||
<div class="footer" role="contentinfo">
|
||
© Copyright 2021, Alejandro Moreo.
|
||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
|
||
</div>
|
||
</body>
|
||
</html> |