<spanid="quapy-data-base-module"></span><h2>quapy.data.base module<aclass="headerlink"href="#module-quapy.data.base"title="Link to this heading"></a></h2>
<p>Abstraction of training and test <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a> objects.</p>
<emclass="property"><spanclass="pre">classmethod</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">SplitStratified</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">collection</span></span><spanclass="p"><spanclass="pre">:</span></span><spanclass="w"></span><spanclass="n"><aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><spanclass="pre">LabelledCollection</span></a></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">train_size</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">0.6</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#Dataset.SplitStratified"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.Dataset.SplitStratified"title="Link to this definition"></a></dt>
<dd><p>Generates a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">Dataset</span></code></a> from a stratified split of a <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a> instance.
See <aclass="reference internal"href="#quapy.data.base.LabelledCollection.split_stratified"title="quapy.data.base.LabelledCollection.split_stratified"><codeclass="xref py py-meth docutils literal notranslate"><spanclass="pre">LabelledCollection.split_stratified()</span></code></a></p>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">binary</span></span><aclass="headerlink"href="#quapy.data.base.Dataset.binary"title="Link to this definition"></a></dt>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">classes_</span></span><aclass="headerlink"href="#quapy.data.base.Dataset.classes_"title="Link to this definition"></a></dt>
<emclass="property"><spanclass="pre">classmethod</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">kFCV</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">data</span></span><spanclass="p"><spanclass="pre">:</span></span><spanclass="w"></span><spanclass="n"><aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><spanclass="pre">LabelledCollection</span></a></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">nfolds</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">5</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">nrepeats</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">1</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">random_state</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">0</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#Dataset.kFCV"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.Dataset.kFCV"title="Link to this definition"></a></dt>
<dd><p>Generator of stratified folds to be used in k-fold cross validation. This function is only a wrapper around
<aclass="reference internal"href="#quapy.data.base.LabelledCollection.kFCV"title="quapy.data.base.LabelledCollection.kFCV"><codeclass="xref py py-meth docutils literal notranslate"><spanclass="pre">LabelledCollection.kFCV()</span></code></a> that returns <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">Dataset</span></code></a> instances made of training and test folds.</p>
<emclass="property"><spanclass="pre">classmethod</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">load</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">train_path</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">test_path</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">loader_func</span></span><spanclass="p"><spanclass="pre">:</span></span><spanclass="w"></span><spanclass="n"><spanclass="pre">callable</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">classes</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em>, <emclass="sig-param"><spanclass="o"><spanclass="pre">**</span></span><spanclass="n"><spanclass="pre">loader_kwargs</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#Dataset.load"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.Dataset.load"title="Link to this definition"></a></dt>
<dd><p>Loads a training and a test labelled set of data and convert it into a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">Dataset</span></code></a> instance.
The function in charge of reading the instances must be specified. This function can be a custom one, or any of
the reading functions defined in <aclass="reference internal"href="#module-quapy.data.reader"title="quapy.data.reader"><codeclass="xref py py-mod docutils literal notranslate"><spanclass="pre">quapy.data.reader</span></code></a> module.</p>
<li><p><strong>train_path</strong>– string, the path to the file containing the training instances</p></li>
<li><p><strong>test_path</strong>– string, the path to the file containing the test instances</p></li>
<li><p><strong>loader_func</strong>– a custom function that implements the data loader and returns a tuple with instances and
labels</p></li>
<li><p><strong>classes</strong>– array-like, the classes according to which the instances are labelled</p></li>
<li><p><strong>loader_kwargs</strong>– any argument that the <cite>loader_func</cite> function needs in order to read the instances.
See <aclass="reference internal"href="#quapy.data.base.LabelledCollection.load"title="quapy.data.base.LabelledCollection.load"><codeclass="xref py py-meth docutils literal notranslate"><spanclass="pre">LabelledCollection.load()</span></code></a> for further details.</p></li>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">n_classes</span></span><aclass="headerlink"href="#quapy.data.base.Dataset.n_classes"title="Link to this definition"></a></dt>
<spanclass="sig-name descname"><spanclass="pre">reduce</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">n_train</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">100</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">n_test</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">100</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#Dataset.reduce"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.Dataset.reduce"title="Link to this definition"></a></dt>
<spanclass="sig-name descname"><spanclass="pre">stats</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">show</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">True</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#Dataset.stats"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.Dataset.stats"title="Link to this definition"></a></dt>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">train_test</span></span><aclass="headerlink"href="#quapy.data.base.Dataset.train_test"title="Link to this definition"></a></dt>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">vocabulary_size</span></span><aclass="headerlink"href="#quapy.data.base.Dataset.vocabulary_size"title="Link to this definition"></a></dt>
<emclass="property"><spanclass="pre">class</span><spanclass="w"></span></em><spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.base.</span></span><spanclass="sig-name descname"><spanclass="pre">LabelledCollection</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">instances</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">labels</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">classes</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#LabelledCollection"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.LabelledCollection"title="Link to this definition"></a></dt>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">X</span></span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.X"title="Link to this definition"></a></dt>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">Xp</span></span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.Xp"title="Link to this definition"></a></dt>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">Xy</span></span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.Xy"title="Link to this definition"></a></dt>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">binary</span></span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.binary"title="Link to this definition"></a></dt>
<spanclass="sig-name descname"><spanclass="pre">counts</span></span><spanclass="sig-paren">(</span><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#LabelledCollection.counts"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.LabelledCollection.counts"title="Link to this definition"></a></dt>
<emclass="property"><spanclass="pre">classmethod</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">join</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="o"><spanclass="pre">*</span></span><spanclass="n"><spanclass="pre">args</span></span><spanclass="p"><spanclass="pre">:</span></span><spanclass="w"></span><spanclass="n"><spanclass="pre">Iterable</span><spanclass="p"><spanclass="pre">[</span></span><aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><spanclass="pre">LabelledCollection</span></a><spanclass="p"><spanclass="pre">]</span></span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#LabelledCollection.join"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.LabelledCollection.join"title="Link to this definition"></a></dt>
<dd><p>Returns a new <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a> as the union of the collections given in input.</p>
<ddclass="field-even"><p>a <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a> representing the union of both collections</p>
<spanclass="sig-name descname"><spanclass="pre">kFCV</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">nfolds</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">5</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">nrepeats</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">1</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">random_state</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#LabelledCollection.kFCV"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.LabelledCollection.kFCV"title="Link to this definition"></a></dt>
<emclass="property"><spanclass="pre">classmethod</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">load</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">path</span></span><spanclass="p"><spanclass="pre">:</span></span><spanclass="w"></span><spanclass="n"><spanclass="pre">str</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">loader_func</span></span><spanclass="p"><spanclass="pre">:</span></span><spanclass="w"></span><spanclass="n"><spanclass="pre">callable</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">classes</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em>, <emclass="sig-param"><spanclass="o"><spanclass="pre">**</span></span><spanclass="n"><spanclass="pre">loader_kwargs</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#LabelledCollection.load"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.LabelledCollection.load"title="Link to this definition"></a></dt>
<dd><p>Loads a labelled set of data and convert it into a <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a> instance. The function in charge
of reading the instances must be specified. This function can be a custom one, or any of the reading functions
defined in <aclass="reference internal"href="#module-quapy.data.reader"title="quapy.data.reader"><codeclass="xref py py-mod docutils literal notranslate"><spanclass="pre">quapy.data.reader</span></code></a> module.</p>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">n_classes</span></span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.n_classes"title="Link to this definition"></a></dt>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">p</span></span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.p"title="Link to this definition"></a></dt>
<spanclass="sig-name descname"><spanclass="pre">prevalence</span></span><spanclass="sig-paren">(</span><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#LabelledCollection.prevalence"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.LabelledCollection.prevalence"title="Link to this definition"></a></dt>
<spanclass="sig-name descname"><spanclass="pre">sampling</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">size</span></span></em>, <emclass="sig-param"><spanclass="o"><spanclass="pre">*</span></span><spanclass="n"><spanclass="pre">prevs</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">shuffle</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">True</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">random_state</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#LabelledCollection.sampling"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.LabelledCollection.sampling"title="Link to this definition"></a></dt>
<dd><p>Return a random sample (an instance of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a>) of desired size and desired prevalence
values. For each class, the sampling is drawn without replacement if the requested prevalence is larger than
the actual prevalence of the class, or with replacement otherwise.</p>
<ddclass="field-even"><p>an instance of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a> with length == <cite>size</cite> and prevalence close to <cite>prevs</cite> (or
prevalence == <cite>prevs</cite> if the exact prevalence values can be met as proportions of instances)</p>
<spanclass="sig-name descname"><spanclass="pre">sampling_from_index</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">index</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#LabelledCollection.sampling_from_index"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.LabelledCollection.sampling_from_index"title="Link to this definition"></a></dt>
<dd><p>Returns an instance of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a> whose elements are sampled from this collection using the
<spanclass="sig-name descname"><spanclass="pre">sampling_index</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">size</span></span></em>, <emclass="sig-param"><spanclass="o"><spanclass="pre">*</span></span><spanclass="n"><spanclass="pre">prevs</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">shuffle</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">True</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">random_state</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#LabelledCollection.sampling_index"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.LabelledCollection.sampling_index"title="Link to this definition"></a></dt>
<spanclass="sig-name descname"><spanclass="pre">split_random</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">train_prop</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">0.6</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">random_state</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#LabelledCollection.split_random"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.LabelledCollection.split_random"title="Link to this definition"></a></dt>
<dd><p>Returns two instances of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a> split randomly from this collection, at desired
<ddclass="field-even"><p>two instances of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a>, the first one with <cite>train_prop</cite> elements, and the
second one with <cite>1-train_prop</cite> elements</p>
<spanclass="sig-name descname"><spanclass="pre">split_stratified</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">train_prop</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">0.6</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">random_state</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#LabelledCollection.split_stratified"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.LabelledCollection.split_stratified"title="Link to this definition"></a></dt>
<dd><p>Returns two instances of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a> split with stratification from this collection, at desired
<ddclass="field-even"><p>two instances of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a>, the first one with <cite>train_prop</cite> elements, and the
second one with <cite>1-train_prop</cite> elements</p>
<spanclass="sig-name descname"><spanclass="pre">stats</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">show</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">True</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#LabelledCollection.stats"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.LabelledCollection.stats"title="Link to this definition"></a></dt>
<spanclass="sig-name descname"><spanclass="pre">uniform_sampling</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">size</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">random_state</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#LabelledCollection.uniform_sampling"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.LabelledCollection.uniform_sampling"title="Link to this definition"></a></dt>
<dd><p>Returns a uniform sample (an instance of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a>) of desired size. The sampling is drawn
<spanclass="sig-name descname"><spanclass="pre">uniform_sampling_index</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">size</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">random_state</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/base.html#LabelledCollection.uniform_sampling_index"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.base.LabelledCollection.uniform_sampling_index"title="Link to this definition"></a></dt>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">y</span></span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.y"title="Link to this definition"></a></dt>
<spanid="quapy-data-datasets-module"></span><h2>quapy.data.datasets module<aclass="headerlink"href="#module-quapy.data.datasets"title="Link to this heading"></a></h2>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.datasets.</span></span><spanclass="sig-name descname"><spanclass="pre">fetch_IFCB</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">single_sample_train</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">True</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">for_model_selection</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">False</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">data_home</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/datasets.html#fetch_IFCB"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.datasets.fetch_IFCB"title="Link to this definition"></a></dt>
<dd><p>Loads the IFCB dataset for quantification from <aclass="reference external"href="https://zenodo.org/records/10036244">Zenodo</a> (for more
information on this dataset, please follow the zenodo link).
This dataset is based on the data available publicly at
If false, a generator of training samples will be returned. Each example in the training set has an individual label.</p></li>
<li><p><strong>for_model_selection</strong>– if True, then returns a split 30% of the training set (86 out of 286 samples) to be used for model selection;
if False, then returns the full training set as training set and the test set as the test set</p></li>
<aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.LabelledCollection</span></code></a>, if <cite>single_sample_train</cite> is true or
<codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data._ifcb.IFCBTrainSamplesFromDir</span></code>, i.e. a sampling protocol that returns a series of samples
labelled example by example. test_gen will be a <codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data._ifcb.IFCBTestSamples</span></code>,
<dd><p>Loads a UCI dataset as an instance of <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a>, as used in
<aclass="reference external"href="https://www.sciencedirect.com/science/article/pii/S1566253516300628">Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).
Using ensembles for problems with characterizable changes in data distribution: A case study on quantification.
Information Fusion, 34, 87-100.</a>
and
<aclass="reference external"href="https://www.sciencedirect.com/science/article/pii/S1566253517303652">Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).
Dynamic ensemble selection for quantification tasks.
The datasets do not come with a predefined train-test split (see <codeclass="xref py py-meth docutils literal notranslate"><spanclass="pre">fetch_UCILabelledCollection()</span></code> for further
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.datasets.</span></span><spanclass="sig-name descname"><spanclass="pre">fetch_UCIBinaryLabelledCollection</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">dataset_name</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">data_home</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">verbose</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">False</span></span></em><spanclass="sig-paren">)</span><spanclass="sig-return"><spanclass="sig-return-icon">→</span><spanclass="sig-return-typehint"><aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><spanclass="pre">LabelledCollection</span></a></span></span><aclass="reference internal"href="_modules/quapy/data/datasets.html#fetch_UCIBinaryLabelledCollection"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.datasets.fetch_UCIBinaryLabelledCollection"title="Link to this definition"></a></dt>
<dd><p>Loads a UCI collection as an instance of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.LabelledCollection</span></code></a>, as used in
<aclass="reference external"href="https://www.sciencedirect.com/science/article/pii/S1566253516300628">Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).
Using ensembles for problems with characterizable changes in data distribution: A case study on quantification.
Information Fusion, 34, 87-100.</a>
and
<aclass="reference external"href="https://www.sciencedirect.com/science/article/pii/S1566253517303652">Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).
Dynamic ensemble selection for quantification tasks.
Information Fusion, 45, 1-15.</a>.
The datasets do not come with a predefined train-test split, and so Pérez-Gállego et al. adopted a 5FCVx2 evaluation
protocol, meaning that each collection was used to generate two rounds (hence the x2) of 5 fold cross validation.
This can be reproduced by using <aclass="reference internal"href="#quapy.data.base.Dataset.kFCV"title="quapy.data.base.Dataset.kFCV"><codeclass="xref py py-meth docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset.kFCV()</span></code></a>, e.g.:</p>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.datasets.</span></span><spanclass="sig-name descname"><spanclass="pre">fetch_UCIMulticlassDataset</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">dataset_name</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">data_home</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">test_split</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">0.3</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">verbose</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">False</span></span></em><spanclass="sig-paren">)</span><spanclass="sig-return"><spanclass="sig-return-icon">→</span><spanclass="sig-return-typehint"><aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><spanclass="pre">Dataset</span></a></span></span><aclass="reference internal"href="_modules/quapy/data/datasets.html#fetch_UCIMulticlassDataset"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.datasets.fetch_UCIMulticlassDataset"title="Link to this definition"></a></dt>
<dd><p>Loads a UCI multiclass dataset as an instance of <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a>.</p>
<p>The list of available datasets is taken from <aclass="reference external"href="https://archive.ics.uci.edu/">https://archive.ics.uci.edu/</a>, following these criteria:
- It has more than 1000 instances
- It is suited for classification
- It has more than two classes
- It is available for Python import (requires ucimlrepo package)</p>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.datasets.</span></span><spanclass="sig-name descname"><spanclass="pre">fetch_UCIMulticlassLabelledCollection</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">dataset_name</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">data_home</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">verbose</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">False</span></span></em><spanclass="sig-paren">)</span><spanclass="sig-return"><spanclass="sig-return-icon">→</span><spanclass="sig-return-typehint"><aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><spanclass="pre">LabelledCollection</span></a></span></span><aclass="reference internal"href="_modules/quapy/data/datasets.html#fetch_UCIMulticlassLabelledCollection"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.datasets.fetch_UCIMulticlassLabelledCollection"title="Link to this definition"></a></dt>
<dd><p>Loads a UCI multiclass collection as an instance of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.LabelledCollection</span></code></a>.</p>
<p>The list of available datasets is taken from <aclass="reference external"href="https://archive.ics.uci.edu/">https://archive.ics.uci.edu/</a>, following these criteria:
- It has more than 1000 instances
- It is suited for classification
- It has more than two classes
- It is available for Python import (requires ucimlrepo package)</p>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.datasets.</span></span><spanclass="sig-name descname"><spanclass="pre">fetch_lequa2022</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">task</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">data_home</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/datasets.html#fetch_lequa2022"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.datasets.fetch_lequa2022"title="Link to this definition"></a></dt>
<dd><p>Loads the official datasets provided for the <aclass="reference external"href="https://lequa2022.github.io/index">LeQua</a> competition.
In brief, there are 4 tasks (T1A, T1B, T2A, T2B) having to do with text quantification
problems. Tasks T1A and T1B provide documents in vector form, while T2A and T2B provide raw documents instead.
Tasks T1A and T2A are binary sentiment quantification problems, while T2A and T2B are multiclass quantification
problems consisting of estimating the class prevalence values of 28 different merchandise products.
We refer to the <aclass="reference external"href="https://ceur-ws.org/Vol-3180/paper-146.pdf">Esuli, A., Moreo, A., Sebastiani, F., & Sperduti, G. (2022).
A Detailed Overview of LeQua@ CLEF 2022: Learning to Quantify.</a> for a detailed description
on the tasks and datasets.</p>
<p>The datasets are downloaded only once, and stored for fast reuse.</p>
<p>See <cite>lequa2022_experiments.py</cite> provided in the example folder, that can serve as a guide on how to use these
<ddclass="field-even"><p>a tuple <cite>(train, val_gen, test_gen)</cite> where <cite>train</cite> is an instance of
<aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.LabelledCollection</span></code></a>, <cite>val_gen</cite> and <cite>test_gen</cite> are instances of
<dd><p>Loads a Reviews dataset as a Dataset instance, as used in
<aclass="reference external"href="https://dl.acm.org/doi/abs/10.1145/3269206.3269287">Esuli, A., Moreo, A., and Sebastiani, F. “A recurrent neural network for sentiment quantification.”
Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018.</a>.
The list of valid dataset names can be accessed in <cite>quapy.data.datasets.REVIEWS_SENTIMENT_DATASETS</cite></p>
<dd><p>Loads a Twitter dataset as a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> instance, as used in:
<aclass="reference external"href="https://link.springer.com/content/pdf/10.1007/s13278-016-0327-z.pdf">Gao, W., Sebastiani, F.: From classification to quantification in tweet sentiment analysis.
Social Network Analysis and Mining6(19), 1–22 (2016)</a>
Note that the datasets ‘semeval13’, ‘semeval14’, ‘semeval15’ share the same training set.
The list of valid dataset names corresponding to training sets can be accessed in
<cite>quapy.data.datasets.TWITTER_SENTIMENT_DATASETS_TRAIN</cite>, while the test sets can be accessed in
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.datasets.</span></span><spanclass="sig-name descname"><spanclass="pre">warn</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="o"><spanclass="pre">*</span></span><spanclass="n"><spanclass="pre">args</span></span></em>, <emclass="sig-param"><spanclass="o"><spanclass="pre">**</span></span><spanclass="n"><spanclass="pre">kwargs</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/datasets.html#warn"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.datasets.warn"title="Link to this definition"></a></dt>
<spanid="quapy-data-preprocessing-module"></span><h2>quapy.data.preprocessing module<aclass="headerlink"href="#module-quapy.data.preprocessing"title="Link to this heading"></a></h2>
<emclass="property"><spanclass="pre">class</span><spanclass="w"></span></em><spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.preprocessing.</span></span><spanclass="sig-name descname"><spanclass="pre">IndexTransformer</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="o"><spanclass="pre">**</span></span><spanclass="n"><spanclass="pre">kwargs</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/preprocessing.html#IndexTransformer"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.preprocessing.IndexTransformer"title="Link to this definition"></a></dt>
<spanclass="sig-name descname"><spanclass="pre">add_word</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">word</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">id</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">nogaps</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">True</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/preprocessing.html#IndexTransformer.add_word"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.preprocessing.IndexTransformer.add_word"title="Link to this definition"></a></dt>
<spanclass="sig-name descname"><spanclass="pre">fit</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">X</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/preprocessing.html#IndexTransformer.fit"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.preprocessing.IndexTransformer.fit"title="Link to this definition"></a></dt>
<spanclass="sig-name descname"><spanclass="pre">fit_transform</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">X</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">n_jobs</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/preprocessing.html#IndexTransformer.fit_transform"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.preprocessing.IndexTransformer.fit_transform"title="Link to this definition"></a></dt>
<spanclass="sig-name descname"><spanclass="pre">transform</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">X</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">n_jobs</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/preprocessing.html#IndexTransformer.transform"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.preprocessing.IndexTransformer.transform"title="Link to this definition"></a></dt>
<spanclass="sig-name descname"><spanclass="pre">vocabulary_size</span></span><spanclass="sig-paren">(</span><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/preprocessing.html#IndexTransformer.vocabulary_size"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.preprocessing.IndexTransformer.vocabulary_size"title="Link to this definition"></a></dt>
<dd><p>Indexes the tokens of a textual <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> of string documents.
To index a document means to replace each different token by a unique numerical index.
Rare words (i.e., words occurring less than <cite>min_df</cite> times) are replaced by a special token <cite>UNK</cite></p>
<li><p><strong>dataset</strong>– a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> object where the instances of training and test documents
are lists of str</p></li>
<li><p><strong>min_df</strong>– minimum number of occurrences below which the term is replaced by a <cite>UNK</cite> index</p></li>
<li><p><strong>inplace</strong>– whether or not to apply the transformation inplace (True), or to a new copy (False, default)</p></li>
<ddclass="field-even"><p>a new <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> (if inplace=False) or a reference to the current
<aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> (inplace=True) consisting of lists of integer values representing indices.</p>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.preprocessing.</span></span><spanclass="sig-name descname"><spanclass="pre">reduce_columns</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">dataset</span></span><spanclass="p"><spanclass="pre">:</span></span><spanclass="w"></span><spanclass="n"><aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><spanclass="pre">Dataset</span></a></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">min_df</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">5</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">inplace</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">False</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/preprocessing.html#reduce_columns"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.preprocessing.reduce_columns"title="Link to this definition"></a></dt>
<li><p><strong>dataset</strong>– a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> in which instances are represented in sparse format (any
subtype of scipy.sparse.spmatrix)</p></li>
<li><p><strong>min_df</strong>– integer, minimum number of instances below which the columns are removed</p></li>
<li><p><strong>inplace</strong>– whether or not to apply the transformation inplace (True), or to a new copy (False, default)</p></li>
<ddclass="field-even"><p>a new <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> (if inplace=False) or a reference to the current
<aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> (inplace=True) where the dimensions corresponding to infrequent terms
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.preprocessing.</span></span><spanclass="sig-name descname"><spanclass="pre">standardize</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">dataset</span></span><spanclass="p"><spanclass="pre">:</span></span><spanclass="w"></span><spanclass="n"><aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><spanclass="pre">Dataset</span></a></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">inplace</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">False</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/preprocessing.html#standardize"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.preprocessing.standardize"title="Link to this definition"></a></dt>
<dd><p>Standardizes the real-valued columns of a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a>.
Standardization, aka z-scoring, of a variable <cite>X</cite> comes down to subtracting the average and normalizing by the
<li><p><strong>dataset</strong>– a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> object</p></li>
<li><p><strong>inplace</strong>– set to True if the transformation is to be applied inplace, or to False (default) if a new
<aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> is to be returned</p></li>
<dd><p>Transforms a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> of textual instances into a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> of
<li><p><strong>dataset</strong>– a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> where the instances of training and test collections are
lists of str</p></li>
<li><p><strong>min_df</strong>– minimum number of occurrences for a word to be considered as part of the vocabulary (default 3)</p></li>
<li><p><strong>sublinear_tf</strong>– whether or not to apply the log scalling to the tf counters (default True)</p></li>
<li><p><strong>inplace</strong>– whether or not to apply the transformation inplace (True), or to a new copy (False, default)</p></li>
<li><p><strong>kwargs</strong>– the rest of parameters of the transformation (as for sklearn’s
<ddclass="field-even"><p>a new <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> in <cite>csr_matrix</cite> format (if inplace=False) or a reference to the
current Dataset (if inplace=True) where the instances are stored in a <cite>csr_matrix</cite> of real-valued tfidf scores</p>
<spanid="quapy-data-reader-module"></span><h2>quapy.data.reader module<aclass="headerlink"href="#module-quapy.data.reader"title="Link to this heading"></a></h2>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.reader.</span></span><spanclass="sig-name descname"><spanclass="pre">binarize</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">y</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">pos_class</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/reader.html#binarize"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.reader.binarize"title="Link to this definition"></a></dt>
<ddclass="field-even"><p>a binary np.ndarray, in which values 1 corresponds to positions in whcih <cite>y</cite> had <cite>pos_class</cite> labels, and
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.reader.</span></span><spanclass="sig-name descname"><spanclass="pre">from_csv</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">path</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">encoding</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">'utf-8'</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/reader.html#from_csv"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.reader.from_csv"title="Link to this definition"></a></dt>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.reader.</span></span><spanclass="sig-name descname"><spanclass="pre">from_sparse</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">path</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/reader.html#from_sparse"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.reader.from_sparse"title="Link to this definition"></a></dt>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.reader.</span></span><spanclass="sig-name descname"><spanclass="pre">from_text</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">path</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">encoding</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">'utf-8'</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">verbose</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">1</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">class2int</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">True</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/reader.html#from_text"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.reader.from_text"title="Link to this definition"></a></dt>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.reader.</span></span><spanclass="sig-name descname"><spanclass="pre">reindex_labels</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">y</span></span></em><spanclass="sig-paren">)</span><aclass="reference internal"href="_modules/quapy/data/reader.html#reindex_labels"><spanclass="viewcode-link"><spanclass="pre">[source]</span></span></a><aclass="headerlink"href="#quapy.data.reader.reindex_labels"title="Link to this definition"></a></dt>