<p>Abstraction of training and test <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a> objects.</p>
<emclass="property"><spanclass="pre">classmethod</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">SplitStratified</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">collection</span></span><spanclass="p"><spanclass="pre">:</span></span><spanclass="w"></span><spanclass="n"><aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><spanclass="pre">LabelledCollection</span></a></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">train_size</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">0.6</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.base.Dataset.SplitStratified"title="Permalink to this definition">¶</a></dt>
<dd><p>Generates a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">Dataset</span></code></a> from a stratified split of a <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a> instance.
See <aclass="reference internal"href="#quapy.data.base.LabelledCollection.split_stratified"title="quapy.data.base.LabelledCollection.split_stratified"><codeclass="xref py py-meth docutils literal notranslate"><spanclass="pre">LabelledCollection.split_stratified()</span></code></a></p>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">binary</span></span><aclass="headerlink"href="#quapy.data.base.Dataset.binary"title="Permalink to this definition">¶</a></dt>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">classes_</span></span><aclass="headerlink"href="#quapy.data.base.Dataset.classes_"title="Permalink to this definition">¶</a></dt>
<emclass="property"><spanclass="pre">classmethod</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">kFCV</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">data</span></span><spanclass="p"><spanclass="pre">:</span></span><spanclass="w"></span><spanclass="n"><aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><spanclass="pre">LabelledCollection</span></a></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">nfolds</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">5</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">nrepeats</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">1</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">random_state</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">0</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.base.Dataset.kFCV"title="Permalink to this definition">¶</a></dt>
<dd><p>Generator of stratified folds to be used in k-fold cross validation. This function is only a wrapper around
<aclass="reference internal"href="#quapy.data.base.LabelledCollection.kFCV"title="quapy.data.base.LabelledCollection.kFCV"><codeclass="xref py py-meth docutils literal notranslate"><spanclass="pre">LabelledCollection.kFCV()</span></code></a> that returns <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">Dataset</span></code></a> instances made of training and test folds.</p>
<emclass="property"><spanclass="pre">classmethod</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">load</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">train_path</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">test_path</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">loader_func</span></span><spanclass="p"><spanclass="pre">:</span></span><spanclass="w"></span><spanclass="n"><spanclass="pre">callable</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">classes</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em>, <emclass="sig-param"><spanclass="o"><spanclass="pre">**</span></span><spanclass="n"><spanclass="pre">loader_kwargs</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.base.Dataset.load"title="Permalink to this definition">¶</a></dt>
<dd><p>Loads a training and a test labelled set of data and convert it into a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">Dataset</span></code></a> instance.
The function in charge of reading the instances must be specified. This function can be a custom one, or any of
the reading functions defined in <aclass="reference internal"href="#module-quapy.data.reader"title="quapy.data.reader"><codeclass="xref py py-mod docutils literal notranslate"><spanclass="pre">quapy.data.reader</span></code></a> module.</p>
<li><p><strong>train_path</strong>– string, the path to the file containing the training instances</p></li>
<li><p><strong>test_path</strong>– string, the path to the file containing the test instances</p></li>
<li><p><strong>loader_func</strong>– a custom function that implements the data loader and returns a tuple with instances and
labels</p></li>
<li><p><strong>classes</strong>– array-like, the classes according to which the instances are labelled</p></li>
<li><p><strong>loader_kwargs</strong>– any argument that the <cite>loader_func</cite> function needs in order to read the instances.
See <aclass="reference internal"href="#quapy.data.base.LabelledCollection.load"title="quapy.data.base.LabelledCollection.load"><codeclass="xref py py-meth docutils literal notranslate"><spanclass="pre">LabelledCollection.load()</span></code></a> for further details.</p></li>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">n_classes</span></span><aclass="headerlink"href="#quapy.data.base.Dataset.n_classes"title="Permalink to this definition">¶</a></dt>
<spanclass="sig-name descname"><spanclass="pre">stats</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">show</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">True</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.base.Dataset.stats"title="Permalink to this definition">¶</a></dt>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">train_test</span></span><aclass="headerlink"href="#quapy.data.base.Dataset.train_test"title="Permalink to this definition">¶</a></dt>
<dd><p>Alias to <cite>self.training</cite> and <cite>self.test</cite></p>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">vocabulary_size</span></span><aclass="headerlink"href="#quapy.data.base.Dataset.vocabulary_size"title="Permalink to this definition">¶</a></dt>
<emclass="property"><spanclass="pre">class</span><spanclass="w"></span></em><spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.base.</span></span><spanclass="sig-name descname"><spanclass="pre">LabelledCollection</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">instances</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">labels</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">classes_</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.base.LabelledCollection"title="Permalink to this definition">¶</a></dt>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">X</span></span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.X"title="Permalink to this definition">¶</a></dt>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">Xp</span></span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.Xp"title="Permalink to this definition">¶</a></dt>
<dd><p>Gets the instances and the true prevalence. This is useful when implementing evaluation protocols from
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">Xy</span></span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.Xy"title="Permalink to this definition">¶</a></dt>
<dd><p>Gets the instances and labels. This is useful when working with <cite>sklearn</cite> estimators, e.g.:</p>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">binary</span></span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.binary"title="Permalink to this definition">¶</a></dt>
<spanclass="sig-name descname"><spanclass="pre">counts</span></span><spanclass="sig-paren">(</span><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.counts"title="Permalink to this definition">¶</a></dt>
<spanclass="sig-name descname"><spanclass="pre">kFCV</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">nfolds</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">5</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">nrepeats</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">1</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">random_state</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">0</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.kFCV"title="Permalink to this definition">¶</a></dt>
<emclass="property"><spanclass="pre">classmethod</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">load</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">path</span></span><spanclass="p"><spanclass="pre">:</span></span><spanclass="w"></span><spanclass="n"><spanclass="pre">str</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">loader_func</span></span><spanclass="p"><spanclass="pre">:</span></span><spanclass="w"></span><spanclass="n"><spanclass="pre">callable</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">classes</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em>, <emclass="sig-param"><spanclass="o"><spanclass="pre">**</span></span><spanclass="n"><spanclass="pre">loader_kwargs</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.load"title="Permalink to this definition">¶</a></dt>
<dd><p>Loads a labelled set of data and convert it into a <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a> instance. The function in charge
of reading the instances must be specified. This function can be a custom one, or any of the reading functions
defined in <aclass="reference internal"href="#module-quapy.data.reader"title="quapy.data.reader"><codeclass="xref py py-mod docutils literal notranslate"><spanclass="pre">quapy.data.reader</span></code></a> module.</p>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">n_classes</span></span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.n_classes"title="Permalink to this definition">¶</a></dt>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">p</span></span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.p"title="Permalink to this definition">¶</a></dt>
<spanclass="sig-name descname"><spanclass="pre">prevalence</span></span><spanclass="sig-paren">(</span><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.prevalence"title="Permalink to this definition">¶</a></dt>
<spanclass="sig-name descname"><spanclass="pre">sampling</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">size</span></span></em>, <emclass="sig-param"><spanclass="o"><spanclass="pre">*</span></span><spanclass="n"><spanclass="pre">prevs</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">shuffle</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">True</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.sampling"title="Permalink to this definition">¶</a></dt>
<dd><p>Return a random sample (an instance of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a>) of desired size and desired prevalence
values. For each class, the sampling is drawn without replacement if the requested prevalence is larger than
the actual prevalence of the class, or with replacement otherwise.</p>
<ddclass="field-even"><p>an instance of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a> with length == <cite>size</cite> and prevalence close to <cite>prevs</cite> (or
prevalence == <cite>prevs</cite> if the exact prevalence values can be met as proportions of instances)</p>
<spanclass="sig-name descname"><spanclass="pre">sampling_from_index</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">index</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.sampling_from_index"title="Permalink to this definition">¶</a></dt>
<dd><p>Returns an instance of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a> whose elements are sampled from this collection using the
<spanclass="sig-name descname"><spanclass="pre">sampling_index</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">size</span></span></em>, <emclass="sig-param"><spanclass="o"><spanclass="pre">*</span></span><spanclass="n"><spanclass="pre">prevs</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">shuffle</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">True</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.sampling_index"title="Permalink to this definition">¶</a></dt>
<spanclass="sig-name descname"><spanclass="pre">split_random</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">train_prop</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">0.6</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">random_state</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.split_random"title="Permalink to this definition">¶</a></dt>
<dd><p>Returns two instances of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a> split randomly from this collection, at desired
<ddclass="field-even"><p>two instances of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a>, the first one with <cite>train_prop</cite> elements, and the
second one with <cite>1-train_prop</cite> elements</p>
<spanclass="sig-name descname"><spanclass="pre">split_stratified</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">train_prop</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">0.6</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">random_state</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.split_stratified"title="Permalink to this definition">¶</a></dt>
<dd><p>Returns two instances of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a> split with stratification from this collection, at desired
<ddclass="field-even"><p>two instances of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a>, the first one with <cite>train_prop</cite> elements, and the
second one with <cite>1-train_prop</cite> elements</p>
<spanclass="sig-name descname"><spanclass="pre">stats</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">show</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">True</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.stats"title="Permalink to this definition">¶</a></dt>
<spanclass="sig-name descname"><spanclass="pre">uniform_sampling</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">size</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">random_state</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.uniform_sampling"title="Permalink to this definition">¶</a></dt>
<dd><p>Returns a uniform sample (an instance of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">LabelledCollection</span></code></a>) of desired size. The sampling is drawn
without replacement if the requested size is greater than the number of instances, or with replacement
<spanclass="sig-name descname"><spanclass="pre">uniform_sampling_index</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">size</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">random_state</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.uniform_sampling_index"title="Permalink to this definition">¶</a></dt>
<emclass="property"><spanclass="pre">property</span><spanclass="w"></span></em><spanclass="sig-name descname"><spanclass="pre">y</span></span><aclass="headerlink"href="#quapy.data.base.LabelledCollection.y"title="Permalink to this definition">¶</a></dt>
<spanid="quapy-data-datasets"></span><h2>quapy.data.datasets<aclass="headerlink"href="#module-quapy.data.datasets"title="Permalink to this heading">¶</a></h2>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.datasets.</span></span><spanclass="sig-name descname"><spanclass="pre">fetch_UCIDataset</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">dataset_name</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">data_home</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">test_split</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">0.3</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">verbose</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">False</span></span></em><spanclass="sig-paren">)</span><spanclass="sig-return"><spanclass="sig-return-icon">→</span><spanclass="sig-return-typehint"><aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><spanclass="pre">Dataset</span></a></span></span><aclass="headerlink"href="#quapy.data.datasets.fetch_UCIDataset"title="Permalink to this definition">¶</a></dt>
<dd><p>Loads a UCI dataset as an instance of <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a>, as used in
<aclass="reference external"href="https://www.sciencedirect.com/science/article/pii/S1566253516300628">Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).
Using ensembles for problems with characterizable changes in data distribution: A case study on quantification.
Information Fusion, 34, 87-100.</a>
and
<aclass="reference external"href="https://www.sciencedirect.com/science/article/pii/S1566253517303652">Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).
Dynamic ensemble selection for quantification tasks.
Information Fusion, 45, 1-15.</a>.
The datasets do not come with a predefined train-test split (see <aclass="reference internal"href="#quapy.data.datasets.fetch_UCILabelledCollection"title="quapy.data.datasets.fetch_UCILabelledCollection"><codeclass="xref py py-meth docutils literal notranslate"><spanclass="pre">fetch_UCILabelledCollection()</span></code></a> for further
information on how to use these collections), and so a train-test split is generated at desired proportion.
The list of valid dataset names can be accessed in <cite>quapy.data.datasets.UCI_DATASETS</cite></p>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.datasets.</span></span><spanclass="sig-name descname"><spanclass="pre">fetch_UCILabelledCollection</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">dataset_name</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">data_home</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">verbose</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">False</span></span></em><spanclass="sig-paren">)</span><spanclass="sig-return"><spanclass="sig-return-icon">→</span><spanclass="sig-return-typehint"><aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><spanclass="pre">Dataset</span></a></span></span><aclass="headerlink"href="#quapy.data.datasets.fetch_UCILabelledCollection"title="Permalink to this definition">¶</a></dt>
<dd><p>Loads a UCI collection as an instance of <aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.LabelledCollection</span></code></a>, as used in
<aclass="reference external"href="https://www.sciencedirect.com/science/article/pii/S1566253516300628">Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017).
Using ensembles for problems with characterizable changes in data distribution: A case study on quantification.
Information Fusion, 34, 87-100.</a>
and
<aclass="reference external"href="https://www.sciencedirect.com/science/article/pii/S1566253517303652">Pérez-Gállego, P., Castano, A., Quevedo, J. R., & del Coz, J. J. (2019).
Dynamic ensemble selection for quantification tasks.
Information Fusion, 45, 1-15.</a>.
The datasets do not come with a predefined train-test split, and so Pérez-Gállego et al. adopted a 5FCVx2 evaluation
protocol, meaning that each collection was used to generate two rounds (hence the x2) of 5 fold cross validation.
This can be reproduced by using <aclass="reference internal"href="#quapy.data.base.Dataset.kFCV"title="quapy.data.base.Dataset.kFCV"><codeclass="xref py py-meth docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset.kFCV()</span></code></a>, e.g.:</p>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.datasets.</span></span><spanclass="sig-name descname"><spanclass="pre">fetch_lequa2022</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">task</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">data_home</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.datasets.fetch_lequa2022"title="Permalink to this definition">¶</a></dt>
<dd><p>Loads the official datasets provided for the <aclass="reference external"href="https://lequa2022.github.io/index">LeQua</a> competition.
In brief, there are 4 tasks (T1A, T1B, T2A, T2B) having to do with text quantification
problems. Tasks T1A and T1B provide documents in vector form, while T2A and T2B provide raw documents instead.
Tasks T1A and T2A are binary sentiment quantification problems, while T2A and T2B are multiclass quantification
problems consisting of estimating the class prevalence values of 28 different merchandise products.
We refer to the <aclass="reference external"href="https://ceur-ws.org/Vol-3180/paper-146.pdf">Esuli, A., Moreo, A., Sebastiani, F., & Sperduti, G. (2022).
A Detailed Overview of LeQua@ CLEF 2022: Learning to Quantify.</a> for a detailed description
on the tasks and datasets.</p>
<p>The datasets are downloaded only once, and stored for fast reuse.</p>
<p>See <cite>lequa2022_experiments.py</cite> provided in the example folder, that can serve as a guide on how to use these
<ddclass="field-even"><p>a tuple <cite>(train, val_gen, test_gen)</cite> where <cite>train</cite> is an instance of
<aclass="reference internal"href="#quapy.data.base.LabelledCollection"title="quapy.data.base.LabelledCollection"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.LabelledCollection</span></code></a>, <cite>val_gen</cite> and <cite>test_gen</cite> are instances of
<codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.protocol.SamplesFromDir</span></code>, i.e., are sampling protocols that return a series of samples
<dd><p>Loads a Reviews dataset as a Dataset instance, as used in
<aclass="reference external"href="https://dl.acm.org/doi/abs/10.1145/3269206.3269287">Esuli, A., Moreo, A., and Sebastiani, F. “A recurrent neural network for sentiment quantification.”
Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018.</a>.
The list of valid dataset names can be accessed in <cite>quapy.data.datasets.REVIEWS_SENTIMENT_DATASETS</cite></p>
<dd><p>Loads a Twitter dataset as a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> instance, as used in:
<aclass="reference external"href="https://link.springer.com/content/pdf/10.1007/s13278-016-0327-z.pdf">Gao, W., Sebastiani, F.: From classification to quantification in tweet sentiment analysis.
Social Network Analysis and Mining6(19), 1–22 (2016)</a>
Note that the datasets ‘semeval13’, ‘semeval14’, ‘semeval15’ share the same training set.
The list of valid dataset names corresponding to training sets can be accessed in
<cite>quapy.data.datasets.TWITTER_SENTIMENT_DATASETS_TRAIN</cite>, while the test sets can be accessed in
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.datasets.</span></span><spanclass="sig-name descname"><spanclass="pre">warn</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="o"><spanclass="pre">*</span></span><spanclass="n"><spanclass="pre">args</span></span></em>, <emclass="sig-param"><spanclass="o"><spanclass="pre">**</span></span><spanclass="n"><spanclass="pre">kwargs</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.datasets.warn"title="Permalink to this definition">¶</a></dt>
<spanid="quapy-data-preprocessing"></span><h2>quapy.data.preprocessing<aclass="headerlink"href="#module-quapy.data.preprocessing"title="Permalink to this heading">¶</a></h2>
<emclass="property"><spanclass="pre">class</span><spanclass="w"></span></em><spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.preprocessing.</span></span><spanclass="sig-name descname"><spanclass="pre">IndexTransformer</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="o"><spanclass="pre">**</span></span><spanclass="n"><spanclass="pre">kwargs</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.preprocessing.IndexTransformer"title="Permalink to this definition">¶</a></dt>
<spanclass="sig-name descname"><spanclass="pre">add_word</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">word</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">id</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">nogaps</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">True</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.preprocessing.IndexTransformer.add_word"title="Permalink to this definition">¶</a></dt>
<spanclass="sig-name descname"><spanclass="pre">fit</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">X</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.preprocessing.IndexTransformer.fit"title="Permalink to this definition">¶</a></dt>
<spanclass="sig-name descname"><spanclass="pre">fit_transform</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">X</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">n_jobs</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.preprocessing.IndexTransformer.fit_transform"title="Permalink to this definition">¶</a></dt>
<spanclass="sig-name descname"><spanclass="pre">transform</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">X</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">n_jobs</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">None</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.preprocessing.IndexTransformer.transform"title="Permalink to this definition">¶</a></dt>
<spanclass="sig-name descname"><spanclass="pre">vocabulary_size</span></span><spanclass="sig-paren">(</span><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.preprocessing.IndexTransformer.vocabulary_size"title="Permalink to this definition">¶</a></dt>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.preprocessing.</span></span><spanclass="sig-name descname"><spanclass="pre">index</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">dataset</span></span><spanclass="p"><spanclass="pre">:</span></span><spanclass="w"></span><spanclass="n"><aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><spanclass="pre">Dataset</span></a></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">min_df</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">5</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">inplace</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">False</span></span></em>, <emclass="sig-param"><spanclass="o"><spanclass="pre">**</span></span><spanclass="n"><spanclass="pre">kwargs</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.preprocessing.index"title="Permalink to this definition">¶</a></dt>
<dd><p>Indexes the tokens of a textual <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> of string documents.
To index a document means to replace each different token by a unique numerical index.
Rare words (i.e., words occurring less than <cite>min_df</cite> times) are replaced by a special token <cite>UNK</cite></p>
<li><p><strong>dataset</strong>– a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> object where the instances of training and test documents
are lists of str</p></li>
<li><p><strong>min_df</strong>– minimum number of occurrences below which the term is replaced by a <cite>UNK</cite> index</p></li>
<li><p><strong>inplace</strong>– whether or not to apply the transformation inplace (True), or to a new copy (False, default)</p></li>
<ddclass="field-even"><p>a new <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> (if inplace=False) or a reference to the current
<aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> (inplace=True) consisting of lists of integer values representing indices.</p>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.preprocessing.</span></span><spanclass="sig-name descname"><spanclass="pre">reduce_columns</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">dataset</span></span><spanclass="p"><spanclass="pre">:</span></span><spanclass="w"></span><spanclass="n"><aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><spanclass="pre">Dataset</span></a></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">min_df</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">5</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">inplace</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">False</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.preprocessing.reduce_columns"title="Permalink to this definition">¶</a></dt>
<li><p><strong>dataset</strong>– a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> in which instances are represented in sparse format (any
subtype of scipy.sparse.spmatrix)</p></li>
<li><p><strong>min_df</strong>– integer, minimum number of instances below which the columns are removed</p></li>
<li><p><strong>inplace</strong>– whether or not to apply the transformation inplace (True), or to a new copy (False, default)</p></li>
<ddclass="field-even"><p>a new <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> (if inplace=False) or a reference to the current
<aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> (inplace=True) where the dimensions corresponding to infrequent terms
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.preprocessing.</span></span><spanclass="sig-name descname"><spanclass="pre">standardize</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">dataset</span></span><spanclass="p"><spanclass="pre">:</span></span><spanclass="w"></span><spanclass="n"><aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><spanclass="pre">Dataset</span></a></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">inplace</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">False</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.preprocessing.standardize"title="Permalink to this definition">¶</a></dt>
<dd><p>Standardizes the real-valued columns of a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a>.
Standardization, aka z-scoring, of a variable <cite>X</cite> comes down to subtracting the average and normalizing by the
<li><p><strong>dataset</strong>– a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> object</p></li>
<li><p><strong>inplace</strong>– set to True if the transformation is to be applied inplace, or to False (default) if a new
<aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> is to be returned</p></li>
<dd><p>Transforms a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> of textual instances into a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> of
<li><p><strong>dataset</strong>– a <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> where the instances of training and test collections are
lists of str</p></li>
<li><p><strong>min_df</strong>– minimum number of occurrences for a word to be considered as part of the vocabulary (default 3)</p></li>
<li><p><strong>sublinear_tf</strong>– whether or not to apply the log scalling to the tf counters (default True)</p></li>
<li><p><strong>inplace</strong>– whether or not to apply the transformation inplace (True), or to a new copy (False, default)</p></li>
<li><p><strong>kwargs</strong>– the rest of parameters of the transformation (as for sklearn’s
<ddclass="field-even"><p>a new <aclass="reference internal"href="#quapy.data.base.Dataset"title="quapy.data.base.Dataset"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">quapy.data.base.Dataset</span></code></a> in <cite>csr_matrix</cite> format (if inplace=False) or a reference to the
current Dataset (if inplace=True) where the instances are stored in a <cite>csr_matrix</cite> of real-valued tfidf scores</p>
<spanid="quapy-data-reader"></span><h2>quapy.data.reader<aclass="headerlink"href="#module-quapy.data.reader"title="Permalink to this heading">¶</a></h2>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.reader.</span></span><spanclass="sig-name descname"><spanclass="pre">binarize</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">y</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">pos_class</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.reader.binarize"title="Permalink to this definition">¶</a></dt>
<ddclass="field-even"><p>a binary np.ndarray, in which values 1 corresponds to positions in whcih <cite>y</cite> had <cite>pos_class</cite> labels, and
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.reader.</span></span><spanclass="sig-name descname"><spanclass="pre">from_csv</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">path</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">encoding</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">'utf-8'</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.reader.from_csv"title="Permalink to this definition">¶</a></dt>
<dd><p>Reads a csv file in which columns are separated by ‘,’.
File format <label>,<feat1>,<feat2>,…,<featn></p>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.reader.</span></span><spanclass="sig-name descname"><spanclass="pre">from_sparse</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">path</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.reader.from_sparse"title="Permalink to this definition">¶</a></dt>
<dd><p>Reads a labelled collection of real-valued instances expressed in sparse format
File format <-1 or 0 or 1>[s col(int):val(float)]</p>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.reader.</span></span><spanclass="sig-name descname"><spanclass="pre">from_text</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">path</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">encoding</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">'utf-8'</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">verbose</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">1</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">class2int</span></span><spanclass="o"><spanclass="pre">=</span></span><spanclass="default_value"><spanclass="pre">True</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.reader.from_text"title="Permalink to this definition">¶</a></dt>
<spanclass="sig-prename descclassname"><spanclass="pre">quapy.data.reader.</span></span><spanclass="sig-name descname"><spanclass="pre">reindex_labels</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">y</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink"href="#quapy.data.reader.reindex_labels"title="Permalink to this definition">¶</a></dt>
<dd><p>Re-indexes a list of labels as a list of indexes, and returns the classnames corresponding to the indexes.