Welcome to QuaPy’s documentation!
+Welcome to QuaPy’s documentation!
QuaPy is a Python-based open-source framework for quantification.
This document contains the API of the modules included in QuaPy.
Installation
+Installation
pip install quapy
GitHub
+GitHub
QuaPy is hosted in GitHub at https://github.com/HLT-ISTI/QuaPy
Contents
+Contents
- quapy
-
@@ -128,12 +128,14 @@
CNNnet
LSTMnet
@@ -154,6 +156,7 @@
TextClassifierNet.forward()
TextClassifierNet.get_params()
TextClassifierNet.predict_proba()
+TextClassifierNet.training
TextClassifierNet.vocabulary_size
TextClassifierNet.xavier_uniform()
- Submodules
- quapy.method.aggregative module
AggregativeSoftQuantifier
+BayesianCC
+BinaryAggregativeQuantifier
BinaryAggregativeQuantifier.fit()
BinaryAggregativeQuantifier.neg_label
@@ -385,6 +399,7 @@
QuaNetModule
QuaNetTrainer
-
@@ -494,6 +509,12 @@
MaximumLikelihoodPrevalenceEstimation.quantify()
+ReadMe
+
- Module contents @@ -543,12 +564,15 @@
- quapy.functional module
HellingerDistance()
TopsoeDistance()
-adjusted_quantification()
argmin_prevalence()
as_binary_prevalence()
check_prevalence_vector()
+clip()
+condsoftmax()
+counts_from_labels()
get_divergence()
get_nprevpoints_approximation()
+l1_norm()
linear_search()
normalize_prevalence()
num_prevalence_combinations()
@@ -556,7 +580,12 @@
prevalence_from_labels()
prevalence_from_probabilities()
prevalence_linspace()
+projection_simplex_sort()
+softmax()
+solve_adjustment()
+solve_adjustment_binary()
strprev()
+ternary_search()
uniform_prevalence_sampling()
uniform_simplex_sampling()
get_quapy_home()
map_parallel()
parallel()
+parallel_unpack()
pickled_resource()
save_text_file()
temp_seed()
@@ -673,7 +703,7 @@
- diff --git a/docs/build/html/modules.html b/docs/build/html/modules.html index 4942493..be528e0 100644 --- a/docs/build/html/modules.html +++ b/docs/build/html/modules.html @@ -1,24 +1,24 @@ - + -
- quapy package
-
@@ -153,12 +153,15 @@
- quapy.functional module
HellingerDistance()
TopsoeDistance()
-adjusted_quantification()
argmin_prevalence()
as_binary_prevalence()
check_prevalence_vector()
+clip()
+condsoftmax()
+counts_from_labels()
get_divergence()
get_nprevpoints_approximation()
+l1_norm()
linear_search()
normalize_prevalence()
num_prevalence_combinations()
@@ -166,7 +169,12 @@
prevalence_from_labels()
prevalence_from_probabilities()
prevalence_linspace()
+projection_simplex_sort()
+softmax()
+solve_adjustment()
+solve_adjustment_binary()
strprev()
+ternary_search()
uniform_prevalence_sampling()
uniform_simplex_sampling()
get_quapy_home()
map_parallel()
parallel()
+parallel_unpack()
pickled_resource()
save_text_file()
temp_seed()
diff --git a/docs/build/html/objects.inv b/docs/build/html/objects.inv
index 35f1681..d0f1285 100644
Binary files a/docs/build/html/objects.inv and b/docs/build/html/objects.inv differ
diff --git a/docs/build/html/py-modindex.html b/docs/build/html/py-modindex.html
index 20305fe..de81f69 100644
--- a/docs/build/html/py-modindex.html
+++ b/docs/build/html/py-modindex.html
@@ -1,22 +1,23 @@
-
+
- - -class quapy.classification.calibration.BCTSCalibration(classifier, val_split=5, n_jobs=None, verbose=False)[source] +class quapy.classification.calibration.BCTSCalibration(classifier, val_split=5, n_jobs=None, verbose=False)[source]
Bases:
RecalibratedProbabilisticClassifierBase
Applies the Bias-Corrected Temperature Scaling (BCTS) calibration method from abstention.calibration, as defined in Alexandari et al. paper:
@@ -124,7 +125,7 @@ training set afterwards. Default value is 5.- -class quapy.classification.calibration.NBVSCalibration(classifier, val_split=5, n_jobs=None, verbose=False)[source] +class quapy.classification.calibration.NBVSCalibration(classifier, val_split=5, n_jobs=None, verbose=False)[source]
Bases:
RecalibratedProbabilisticClassifierBase
Applies the No-Bias Vector Scaling (NBVS) calibration method from abstention.calibration, as defined in Alexandari et al. paper:
@@ -145,7 +146,7 @@ training set afterwards. Default value is 5.- -class quapy.classification.calibration.RecalibratedProbabilisticClassifier[source] +class quapy.classification.calibration.RecalibratedProbabilisticClassifier[source]
Bases:
object
Abstract class for (re)calibration method from abstention.calibration, as defined in Alexandari, A., Kundaje, A., & Shrikumar, A. (2020, November). Maximum likelihood with bias-corrected calibration @@ -154,7 +155,7 @@ is hard-to-beat at label shift adaptation. In International Conference on Machin
- -class quapy.classification.calibration.RecalibratedProbabilisticClassifierBase(classifier, calibrator, val_split=5, n_jobs=None, verbose=False)[source] +class quapy.classification.calibration.RecalibratedProbabilisticClassifierBase(classifier, calibrator, val_split=5, n_jobs=None, verbose=False)[source]
Bases:
BaseEstimator
,RecalibratedProbabilisticClassifier
Applies a (re)calibration method from abstention.calibration, as defined in Alexandari et al. paper.
@@ -174,7 +175,7 @@ training set afterwards. Default value is 5.
- -property classes_ +property classes_
Returns the classes on which the classifier has been trained on
- Returns: @@ -185,7 +186,7 @@ training set afterwards. Default value is 5.
- -fit(X, y)[source] +fit(X, y)[source]
Fits the calibration for the probabilistic classifier.
- Parameters: @@ -202,7 +203,7 @@ training set afterwards. Default value is 5.
- -fit_cv(X, y)[source] +fit_cv(X, y)[source]
Fits the calibration in a cross-validation manner, i.e., it generates posterior probabilities for all training instances via cross-validation, and then retrains the classifier on all training instances. The posterior probabilities thus generated are used for calibrating the outputs of the classifier.
@@ -221,7 +222,7 @@ The posterior probabilities thus generated are used for calibrating the outputs- -fit_tr_val(X, y)[source] +fit_tr_val(X, y)[source]
Fits the calibration in a train/val-split manner, i.e.t, it partitions the training instances into a training and a validation set, and then uses the training samples to learn classifier which is then used to generate posterior probabilities for the held-out validation data. These posteriors are used to calibrate @@ -241,7 +242,7 @@ the classifier. The classifier is not retrained on the whole dataset.
- -predict(X)[source] +predict(X)[source]
Predicts class labels for the data instances in X
- Parameters: @@ -255,7 +256,7 @@ the classifier. The classifier is not retrained on the whole dataset.
- -predict_proba(X)[source] +predict_proba(X)[source]
Generates posterior probabilities for the data instances in X
- Parameters: @@ -271,7 +272,7 @@ the classifier. The classifier is not retrained on the whole dataset.
- -class quapy.classification.calibration.TSCalibration(classifier, val_split=5, n_jobs=None, verbose=False)[source] +class quapy.classification.calibration.TSCalibration(classifier, val_split=5, n_jobs=None, verbose=False)[source]
Bases:
RecalibratedProbabilisticClassifierBase
Applies the Temperature Scaling (TS) calibration method from abstention.calibration, as defined in Alexandari et al. paper:
@@ -292,7 +293,7 @@ training set afterwards. Default value is 5.- -class quapy.classification.calibration.VSCalibration(classifier, val_split=5, n_jobs=None, verbose=False)[source] +class quapy.classification.calibration.VSCalibration(classifier, val_split=5, n_jobs=None, verbose=False)[source]
Bases:
RecalibratedProbabilisticClassifierBase
Applies the Vector Scaling (VS) calibration method from abstention.calibration, as defined in Alexandari et al. paper:
@@ -313,10 +314,10 @@ training set afterwards. Default value is 5.
- -class quapy.classification.methods.LowRankLogisticRegression(n_components=100, **kwargs)[source] +class quapy.classification.methods.LowRankLogisticRegression(n_components=100, **kwargs)[source]
Bases:
BaseEstimator
An example of a classification method (i.e., an object that implements fit, predict, and predict_proba) that also generates embedded inputs (i.e., that implements transform), as those required for @@ -335,7 +336,7 @@ while classification is performed using
- -fit(X, y)[source]
+fit(X, y)[source]Fit the model according to the given training data. The fit consists of fitting TruncatedSVD and then LogisticRegression on the low-rank representation.
- -predict_proba(X)[source] +predict_proba(X)[source]
Predicts posterior probabilities for the instances X embedded into the low-rank space.
- -class quapy.classification.neural.CNNnet(vocabulary_size, n_classes, embedding_size=100, hidden_size=256, repr_size=100, kernel_heights=[3, 5, 7], stride=1, padding=0, drop_p=0.5)[source] +class quapy.classification.neural.CNNnet(vocabulary_size, n_classes, embedding_size=100, hidden_size=256, repr_size=100, kernel_heights=[3, 5, 7], stride=1, padding=0, drop_p=0.5)[source]
Bases:
TextClassifierNet
An implementation of
@@ -448,7 +449,7 @@ consecutive tokens that each kernel coversquapy.classification.neural.TextClassifierNet
based on Convolutional Neural Networks.- -document_embedding(input)[source] +document_embedding(input)[source]
Embeds documents (i.e., performs the forward pass up to the next-to-last layer).
-
@@ -466,7 +467,7 @@ dimensionality of the embedding
- -get_params()[source] +get_params()[source]
Get hyper-parameters for this estimator
- Returns: @@ -475,9 +476,14 @@ dimensionality of the embedding
- +training: bool +
- -property vocabulary_size +property vocabulary_size
Return the size of the vocabulary
- Returns: @@ -490,7 +496,7 @@ dimensionality of the embedding
- -class quapy.classification.neural.LSTMnet(vocabulary_size, n_classes, embedding_size=100, hidden_size=256, repr_size=100, lstm_class_nlayers=1, drop_p=0.5)[source] +class quapy.classification.neural.LSTMnet(vocabulary_size, n_classes, embedding_size=100, hidden_size=256, repr_size=100, lstm_class_nlayers=1, drop_p=0.5)[source]
Bases:
TextClassifierNet
An implementation of
@@ -509,7 +515,7 @@ Long Short Term Memory networks.quapy.classification.neural.TextClassifierNet
based on Long Short Term Memory networks.- -document_embedding(x)[source] +document_embedding(x)[source]
Embeds documents (i.e., performs the forward pass up to the next-to-last layer).
-
@@ -527,7 +533,7 @@ dimensionality of the embedding
- -get_params()[source] +get_params()[source]
Get hyper-parameters for this estimator
- Returns: @@ -536,9 +542,14 @@ dimensionality of the embedding
- +training: bool +
- -property vocabulary_size +property vocabulary_size
Return the size of the vocabulary
- Returns: @@ -551,7 +562,7 @@ dimensionality of the embedding
- -class quapy.classification.neural.NeuralClassifierTrainer(net: TextClassifierNet, lr=0.001, weight_decay=0, patience=10, epochs=200, batch_size=64, batch_size_test=512, padding_length=300, device='cuda', checkpointpath='../checkpoint/classifier_net.dat')[source] +class quapy.classification.neural.NeuralClassifierTrainer(net: TextClassifierNet, lr=0.001, weight_decay=0, patience=10, epochs=200, batch_size=64, batch_size_test=512, padding_length=300, device='cuda', checkpointpath='../checkpoint/classifier_net.dat')[source]
Bases:
object
Trains a neural network for text classification.
-
@@ -574,7 +585,7 @@ according to the evaluation in the held-out validation split (default ‘../chec
- -property device +property device
Gets the device in which the network is allocated
- Returns: @@ -585,7 +596,7 @@ according to the evaluation in the held-out validation split (default ‘../chec
- -fit(instances, labels, val_split=0.3)[source] +fit(instances, labels, val_split=0.3)[source]
Fits the model according to the given training data.
- Parameters: @@ -603,7 +614,7 @@ according to the evaluation in the held-out validation split (default ‘../chec
- -get_params()[source] +get_params()[source]
Get hyper-parameters for this estimator
- Returns: @@ -614,7 +625,7 @@ according to the evaluation in the held-out validation split (default ‘../chec
- -predict(instances)[source] +predict(instances)[source]
Predicts labels for the instances
- Parameters: @@ -629,7 +640,7 @@ instances in X
- -predict_proba(instances)[source] +predict_proba(instances)[source]
Predicts posterior probabilities for the instances
- Parameters: @@ -643,7 +654,7 @@ instances in X
- -reset_net_params(vocab_size, n_classes)[source] +reset_net_params(vocab_size, n_classes)[source]
Reinitialize the network parameters
- Parameters: @@ -657,7 +668,7 @@ instances in X
- -set_params(**params)[source] +set_params(**params)[source]
Set the parameters of this trainer and the learner it is training. In this current version, parameter names for the trainer and learner should be disjoint.
@@ -670,7 +681,7 @@ be disjoint.- -transform(instances)[source] +transform(instances)[source]
Returns the embeddings of the instances
- Parameters: @@ -687,12 +698,12 @@ where embed_size is defined by the classification network
- -class quapy.classification.neural.TextClassifierNet(*args, **kwargs)[source] +class quapy.classification.neural.TextClassifierNet(*args, **kwargs)[source]
Bases:
Module
Abstract Text classifier (torch.nn.Module)
- -dimensions()[source] +dimensions()[source]
Gets the number of dimensions of the embedding space
- Returns: @@ -703,7 +714,7 @@ where embed_size is defined by the classification network
- -abstract document_embedding(x)[source] +abstract document_embedding(x)[source]
Embeds documents (i.e., performs the forward pass up to the next-to-last layer).
-
@@ -721,7 +732,7 @@ dimensionality of the embedding
- -forward(x)[source] +forward(x)[source]
Performs the forward pass.
- Parameters: @@ -737,7 +748,7 @@ for each of the instances and classes
- -abstract get_params()[source] +abstract get_params()[source]
Get hyper-parameters for this estimator
- Returns: @@ -748,7 +759,7 @@ for each of the instances and classes
- -predict_proba(x)[source] +predict_proba(x)[source]
Predicts posterior probabilities for the instances in x
- Parameters: @@ -762,9 +773,14 @@ is length of the pad in the batch
- +training: bool +
- -property vocabulary_size +property vocabulary_size
Return the size of the vocabulary
- Returns: @@ -775,7 +791,7 @@ is length of the pad in the batch
- -xavier_uniform()[source] +xavier_uniform()[source]
Performs Xavier initialization of the network parameters
- -class quapy.classification.neural.TorchDataset(instances, labels=None)[source] +class quapy.classification.neural.TorchDataset(instances, labels=None)[source]
Bases:
Dataset
Transforms labelled instances into a Torch’s
torch.utils.data.DataLoader
object-
@@ -796,7 +812,7 @@ is length of the pad in the batch
- -asDataloader(batch_size, shuffle, pad_length, device)[source] +asDataloader(batch_size, shuffle, pad_length, device)[source]
Converts the labelled collection into a Torch DataLoader with dynamic padding for the batch
-
@@ -820,10 +836,10 @@ applied, meaning that if the longest document in the batch is shorter than
-
+
-
+
-
+
- -class quapy.classification.svmperf.SVMperf(svmperf_base, C=0.01, verbose=False, loss='01', host_folder=None)[source] +class quapy.classification.svmperf.SVMperf(svmperf_base, C=0.01, verbose=False, loss='01', host_folder=None)[source]
Bases:
BaseEstimator
,ClassifierMixin
A wrapper for the SVM-perf package by Thorsten Joachims. When using losses for quantification, the source code has to be patched. See @@ -848,7 +864,7 @@ for further details.
- -decision_function(X, y=None)[source] +decision_function(X, y=None)[source]
Evaluate the decision function for the samples in X.
- -class quapy.data.base.Dataset(training: LabelledCollection, test: LabelledCollection, vocabulary: dict | None = None, name='')[source] +class quapy.data.base.Dataset(training: LabelledCollection, test: LabelledCollection, vocabulary: Optional[dict] = None, name='')[source]
Bases:
object
Abstraction of training and test
LabelledCollection
objects.-
@@ -118,7 +119,7 @@
- -classmethod SplitStratified(collection: LabelledCollection, train_size=0.6)[source] +classmethod SplitStratified(collection: LabelledCollection, train_size=0.6)[source]
Generates a
Dataset
from a stratified split of aLabelledCollection
instance. SeeLabelledCollection.split_stratified()
-
@@ -136,7 +137,7 @@ See
- -property binary +property binary
Returns True if the training collection is labelled according to two classes
- Returns: @@ -147,7 +148,7 @@ See
- -property classes_ +property classes_
The classes according to which the training collection is labelled
- Returns: @@ -158,7 +159,7 @@ See
- -classmethod kFCV(data: LabelledCollection, nfolds=5, nrepeats=1, random_state=0)[source] +classmethod kFCV(data: LabelledCollection, nfolds=5, nrepeats=1, random_state=0)[source]
Generator of stratified folds to be used in k-fold cross validation. This function is only a wrapper around
LabelledCollection.kFCV()
that returnsDataset
instances made of training and test folds.-
@@ -177,7 +178,7 @@ See
- -classmethod load(train_path, test_path, loader_func: callable, classes=None, **loader_kwargs)[source] +classmethod load(train_path, test_path, loader_func: callable, classes=None, **loader_kwargs)[source]
Loads a training and a test labelled set of data and convert it into a
@@ -201,7 +202,7 @@ SeeDataset
instance. The function in charge of reading the instances must be specified. This function can be a custom one, or any of the reading functions defined inquapy.data.reader
module.- -property n_classes
+property n_classesThe number of classes according to which the training collection is labelled
- Returns: @@ -212,7 +213,7 @@ See
- -reduce(n_train=100, n_test=100)[source] +reduce(n_train=100, n_test=100)[source]
Reduce the number of instances in place for quick experiments. Preserves the prevalence of each set.
- Parameters: @@ -229,7 +230,7 @@ See
- -stats(show=True)[source] +stats(show=True)[source]
Returns (and eventually prints) a dictionary with some stats of this dataset. E.g.,:
>>> data = qp.datasets.fetch_reviews('kindle', tfidf=True, min_df=5) >>> data.stats() @@ -252,7 +253,7 @@ the collection), prevs (the prevalence values for each class)
- -property train_test +property train_test
Alias to self.training and self.test
- Returns: @@ -266,7 +267,7 @@ the collection), prevs (the prevalence values for each class)
- -property vocabulary_size +property vocabulary_size
If the dataset is textual, and the vocabulary was indicated, returns the size of the vocabulary
- Returns: @@ -279,7 +280,7 @@ the collection), prevs (the prevalence values for each class)
- -class quapy.data.base.LabelledCollection(instances, labels, classes=None)[source] +class quapy.data.base.LabelledCollection(instances, labels, classes=None)[source]
Bases:
object
A LabelledCollection is a set of objects each with a label attached to each of them. This class implements several sampling routines and other utilities.
@@ -296,7 +297,7 @@ from the labels. The classes must be indicated in cases in which some of the lab- -property X +property X
An alias to self.instances
- Returns: @@ -307,7 +308,7 @@ from the labels. The classes must be indicated in cases in which some of the lab
- -property Xp +property Xp
Gets the instances and the true prevalence. This is useful when implementing evaluation protocols from a
LabelledCollection
object.-
@@ -319,7 +320,7 @@ a
- -property Xy +property Xy
Gets the instances and labels. This is useful when working with sklearn estimators, e.g.:
@@ -333,7 +334,7 @@ a>>> svm = LinearSVC().fit(*my_collection.Xy)
- -property binary
+property binaryReturns True if the number of classes is 2
- Returns: @@ -344,7 +345,7 @@ a
- -counts()[source] +counts()[source]
Returns the number of instances for each of the classes in the codeframe.
- Returns: @@ -356,7 +357,7 @@ as listed by self.classes_
- -classmethod join(*args: Iterable[LabelledCollection])[source] +classmethod join(*args: Iterable[LabelledCollection])[source]
Returns a new
LabelledCollection
as the union of the collections given in input.- Parameters: @@ -370,7 +371,7 @@ as listed by self.classes_
- -kFCV(nfolds=5, nrepeats=1, random_state=None)[source] +kFCV(nfolds=5, nrepeats=1, random_state=None)[source]
Generator of stratified folds to be used in k-fold cross validation.
- Parameters: @@ -388,7 +389,7 @@ as listed by self.classes_
- -classmethod load(path: str, loader_func: callable, classes=None, **loader_kwargs)[source] +classmethod load(path: str, loader_func: callable, classes=None, **loader_kwargs)[source]
Loads a labelled set of data and convert it into a
@@ -411,7 +412,7 @@ these arguments are used to call loader_func(path, **loader_kwargs)LabelledCollection
instance. The function in charge of reading the instances must be specified. This function can be a custom one, or any of the reading functions defined inquapy.data.reader
module.- -property n_classes +property n_classes
The number of classes
- Returns: @@ -422,7 +423,7 @@ these arguments are used to call loader_func(path, **loader_kwargs)
- -property p +property p
An alias to self.prevalence()
- Returns: @@ -433,7 +434,7 @@ these arguments are used to call loader_func(path, **loader_kwargs)
- -prevalence()[source] +prevalence()[source]
Returns the prevalence, or relative frequency, of the classes in the codeframe.
- Returns: @@ -445,7 +446,7 @@ as listed by self.classes_
- -sampling(size, *prevs, shuffle=True, random_state=None)[source] +sampling(size, *prevs, shuffle=True, random_state=None)[source]
Return a random sample (an instance of
@@ -469,7 +470,7 @@ prevalence == prevs if the exact prevalence values can be met as prLabelledCollection
) of desired size and desired prevalence values. For each class, the sampling is drawn without replacement if the requested prevalence is larger than the actual prevalence of the class, or with replacement otherwise.- -sampling_from_index(index)[source] +sampling_from_index(index)[source]
Returns an instance of
LabelledCollection
whose elements are sampled from this collection using the index.-
@@ -484,7 +485,7 @@ index.
- -sampling_index(size, *prevs, shuffle=True, random_state=None)[source] +sampling_index(size, *prevs, shuffle=True, random_state=None)[source]
Returns an index to be used to extract a random sample of desired size and desired prevalence values. If the prevalence values are not specified, then returns the index of a uniform sampling. For each class, the sampling is drawn with replacement if the requested prevalence is larger than @@ -508,7 +509,7 @@ it is constrained. E.g., for binary collections, only the prevalence p
- -split_random(train_prop=0.6, random_state=None)[source]
+split_random(train_prop=0.6, random_state=None)[source]Returns two instances of
LabelledCollection
split randomly from this collection, at desired proportion.-
@@ -529,7 +530,7 @@ second one with 1-train_prop elements
- -split_stratified(train_prop=0.6, random_state=None)[source] +split_stratified(train_prop=0.6, random_state=None)[source]
Returns two instances of
LabelledCollection
split with stratification from this collection, at desired proportion.-
@@ -550,7 +551,7 @@ second one with 1-train_prop elements
- -stats(show=True)[source] +stats(show=True)[source]
Returns (and eventually prints) a dictionary with some stats of this collection. E.g.,:
>>> data = qp.datasets.fetch_reviews('kindle', tfidf=True, min_df=5) >>> data.training.stats() @@ -572,7 +573,7 @@ values for each class)
- -uniform_sampling(size, random_state=None)[source] +uniform_sampling(size, random_state=None)[source]
Returns a uniform sample (an instance of
@@ -591,7 +592,7 @@ otherwise.LabelledCollection
) of desired size. The sampling is drawn with replacement if the requested size is greater than the number of instances, or without replacement otherwise.- -uniform_sampling_index(size, random_state=None)[source] +uniform_sampling_index(size, random_state=None)[source]
Returns an index to be used to extract a uniform sample of desired size. The sampling is drawn with replacement if the requested size is greater than the number of instances, or without replacement otherwise.
@@ -610,7 +611,7 @@ otherwise.- -property y +property y
An alias to self.labels
- Returns: @@ -623,10 +624,10 @@ otherwise.
- -quapy.data.datasets.fetch_IFCB(single_sample_train=True, for_model_selection=False, data_home=None)[source] +quapy.data.datasets.fetch_IFCB(single_sample_train=True, for_model_selection=False, data_home=None)[source]
Loads the IFCB dataset for quantification from Zenodo (for more information on this dataset, please follow the zenodo link). This dataset is based on the data available publicly at @@ -658,7 +659,7 @@ i.e., a sampling protocol that returns a series of samples labelled by prevalenc
- -quapy.data.datasets.fetch_UCIBinaryDataset(dataset_name, data_home=None, test_split=0.3, verbose=False) Dataset [source] +quapy.data.datasets.fetch_UCIBinaryDataset(dataset_name, data_home=None, test_split=0.3, verbose=False) Dataset [source]
Loads a UCI dataset as an instance of
quapy.data.base.Dataset
, as used in Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017). Using ensembles for problems with characterizable changes in data distribution: A case study on quantification. @@ -688,7 +689,7 @@ The list of valid dataset names can be accessed in quapy.data.datasets.UCI- -quapy.data.datasets.fetch_UCIBinaryLabelledCollection(dataset_name, data_home=None, verbose=False) LabelledCollection [source] +quapy.data.datasets.fetch_UCIBinaryLabelledCollection(dataset_name, data_home=None, verbose=False) LabelledCollection [source]
Loads a UCI collection as an instance of
quapy.data.base.LabelledCollection
, as used in Pérez-Gállego, P., Quevedo, J. R., & del Coz, J. J. (2017). Using ensembles for problems with characterizable changes in data distribution: A case study on quantification. @@ -725,7 +726,7 @@ This can be reproduced by using- -quapy.data.datasets.fetch_UCIMulticlassDataset(dataset_name, data_home=None, test_split=0.3, verbose=False) Dataset [source]
+quapy.data.datasets.fetch_UCIMulticlassDataset(dataset_name, data_home=None, test_split=0.3, verbose=False) Dataset [source]Loads a UCI multiclass dataset as an instance of
quapy.data.base.Dataset
.The list of available datasets is taken from https://archive.ics.uci.edu/, following these criteria: - It has more than 1000 instances @@ -758,7 +759,7 @@ This can be reproduced by using
- -quapy.data.datasets.fetch_UCIMulticlassLabelledCollection(dataset_name, data_home=None, verbose=False) LabelledCollection [source]
+quapy.data.datasets.fetch_UCIMulticlassLabelledCollection(dataset_name, data_home=None, verbose=False) LabelledCollection [source]Loads a UCI multiclass collection as an instance of
quapy.data.base.LabelledCollection
.The list of available datasets is taken from https://archive.ics.uci.edu/, following these criteria: - It has more than 1000 instances @@ -791,7 +792,7 @@ This can be reproduced by using
- -quapy.data.datasets.fetch_lequa2022(task, data_home=None)[source]
+quapy.data.datasets.fetch_lequa2022(task, data_home=None)[source]Loads the official datasets provided for the LeQua competition. In brief, there are 4 tasks (T1A, T1B, T2A, T2B) having to do with text quantification problems. Tasks T1A and T1B provide documents in vector form, while T2A and T2B provide raw documents instead. @@ -822,7 +823,7 @@ that return a series of samples stored in a directory which are labelled by prev
- -quapy.data.datasets.fetch_reviews(dataset_name, tfidf=False, min_df=None, data_home=None, pickle=False) Dataset [source] +quapy.data.datasets.fetch_reviews(dataset_name, tfidf=False, min_df=None, data_home=None, pickle=False) Dataset [source]
Loads a Reviews dataset as a Dataset instance, as used in Esuli, A., Moreo, A., and Sebastiani, F. “A recurrent neural network for sentiment quantification.” Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018.. @@ -848,7 +849,7 @@ faster subsequent invokations
- -quapy.data.datasets.fetch_twitter(dataset_name, for_model_selection=False, min_df=None, data_home=None, pickle=False) Dataset [source] +quapy.data.datasets.fetch_twitter(dataset_name, for_model_selection=False, min_df=None, data_home=None, pickle=False) Dataset [source]
Loads a Twitter dataset as a
quapy.data.base.Dataset
instance, as used in: Gao, W., Sebastiani, F.: From classification to quantification in tweet sentiment analysis. Social Network Analysis and Mining6(19), 1–22 (2016) @@ -879,15 +880,15 @@ faster subsequent invokations
- -class quapy.data.preprocessing.IndexTransformer(**kwargs)[source] +class quapy.data.preprocessing.IndexTransformer(**kwargs)[source]
Bases:
object
This class implements a sklearn’s-style transformer that indexes text as numerical ids for the tokens it contains, and that would be generated by sklearn’s @@ -901,7 +902,7 @@ contains, and that would be generated by sklearn’s
- -add_word(word, id=None, nogaps=True)[source] +add_word(word, id=None, nogaps=True)[source]
Adds a new token (regardless of whether it has been found in the text or not), with dedicated id. Useful to define special tokens for codifying unknown words, or padding tokens.
-
@@ -922,7 +923,7 @@ precedent ids stored so far
- -fit(X)[source] +fit(X)[source]
Fits the transformer, i.e., decides on the vocabulary, given a list of strings.
- Parameters: @@ -936,7 +937,7 @@ precedent ids stored so far
- -fit_transform(X, n_jobs=None)[source] +fit_transform(X, n_jobs=None)[source]
Fits the transform on X and transforms it.
- Parameters: @@ -953,7 +954,7 @@ precedent ids stored so far
- -transform(X, n_jobs=None)[source] +transform(X, n_jobs=None)[source]
Transforms the strings in X as lists of numerical ids
- Parameters: @@ -970,7 +971,7 @@ precedent ids stored so far
- -vocabulary_size()[source] +vocabulary_size()[source]
Gets the length of the vocabulary according to which the document tokens have been indexed
- Returns: @@ -983,7 +984,7 @@ precedent ids stored so far
- -quapy.data.preprocessing.index(dataset: Dataset, min_df=5, inplace=False, **kwargs)[source] +quapy.data.preprocessing.index(dataset: Dataset, min_df=5, inplace=False, **kwargs)[source]
Indexes the tokens of a textual
@@ -1007,7 +1008,7 @@ are lists of strquapy.data.base.Dataset
of string documents. To index a document means to replace each different token by a unique numerical index. Rare words (i.e., words occurring less than min_df times) are replaced by a special token UNK- -quapy.data.preprocessing.reduce_columns(dataset: Dataset, min_df=5, inplace=False)[source] +quapy.data.preprocessing.reduce_columns(dataset: Dataset, min_df=5, inplace=False)[source]
Reduces the dimensionality of the instances, represented as a csr_matrix (or any subtype of scipy.sparse.spmatrix), of training and test documents by removing the columns of words which are not present in at least min_df instances in the training set
@@ -1030,7 +1031,7 @@ in the training set have been removed- -quapy.data.preprocessing.standardize(dataset: Dataset, inplace=False)[source] +quapy.data.preprocessing.standardize(dataset: Dataset, inplace=False)[source]
Standardizes the real-valued columns of a
@@ -1050,7 +1051,7 @@ standard deviation.quapy.data.base.Dataset
. Standardization, aka z-scoring, of a variable X comes down to subtracting the average and normalizing by the standard deviation.- -quapy.data.preprocessing.text2tfidf(dataset: Dataset, min_df=3, sublinear_tf=True, inplace=False, **kwargs)[source] +quapy.data.preprocessing.text2tfidf(dataset: Dataset, min_df=3, sublinear_tf=True, inplace=False, **kwargs)[source]
Transforms a
quapy.data.base.Dataset
of textual instances into aquapy.data.base.Dataset
of tfidf weighted sparse vectors-
@@ -1074,10 +1075,10 @@ current Dataset (if inplace=True) where the instances are stored in a csr_
- -quapy.data.reader.binarize(y, pos_class)[source] +quapy.data.reader.binarize(y, pos_class)[source]
Binarizes a categorical array-like collection of labels towards the positive class pos_class. E.g.,:
>>> binarize([1, 2, 3, 1, 1, 0], pos_class=2) >>> array([0, 1, 0, 0, 0, 0]) @@ -1099,7 +1100,7 @@ current Dataset (if inplace=True) where the instances are stored in a csr_
- -quapy.data.reader.from_csv(path, encoding='utf-8')[source] +quapy.data.reader.from_csv(path, encoding='utf-8')[source]
Reads a csv file in which columns are separated by ‘,’. File format <label>,<feat1>,<feat2>,…,<featn>
-
@@ -1117,7 +1118,7 @@ File format <label>,<feat1>,<feat2>,…,<featn>
- -quapy.data.reader.from_sparse(path)[source] +quapy.data.reader.from_sparse(path)[source]
Reads a labelled collection of real-valued instances expressed in sparse format File format <-1 or 0 or 1>[s col(int):val(float)]
-
@@ -1132,7 +1133,7 @@ File format <-1 or 0 or 1>[s col(int):val(float)]
- -quapy.data.reader.from_text(path, encoding='utf-8', verbose=1, class2int=True)[source] +quapy.data.reader.from_text(path, encoding='utf-8', verbose=1, class2int=True)[source]
Reads a labelled colletion of documents. File fomart <0 or 1> <document>
-
@@ -1151,7 +1152,7 @@ File fomart <0 or 1> <document>
- -quapy.data.reader.reindex_labels(y)[source] +quapy.data.reader.reindex_labels(y)[source]
Re-indexes a list of labels as a list of indexes, and returns the classnames corresponding to the indexes. E.g.:
>>> reindex_labels(['B', 'B', 'A', 'C']) @@ -1170,7 +1171,7 @@ E.g.:
- diff --git a/docs/build/html/quapy.html b/docs/build/html/quapy.html index cfe4d60..0803b74 100644 --- a/docs/build/html/quapy.html +++ b/docs/build/html/quapy.html @@ -1,23 +1,24 @@ - + -Module contents
+Module contents
quapy package — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation - - +quapy package — QuaPy: A Python-based open-source framework for quantification 0.1.9 documentation + + - - - - - + + + + + + @@ -96,12 +97,15 @@- quapy.functional module
HellingerDistance()
TopsoeDistance()
-adjusted_quantification()
argmin_prevalence()
as_binary_prevalence()
check_prevalence_vector()
+clip()
+condsoftmax()
+counts_from_labels()
get_divergence()
get_nprevpoints_approximation()
+l1_norm()
linear_search()
normalize_prevalence()
num_prevalence_combinations()
@@ -109,7 +113,12 @@
prevalence_from_labels()
prevalence_from_probabilities()
prevalence_linspace()
+projection_simplex_sort()
+softmax()
+solve_adjustment()
+solve_adjustment_binary()
strprev()
+ternary_search()
uniform_prevalence_sampling()
uniform_simplex_sampling()
get_quapy_home()
map_parallel()
+parallel()
parallel_unpack()
pickled_resource()
save_text_file()
@@ -193,9 +203,9 @@temp_seed()
- quapy package
+quapy package
- Subpackages
+Subpackages
- quapy.classification package
-
@@ -233,12 +243,14 @@
CNNnet
LSTMnet
@@ -259,6 +271,7 @@
TextClassifierNet.forward()
TextClassifierNet.get_params()
TextClassifierNet.predict_proba()
+TextClassifierNet.training
TextClassifierNet.vocabulary_size
TextClassifierNet.xavier_uniform()
- Submodules
- quapy.method.aggregative module
AggregativeSoftQuantifier
+BayesianCC
+BinaryAggregativeQuantifier
BinaryAggregativeQuantifier.fit()
BinaryAggregativeQuantifier.neg_label
@@ -490,6 +514,7 @@
QuaNetModule
QuaNetTrainer
-
@@ -599,6 +624,12 @@
MaximumLikelihoodPrevalenceEstimation.quantify()
+ReadMe
+
- Module contents @@ -608,14 +639,14 @@
- Submodules
+Submodules
- quapy.error module
+quapy.error module
Implementation of error measures used for quantification
- -quapy.error.absolute_error(prevs, prevs_hat) +quapy.error.absolute_error(prevs, prevs_hat)
- Computes the absolute error between the two prevalence vectors.
Absolute error between two prevalence vectors \(p\) and \(\hat{p}\) is computed as \(AE(p,\hat{p})=\frac{1}{|\mathcal{Y}|}\sum_{y\in \mathcal{Y}}|\hat{p}(y)-p(y)|\), @@ -637,7 +668,7 @@ where \(\mathcal{Y}\) are the
- -quapy.error.acc_error(y_true, y_pred) +quapy.error.acc_error(y_true, y_pred)
Computes the error in terms of 1-accuracy. The accuracy is computed as \(\frac{tp+tn}{tp+fp+fn+tn}\), with tp, fp, fn, and tn standing for true positives, false positives, false negatives, and true negatives, @@ -657,7 +688,7 @@ respectively
- -quapy.error.acce(y_true, y_pred)[source] +quapy.error.acce(y_true, y_pred)[source]
Computes the error in terms of 1-accuracy. The accuracy is computed as \(\frac{tp+tn}{tp+fp+fn+tn}\), with tp, fp, fn, and tn standing for true positives, false positives, false negatives, and true negatives, @@ -677,7 +708,7 @@ respectively
- -quapy.error.ae(prevs, prevs_hat)[source] +quapy.error.ae(prevs, prevs_hat)[source]
- Computes the absolute error between the two prevalence vectors.
Absolute error between two prevalence vectors \(p\) and \(\hat{p}\) is computed as \(AE(p,\hat{p})=\frac{1}{|\mathcal{Y}|}\sum_{y\in \mathcal{Y}}|\hat{p}(y)-p(y)|\), @@ -699,7 +730,7 @@ where \(\mathcal{Y}\) are the
- -quapy.error.f1_error(y_true, y_pred) +quapy.error.f1_error(y_true, y_pred)
F1 error: simply computes the error in terms of macro \(F_1\), i.e., \(1-F_1^M\), where \(F_1\) is the harmonic mean of precision and recall, defined as \(\frac{2tp}{2tp+fp+fn}\), with tp, fp, and fn standing @@ -721,7 +752,7 @@ and then averaged.
- -quapy.error.f1e(y_true, y_pred)[source] +quapy.error.f1e(y_true, y_pred)[source]
F1 error: simply computes the error in terms of macro \(F_1\), i.e., \(1-F_1^M\), where \(F_1\) is the harmonic mean of precision and recall, defined as \(\frac{2tp}{2tp+fp+fn}\), with tp, fp, and fn standing @@ -743,7 +774,7 @@ and then averaged.
- -quapy.error.from_name(err_name)[source] +quapy.error.from_name(err_name)[source]
Gets an error function from its name. E.g., from_name(“mae”) will return function
quapy.error.mae()
-
@@ -758,7 +789,7 @@ will return function
- -quapy.error.kld(prevs, prevs_hat, eps=None)[source] +quapy.error.kld(prevs, prevs_hat, eps=None)[source]
- Computes the Kullback-Leibler divergence between the two prevalence distributions.
Kullback-Leibler divergence between two prevalence distributions \(p\) and \(\hat{p}\) is computed as @@ -787,7 +818,7 @@ If eps=None, the sample size will be taken from the environment var
- -quapy.error.mae(prevs, prevs_hat)[source] +quapy.error.mae(prevs, prevs_hat)[source]
Computes the mean absolute error (see
quapy.error.ae()
) across the sample pairs.- Parameters: @@ -805,7 +836,7 @@ prevalence values
- -quapy.error.mean_absolute_error(prevs, prevs_hat) +quapy.error.mean_absolute_error(prevs, prevs_hat)
Computes the mean absolute error (see
quapy.error.ae()
) across the sample pairs.- Parameters: @@ -823,7 +854,7 @@ prevalence values
- -quapy.error.mean_normalized_absolute_error(prevs, prevs_hat) +quapy.error.mean_normalized_absolute_error(prevs, prevs_hat)
Computes the mean normalized absolute error (see
quapy.error.nae()
) across the sample pairs.- Parameters: @@ -841,7 +872,7 @@ prevalence values
- -quapy.error.mean_normalized_relative_absolute_error(prevs, prevs_hat, eps=None) +quapy.error.mean_normalized_relative_absolute_error(prevs, prevs_hat, eps=None)
Computes the mean normalized relative absolute error (see
@@ -866,7 +897,7 @@ the environment variable SAMPLE_SIZE (which has thus to be set befoquapy.error.nrae()
) across the sample pairs. The distributions are smoothed using the eps factor (seequapy.error.smooth()
).- -quapy.error.mean_relative_absolute_error(prevs, prevs_hat, eps=None) +quapy.error.mean_relative_absolute_error(prevs, prevs_hat, eps=None)
Computes the mean relative absolute error (see
@@ -891,7 +922,7 @@ the environment variable SAMPLE_SIZE (which has thus to be set befoquapy.error.rae()
) across the sample pairs. The distributions are smoothed using the eps factor (seequapy.error.smooth()
).- -quapy.error.mkld(prevs, prevs_hat, eps=None)[source] +quapy.error.mkld(prevs, prevs_hat, eps=None)[source]
Computes the mean Kullback-Leibler divergence (see
@@ -916,7 +947,7 @@ If eps=None, the sample size will be taken from the environment varquapy.error.kld()
) across the sample pairs. The distributions are smoothed using the eps factor (seequapy.error.smooth()
).- -quapy.error.mnae(prevs, prevs_hat)[source] +quapy.error.mnae(prevs, prevs_hat)[source]
Computes the mean normalized absolute error (see
quapy.error.nae()
) across the sample pairs.- Parameters: @@ -934,7 +965,7 @@ prevalence values
- -quapy.error.mnkld(prevs, prevs_hat, eps=None)[source] +quapy.error.mnkld(prevs, prevs_hat, eps=None)[source]
Computes the mean Normalized Kullback-Leibler divergence (see
@@ -958,7 +989,7 @@ If eps=None, the sample size will be taken from the environment varquapy.error.nkld()
) across the sample pairs. The distributions are smoothed using the eps factor (seequapy.error.smooth()
).- -quapy.error.mnrae(prevs, prevs_hat, eps=None)[source] +quapy.error.mnrae(prevs, prevs_hat, eps=None)[source]
Computes the mean normalized relative absolute error (see
@@ -983,7 +1014,7 @@ the environment variable SAMPLE_SIZE (which has thus to be set befoquapy.error.nrae()
) across the sample pairs. The distributions are smoothed using the eps factor (seequapy.error.smooth()
).- -quapy.error.mrae(prevs, prevs_hat, eps=None)[source] +quapy.error.mrae(prevs, prevs_hat, eps=None)[source]
Computes the mean relative absolute error (see
@@ -1008,7 +1039,7 @@ the environment variable SAMPLE_SIZE (which has thus to be set befoquapy.error.rae()
) across the sample pairs. The distributions are smoothed using the eps factor (seequapy.error.smooth()
).- -quapy.error.mse(prevs, prevs_hat)[source] +quapy.error.mse(prevs, prevs_hat)[source]
Computes the mean squared error (see
quapy.error.se()
) across the sample pairs.- Parameters: @@ -1027,7 +1058,7 @@ predicted prevalence values
- -quapy.error.nae(prevs, prevs_hat)[source] +quapy.error.nae(prevs, prevs_hat)[source]
- Computes the normalized absolute error between the two prevalence vectors.
Normalized absolute error between two prevalence vectors \(p\) and \(\hat{p}\) is computed as \(NAE(p,\hat{p})=\frac{AE(p,\hat{p})}{z_{AE}}\), @@ -1050,7 +1081,7 @@ are the classes of interest.
- -quapy.error.nkld(prevs, prevs_hat, eps=None)[source] +quapy.error.nkld(prevs, prevs_hat, eps=None)[source]
- Computes the Normalized Kullback-Leibler divergence between the two prevalence distributions.
Normalized Kullback-Leibler divergence between two prevalence distributions \(p\) and \(\hat{p}\) is computed as @@ -1079,7 +1110,7 @@ size. If eps=None, the sample size will be taken from the environme
- -quapy.error.normalized_absolute_error(prevs, prevs_hat) +quapy.error.normalized_absolute_error(prevs, prevs_hat)
- Computes the normalized absolute error between the two prevalence vectors.
Normalized absolute error between two prevalence vectors \(p\) and \(\hat{p}\) is computed as \(NAE(p,\hat{p})=\frac{AE(p,\hat{p})}{z_{AE}}\), @@ -1102,7 +1133,7 @@ are the classes of interest.
- -quapy.error.normalized_relative_absolute_error(prevs, prevs_hat, eps=None) +quapy.error.normalized_relative_absolute_error(prevs, prevs_hat, eps=None)
- Computes the normalized absolute relative error between the two prevalence vectors.
Relative absolute error between two prevalence vectors \(p\) and \(\hat{p}\) is computed as @@ -1132,7 +1163,7 @@ sample size. If eps=None, the sample size will be taken from the en
- -quapy.error.nrae(prevs, prevs_hat, eps=None)[source] +quapy.error.nrae(prevs, prevs_hat, eps=None)[source]
- Computes the normalized absolute relative error between the two prevalence vectors.
Relative absolute error between two prevalence vectors \(p\) and \(\hat{p}\) is computed as @@ -1162,7 +1193,7 @@ sample size. If eps=None, the sample size will be taken from the en
- -quapy.error.rae(prevs, prevs_hat, eps=None)[source] +quapy.error.rae(prevs, prevs_hat, eps=None)[source]
- Computes the absolute relative error between the two prevalence vectors.
Relative absolute error between two prevalence vectors \(p\) and \(\hat{p}\) is computed as @@ -1191,7 +1222,7 @@ sample size. If eps=None, the sample size will be taken from the en
- -quapy.error.relative_absolute_error(prevs, prevs_hat, eps=None) +quapy.error.relative_absolute_error(prevs, prevs_hat, eps=None)
- Computes the absolute relative error between the two prevalence vectors.
Relative absolute error between two prevalence vectors \(p\) and \(\hat{p}\) is computed as @@ -1220,7 +1251,7 @@ sample size. If eps=None, the sample size will be taken from the en
- -quapy.error.se(prevs, prevs_hat)[source] +quapy.error.se(prevs, prevs_hat)[source]
- Computes the squared error between the two prevalence vectors.
Squared error between two prevalence vectors \(p\) and \(\hat{p}\) is computed as \(SE(p,\hat{p})=\frac{1}{|\mathcal{Y}|}\sum_{y\in \mathcal{Y}}(\hat{p}(y)-p(y))^2\), @@ -1243,7 +1274,7 @@ where
- quapy.evaluation module
+quapy.evaluation module
- -quapy.evaluation.evaluate(model: BaseQuantifier, protocol: AbstractProtocol, error_metric: str | Callable, aggr_speedup: str | bool = 'auto', verbose=False)[source] +quapy.evaluation.evaluate(model: BaseQuantifier, protocol: AbstractProtocol, error_metric: Union[str, Callable], aggr_speedup: Union[str, bool] = 'auto', verbose=False)[source]
Evaluates a quantification model according to a specific sample generation protocol and in terms of one evaluation metric (error).
-
@@ -1294,7 +1325,7 @@ a single float
- -quapy.evaluation.evaluate_on_samples(model: BaseQuantifier, samples: Iterable[LabelledCollection], error_metric: str | Callable, verbose=False)[source] +quapy.evaluation.evaluate_on_samples(model: BaseQuantifier, samples: Iterable[LabelledCollection], error_metric: Union[str, Callable], verbose=False)[source]
Evaluates a quantification model on a given set of samples and in terms of one evaluation metric (error).
- Parameters: @@ -1316,7 +1347,7 @@ a single float
- -quapy.evaluation.evaluation_report(model: BaseQuantifier, protocol: AbstractProtocol, error_metrics: Iterable[str | Callable] = 'mae', aggr_speedup: str | bool = 'auto', verbose=False)[source] +quapy.evaluation.evaluation_report(model: BaseQuantifier, protocol: AbstractProtocol, error_metrics: Iterable[Union[str, Callable]] = 'mae', aggr_speedup: Union[str, bool] = 'auto', verbose=False)[source]
Generates a report (a pandas’ DataFrame) containing information of the evaluation of the model as according to a specific protocol and in terms of one or more evaluation metrics (errors).
-
@@ -1346,7 +1377,7 @@ have been indicated, each displaying the score in terms of that metric for every
- -quapy.evaluation.prediction(model: BaseQuantifier, protocol: AbstractProtocol, aggr_speedup: str | bool = 'auto', verbose=False)[source] +quapy.evaluation.prediction(model: BaseQuantifier, protocol: AbstractProtocol, aggr_speedup: Union[str, bool] = 'auto', verbose=False)[source]
Uses a quantification model to generate predictions for the samples generated via a specific protocol. This function is central to all evaluation processes, and is endowed with an optimization to speed-up the prediction of protocols that generate samples from a large collection. The optimization applies to aggregative @@ -1379,10 +1410,10 @@ convenient or not. Set to False to deactivate.
- quapy.functional module
+quapy.functional module
- -quapy.functional.HellingerDistance(P, Q) float [source] +quapy.functional.HellingerDistance(P: ndarray, Q: ndarray) float [source]
Computes the Hellingher Distance (HD) between (discretized) distributions P and Q. The HD for two discrete distributions of k bins is defined as:
@@ -1402,7 +1433,7 @@ The HD for two discrete distributions of k bins is defined as:- -quapy.functional.TopsoeDistance(P, Q, epsilon=1e-20)[source] +quapy.functional.TopsoeDistance(P: ndarray, Q: ndarray, epsilon: float = 1e-20)[source]
Topsoe distance between two (discretized) distributions P and Q. The Topsoe distance for two discrete distributions of k bins is defined as:
@@ -1422,42 +1453,36 @@ The Topsoe distance for two discrete distributions of k bins is def-
-
- -quapy.functional.adjusted_quantification(prevalence_estim, tpr, fpr, clip=True)[source] -
Implements the adjustment of ACC and PACC for the binary case. The adjustment for a prevalence estimate of the -positive class p comes down to computing:
--\[ACC(p) = \frac{ p - fpr }{ tpr - fpr }\]+- +quapy.functional.argmin_prevalence(loss: Callable, n_classes: int, method: Literal['optim_minimize', 'linear_search', 'ternary_search'] = 'optim_minimize')[source] +
Searches for the prevalence vector that minimizes a loss function.
- Parameters:
-
-
prevalence_estim – float, the estimated value for the positive class
-tpr – float, the true positive rate of the classifier
-fpr – float, the false positive rate of the classifier
-clip – set to True (default) to clip values that might exceed the range [0,1]
+loss – callable, the function to minimize
+n_classes – int, number of classes
+method – string indicating the search strategy. Possible values are:: +‘optim_minimize’: uses scipy.optim +‘linear_search’: carries out a linear search for binary problems in the space [0, 0.01, 0.02, …, 1] +‘ternary_search’: implements the ternary search (not yet implemented)
- Returns: -
float, the adjusted count
+np.ndarray, a prevalence vector
- -quapy.functional.as_binary_prevalence(positive_prevalence: float | ndarray, clip_if_necessary=False)[source] +quapy.functional.as_binary_prevalence(positive_prevalence: Union[float, _SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], clip_if_necessary: bool = False) ndarray [source]
Helper that, given a float representing the prevalence for the positive class, returns a np.ndarray of two values representing a binary distribution.
- Parameters:
-
-
positive_prevalence – prevalence for the positive class
-clip_if_necessary – if True, clips the value in [0,1] in order to guarantee the resulting distribution +
positive_prevalence – float or array-like of floats with the prevalence for the positive class
+clip_if_necessary (bool) – if True, clips the value in [0,1] in order to guarantee the resulting distribution is valid. If False, it then checks that the value is in the valid range, and raises an error if not.
@@ -1469,35 +1494,95 @@ is valid. If False, it then checks that the value is in the valid range, and rai
- -quapy.functional.check_prevalence_vector(p, raise_exception=False, toleranze=1e-08)[source] -
Checks that p is a valid prevalence vector, i.e., that it contains values in [0,1] and that the values sum up to 1.
+quapy.functional.check_prevalence_vector(prevalences: Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], raise_exception: bool = False, tolerance: float = 1e-08, aggr=True)[source] +Checks that prevalences is a valid prevalence vector, i.e., it contains values in [0,1] and +the values sum up to 1. In other words, verifies that the prevalences vectors lies in the +probability simplex.
- Parameters: -
p – the prevalence vector to check
+-
+
prevalences (ArrayLike) – the prevalence vector, or vectors, to check
+raise_exception (bool) – whether to raise an exception if the vector (or any of the vectors) does +not lie in the simplex (default False)
+tolerance (float) – error tolerance for the check sum(prevalences) - 1 = 0
+aggr (bool) – if True (default) returns one single bool (True if all prevalence vectors are valid, +False otherwise), if False returns an array of bool, one for each prevalence vector
+
- Returns: -
True if p is valid, False otherwise
+ +a single bool True if prevalences is a vector of prevalence values that lies on the simplex, +or False otherwise; alternatively, if prevalences is a matrix of shape (num_vectors, n_classes,) +then it returns one such bool for each prevalence vector
+
- +quapy.functional.clip(prevalences: Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) ndarray [source] +
Clips the values in [0,1] and then applies the L1 normalization.
+-
+
- Parameters: +
prevalences – array-like of shape (n_classes,) or of shape (n_samples, n_classes,) with prevalence values
+
+- Returns: +
np.ndarray representing a valid distribution
+
+
- +quapy.functional.condsoftmax(prevalences: Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) ndarray [source] +
- +quapy.functional.counts_from_labels(labels: Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], classes: Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) ndarray [source] +
Computes the raw count values from a vector of labels.
+-
+
- Parameters: +
-
+
labels – array-like of shape (n_instances,) with the label for each instance
+classes – the class labels. This is needed in order to correctly compute the prevalence vector even when +some classes have no examples.
+
+- Returns: +
ndarray of shape (len(classes),) with the raw counts for each class, in the same order +as they appear in classes
Guarantees that the divergence received as argument is a function. That is, if this argument is already +a callable, then it is returned, if it is instead a string, then tries to instantiate the corresponding +divergence from the string name.
+-
+
- Parameters: +
divergence – callable or string indicating the name of the divergence function
+
+- Returns: +
callable
+
+
-
+
-
+
-
+
- -quapy.functional.get_nprevpoints_approximation(combinations_budget: int, n_classes: int, n_repeats: int = 1)[source] +quapy.functional.get_nprevpoints_approximation(combinations_budget: int, n_classes: int, n_repeats: int = 1) int [source]
Searches for the largest number of (equidistant) prevalence points to define for each of the n_classes classes so that the number of valid prevalence values generated as combinations of prevalence points (points in a n_classes-dimensional simplex) do not exceed combinations_budget.
- Parameters:
-
-
combinations_budget – integer, maximum number of combinations allowed
-n_classes – integer, number of classes
-n_repeats – integer, number of repetitions for each prevalence combination
+combinations_budget (int) – maximum number of combinations allowed
+n_classes (int) – number of classes
+n_repeats (int) – number of repetitions for each prevalence combination
- Returns: @@ -1506,9 +1591,27 @@ that the number of valid prevalence values generated as combinations of prevalen
-
+
- +quapy.functional.l1_norm(prevalences: Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) ndarray [source] +
Applies L1 normalization to the unnormalized_arr so that it becomes a valid prevalence +vector. Zero vectors are mapped onto the uniform distribution. Raises an exception if +the resulting vectors are not valid distributions. This may happen when the original +prevalence vectors contain negative values. Use the clip normalization function +instead to avoid this possibility.
+-
+
- Parameters: +
prevalences – array-like of shape (n_classes,) or of shape (n_samples, n_classes,) with prevalence values
+
+- Returns: +
np.ndarray representing a valid distribution
+
+
- -quapy.functional.linear_search(loss, n_classes)[source] +quapy.functional.linear_search(loss: Callable, n_classes: int)[source]
Performs a linear search for the best prevalence value in binary problems. The search is carried out by exploring the range [0,1] stepping by 0.01. This search is inefficient, and is added only for completeness (some of the early methods in quantification literature used it, e.g., HDy). A most powerful alternative is optim_minimize.
@@ -1527,13 +1630,25 @@ early methods in quantification literature used it, e.g., HDy). A most powerful- -quapy.functional.normalize_prevalence(prevalences)[source] -
Normalize a vector or matrix of prevalence values. The normalization consists of applying a L1 normalization in +quapy.functional.normalize_prevalence(prevalences: Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], method='l1')[source] +
Normalizes a vector or matrix of prevalence values. The normalization consists of applying a L1 normalization in cases in which the prevalence values are not all-zeros, and to convert the prevalence values into 1/n_classes in cases in which all values are zero.
- Parameters: -
prevalences – array-like of shape (n_classes,) or of shape (n_samples, n_classes,) with prevalence values
+-
+
prevalences – array-like of shape (n_classes,) or of shape (n_samples, n_classes,) with prevalence values
+method (str) –
indicates the normalization method to employ, options are:
+-
+
l1: applies L1 normalization (default); a 0 vector is mapped onto the uniform prevalence
+clip: clip values in [0,1] and then rescales so that the L1 norm is 1
+mapsimplex: projects vectors onto the probability simplex. This implementation relies on +Mathieu Blondel’s projection_simplex_sort
+softmax: applies softmax to all vectors
+condsoftmax: applies softmax only to invalid prevalence vectors
+
+
- Returns:
a normalized vector or matrix of prevalence values
@@ -1543,7 +1658,7 @@ cases in which all values are zero.- -quapy.functional.num_prevalence_combinations(n_prevpoints: int, n_classes: int, n_repeats: int = 1)[source] +quapy.functional.num_prevalence_combinations(n_prevpoints: int, n_classes: int, n_repeats: int = 1) int [source]
Computes the number of valid prevalence combinations in the n_classes-dimensional simplex if n_prevpoints equally distant prevalence values are generated and n_repeats repetitions are requested. The computation comes down to calculating:
@@ -1555,21 +1670,22 @@ classes, and r is n_repeats. This solution comes from- Parameters:
-
-
n_classes – integer, number of classes
-n_prevpoints – integer, number of prevalence points.
-n_repeats – integer, number of repetitions for each prevalence combination
+n_classes (int) – number of classes
+n_prevpoints (int) – number of prevalence points.
+n_repeats (int) – number of repetitions for each prevalence combination
- Returns: -
The number of possible combinations. For example, if n_classes=2, n_prevpoints=5, n_repeats=1, then the -number of possible combinations are 5, i.e.: [0,1], [0.25,0.75], [0.50,0.50], [0.75,0.25], and [1.0,0.0]
+The number of possible combinations. For example, if `n_classes`=2, `n_prevpoints`=5, `n_repeats`=1, +then the number of possible combinations are 5, i.e.: [0,1], [0.25,0.75], [0.50,0.50], [0.75,0.25], +and [1.0,0.0]
- -quapy.functional.optim_minimize(loss, n_classes)[source] +quapy.functional.optim_minimize(loss: Callable, n_classes: int)[source]
Searches for the optimal prevalence values, i.e., an n_classes-dimensional vector of the (n_classes-1)-simplex that yields the smallest lost. This optimization is carried out by means of a constrained search using scipy’s SLSQP routine.
@@ -1588,25 +1704,26 @@ SLSQP routine.- -quapy.functional.prevalence_from_labels(labels, classes)[source] -
Computed the prevalence values from a vector of labels.
+quapy.functional.prevalence_from_labels(labels: Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], classes: Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]])[source] +Computes the prevalence values from a vector of labels.
- Parameters:
-
-
labels – array-like of shape (n_instances) with the label for each instance
+labels – array-like of shape (n_instances,) with the label for each instance
classes – the class labels. This is needed in order to correctly compute the prevalence vector even when some classes have no examples.
- Returns: -
an ndarray of shape (len(classes)) with the class prevalence values
+ndarray of shape (len(classes),) with the class proportions for each class, in the same order +as they appear in classes
- -quapy.functional.prevalence_from_probabilities(posteriors, binarize: bool = False)[source] +quapy.functional.prevalence_from_probabilities(posteriors: Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], binarize: bool = False)[source]
Returns a vector of prevalence values from a matrix of posterior probabilities.
- Parameters: @@ -1624,7 +1741,7 @@ converting the vectors of posterior probabilities into class indices, by taking
- -quapy.functional.prevalence_linspace(n_prevalences=21, repeats=1, smooth_limits_epsilon=0.01)[source] +quapy.functional.prevalence_linspace(grid_points: int = 21, repeats: int = 1, smooth_limits_epsilon: float = 0.01) ndarray [source]
Produces an array of uniformly separated values of prevalence. By default, produces an array of 21 prevalence values, with step 0.05 and with the limits smoothed, i.e.: @@ -1632,7 +1749,7 @@ step 0.05 and with the limits smoothed, i.e.:
- Parameters:
-
-
n_prevalences – the number of prevalence values to sample from the [0,1] interval (default 21)
+grid_points – the number of prevalence values to sample from the [0,1] interval (default 21)
repeats – number of times each prevalence is to be repeated (defaults to 1)
smooth_limits_epsilon – the quantity to add and subtract to the limits 0 and 1
- +quapy.functional.projection_simplex_sort(unnormalized_arr: Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) ndarray [source] +
Projects a point onto the probability simplex.
+The code is adapted from Mathieu Blondel’s BSD-licensed +implementation +(see function projection_simplex_sort in their repo) which is accompanying the paper
+Mathieu Blondel, Akinori Fujino, and Naonori Ueda. +Large-scale Multiclass Support Vector Machine Training via Euclidean Projection onto the Simplex, +ICPR 2014, URL
+-
+
- Parameters: +
unnormalized_arr – point in n-dimensional space, shape (n,)
+
+- Returns: +
projection of unnormalized_arr onto the (n-1)-dimensional probability simplex, shape (n,)
+
+
- +quapy.functional.softmax(prevalences: Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) ndarray [source] +
Applies the softmax function to all vectors even if the original vectors were valid distributions. +If you want to leave valid vectors untouched, use condsoftmax instead.
+-
+
- Parameters: +
prevalences – array-like of shape (n_classes,) or of shape (n_samples, n_classes,) with prevalence values
+
+- Returns: +
np.ndarray representing a valid distribution
+
+
- +quapy.functional.solve_adjustment(class_conditional_rates: ndarray, unadjusted_counts: ndarray, method: Literal['inversion', 'invariant-ratio'], solver: Literal['exact', 'minimize', 'exact-raise', 'exact-cc']) ndarray [source] +
Function that tries to solve for \(p\) the equation \(q = M p\), where \(q\) is the vector of +unadjusted counts (as estimated, e.g., via classify and count) with \(q_i\) an estimate of +\(P(\hat{Y}=y_i)\), and where \(M\) is the matrix of class-conditional rates with \(M_{ij}\) an +estimate of \(P(\hat{Y}=y_i|Y=y_j)\).
+-
+
- Parameters: +
-
+
class_conditional_rates – array of shape (n_classes, n_classes,) with entry (i,j) being the estimate +of \(P(\hat{Y}=y_i|Y=y_j)\), that is, the probability that an instance that belongs to class \(y_j\) +ends up being classified as belonging to class \(y_i\)
+unadjusted_counts – array of shape (n_classes,) containing the unadjusted prevalence values (e.g., as +estimated by CC or PCC)
+method (str) –
indicates the adjustment method to be used. Valid options are:
+-
+
inversion: tries to solve the equation \(q = M p\) as \(p = M^{-1} q\) where +\(M^{-1}\) is the matrix inversion of \(M\). This inversion may not exist in +degenerated cases.
+invariant-ratio: invariant ratio estimator of Vaz et al. 2018, +which replaces the last equation in \(M\) with the normalization condition (i.e., that the sum of +all prevalence values must equal 1).
+
+solver (str) –
the method to use for solving the system of linear equations. Valid options are:
+-
+
exact-raise: tries to solve the system using matrix inversion. Raises an error if the matrix has rank +strictly lower than n_classes.
+exact-cc: if the matrix is not full rank, returns \(q\) (i.e., the unadjusted counts) as the estimates
+exact: deprecated, defaults to ‘exact-cc’ (will be removed in future versions)
+minimize: minimizes a loss, so the solution always exists
+
+
+
- +quapy.functional.solve_adjustment_binary(prevalence_estim: Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], tpr: float, fpr: float, clip: bool = True)[source] +
Implements the adjustment of ACC and PACC for the binary case. The adjustment for a prevalence estimate of the +positive class p comes down to computing:
++\[ACC(p) = \frac{ p - fpr }{ tpr - fpr }\]+-
+
- Parameters: +
-
+
prevalence_estim (float) – the estimated value for the positive class (p in the formula)
+tpr (float) – the true positive rate of the classifier
+fpr (float) – the false positive rate of the classifier
+clip (bool) – set to True (default) to clip values that might exceed the range [0,1]
+
+- Returns: +
float, the adjusted count
+
+
- -quapy.functional.strprev(prevalences, prec=3)[source] +quapy.functional.strprev(prevalences: Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], prec: int = 3) str [source]
Returns a string representation for a prevalence vector. E.g.,
>>> strprev([1/3, 2/3], prec=2) >>> '[0.33, 0.67]' @@ -1654,8 +1867,8 @@ step 0.05 and with the limits smoothed, i.e.:
- Parameters:
-
-
prevalences – a vector of prevalence values
-prec – float precision
+prevalences – array-like of prevalence values
+prec – int, indicates the float precision (number of decimal values to print)
- Returns: @@ -1664,9 +1877,14 @@ step 0.05 and with the limits smoothed, i.e.:
- -quapy.functional.uniform_prevalence_sampling(n_classes, size=1)[source] +quapy.functional.uniform_prevalence_sampling(n_classes: int, size: int = 1) ndarray [source]
Implements the Kraemer algorithm for sampling uniformly at random from the unit simplex. This implementation is adapted from this post <https://cs.stackexchange.com/questions/3227/uniform-sampling-from-a-simplex>_.
@@ -1685,7 +1903,7 @@ for sampling uniformly at random from the unit simplex. This implementation is a- -quapy.functional.uniform_simplex_sampling(n_classes, size=1) +quapy.functional.uniform_simplex_sampling(n_classes: int, size: int = 1) ndarray
Implements the Kraemer algorithm for sampling uniformly at random from the unit simplex. This implementation is adapted from this post <https://cs.stackexchange.com/questions/3227/uniform-sampling-from-a-simplex>_.
@@ -1704,26 +1922,26 @@ for sampling uniformly at random from the unit simplex. This implementation is a- quapy.model_selection module
+quapy.model_selection module
- -class quapy.model_selection.ConfigStatus(params, status, msg='')[source] +class quapy.model_selection.ConfigStatus(params, status, msg='')[source]
Bases:
object
- -class quapy.model_selection.GridSearchQ(model: ~quapy.method.base.BaseQuantifier, param_grid: dict, protocol: ~quapy.protocol.AbstractProtocol, error: ~typing.Callable | str = <function mae>, refit=True, timeout=-1, n_jobs=None, raise_errors=False, verbose=False)[source] +class quapy.model_selection.GridSearchQ(model: ~quapy.method.base.BaseQuantifier, param_grid: dict, protocol: ~quapy.protocol.AbstractProtocol, error: ~typing.Union[~typing.Callable, str] = <function mae>, refit=True, timeout=-1, n_jobs=None, raise_errors=False, verbose=False)[source]
Bases:
BaseQuantifier
Grid Search optimization targeting a quantification-oriented metric.
Optimizes the hyperparameters of a quantification method, based on an evaluation method and on an evaluation @@ -1750,7 +1968,7 @@ However, if no configuration yields a valid model, then a ValueError exception w
- -best_model()[source] +best_model()[source]
Returns the best model found after calling the
fit()
method, i.e., the one trained on the combination of hyper-parameters that minimized the error function.-
@@ -1762,7 +1980,7 @@ of hyper-parameters that minimized the error function.
- -fit(training: LabelledCollection)[source] +fit(training: LabelledCollection)[source]
- Learning routine. Fits methods with all combinations of hyperparameters and selects the one minimizing
the error metric.
@@ -1779,7 +1997,7 @@ of hyper-parameters that minimized the error function.
- -get_params(deep=True)[source] +get_params(deep=True)[source]
Returns the dictionary of hyper-parameters to explore (param_grid)
- Parameters: @@ -1793,7 +2011,7 @@ of hyper-parameters that minimized the error function.
- -quantify(instances)[source] +quantify(instances)[source]
Estimate class prevalence values using the best model found after calling the
fit()
method.- Parameters: @@ -1808,7 +2026,7 @@ by the model selection process.
- -set_params(**parameters)[source] +set_params(**parameters)[source]
Sets the hyper-parameters to explore.
- Parameters: @@ -1821,34 +2039,34 @@ by the model selection process.
- -class quapy.model_selection.Status(value)[source] +class quapy.model_selection.Status(value)[source]
Bases:
Enum
An enumeration.
- -quapy.model_selection.cross_val_predict(quantifier: BaseQuantifier, data: LabelledCollection, nfolds=3, random_state=0)[source] +quapy.model_selection.cross_val_predict(quantifier: BaseQuantifier, data: LabelledCollection, nfolds=3, random_state=0)[source]
Akin to scikit-learn’s cross_val_predict but for quantification.
-
@@ -1868,7 +2086,7 @@ but for quantification.
- -quapy.model_selection.expand_grid(param_grid: dict)[source] +quapy.model_selection.expand_grid(param_grid: dict)[source]
Expands a param_grid dictionary as a list of configurations. Example:
>>> combinations = expand_grid({'A': [1, 10, 100], 'B': [True, False]}) @@ -1889,7 +2107,7 @@ to explore for that hyper-parameter
- -quapy.model_selection.group_params(param_grid: dict)[source] +quapy.model_selection.group_params(param_grid: dict)[source]
Partitions a param_grid dictionary as two lists of configurations, one for the classifier-specific hyper-parameters, and another for que quantifier-specific hyper-parameters
-
@@ -1905,10 +2123,10 @@ to explore for that hyper-parameter
- -quapy.plot.binary_bias_bins(method_names, true_prevs, estim_prevs, pos_class=1, title=None, nbins=5, colormap=<matplotlib.colors.ListedColormap object>, vertical_xticks=False, legend=True, savepath=None)[source] +quapy.plot.binary_bias_bins(method_names, true_prevs, estim_prevs, pos_class=1, title=None, nbins=5, colormap=<matplotlib.colors.ListedColormap object>, vertical_xticks=False, legend=True, savepath=None)[source]
Box-plots displaying the local bias (i.e., signed error computed as the estimated value minus the true value) for different bins of (true) prevalence of the positive classs, for each quantification method.
-
@@ -1933,7 +2151,7 @@ for each experiment
- -quapy.plot.binary_bias_global(method_names, true_prevs, estim_prevs, pos_class=1, title=None, savepath=None)[source] +quapy.plot.binary_bias_global(method_names, true_prevs, estim_prevs, pos_class=1, title=None, savepath=None)[source]
Box-plots displaying the global bias (i.e., signed error computed as the estimated value minus the true value) for each quantification method with respect to a given positive class.
-
@@ -1954,7 +2172,7 @@ for each experiment
- -quapy.plot.binary_diagonal(method_names, true_prevs, estim_prevs, pos_class=1, title=None, show_std=True, legend=True, train_prev=None, savepath=None, method_order=None)[source] +quapy.plot.binary_diagonal(method_names, true_prevs, estim_prevs, pos_class=1, title=None, show_std=True, legend=True, train_prev=None, savepath=None, method_order=None)[source]
The diagonal plot displays the predicted prevalence values (along the y-axis) as a function of the true prevalence values (along the x-axis). The optimal quantifier is described by the diagonal (0,0)-(1,1) of the plot (hence the name). It is convenient for binary quantification problems, though it can be used for multiclass problems by @@ -1985,7 +2203,7 @@ listed in the legend and associated with matplotlib colors).
- -quapy.plot.brokenbar_supremacy_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, n_bins=20, binning='isomerous', x_error='ae', y_error='ae', ttest_alpha=0.005, tail_density_threshold=0.005, method_order=None, savepath=None)[source] +quapy.plot.brokenbar_supremacy_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, n_bins=20, binning='isomerous', x_error='ae', y_error='ae', ttest_alpha=0.005, tail_density_threshold=0.005, method_order=None, savepath=None)[source]
Displays (only) the top performing methods for different regions of the train-test shift in form of a broken bar chart, in which each method has bars only for those regions in which either one of the following conditions hold: (i) it is the best method (in average) for the bin, or (ii) it is not statistically significantly different @@ -2027,7 +2245,7 @@ listed in the legend and associated with matplotlib colors).
- -quapy.plot.error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, n_bins=20, error_name='ae', show_std=False, show_density=True, show_legend=True, logscale=False, title='Quantification error as a function of distribution shift', vlines=None, method_order=None, savepath=None)[source] +quapy.plot.error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, n_bins=20, error_name='ae', show_std=False, show_density=True, show_legend=True, logscale=False, title='Quantification error as a function of distribution shift', vlines=None, method_order=None, savepath=None)[source]
Plots the error (along the x-axis, as measured in terms of error_name) as a function of the train-test shift (along the y-axis, as measured in terms of
quapy.error.ae()
). This plot is useful especially for multiclass problems, in which “diagonal plots” may be cumbersone, and in order to gain understanding about how methods @@ -2061,10 +2279,10 @@ listed in the legend and associated with matplotlib colors).
- -class quapy.protocol.APP(data: LabelledCollection, sample_size=None, n_prevalences=21, repeats=10, smooth_limits_epsilon=0, random_state=0, sanity_check=10000, return_type='sample_prev')[source] +class quapy.protocol.APP(data: LabelledCollection, sample_size=None, n_prevalences=21, repeats=10, smooth_limits_epsilon=0, random_state=0, sanity_check=10000, return_type='sample_prev')[source]
Bases:
AbstractStochasticSeededProtocol
,OnLabelledCollectionProtocol
Implementation of the artificial prevalence protocol (APP). The APP consists of exploring a grid of prevalence values containing n_prevalences points (e.g., @@ -2093,7 +2311,7 @@ to “labelled_collection” to get instead instances of LabelledCollection
<- -prevalence_grid()[source] +prevalence_grid()[source]
Generates vectors of prevalence values from an exhaustive grid of prevalence values. The number of prevalence values explored for each dimension depends on n_prevalences, so that, if, for example, n_prevalences=11 then the prevalence values of the grid are taken from [0, 0.1, 0.2, …, 0.9, 1]. Only @@ -2113,7 +2331,7 @@ in the grid multiplied by repeat
- -sample(index)[source] +sample(index)[source]
Realizes the sample given the index of the instances.
- Parameters: @@ -2127,7 +2345,7 @@ in the grid multiplied by repeat
- -samples_parameters()[source] +samples_parameters()[source]
Return all the necessary parameters to replicate the samples as according to the APP protocol.
- Returns: @@ -2138,7 +2356,7 @@ in the grid multiplied by repeat
- -total()[source] +total()[source]
Returns the number of samples that will be generated
- Returns: @@ -2151,12 +2369,12 @@ in the grid multiplied by repeat
- -class quapy.protocol.AbstractProtocol[source] +class quapy.protocol.AbstractProtocol[source]
Bases:
object
Abstract parent class for sample generation protocols.
- -total()[source] +total()[source]
Indicates the total number of samples that the protocol generates.
- Returns: @@ -2169,7 +2387,7 @@ in the grid multiplied by repeat
- -class quapy.protocol.AbstractStochasticSeededProtocol(random_state=0)[source] +class quapy.protocol.AbstractStochasticSeededProtocol(random_state=0)[source]
Bases:
AbstractProtocol
An AbstractStochasticSeededProtocol is a protocol that generates, via any random procedure (e.g., via random sampling), sequences of
quapy.data.base.LabelledCollection
samples. @@ -2187,7 +2405,7 @@ the sequence will be consistent every time the protocol is called.- -collator(sample, *args)[source] +collator(sample, *args)[source]
The collator prepares the sample to accommodate the desired output format before returning the output. This collator simply returns the sample as it is. Classes inheriting from this abstract class can implement their custom collators.
@@ -2206,12 +2424,12 @@ implement their custom collators.- -abstract sample(params)[source] +abstract sample(params)[source]
Extract one sample determined by the given parameters
- Parameters: @@ -2225,7 +2443,7 @@ implement their custom collators.
- -abstract samples_parameters()[source] +abstract samples_parameters()[source]
This function has to return all the necessary parameters to replicate the samples
- Returns: @@ -2238,13 +2456,13 @@ implement their custom collators.
- -quapy.protocol.ArtificialPrevalenceProtocol +quapy.protocol.ArtificialPrevalenceProtocol
alias of
APP
- -class quapy.protocol.DomainMixer(domainA: LabelledCollection, domainB: LabelledCollection, sample_size, repeats=1, prevalence=None, mixture_points=11, random_state=0, return_type='sample_prev')[source] +class quapy.protocol.DomainMixer(domainA: LabelledCollection, domainB: LabelledCollection, sample_size, repeats=1, prevalence=None, mixture_points=11, random_state=0, return_type='sample_prev')[source]
Bases:
AbstractStochasticSeededProtocol
Generates mixtures of two domains (A and B) at controlled rates, but preserving the original class prevalence.
-
@@ -2268,7 +2486,7 @@ will be the same every time the protocol is called)
- -sample(indexes)[source] +sample(indexes)[source]
Realizes the sample given a pair of indexes of the instances from A and B.
- Parameters: @@ -2282,7 +2500,7 @@ will be the same every time the protocol is called)
- -samples_parameters()[source] +samples_parameters()[source]
Return all the necessary parameters to replicate the samples as according to the this protocol.
- Returns: @@ -2293,7 +2511,7 @@ will be the same every time the protocol is called)
- -total()[source] +total()[source]
Returns the number of samples that will be generated (equals to “repeats * mixture_points”)
- Returns: @@ -2306,7 +2524,7 @@ will be the same every time the protocol is called)
- -class quapy.protocol.IterateProtocol(samples: [<class 'quapy.data.base.LabelledCollection'>])[source] +class quapy.protocol.IterateProtocol(samples: [<class 'quapy.data.base.LabelledCollection'>])[source]
Bases:
AbstractProtocol
A very simple protocol which simply iterates over a list of previously generated samples
-
@@ -2316,7 +2534,7 @@ will be the same every time the protocol is called)
- -total()[source] +total()[source]
Returns the number of samples in this protocol
- Returns: @@ -2329,7 +2547,7 @@ will be the same every time the protocol is called)
- -class quapy.protocol.NPP(data: LabelledCollection, sample_size=None, repeats=100, random_state=0, return_type='sample_prev')[source] +class quapy.protocol.NPP(data: LabelledCollection, sample_size=None, repeats=100, random_state=0, return_type='sample_prev')[source]
Bases:
AbstractStochasticSeededProtocol
,OnLabelledCollectionProtocol
A generator of samples that implements the natural prevalence protocol (NPP). The NPP consists of drawing samples uniformly at random, therefore approximately preserving the natural prevalence of the collection.
@@ -2349,7 +2567,7 @@ to “labelled_collection” to get instead instances of LabelledCollection<- -sample(index)[source] +sample(index)[source]
Realizes the sample given the index of the instances.
- Parameters: @@ -2363,7 +2581,7 @@ to “labelled_collection” to get instead instances of LabelledCollection<
- -samples_parameters()[source] +samples_parameters()[source]
Return all the necessary parameters to replicate the samples as according to the NPP protocol.
- Returns: @@ -2374,7 +2592,7 @@ to “labelled_collection” to get instead instances of LabelledCollection<
- -total()[source] +total()[source]
Returns the number of samples that will be generated (equals to “repeats”)
- Returns: @@ -2387,23 +2605,23 @@ to “labelled_collection” to get instead instances of LabelledCollection<
- -class quapy.protocol.OnLabelledCollectionProtocol[source] +class quapy.protocol.OnLabelledCollectionProtocol[source]
Bases:
object
Protocols that generate samples from a
qp.data.LabelledCollection
object.- -RETURN_TYPES = ['sample_prev', 'labelled_collection', 'index'] +RETURN_TYPES = ['sample_prev', 'labelled_collection', 'index']
- -classmethod get_collator(return_type='sample_prev')[source] +classmethod get_collator(return_type='sample_prev')[source]
Returns a collator function, i.e., a function that prepares the yielded data
- Parameters: @@ -2420,7 +2638,7 @@ to “labelled_collection” to get instead instances of LabelledCollection<
- -get_labelled_collection()[source] +get_labelled_collection()[source]
Returns the labelled collection on which this protocol acts.
- Returns: @@ -2431,7 +2649,7 @@ to “labelled_collection” to get instead instances of LabelledCollection<
- -on_preclassified_instances(pre_classifications, in_place=False)[source] +on_preclassified_instances(pre_classifications, in_place=False)[source]
Returns a copy of this protocol that acts on a modified version of the original
qp.data.LabelledCollection
in which the original instances have been replaced with the outputs of a classifier for each instance. (This is convenient for speeding-up @@ -2455,7 +2673,7 @@ with shape (n_instances,) when the classifier is a hard one, or wit- -class quapy.protocol.UPP(data: LabelledCollection, sample_size=None, repeats=100, random_state=0, return_type='sample_prev')[source] +class quapy.protocol.UPP(data: LabelledCollection, sample_size=None, repeats=100, random_state=0, return_type='sample_prev')[source]
Bases:
AbstractStochasticSeededProtocol
,OnLabelledCollectionProtocol
A variant of
<APP
that, instead of using a grid of equidistant prevalence values, relies on the Kraemer algorithm for sampling unit (k-1)-simplex uniformly at random, with @@ -2479,7 +2697,7 @@ to “labelled_collection” to get instead instances of LabelledCollection
- -sample(index)[source] +sample(index)[source]
Realizes the sample given the index of the instances.
- Parameters: @@ -2493,7 +2711,7 @@ to “labelled_collection” to get instead instances of LabelledCollection<
- -samples_parameters()[source] +samples_parameters()[source]
Return all the necessary parameters to replicate the samples as according to the UPP protocol.
- -class quapy.util.EarlyStop(patience, lower_is_better=True)[source] +class quapy.util.EarlyStop(patience, lower_is_better=True)[source]
Bases:
object
A class implementing the early-stopping condition typically used for training neural networks.
>>> earlystop = EarlyStop(patience=2, lower_is_better=True) @@ -2563,7 +2781,7 @@ stopping condition. An instance of this class is callable, and is t
- -quapy.util.create_if_not_exist(path)[source] +quapy.util.create_if_not_exist(path)[source]
An alias to os.makedirs(path, exist_ok=True) that also returns the path. This is useful in cases like, e.g.:
@@ -2580,7 +2798,7 @@ stopping condition. An instance of this class is callable, and is t>>> path = create_if_not_exist(os.path.join(dir, subdir, anotherdir))
- -quapy.util.create_parent_dir(path)[source] +quapy.util.create_parent_dir(path)[source]
Creates the parent dir (if any) of a given path, if not exists. E.g., for ./path/to/file.txt, the path ./path/to is created.
-
@@ -2592,7 +2810,7 @@ is created.
- -quapy.util.download_file(url, archive_filename)[source] +quapy.util.download_file(url, archive_filename)[source]
Downloads a file from a url
- Parameters: @@ -2606,7 +2824,7 @@ is created.
- -quapy.util.download_file_if_not_exists(url, archive_filename)[source] +quapy.util.download_file_if_not_exists(url, archive_filename)[source]
Dowloads a function (using
download_file()
) if the file does not exist.- Parameters: @@ -2620,7 +2838,7 @@ is created.
- -quapy.util.get_quapy_home()[source] +quapy.util.get_quapy_home()[source]
Gets the home directory of QuaPy, i.e., the directory where QuaPy saves permanent data, such as dowloaded datasets. This directory is ~/quapy_data
-
@@ -2632,7 +2850,7 @@ This directory is ~/quapy_data
- -quapy.util.map_parallel(func, args, n_jobs)[source] +quapy.util.map_parallel(func, args, n_jobs)[source]
Applies func to n_jobs slices of args. E.g., if args is an array of 99 items and n_jobs=2, then func is applied in two parallel processes to args[0:50] and to args[50:99]. func is a function that already works with a list of arguments.
@@ -2649,7 +2867,7 @@ that already works with a list of arguments.- -quapy.util.parallel(func, args, n_jobs, seed=None, asarray=True, backend='loky')[source] +quapy.util.parallel(func, args, n_jobs, seed=None, asarray=True, backend='loky')[source]
A wrapper of multiprocessing:
>>> Parallel(n_jobs=n_jobs)( >>> delayed(func)(args_i) for args_i in args @@ -2666,6 +2884,31 @@ Seeds the child processes to ensure reproducibility when n_jobs>1.
seed – the numeric seed
asarray – set to True to return a np.ndarray instead of a list
+backend – indicates the backend used for handling parallel works
+ + + + + +open_args – if True, then the delayed function is called on *args_i, instead of on args_i
-
+
- +quapy.util.parallel_unpack(func, args, n_jobs, seed=None, asarray=True, backend='loky')[source] +
A wrapper of multiprocessing:
+++>>> Parallel(n_jobs=n_jobs)( +>>> delayed(func)(*args_i) for args_i in args +>>> ) +
that takes the quapy.environ variable as input silently. +Seeds the child processes to ensure reproducibility when n_jobs>1.
+-
+
- Parameters: +
-
+
func – callable
+args – args of func
+seed – the numeric seed
+asarray – set to True to return a np.ndarray instead of a list
+backend – indicates the backend used for handling parallel works
- -quapy.util.pickled_resource(pickle_path: str, generation_func: callable, *args)[source] +quapy.util.pickled_resource(pickle_path: str, generation_func: callable, *args)[source]
Allows for fast reuse of resources that are generated only once by calling generation_func(*args). The next times this function is invoked, it loads the pickled resource. Example:
>>> def some_array(n): # a mock resource created with one parameter (`n`) @@ -2698,7 +2941,7 @@ this function is invoked, it loads the pickled resource. Example:
- -quapy.util.save_text_file(path, text)[source] +quapy.util.save_text_file(path, text)[source]
Saves a text file to disk, given its full path, and creates the parent directory if missing.
- Parameters: @@ -2712,7 +2955,7 @@ this function is invoked, it loads the pickled resource. Example:
- -quapy.util.temp_seed(random_state)[source] +quapy.util.temp_seed(random_state)[source]
Can be used in a “with” context to set a temporal seed without modifying the outer numpy’s current state. E.g.:
>>> with temp_seed(random_seed): >>> pass # do any computation depending on np.random functionality @@ -2727,7 +2970,7 @@ this function is invoked, it loads the pickled resource. Example:
- -quapy.util.timeout(seconds)[source] +quapy.util.timeout(seconds)[source]
Opens a context that will launch an exception if not closed after a given number of seconds
>>> def func(start_msg, end_msg): >>> print(start_msg) @@ -2750,7 +2993,7 @@ this function is invoked, it loads the pickled resource. Example:
- diff --git a/docs/build/html/quapy.method.html b/docs/build/html/quapy.method.html index e843d2a..c71d492 100644 --- a/docs/build/html/quapy.method.html +++ b/docs/build/html/quapy.method.html @@ -1,23 +1,24 @@ - + -Module contents
+Module contents
QuaPy module for quantification
quapy.method package — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation - - +quapy.method package — QuaPy: A Python-based open-source framework for quantification 0.1.9 documentation + + - - - - - + + + + + + @@ -95,15 +96,15 @@- quapy.method package
+quapy.method package
- Submodules
+Submodules
- quapy.method.aggregative module
+quapy.method.aggregative module
- -class quapy.method.aggregative.ACC(classifier: BaseEstimator, val_split=5, n_jobs=None, solver='minimize')[source] +class quapy.method.aggregative.ACC(classifier: BaseEstimator, val_split=5, solver: Literal['minimize', 'exact', 'exact-raise', 'exact-cc'] = 'minimize', method: Literal['inversion', 'invariant-ratio'] = 'inversion', norm: Literal['clip', 'mapsimplex', 'condsoftmax'] = 'clip', n_jobs=None)[source]
Bases:
AggregativeCrispQuantifier
Adjusted Classify & Count, the “adjusted” variant of
+CC
, that corrects the predictions of CC @@ -119,24 +120,57 @@ are to be generated in a k-fold cross-validation manner (with this for k); or as a collection defining the specific set of data to use for validation. Alternatively, this set can be specified at fit time by indicating the exact set of data on which the predictions are to be generated. +method (str) –
adjustment method to be used:
+-
+
’inversion’: matrix inversion method based on the matrix equality \(P(C)=P(C|Y)P(Y)\), +which tries to invert \(P(C|Y)\) matrix.
+’invariant-ratio’: invariant ratio estimator of Vaz et al. 2018, +which replaces the last equation with the normalization condition.
+
+solver (str) –
indicates the method to use for solving the system of linear equations. Valid options are:
+-
+
’exact-raise’: tries to solve the system using matrix inversion. Raises an error if the matrix has rank +strictly less than n_classes.
+’exact-cc’: if the matrix is not of full rank, returns p_c as the estimates, which corresponds to +no adjustment (i.e., the classify and count method. See
quapy.method.aggregative.CC
)
+’exact’: deprecated, defaults to ‘exact-cc’
+’minimize’: minimizes the L2 norm of \(|Ax-B|\). This one generally works better, and is the +default parameter. More details about this can be consulted in Bunse, M. “On Multi-Class Extensions of +Adjusted Classify and Count”, on proceedings of the 2nd International Workshop on Learning to Quantify: +Methods and Applications (LQ 2022), ECML/PKDD 2022, Grenoble (France).
+
norm (str) –
the method to use for normalization.
+-
+
clip, the values are clipped to the range [0,1] and then L1-normalized.
+mapsimplex projects vectors onto the probability simplex. This implementation relies on +Mathieu Blondel’s projection_simplex_sort
+condsoftmax, applies a softmax normalization only to prevalence vectors that lie outside the simplex
+
-n_jobs – number of parallel workers
solver – indicates the method to be used for obtaining the final estimates. The choice -‘exact’ comes down to solving the system of linear equations \(Ax=B\) where A is a -matrix containing the class-conditional probabilities of the predictions (e.g., the tpr and fpr in -binary) and B is the vector of prevalence values estimated via CC, as \(x=A^{-1}B\). This solution -might not exist for degenerated classifiers, in which case the method defaults to classify and count -(i.e., does not attempt any adjustment). -Another option is to search for the prevalence vector that minimizes the L2 norm of \(|Ax-B|\). The latter -is achieved by indicating solver=’minimize’. This one generally works better, and is the default parameter. -More details about this can be consulted in Bunse, M. “On Multi-Class Extensions of Adjusted Classify and -Count”, on proceedings of the 2nd International Workshop on Learning to Quantify: Methods and Applications -(LQ 2022), ECML/PKDD 2022, Grenoble (France).
-
+
- +METHODS = ['inversion', 'invariant-ratio'] +
-
+
- +NORMALIZATIONS = ['clip', 'mapsimplex', 'condsoftmax', None] +
-
+
- +SOLVERS = ['exact', 'minimize', 'exact-raise', 'exact-cc'] +
- -aggregate(classif_predictions)[source] +aggregate(classif_predictions)[source]
Implements the aggregation of label predictions.
- Parameters: @@ -150,62 +184,82 @@ Count”, on proceedings of the 2nd International Workshop on Learning to Quanti
- -aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source] +aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]
Estimates the misclassification rates.
- Parameters: -
classif_predictions – classifier predictions with true labels
+-
+
classif_predictions – a
quapy.data.base.LabelledCollection
containing, +as instances, the label predictions issued by the classifier and, as labels, the true labels
+data – a
quapy.data.base.LabelledCollection
consisting of the training data
+
- -classmethod solve_adjustment(PteCondEstim, prevs_estim, solver='exact')[source] -
Solves the system linear system \(Ax = B\) with \(A\) = PteCondEstim and \(B\) = prevs_estim
+classmethod getPteCondEstim(classes, y, y_)[source] +Estimate the matrix with entry (i,j) being the estimate of P(hat_yi|yj), that is, the probability that a +document that belongs to yj ends up being classified as belonging to yi
- Parameters:
-
-
PteCondEstim – a np.ndarray of shape (n_classes,n_classes,) with entry (i,j) being the estimate -of \(P(y_i|y_j)\), that is, the probability that an instance that belongs to \(y_j\) ends up being -classified as belonging to \(y_i\)
-prevs_estim – a np.ndarray of shape (n_classes,) with the class prevalence estimates
-solver – indicates the method to use for solving the system of linear equations. Valid options are -‘exact’ (tries to solve the system –may fail if the misclassificatin matrix has rank < n_classes) or -‘optim_minimize’ (minimizes a norm –always exists).
+classes – array-like with the class names
+y – array-like with the true labels
+y – array-like with the estimated labels
- Returns: -
an adjusted np.ndarray of shape (n_classes,) with the corrected class prevalence estimates
+np.ndarray
- +classmethod newInvariantRatioEstimation(classifier: BaseEstimator, val_split=5, n_jobs=None)[source] +
Constructs a quantifier that implements the Invariant Ratio Estimator of +Vaz et al. 2018. This amounts +to setting method to ‘invariant-ratio’ and clipping to ‘project’.
+-
+
- Parameters: +
-
+
classifier – a sklearn’s Estimator that generates a classifier
+val_split – specifies the data used for generating classifier predictions. This specification
+
+
can be made as float in (0, 1) indicating the proportion of stratified held-out validation set to +be extracted from the training set; or as an integer (default 5), indicating that the predictions +are to be generated in a k-fold cross-validation manner (with this integer indicating the value +for k); or as a collection defining the specific set of data to use for validation. +Alternatively, this set can be specified at fit time by indicating the exact set of data +on which the predictions are to be generated. +:param n_jobs: number of parallel workers +:return: an instance of ACC configured so that it implements the Invariant Ratio Estimator
+
-
-
-
+
- -quapy.method.aggregative.AdjustedClassifyAndCount +quapy.method.aggregative.AdjustedClassifyAndCount
alias of
ACC
- -class quapy.method.aggregative.AggregativeCrispQuantifier[source] +class quapy.method.aggregative.AggregativeCrispQuantifier[source]
Bases:
-AggregativeQuantifier
,ABC
Abstract class for quantification methods that base their estimations on the aggregation of crips decisions +
Abstract class for quantification methods that base their estimations on the aggregation of crisp decisions as returned by a hard classifier. Aggregative crisp quantifiers thus extend Aggregative Quantifiers by implementing specifications about crisp predictions.
- -class quapy.method.aggregative.AggregativeMedianEstimator(base_quantifier: AggregativeQuantifier, param_grid: dict, random_state=None, n_jobs=None)[source] +class quapy.method.aggregative.AggregativeMedianEstimator(base_quantifier: AggregativeQuantifier, param_grid: dict, random_state=None, n_jobs=None)[source]
Bases:
BinaryQuantifier
This method is a meta-quantifier that returns, as the estimated class prevalence values, the median of the estimation returned by differently (hyper)parameterized base quantifiers. @@ -223,7 +277,7 @@ i.e., in cases of binary quantification.
- -fit(training: LabelledCollection, **kwargs)[source] +fit(training: LabelledCollection, **kwargs)[source]
Trains a quantifier.
- Parameters: @@ -237,7 +291,7 @@ i.e., in cases of binary quantification.
- -get_params(deep=True)[source] +get_params(deep=True)[source]
Get parameters for this estimator.
- Parameters: @@ -255,7 +309,7 @@ contained subobjects that are estimators.
- -quantify(instances)[source] +quantify(instances)[source]
Generate class prevalence estimates for the sample’s instances
- Parameters: @@ -269,7 +323,7 @@ contained subobjects that are estimators.
- -set_params(**params)[source] +set_params(**params)[source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have @@ -292,7 +346,7 @@ possible to update each component of a nested object.- -class quapy.method.aggregative.AggregativeQuantifier[source] +class quapy.method.aggregative.AggregativeQuantifier[source]
Bases:
BaseQuantifier
,ABC
Abstract class for quantification methods that base their estimations on the aggregation of classification results. Aggregative quantifiers implement a pipeline that consists of generating classification predictions @@ -306,7 +360,7 @@ and
aggregate()
.- -abstract aggregate(classif_predictions: ndarray)[source] +abstract aggregate(classif_predictions: ndarray)[source]
Implements the aggregation of label predictions.
- Parameters: @@ -320,13 +374,13 @@ and
- -abstract aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source] +abstract aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]
Trains the aggregation function.
- Parameters:
-
-
classif_predictions – a LabelledCollection containing the label predictions issued -by the classifier
+classif_predictions – a
quapy.data.base.LabelledCollection
containing, +as instances, the predictions issued by the classifier and, as labels, the true labelsdata – a
quapy.data.base.LabelledCollection
consisting of the training data
@@ -335,7 +389,7 @@ by the classifier
- -property classes_ +property classes_
Class labels, in the same order in which class prevalence values are to be computed. This default implementation actually returns the class labels of the learner.
-
@@ -347,7 +401,7 @@ This default implementation actually returns the class labels of the learner.
- -property classifier +property classifier
Gives access to the classifier
- Returns: @@ -358,7 +412,7 @@ This default implementation actually returns the class labels of the learner.
- -classifier_fit_predict(data: LabelledCollection, fit_classifier=True, predict_on=None)[source] +classifier_fit_predict(data: LabelledCollection, fit_classifier=True, predict_on=None)[source]
Trains the classifier if requested (fit_classifier=True) and generate the necessary predictions to train the aggregation function.
-
@@ -380,7 +434,7 @@ the predictions.
- -classify(instances)[source] +classify(instances)[source]
Provides the label predictions for the given instances. The predictions should respect the format expected by
@@ -396,7 +450,7 @@ non-probabilistic quantifiers. The default one is “decision_function”.aggregate()
, e.g., posterior probabilities for probabilistic quantifiers, or crisp predictions for non-probabilistic quantifiers. The default one is “decision_function”.
- quapy.plot module
+quapy.plot module
- quapy.protocol module
+quapy.protocol module
- quapy.util module
+quapy.util module
-
+
-
+
-
+
-
+
- quapy.functional module
- quapy.data.datasets module
+quapy.data.datasets module
- quapy.data.preprocessing module
+quapy.data.preprocessing module
- quapy.data.reader module
+quapy.data.reader module
Python Module Index — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation - - +Python Module Index — QuaPy: A Python-based open-source framework for quantification 0.1.9 documentation + + - - - - - + + + + + + diff --git a/docs/build/html/quapy.classification.html b/docs/build/html/quapy.classification.html index b181a3b..95da4d7 100644 --- a/docs/build/html/quapy.classification.html +++ b/docs/build/html/quapy.classification.html @@ -1,23 +1,24 @@ - + -quapy.classification package — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation - - +quapy.classification package — QuaPy: A Python-based open-source framework for quantification 0.1.9 documentation + + - - - - - + + + + + + @@ -95,15 +96,15 @@- diff --git a/docs/build/html/quapy.data.html b/docs/build/html/quapy.data.html index fd7a730..0f0f06f 100644 --- a/docs/build/html/quapy.data.html +++ b/docs/build/html/quapy.data.html @@ -1,23 +1,24 @@ - + -quapy.classification package
+quapy.classification package
- Submodules
+Submodules
- quapy.classification.calibration module
+quapy.classification.calibration module
- quapy.classification.methods module
+quapy.classification.methods module
- quapy.classification.neural module
+quapy.classification.neural module
- quapy.classification.svmperf module
+quapy.classification.svmperf module
- Module contents
+Module contents
quapy.data package — QuaPy: A Python-based open-source framework for quantification 0.1.8 documentation - - +quapy.data package — QuaPy: A Python-based open-source framework for quantification 0.1.9 documentation + + - - - - - + + + + + + @@ -95,15 +96,15 @@- quapy.data package
+quapy.data package
- Submodules
+Submodules
- quapy.data.base module
+quapy.data.base module
- quapy.functional module