quapy package¶
Subpackages¶
Submodules¶
quapy.error module¶
- quapy.error.absolute_error(prevs, prevs_hat)¶
- Computes the absolute error between the two prevalence vectors.
Absolute error between two prevalence vectors \(p\) and \(\hat{p}\) is computed as \(AE(p,\hat{p})=\frac{1}{|\mathcal{Y}|}\sum_{y\in \mathcal{Y}}|\hat{p}(y)-p(y)|\), where \(\mathcal{Y}\) are the classes of interest.
- Parameters
prevs – array-like of shape (n_classes,) with the true prevalence values
prevs_hat – array-like of shape (n_classes,) with the predicted prevalence values
- Returns
absolute error
- quapy.error.acc_error(y_true, y_pred)¶
Computes the error in terms of 1-accuracy. The accuracy is computed as \(\frac{tp+tn}{tp+fp+fn+tn}\), with tp, fp, fn, and tn standing for true positives, false positives, false negatives, and true negatives, respectively
- Parameters
y_true – array-like of true labels
y_pred – array-like of predicted labels
- Returns
1-accuracy
- quapy.error.acce(y_true, y_pred)¶
Computes the error in terms of 1-accuracy. The accuracy is computed as \(\frac{tp+tn}{tp+fp+fn+tn}\), with tp, fp, fn, and tn standing for true positives, false positives, false negatives, and true negatives, respectively
- Parameters
y_true – array-like of true labels
y_pred – array-like of predicted labels
- Returns
1-accuracy
- quapy.error.ae(prevs, prevs_hat)¶
- Computes the absolute error between the two prevalence vectors.
Absolute error between two prevalence vectors \(p\) and \(\hat{p}\) is computed as \(AE(p,\hat{p})=\frac{1}{|\mathcal{Y}|}\sum_{y\in \mathcal{Y}}|\hat{p}(y)-p(y)|\), where \(\mathcal{Y}\) are the classes of interest.
- Parameters
prevs – array-like of shape (n_classes,) with the true prevalence values
prevs_hat – array-like of shape (n_classes,) with the predicted prevalence values
- Returns
absolute error
- quapy.error.f1_error(y_true, y_pred)¶
F1 error: simply computes the error in terms of macro \(F_1\), i.e., \(1-F_1^M\), where \(F_1\) is the harmonic mean of precision and recall, defined as \(\frac{2tp}{2tp+fp+fn}\), with tp, fp, and fn standing for true positives, false positives, and false negatives, respectively. Macro averaging means the \(F_1\) is computed for each category independently, and then averaged.
- Parameters
y_true – array-like of true labels
y_pred – array-like of predicted labels
- Returns
\(1-F_1^M\)
- quapy.error.f1e(y_true, y_pred)¶
F1 error: simply computes the error in terms of macro \(F_1\), i.e., \(1-F_1^M\), where \(F_1\) is the harmonic mean of precision and recall, defined as \(\frac{2tp}{2tp+fp+fn}\), with tp, fp, and fn standing for true positives, false positives, and false negatives, respectively. Macro averaging means the \(F_1\) is computed for each category independently, and then averaged.
- Parameters
y_true – array-like of true labels
y_pred – array-like of predicted labels
- Returns
\(1-F_1^M\)
- quapy.error.from_name(err_name)¶
Gets an error function from its name. E.g., from_name(“mae”) will return function
quapy.error.mae()
- Parameters
err_name – string, the error name
- Returns
a callable implementing the requested error
- quapy.error.kld(p, p_hat, eps=None)¶
- Computes the Kullback-Leibler divergence between the two prevalence distributions.
Kullback-Leibler divergence between two prevalence distributions \(p\) and \(\hat{p}\) is computed as \(KLD(p,\hat{p})=D_{KL}(p||\hat{p})=\sum_{y\in \mathcal{Y}} p(y)\log\frac{p(y)}{\hat{p}(y)}\), where \(\mathcal{Y}\) are the classes of interest. The distributions are smoothed using the eps factor (see
quapy.error.smooth()
).
- Parameters
prevs – array-like of shape (n_classes,) with the true prevalence values
prevs_hat – array-like of shape (n_classes,) with the predicted prevalence values
eps – smoothing factor. KLD is not defined in cases in which the distributions contain zeros; eps is typically set to be \(\frac{1}{2T}\), with \(T\) the sample size. If eps=None, the sample size will be taken from the environment variable SAMPLE_SIZE (which has thus to be set beforehand).
- Returns
Kullback-Leibler divergence between the two distributions
- quapy.error.mae(prevs, prevs_hat)¶
Computes the mean absolute error (see
quapy.error.ae()
) across the sample pairs.- Parameters
prevs – array-like of shape (n_samples, n_classes,) with the true prevalence values
prevs_hat – array-like of shape (n_samples, n_classes,) with the predicted prevalence values
- Returns
mean absolute error
- quapy.error.mean_absolute_error(prevs, prevs_hat)¶
Computes the mean absolute error (see
quapy.error.ae()
) across the sample pairs.- Parameters
prevs – array-like of shape (n_samples, n_classes,) with the true prevalence values
prevs_hat – array-like of shape (n_samples, n_classes,) with the predicted prevalence values
- Returns
mean absolute error
- quapy.error.mean_relative_absolute_error(p, p_hat, eps=None)¶
Computes the mean relative absolute error (see
quapy.error.rae()
) across the sample pairs. The distributions are smoothed using the eps factor (seequapy.error.smooth()
).- Parameters
prevs – array-like of shape (n_samples, n_classes,) with the true prevalence values
prevs_hat – array-like of shape (n_samples, n_classes,) with the predicted prevalence values
eps – smoothing factor. mrae is not defined in cases in which the true distribution contains zeros; eps is typically set to be \(\frac{1}{2T}\), with \(T\) the sample size. If eps=None, the sample size will be taken from the environment variable SAMPLE_SIZE (which has thus to be set beforehand).
- Returns
mean relative absolute error
- quapy.error.mkld(prevs, prevs_hat, eps=None)¶
Computes the mean Kullback-Leibler divergence (see
quapy.error.kld()
) across the sample pairs. The distributions are smoothed using the eps factor (seequapy.error.smooth()
).- Parameters
prevs – array-like of shape (n_samples, n_classes,) with the true prevalence values
prevs_hat – array-like of shape (n_samples, n_classes,) with the predicted prevalence values
eps – smoothing factor. KLD is not defined in cases in which the distributions contain zeros; eps is typically set to be \(\frac{1}{2T}\), with \(T\) the sample size. If eps=None, the sample size will be taken from the environment variable SAMPLE_SIZE (which has thus to be set beforehand).
- Returns
mean Kullback-Leibler distribution
- quapy.error.mnkld(prevs, prevs_hat, eps=None)¶
Computes the mean Normalized Kullback-Leibler divergence (see
quapy.error.nkld()
) across the sample pairs. The distributions are smoothed using the eps factor (seequapy.error.smooth()
).- Parameters
prevs – array-like of shape (n_samples, n_classes,) with the true prevalence values
prevs_hat – array-like of shape (n_samples, n_classes,) with the predicted prevalence values
eps – smoothing factor. NKLD is not defined in cases in which the distributions contain zeros; eps is typically set to be \(\frac{1}{2T}\), with \(T\) the sample size. If eps=None, the sample size will be taken from the environment variable SAMPLE_SIZE (which has thus to be set beforehand).
- Returns
mean Normalized Kullback-Leibler distribution
- quapy.error.mrae(p, p_hat, eps=None)¶
Computes the mean relative absolute error (see
quapy.error.rae()
) across the sample pairs. The distributions are smoothed using the eps factor (seequapy.error.smooth()
).- Parameters
prevs – array-like of shape (n_samples, n_classes,) with the true prevalence values
prevs_hat – array-like of shape (n_samples, n_classes,) with the predicted prevalence values
eps – smoothing factor. mrae is not defined in cases in which the true distribution contains zeros; eps is typically set to be \(\frac{1}{2T}\), with \(T\) the sample size. If eps=None, the sample size will be taken from the environment variable SAMPLE_SIZE (which has thus to be set beforehand).
- Returns
mean relative absolute error
- quapy.error.mse(prevs, prevs_hat)¶
Computes the mean squared error (see
quapy.error.se()
) across the sample pairs.- Parameters
prevs – array-like of shape (n_samples, n_classes,) with the true prevalence values
prevs_hat – array-like of shape (n_samples, n_classes,) with the predicted prevalence values
- Returns
mean squared error
- quapy.error.nkld(p, p_hat, eps=None)¶
- Computes the Normalized Kullback-Leibler divergence between the two prevalence distributions.
Normalized Kullback-Leibler divergence between two prevalence distributions \(p\) and \(\hat{p}\) is computed as \(NKLD(p,\hat{p}) = 2\frac{e^{KLD(p,\hat{p})}}{e^{KLD(p,\hat{p})}+1}-1\), where \(\mathcal{Y}\) are the classes of interest. The distributions are smoothed using the eps factor (see
quapy.error.smooth()
).
- Parameters
prevs – array-like of shape (n_classes,) with the true prevalence values
prevs_hat – array-like of shape (n_classes,) with the predicted prevalence values
eps – smoothing factor. NKLD is not defined in cases in which the distributions contain zeros; eps is typically set to be \(\frac{1}{2T}\), with \(T\) the sample size. If eps=None, the sample size will be taken from the environment variable SAMPLE_SIZE (which has thus to be set beforehand).
- Returns
Normalized Kullback-Leibler divergence between the two distributions
- quapy.error.rae(p, p_hat, eps=None)¶
- Computes the absolute relative error between the two prevalence vectors.
Relative absolute error between two prevalence vectors \(p\) and \(\hat{p}\) is computed as \(RAE(p,\hat{p})=\frac{1}{|\mathcal{Y}|}\sum_{y\in \mathcal{Y}}\frac{|\hat{p}(y)-p(y)|}{p(y)}\), where \(\mathcal{Y}\) are the classes of interest. The distributions are smoothed using the eps factor (see
quapy.error.smooth()
).
- Parameters
prevs – array-like of shape (n_classes,) with the true prevalence values
prevs_hat – array-like of shape (n_classes,) with the predicted prevalence values
eps – smoothing factor. rae is not defined in cases in which the true distribution contains zeros; eps is typically set to be \(\frac{1}{2T}\), with \(T\) the sample size. If eps=None, the sample size will be taken from the environment variable SAMPLE_SIZE (which has thus to be set beforehand).
- Returns
relative absolute error
- quapy.error.relative_absolute_error(p, p_hat, eps=None)¶
- Computes the absolute relative error between the two prevalence vectors.
Relative absolute error between two prevalence vectors \(p\) and \(\hat{p}\) is computed as \(RAE(p,\hat{p})=\frac{1}{|\mathcal{Y}|}\sum_{y\in \mathcal{Y}}\frac{|\hat{p}(y)-p(y)|}{p(y)}\), where \(\mathcal{Y}\) are the classes of interest. The distributions are smoothed using the eps factor (see
quapy.error.smooth()
).
- Parameters
prevs – array-like of shape (n_classes,) with the true prevalence values
prevs_hat – array-like of shape (n_classes,) with the predicted prevalence values
eps – smoothing factor. rae is not defined in cases in which the true distribution contains zeros; eps is typically set to be \(\frac{1}{2T}\), with \(T\) the sample size. If eps=None, the sample size will be taken from the environment variable SAMPLE_SIZE (which has thus to be set beforehand).
- Returns
relative absolute error
- quapy.error.se(p, p_hat)¶
- Computes the squared error between the two prevalence vectors.
Squared error between two prevalence vectors \(p\) and \(\hat{p}\) is computed as \(SE(p,\hat{p})=\frac{1}{|\mathcal{Y}|}\sum_{y\in \mathcal{Y}}(\hat{p}(y)-p(y))^2\), where \(\mathcal{Y}\) are the classes of interest.
- Parameters
prevs – array-like of shape (n_classes,) with the true prevalence values
prevs_hat – array-like of shape (n_classes,) with the predicted prevalence values
- Returns
absolute error
- quapy.error.smooth(prevs, eps)¶
Smooths a prevalence distribution with \(\epsilon\) (eps) as: \(\underline{p}(y)=\frac{\epsilon+p(y)}{\epsilon|\mathcal{Y}|+\displaystyle\sum_{y\in \mathcal{Y}}p(y)}\)
- Parameters
prevs – array-like of shape (n_classes,) with the true prevalence values
eps – smoothing factor
- Returns
array-like of shape (n_classes,) with the smoothed distribution
quapy.evaluation module¶
- quapy.evaluation.artificial_prevalence_prediction(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_prevpoints=210, n_repetitions=1, eval_budget: Optional[int] = None, n_jobs=1, random_seed=42, verbose=False)¶
Performs the predictions for all samples generated according to the artificial sampling protocol. :param model: the model in charge of generating the class prevalence estimations :param test: the test set on which to perform arificial sampling :param sample_size: the size of the samples :param n_prevpoints: the number of different prevalences to sample (or set to None if eval_budget is specified) :param n_repetitions: the number of repetitions for each prevalence :param eval_budget: if specified, sets a ceil on the number of evaluations to perform. For example, if there are 3 classes, n_repetitions=1 and eval_budget=20, then n_prevpoints will be set to 5, since this will generate 15 different prevalences ([0, 0, 1], [0, 0.25, 0.75], [0, 0.5, 0.5] … [1, 0, 0]) and since setting it n_prevpoints to 6 would produce more than 20 evaluations. :param n_jobs: number of jobs to be run in parallel :param random_seed: allows to replicate the samplings. The seed is local to the method and does not affect any other random process. :param verbose: if True, shows a progress bar :return: two ndarrays of shape (m,n) with m the number of samples (n_prevpoints*n_repetitions) and n the
number of classes. The first one contains the true prevalences for the samples generated while the second one contains the the prevalence estimations
- quapy.evaluation.artificial_prevalence_protocol(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_prevpoints=210, n_repetitions=1, eval_budget: Optional[int] = None, n_jobs=1, random_seed=42, error_metric: Union[str, Callable] = 'mae', verbose=False)¶
- quapy.evaluation.artificial_prevalence_report(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_prevpoints=210, n_repetitions=1, eval_budget: Optional[int] = None, n_jobs=1, random_seed=42, error_metrics: Iterable[Union[str, Callable]] = 'mae', verbose=False)¶
- quapy.evaluation.evaluate(model: quapy.method.base.BaseQuantifier, test_samples: Iterable[quapy.data.base.LabelledCollection], err: Union[str, Callable], n_jobs: int = - 1)¶
- quapy.evaluation.gen_prevalence_prediction(model: quapy.method.base.BaseQuantifier, gen_fn: Callable, eval_budget=None)¶
- quapy.evaluation.natural_prevalence_prediction(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_repetitions=1, n_jobs=1, random_seed=42, verbose=False)¶
Performs the predictions for all samples generated according to the artificial sampling protocol. :param model: the model in charge of generating the class prevalence estimations :param test: the test set on which to perform arificial sampling :param sample_size: the size of the samples :param n_repetitions: the number of repetitions for each prevalence :param n_jobs: number of jobs to be run in parallel :param random_seed: allows to replicate the samplings. The seed is local to the method and does not affect any other random process. :param verbose: if True, shows a progress bar :return: two ndarrays of shape (m,n) with m the number of samples (n_repetitions) and n the
number of classes. The first one contains the true prevalences for the samples generated while the second one contains the the prevalence estimations
- quapy.evaluation.natural_prevalence_protocol(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_repetitions=1, n_jobs=1, random_seed=42, error_metric: Union[str, Callable] = 'mae', verbose=False)¶
- quapy.evaluation.natural_prevalence_report(model: quapy.method.base.BaseQuantifier, test: quapy.data.base.LabelledCollection, sample_size, n_repetitions=1, n_jobs=1, random_seed=42, error_metrics: Iterable[Union[str, Callable]] = 'mae', verbose=False)¶
quapy.functional module¶
- quapy.functional.HellingerDistance(P, Q)¶
- quapy.functional.adjusted_quantification(prevalence_estim, tpr, fpr, clip=True)¶
- quapy.functional.artificial_prevalence_sampling(dimensions, n_prevalences=21, repeat=1, return_constrained_dim=False)¶
- quapy.functional.get_nprevpoints_approximation(combinations_budget: int, n_classes: int, n_repeats: int = 1)¶
Searches for the largest number of (equidistant) prevalence points to define for each of the n_classes classes so that the number of valid prevalences generated as combinations of prevalence points (points in a n_classes-dimensional simplex) do not exceed combinations_budget. :param n_classes: number of classes :param n_repeats: number of repetitions for each prevalence combination :param combinations_budget: maximum number of combinatios allowed :return: the largest number of prevalence points that generate less than combinations_budget valid prevalences
- quapy.functional.normalize_prevalence(prevalences)¶
- quapy.functional.num_prevalence_combinations(n_prevpoints: int, n_classes: int, n_repeats: int = 1)¶
Computes the number of prevalence combinations in the n_classes-dimensional simplex if nprevpoints equally distant prevalences are generated and n_repeats repetitions are requested :param n_classes: number of classes :param n_prevpoints: number of prevalence points. :param n_repeats: number of repetitions for each prevalence combination :return: The number of possible combinations. For example, if n_classes=2, n_prevpoints=5, n_repeats=1, then the number of possible combinations are 5, i.e.: [0,1], [0.25,0.75], [0.50,0.50], [0.75,0.25], and [1.0,0.0]
- quapy.functional.prevalence_from_labels(labels, classes_)¶
- quapy.functional.prevalence_from_probabilities(posteriors, binarize: bool = False)¶
- quapy.functional.prevalence_linspace(n_prevalences=21, repeat=1, smooth_limits_epsilon=0.01)¶
Produces a uniformly separated values of prevalence. By default, produces an array 21 prevalences, with step 0.05 and with the limits smoothed, i.e.: [0.01, 0.05, 0.10, 0.15, …, 0.90, 0.95, 0.99] :param n_prevalences: the number of prevalence values to sample from the [0,1] interval (default 21) :param repeat: number of times each prevalence is to be repeated (defaults to 1) :param smooth_limits_epsilon: the quantity to add and subtract to the limits 0 and 1 :return: an array of uniformly separated prevalence values
- quapy.functional.strprev(prevalences, prec=3)¶
- quapy.functional.uniform_prevalence_sampling(n_classes, size=1)¶
- quapy.functional.uniform_simplex_sampling(n_classes, size=1)¶
quapy.model_selection module¶
- class quapy.model_selection.GridSearchQ(model: quapy.method.base.BaseQuantifier, param_grid: dict, sample_size: Optional[int], protocol='app', n_prevpoints: Optional[int] = None, n_repetitions: int = 1, eval_budget: Optional[int] = None, error: Union[Callable, str] = <function mae>, refit=True, val_split=0.4, n_jobs=1, random_seed=42, timeout=-1, verbose=False)¶
Bases:
quapy.method.base.BaseQuantifier
Grid Search optimization targeting a quantification-oriented metric.
Optimizes the hyperparameters of a quantification method, based on an evaluation method and on an evaluation protocol for quantification.
- Parameters
model (BaseQuantifier) – the quantifier to optimize
param_grid – a dictionary with keys the parameter names and values the list of values to explore
sample_size – the size of the samples to extract from the validation set (ignored if protocl=’gen’)
protocol – either ‘app’ for the artificial prevalence protocol, ‘npp’ for the natural prevalence protocol, or ‘gen’ for using a custom sampling generator function
n_prevpoints – if specified, indicates the number of equally distant points to extract from the interval [0,1] in order to define the prevalences of the samples; e.g., if n_prevpoints=5, then the prevalences for each class will be explored in [0.00, 0.25, 0.50, 0.75, 1.00]. If not specified, then eval_budget is requested. Ignored if protocol!=’app’.
n_repetitions – the number of repetitions for each combination of prevalences. This parameter is ignored for the protocol=’app’ if eval_budget is set and is lower than the number of combinations that would be generated using the value assigned to n_prevpoints (for the current number of classes and n_repetitions). Ignored for protocol=’npp’ and protocol=’gen’ (use eval_budget for setting a maximum number of samples in those cases).
eval_budget – if specified, sets a ceil on the number of evaluations to perform for each hyper-parameter combination. For example, if protocol=’app’, there are 3 classes, n_repetitions=1 and eval_budget=20, then n_prevpoints will be set to 5, since this will generate 15 different prevalences, i.e., [0, 0, 1], [0, 0.25, 0.75], [0, 0.5, 0.5] … [1, 0, 0], and since setting it to 6 would generate more than 20. When protocol=’gen’, indicates the maximum number of samples to generate, but less samples will be generated if the generator yields less samples.
error – an error function (callable) or a string indicating the name of an error function (valid ones are those in qp.error.QUANTIFICATION_ERROR
refit – whether or not to refit the model on the whole labelled collection (training+validation) with the best chosen hyperparameter combination. Ignored if protocol=’gen’
val_split – either a LabelledCollection on which to test the performance of the different settings, or a float in [0,1] indicating the proportion of labelled data to extract from the training set, or a callable returning a generator function each time it is invoked (only for protocol=’gen’).
n_jobs – number of parallel jobs
random_seed – set the seed of the random generator to replicate experiments. Ignored if protocol=’gen’.
timeout – establishes a timer (in seconds) for each of the hyperparameters configurations being tested. Whenever a run takes longer than this timer, that configuration will be ignored. If all configurations end up being ignored, a TimeoutError exception is raised. If -1 (default) then no time bound is set.
verbose – set to True to get information through the stdout
- best_model()¶
- property classes_¶
- fit(training: quapy.data.base.LabelledCollection, val_split: Optional[Union[quapy.data.base.LabelledCollection, float, Callable]] = None)¶
- Learning routine. Fits methods with all combinations of hyperparameters and selects the one minimizing
the error metric.
- Parameters
training – the training set on which to optimize the hyperparameters
val_split – either a LabelledCollection on which to test the performance of the different settings, or a float in [0,1] indicating the proportion of labelled data to extract from the training set
- get_params(deep=True)¶
Returns the dictionary of hyper-parameters to explore (param_grid)
- Parameters
deep – Unused
- Returns
the dictionary param_grid
- quantify(instances)¶
Estimate class prevalence values
- Parameters
instances – sample contanining the instances
- set_params(**parameters)¶
Sets the hyper-parameters to explore.
- Parameters
parameters – a dictionary with keys the parameter names and values the list of values to explore
quapy.plot module¶
- quapy.plot.binary_bias_bins(method_names, true_prevs, estim_prevs, pos_class=1, title=None, nbins=5, colormap=<matplotlib.colors.ListedColormap object>, vertical_xticks=False, legend=True, savepath=None)¶
- Parameters
method_names – array-like with the method names for each experiment
true_prevs – array-like with the true prevalence values (each being a ndarray with n_classes components) for each experiment
estim_prevs – array-like with the estimated prevalence values (each being a ndarray with n_classes components) for each experiment
pos_class – index of the positive class
title – the title to be displayed in the plot
nbins – number of bins
colormap – the matplotlib colormap to use (default cm.tab10)
vertical_xticks –
legend – whether or not to display the legend (default is True)
savepath – path where to save the plot. If not indicated (as default), the plot is shown.
- quapy.plot.binary_bias_global(method_names, true_prevs, estim_prevs, pos_class=1, title=None, savepath=None)¶
Box-plots displaying the global bias (i.e., signed error computed as the estimated value minus the true value) for each quantification method with respect to a given positive class.
- Parameters
method_names – array-like with the method names for each experiment
true_prevs – array-like with the true prevalence values (each being a ndarray with n_classes components) for each experiment
estim_prevs – array-like with the estimated prevalence values (each being a ndarray with n_classes components) for each experiment
pos_class – index of the positive class
title – the title to be displayed in the plot
savepath – path where to save the plot. If not indicated (as default), the plot is shown.
- quapy.plot.binary_diagonal(method_names, true_prevs, estim_prevs, pos_class=1, title=None, show_std=True, legend=True, train_prev=None, savepath=None, method_order=None)¶
The diagonal plot displays the predicted prevalence values (along the y-axis) as a function of the true prevalence values (along the x-axis). The optimal quantifier is described by the diagonal (0,0)-(1,1) of the plot (hence the name). It is convenient for binary quantification problems, though it can be used for multiclass problems by indicating which class is to be taken as the positive class. (For multiclass quantification problems, other plots like the
error_by_drift()
might be preferable though).- Parameters
method_names – array-like with the method names for each experiment
true_prevs – array-like with the true prevalence values (each being a ndarray with n_classes components) for each experiment
estim_prevs – array-like with the estimated prevalence values (each being a ndarray with n_classes components) for each experiment
pos_class – index of the positive class
title – the title to be displayed in the plot
show_std – whether or not to show standard deviations (represented by color bands). This might be inconvenient for cases in which many methods are compared, or when the standard deviations are high – default True)
legend – whether or not to display the leyend (default True)
train_prev – if indicated (default is None), the training prevalence (for the positive class) is hightlighted in the plot. This is convenient when all the experiments have been conducted in the same dataset.
savepath – path where to save the plot. If not indicated (as default), the plot is shown.
method_order – if indicated (default is None), imposes the order in which the methods are processed (i.e., listed in the legend and associated with matplotlib colors).
- quapy.plot.brokenbar_supremacy_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, n_bins=20, binning='isomerous', x_error='ae', y_error='ae', ttest_alpha=0.005, tail_density_threshold=0.005, method_order=None, savepath=None)¶
- quapy.plot.error_by_drift(method_names, true_prevs, estim_prevs, tr_prevs, n_bins=20, error_name='ae', show_std=False, show_density=True, logscale=False, title='Quantification error as a function of distribution shift', savepath=None, vlines=None, method_order=None)¶
- quapy.plot.save_or_show(savepath)¶
quapy.util module¶
- class quapy.util.EarlyStop(patience, lower_is_better=True)¶
Bases:
object
- quapy.util.create_if_not_exist(path)¶
- quapy.util.create_parent_dir(path)¶
- quapy.util.download_file(url, archive_filename)¶
- quapy.util.download_file_if_not_exists(url, archive_path)¶
- quapy.util.get_quapy_home()¶
- quapy.util.map_parallel(func, args, n_jobs)¶
Applies func to n_jobs slices of args. E.g., if args is an array of 99 items and n_jobs=2, then func is applied in two parallel processes to args[0:50] and to args[50:99]
- quapy.util.parallel(func, args, n_jobs)¶
A wrapper of multiprocessing: Parallel(n_jobs=n_jobs)(
delayed(func)(args_i) for args_i in args
) that takes the quapy.environ variable as input silently
- quapy.util.pickled_resource(pickle_path: str, generation_func: callable, *args)¶
Allows for fast reuse of resources that are generated only once by calling generation_func(*args). The next times this function is invoked, it loads the pickled resource. Example: def some_array(n):
return np.random.rand(n)
pickled_resource(‘./my_array.pkl’, some_array, 10) # the resource does not exist: it is created by some_array(10) pickled_resource(‘./my_array.pkl’, some_array, 10) # the resource exists: it is loaded from ‘./my_array.pkl’ :param pickle_path: the path where to save (first time) and load (next times) the resource :param generation_func: the function that generates the resource, in case it does not exist in pickle_path :param args: any arg that generation_func uses for generating the resources :return: the resource
- quapy.util.save_text_file(path, text)¶
- quapy.util.temp_seed(seed)¶
Can be used in a “with” context to set a temporal seed without modifying the outer numpy’s current state. E.g.: with temp_seed(random_seed):
# do any computation depending on np.random functionality
- Parameters
seed – the seed to set within the “with” context
Module contents¶
- quapy.isbinary(x)¶