84 lines
4.4 KiB
Python
84 lines
4.4 KiB
Python
from quapy.method.confidence import BayesianCC
|
|
from quapy.method.confidence import AggregativeBootstrap
|
|
from quapy.method.aggregative import PACC
|
|
import quapy.functional as F
|
|
import quapy as qp
|
|
|
|
"""
|
|
Just like any other type of estimator, quantifier predictions are affected by error. It is therefore useful to provide,
|
|
along with the point estimate (the class prevalence values) a measure of uncertainty. These, typically come in the
|
|
form of credible regions around the point estimate.
|
|
|
|
QuaPy implements a method for deriving confidence regions around point estimates of class prevalence based on bootstrap.
|
|
|
|
Bootstrap method comes down to resampling the population several times, thus generating a series of point estimates.
|
|
QuaPy provides a variant of bootstrap for aggregative quantifiers, that only applies resampling to the pre-classified
|
|
instances.
|
|
|
|
Let see one example:
|
|
"""
|
|
|
|
# load some data
|
|
data = qp.datasets.fetch_UCIMulticlassDataset('molecular')
|
|
train, test = data.train_test
|
|
Xtr, ytr = train.Xy
|
|
|
|
# by simply wrapping an aggregative quantifier within the AggregativeBootstrap class, we can obtain confidence
|
|
# intervals around the point estimate, in this case, at 95% of confidence
|
|
pacc = AggregativeBootstrap(PACC(), n_test_samples=500, confidence_level=0.95)
|
|
|
|
|
|
with qp.util.temp_seed(0):
|
|
# we train the quantifier the usual way
|
|
pacc.fit(Xtr, ytr)
|
|
|
|
# let us simulate some shift in the test data
|
|
random_prevalence = F.uniform_prevalence_sampling(n_classes=test.n_classes)
|
|
shifted_test = test.sampling(200, *random_prevalence)
|
|
true_prev = shifted_test.prevalence()
|
|
|
|
# by calling "quantify_conf", we obtain the point estimate and the confidence intervals around it
|
|
pred_prev, conf_intervals = pacc.quantify_conf(shifted_test.X)
|
|
|
|
# conf_intervals is an instance of ConfidenceRegionABC, which provides some useful utilities like:
|
|
# - coverage: a function which computes the fraction of true values that belong to the confidence region
|
|
# - simplex_proportion: estimates the proportion of the simplex covered by the confidence region (amplitude)
|
|
# ideally, we are interested in obtaining confidence regions with high level of coverage and small amplitude
|
|
|
|
# the point estimate is computed as the mean of all bootstrap predictions; let us see the prediction error
|
|
error = qp.error.ae(true_prev, pred_prev)
|
|
|
|
# some useful outputs
|
|
print(f'train prevalence: {F.strprev(train.prevalence())}')
|
|
print(f'test prevalence: {F.strprev(true_prev)}')
|
|
print(f'point-estimate: {F.strprev(pred_prev)}')
|
|
print(f'absolute error: {error:.3f}')
|
|
print(f'Is the true value in the confidence region?: {conf_intervals.coverage(true_prev)==1}')
|
|
print(f'Proportion of simplex covered at confidence level {pacc.confidence_level*100:.1f}%: {conf_intervals.simplex_portion()*100:.2f}%')
|
|
|
|
"""
|
|
Final remarks:
|
|
There are various ways for performing bootstrap:
|
|
- the population-based approach (default): performs resampling of the test instances
|
|
e.g., use AggregativeBootstrap(PACC(), n_train_samples=1, n_test_samples=100, confidence_level=0.95)
|
|
- the model-based approach: performs resampling of the training instances, thus training several quantifiers
|
|
e.g., use AggregativeBootstrap(PACC(), n_train_samples=100, n_test_samples=1, confidence_level=0.95)
|
|
this implementation avoids retraining the classifier, and performs resampling only to train different aggregation functions
|
|
- the combined approach: a combination of the above
|
|
e.g., use AggregativeBootstrap(PACC(), n_train_samples=100, n_test_samples=100, confidence_level=0.95)
|
|
this example will generate 100 x 100 predictions
|
|
|
|
There are different ways for constructing confidence regions implemented in QuaPy:
|
|
- confidence intervals: the simplest way, and one that typically works well in practice
|
|
use: AggregativeBootstrap(PACC(), confidence_level=0.95, method='intervals')
|
|
- confidence ellipse in the simplex: creates an ellipse, which lies on the probability simplex, around the point estimate
|
|
use: AggregativeBootstrap(PACC(), confidence_level=0.95, method='ellipse')
|
|
- confidence ellipse in the Centered-Log Ratio (CLR) space: creates an ellipse in the CLR space (this should be
|
|
convenient for taking into account the inner structure of the probability simplex)
|
|
use: AggregativeBootstrap(PACC(), confidence_level=0.95, method='ellipse-clr')
|
|
|
|
Other methods that return confidence regions in QuaPy include the BayesianCC method.
|
|
"""
|
|
|
|
|