improving conf regions docs
This commit is contained in:
parent
c79b76516c
commit
c8235ddb2a
1
TODO.txt
1
TODO.txt
|
|
@ -1,4 +1,3 @@
|
||||||
- [TODO] document confidence in manuals
|
|
||||||
- [TODO] Test the return_type="index" in protocols and finish the "distributing_samples.py" example
|
- [TODO] Test the return_type="index" in protocols and finish the "distributing_samples.py" example
|
||||||
- [TODO] Add EDy (an implementation is available at quantificationlib)
|
- [TODO] Add EDy (an implementation is available at quantificationlib)
|
||||||
- [TODO] add ensemble methods SC-MQ, MC-SQ, MC-MQ
|
- [TODO] add ensemble methods SC-MQ, MC-SQ, MC-MQ
|
||||||
|
|
|
||||||
|
|
@ -221,7 +221,7 @@ Options are:
|
||||||
* `"condsoftmax"` applies softmax normalization only if the prevalence vector lies outside of the probability simplex.
|
* `"condsoftmax"` applies softmax normalization only if the prevalence vector lies outside of the probability simplex.
|
||||||
|
|
||||||
|
|
||||||
#### BayesianCC (_New in v0.1.9_!)
|
#### BayesianCC
|
||||||
|
|
||||||
The `BayesianCC` is a variant of ACC introduced in
|
The `BayesianCC` is a variant of ACC introduced in
|
||||||
[Ziegler, A. and Czyż, P. "Bayesian quantification with black-box estimators", arXiv (2023)](https://arxiv.org/abs/2302.09159),
|
[Ziegler, A. and Czyż, P. "Bayesian quantification with black-box estimators", arXiv (2023)](https://arxiv.org/abs/2302.09159),
|
||||||
|
|
@ -280,8 +280,8 @@ See the API documentation for further details.
|
||||||
### Hellinger Distance y (HDy)
|
### Hellinger Distance y (HDy)
|
||||||
|
|
||||||
Implementation of the method based on the Hellinger Distance y (HDy) proposed by
|
Implementation of the method based on the Hellinger Distance y (HDy) proposed by
|
||||||
[González-Castro, V., Alaiz-Rodrı́guez, R., and Alegre, E. (2013). Class distribution
|
[González-Castro, V., Alaiz-Rodríguez, R., and Alegre, E. (2013). Class distribution
|
||||||
estimation based on the Hellinger distance. Information Sciences, 218:146–164.](https://www.sciencedirect.com/science/article/pii/S0020025512004069)
|
estimation based on the Hellinger distance. Information Sciences, 218:146-164.](https://www.sciencedirect.com/science/article/pii/S0020025512004069)
|
||||||
|
|
||||||
It is implemented in `qp.method.aggregative.HDy` (also accessible
|
It is implemented in `qp.method.aggregative.HDy` (also accessible
|
||||||
through the allias `qp.method.aggregative.HellingerDistanceY`).
|
through the allias `qp.method.aggregative.HellingerDistanceY`).
|
||||||
|
|
@ -423,7 +423,7 @@ _New in v0.1.8_: QuaPy now provides implementations for the three variants
|
||||||
of KDE-based methods proposed in
|
of KDE-based methods proposed in
|
||||||
_[Moreo, A., González, P. and del Coz, J.J., 2023.
|
_[Moreo, A., González, P. and del Coz, J.J., 2023.
|
||||||
Kernel Density Estimation for Multiclass Quantification.
|
Kernel Density Estimation for Multiclass Quantification.
|
||||||
arXiv preprint arXiv:2401.00490.](https://arxiv.org/abs/2401.00490)_.
|
arXiv preprint arXiv:2401.00490](https://arxiv.org/abs/2401.00490)_.
|
||||||
The variants differ in the divergence metric to be minimized:
|
The variants differ in the divergence metric to be minimized:
|
||||||
|
|
||||||
- KDEy-HD: minimizes the (squared) Hellinger Distance and solves the problem via a Monte Carlo approach
|
- KDEy-HD: minimizes the (squared) Hellinger Distance and solves the problem via a Monte Carlo approach
|
||||||
|
|
@ -582,3 +582,25 @@ model.fit(dataset.training)
|
||||||
estim_prevalence = model.quantify(dataset.test.instances)
|
estim_prevalence = model.quantify(dataset.test.instances)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Confidence Regions for Class Prevalence Estimation
|
||||||
|
|
||||||
|
_(New in v0.1.10!)_ Some quantification methods go beyond providing a single point estimate of class prevalence values and also produce confidence regions, which characterize the uncertainty around the point estimate. In QuaPy, two such methods are currently implemented:
|
||||||
|
|
||||||
|
* Aggregative Bootstrap: The Aggregative Bootstrap method extends any aggregative quantifier by generating confidence regions for class prevalence estimates through bootstrapping. Key features of this method include:
|
||||||
|
|
||||||
|
* Optimized Computation: The bootstrap is applied to pre-classified instances, significantly speeding up training and inference.
|
||||||
|
During training, bootstrap repetitions are performed only after training the classifier once. These repetitions are used to train multiple aggregation functions.
|
||||||
|
During inference, bootstrap is applied over pre-classified test instances.
|
||||||
|
* General Applicability: Aggregative Bootstrap can be applied to any aggregative quantifier.
|
||||||
|
For further information, check the [example](https://github.com/HLT-ISTI/QuaPy/tree/master/examples) provided.
|
||||||
|
|
||||||
|
* BayesianCC: is a Bayesian variant of the Adjusted Classify & Count (ACC) quantifier (see more details in [Aggregative Quantifiers](#bayesiancc)).
|
||||||
|
|
||||||
|
Confidence regions are constructed around a point estimate, which is typically computed as the mean value of a set of samples.
|
||||||
|
The confidence region can be instantiated in three ways:
|
||||||
|
* Confidence intervals: are standard confidence intervals generated for each class independently (_method="intervals"_).
|
||||||
|
* Confidence ellipse in the simplex: an ellipse constructed around the mean point; the ellipse lies on the simplex and takes
|
||||||
|
into account possible inter-class dependencies in the data (_method="ellipse"_).
|
||||||
|
* Confidence ellipse in the Centered-Log Ratio (CLR) space: the underlying assumption of the ellipse is that the components are
|
||||||
|
normally distributed. However, we know elements from the simplex have an inner structure. A better approach is to first
|
||||||
|
transform the components into an unconstrained space (the CLR), and then construct the ellipse in such space (_method="ellipse-clr"_).
|
||||||
|
|
@ -1,4 +1,3 @@
|
||||||
from quapy.method.confidence import BayesianCC
|
|
||||||
from quapy.method.confidence import AggregativeBootstrap
|
from quapy.method.confidence import AggregativeBootstrap
|
||||||
from quapy.method.aggregative import PACC
|
from quapy.method.aggregative import PACC
|
||||||
import quapy.functional as F
|
import quapy.functional as F
|
||||||
|
|
@ -26,7 +25,6 @@ train, test = data.train_test
|
||||||
# intervals around the point estimate, in this case, at 95% of confidence
|
# intervals around the point estimate, in this case, at 95% of confidence
|
||||||
pacc = AggregativeBootstrap(PACC(), n_test_samples=500, confidence_level=0.95)
|
pacc = AggregativeBootstrap(PACC(), n_test_samples=500, confidence_level=0.95)
|
||||||
|
|
||||||
|
|
||||||
with qp.util.temp_seed(0):
|
with qp.util.temp_seed(0):
|
||||||
# we train the quantifier the usual way
|
# we train the quantifier the usual way
|
||||||
pacc.fit(train)
|
pacc.fit(train)
|
||||||
|
|
|
||||||
|
|
@ -447,8 +447,13 @@ class BayesianCC(AggregativeCrispQuantifier, WithConfidenceABC):
|
||||||
`$ pip install quapy[bayes]`
|
`$ pip install quapy[bayes]`
|
||||||
|
|
||||||
:param classifier: a sklearn's Estimator that generates a classifier
|
:param classifier: a sklearn's Estimator that generates a classifier
|
||||||
:param val_split: a float in (0, 1) indicating the proportion of the training data to be used,
|
:param val_split: specifies the data used for generating classifier predictions. This specification
|
||||||
as a stratified held-out validation set, for generating classifier predictions.
|
can be made as float in (0, 1) indicating the proportion of stratified held-out validation set to
|
||||||
|
be extracted from the training set; or as an integer (default 5), indicating that the predictions
|
||||||
|
are to be generated in a `k`-fold cross-validation manner (with this integer indicating the value
|
||||||
|
for `k`); or as a collection defining the specific set of data to use for validation.
|
||||||
|
Alternatively, this set can be specified at fit time by indicating the exact set of data
|
||||||
|
on which the predictions are to be generated.
|
||||||
:param num_warmup: number of warmup iterations for the MCMC sampler (default 500)
|
:param num_warmup: number of warmup iterations for the MCMC sampler (default 500)
|
||||||
:param num_samples: number of samples to draw from the posterior (default 1000)
|
:param num_samples: number of samples to draw from the posterior (default 1000)
|
||||||
:param mcmc_seed: random seed for the MCMC sampler (default 0)
|
:param mcmc_seed: random seed for the MCMC sampler (default 0)
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue