diff --git a/CHANGE_LOG.txt b/CHANGE_LOG.txt
index 48ee2f3..a701186 100644
--- a/CHANGE_LOG.txt
+++ b/CHANGE_LOG.txt
@@ -1,4 +1,4 @@
-Change Log 0.1.10
+Change Log 0.2.0
 -----------------
 
 CLEAN TODO-FILE
@@ -6,7 +6,7 @@ CLEAN TODO-FILE
 - Base code Refactor:
     - Removing coupling between LabelledCollection and quantification methods; the fit interface changes:
         def fit(data:LabelledCollection): -> def fit(X, y):
-    - Adding function "predict" (function "quantify" is still present as an alias)
+    - Adding function "predict" (function "quantify" is still present as an alias, for the nostalgic)
     - Aggregative methods's behavior in terms of fit_classifier and how to treat the val_split is now
         indicated exclusively at construction time, and it is no longer possible to indicate it at fit time.
         This is because, in v<=0.1.9, one could create a method (e.g., ACC) and then indicate:
@@ -21,15 +21,16 @@ CLEAN TODO-FILE
     - A new parameter "on_calib_error" is passed to the constructor, which informs of the policy to follow
         in case the abstention's calibration functions failed (which happens sometimes). Options include:
             - 'raise': raises a RuntimeException (default)
-            - 'backup': reruns avoiding calibration
+            - 'backup': reruns by silently avoiding calibration
     - Parameter "recalib" has been renamed "calib"
 - Added aggregative bootstrap for deriving confidence regions (confidence intervals, ellipses in the simplex, or
     ellipses in the CLR space). This method is efficient as it leverages the two-phases of the aggregative quantifiers.
     This method applies resampling only to the aggregation phase, thus avoiding to train many quantifiers, or
     classify multiple times the instances of a sample. See:
     - quapy/method/confidence.py (new)
-    - the new example no. 15.
-- BayesianCC moved to confidence.py, where methods having to do with confidence intervals live
+    - the new example no. 16.confidence_regions.py
+- BayesianCC moved to confidence.py, where methods having to do with confidence intervals belong.
+- Improved documentation of qp.plot module.
 
 
 Change Log 0.1.9
diff --git a/docs/source/manuals/datasets.md b/docs/source/manuals/datasets.md
index c9e4169..b7d8827 100644
--- a/docs/source/manuals/datasets.md
+++ b/docs/source/manuals/datasets.md
@@ -340,10 +340,10 @@ and a set of test samples (for evaluation). QuaPy returns this data as a Labelle
 (training) and two generation protocols (for validation and test samples), as follows:
 
 ```python
-training, val_generator, test_generator = fetch_lequa2022(task=task)
+training, val_generator, test_generator = qp.datasets.fetch_lequa2022(task=task)
 ```
 
-See the `lequa2022_experiments.py` in the examples folder for further details on how to
+See the `5a.lequa2022_experiments.py` in the examples folder for further details on how to
 carry out experiments using these datasets.  
 
 The datasets are downloaded only once, and stored for fast reuse.
@@ -365,6 +365,53 @@ Esuli, A., Moreo, A., Sebastiani, F., & Sperduti, G. (2022).
 A Detailed Overview of LeQua@ CLEF 2022: Learning to Quantify.
 ```
 
+## LeQua 2024 Datasets
+
+QuaPy also provides the datasets used for the [LeQua 2024 competition](https://lequa2024.github.io/). 
+In brief, there are 4 tasks:
+* T1: binary quantification (by sentiment)
+* T2: multiclass quantification (28 classes, merchandise products)
+* T3: ordinal quantification (5-stars sentiment ratings)
+* T4: binary sentiment quantification under a combination of covariate shift and prior shift
+
+In all cases, the covariate space has 256 dimensions (extracted using the `ELECTRA-Small` model).
+
+Every task consists of a training set, a set of validation samples (for model selection)
+and a set of test samples (for evaluation). QuaPy returns this data as a LabelledCollection
+(training bags) and sampling generation protocols (for validation and test bags). 
+T3 also offers the possibility to obtain a series of training bags (in form of a 
+sampling generation protocol) instead of one single training bag. Use it as follows:
+
+```python
+training, val_generator, test_generator = qp.datasets.fetch_lequa2024(task=task)
+```
+
+See the `5b.lequa2024_experiments.py` in the examples folder for further details on how to
+carry out experiments using these datasets.  
+
+The datasets are downloaded only once, and stored for fast reuse.
+
+Some statistics are summarized below:
+
+| Dataset | classes | train size  | validation samples | test samples | docs by sample |   type   |
+|---------|:-------:|:-----------:|:------------------:|:------------:|:--------------:|:--------:| 
+| T1      |    2    |    5000     |        1000        |     5000     |      250       |  vector  | 
+| T2      |   28    |    20000    |        1000        |     5000     |      1000      |  vector  |
+| T3      |    5    | 100 samples |        1000        |     5000     |      200       |   vector   |
+| T4      |    2    |    5000     |        1000        |     5000     |      250       |   vector   |
+
+For further details on the datasets or the competition, we refer to 
+[the official site](https://lequa2024.github.io/data/) and
+[the overview paper](http://nmis.isti.cnr.it/sebastiani/Publications/LQ2024.pdf).
+
+```
+Esuli, A., Moreo, A., Sebastiani, F., & Sperduti, G. (2022).
+An Overview of LeQua 2024, the 2nd International Data Challenge on Learning to Quantify,
+Proceedings of the 4th International Workshop on Learning to Quantify (LQ 2024), 
+ECML-PKDD 2024, Vilnius, Lithuania.
+```
+
+
 ## IFCB Plankton dataset
 
 IFCB is a dataset of plankton species in water samples hosted in `Zenodo <https://zenodo.org/records/10036244>`_.
@@ -410,8 +457,12 @@ Journal of Plankton Research 41 (4), 449-463](https://par.nsf.gov/servlets/purl/
 
 ## Adding Custom Datasets
 
+It is straightforward to import your own datasets into QuaPy. 
+I what follows, there are some code snippets for doing so; see also the example
+[3.custom_collection.py](https://github.com/HLT-ISTI/QuaPy/blob/master/examples/3.custom_collection.py).
+
 QuaPy provides data loaders for simple formats dealing with 
-text, following the format:
+text; for example, use `qp.data.reader.from_text` for the following the format:
 
 ```
 class-id \t first document's pre-processed text \n
@@ -419,13 +470,16 @@ class-id \t second document's pre-processed text \n
 ...
 ```
 
-and sparse representations of the form:
+or `qp.data.reader.from_sparse` for sparse representations of the form:
 
 ```
 {-1, 0, or +1} col(int):val(float) col(int):val(float) ... \n
 ...
 ```
 
+both functions return a tuple `X, y` containing a list of strings and the corresponding 
+labels, respectively.
+
 The code in charge in loading a LabelledCollection is:
 
 ```python
@@ -434,12 +488,13 @@ def load(cls, path:str, loader_func:callable):
     return LabelledCollection(*loader_func(path))
 ```
 
-indicating that any _loader_func_ (e.g., a user-defined one) which 
+indicating that any `loader_func` (e.g., `from_text`, `from_sparse`, `from_csv`, or a user-defined one) which 
 returns valid arguments for initializing a _LabelledCollection_ object will allow
-to load any collection. In particular, the _LabelledCollection_ receives as 
-arguments the instances (as an iterable) and the labels (as an iterable) and,
-additionally, the number of classes can be specified (it would otherwise be
-inferred from the labels, but that requires at least one positive example for
+to load any collection. More specifically, the _LabelledCollection_ receives as 
+arguments the _instances_ (iterable) and the _labels_ (iterable) and,
+optionally, the number of classes (it would be
+inferred from the labels if not indicated, but this requires at least one 
+positive example for
 all classes to be present in the collection).
 
 The same _loader_func_ can be passed to a Dataset, along with two 
@@ -452,20 +507,23 @@ import quapy as qp
 train_path = '../my_data/train.dat'
 test_path = '../my_data/test.dat'
 
-def my_custom_loader(path):
+def my_custom_loader(path, **custom_kwargs):
     with open(path, 'rb') as fin:
         ...
     return instances, labels
 
-data = qp.data.Dataset.load(train_path, test_path, my_custom_loader)
+data = qp.data.Dataset.load(train_path, test_path, my_custom_loader, **custom_kwargs)
 ```
 
 ### Data Processing
 
-QuaPy implements a number of preprocessing functions in the package _qp.data.preprocessing_, including:
+QuaPy implements a number of preprocessing functions in the package `qp.data.preprocessing`, including:
 
 * _text2tfidf_: tfidf vectorization 
 * _reduce_columns_: reducing the number of columns based on term frequency
 * _standardize_: transforms the column values into z-scores (i.e., subtract the mean and normalizes by the standard deviation, so
 that the column values have zero mean and unit variance).
 * _index_: transforms textual tokens into lists of numeric ids
+
+These functions are applied to `Dataset` objects, and offer the possibility to apply the transformation
+inline (thus modifying the original dataset), or to return a modified copy.
\ No newline at end of file
diff --git a/docs/source/manuals/evaluation.md b/docs/source/manuals/evaluation.md
index e5404a3..aba7068 100644
--- a/docs/source/manuals/evaluation.md
+++ b/docs/source/manuals/evaluation.md
@@ -46,18 +46,18 @@ e.g.:
 
 ```python
 qp.environ['SAMPLE_SIZE'] = 100  # once for all
-true_prev = np.asarray([0.5, 0.3, 0.2])  # let's assume 3 classes
-estim_prev = np.asarray([0.1, 0.3, 0.6])
+true_prev = [0.5, 0.3, 0.2]  # let's assume 3 classes
+estim_prev = [0.1, 0.3, 0.6]
 error = qp.error.mrae(true_prev, estim_prev)
 print(f'mrae({true_prev}, {estim_prev}) = {error:.3f}')
 ```
 
 will print:
 ```
-mrae([0.500, 0.300, 0.200], [0.100, 0.300, 0.600]) = 0.914
+mrae([0.5, 0.3, 0.2], [0.1, 0.3, 0.6]) = 0.914
 ```
 
-Finally, it is possible to instantiate QuaPy's quantification
+It is also possible to instantiate QuaPy's quantification
 error functions from strings using, e.g.:
 
 ```python
@@ -85,7 +85,7 @@ print(f'MAE = {mae:.4f}')
 ```
 
 It is often desirable to evaluate our system using more than one
-single evaluatio measure. In this case, it is convenient to generate
+single evaluation measure. In this case, it is convenient to generate
 a _report_. A report in QuaPy is a dataframe accounting for all the
 true prevalence values with their corresponding prevalence values
 as estimated by the quantifier, along with the error each has given
@@ -104,7 +104,7 @@ report['estim-prev'] = report['estim-prev'].map(F.strprev)
 print(report)
 
 print('Averaged values:')
-print(report.mean())
+print(report.mean(numeric_only=True))
 ```
 
 This will produce an output like:
@@ -141,11 +141,14 @@ true_prevs, estim_prevs = qp.evaluation.prediction(quantifier, protocol=prot)
 
 All the evaluation functions implement specific optimizations for speeding-up 
 the evaluation of aggregative quantifiers (i.e., of instances of _AggregativeQuantifier_).
+
 The optimization comes down to generating classification predictions (either crisp or soft) 
 only once for the entire test set, and then applying the sampling procedure to the
 predictions, instead of generating samples of instances and then computing the 
 classification predictions every time. This is only possible when the protocol
-is an instance of _OnLabelledCollectionProtocol_. The optimization is only 
+is an instance of _OnLabelledCollectionProtocol_. 
+
+The optimization is only 
 carried out when the number of classification predictions thus generated would be
 smaller than the number of predictions required for the entire protocol; e.g., 
 if the original dataset contains 1M instances, but the protocol is such that it would
@@ -156,4 +159,4 @@ precompute all the predictions irrespectively of the number of instances and num
 Finally, this can be deactivated by setting _aggr_speedup=False_. Note that this optimization
 is not only applied for the final evaluation, but also for the internal evaluations carried
 out during _model selection_. Since these are typically many, the heuristic can help reduce the
-execution time a lot.
\ No newline at end of file
+execution time significatively.
\ No newline at end of file
diff --git a/docs/source/manuals/methods.md b/docs/source/manuals/methods.md
index 570075f..4fa8d08 100644
--- a/docs/source/manuals/methods.md
+++ b/docs/source/manuals/methods.md
@@ -1,7 +1,7 @@
 # Quantification Methods
 
 Quantification methods can be categorized as belonging to
-`aggregative` and `non-aggregative` groups. 
+`aggregative`, `non-aggregative`, and `meta-learning` groups. 
 Most methods included in QuaPy at the moment are of type `aggregative`
 (though we plan to add many more methods in the near future), i.e.,
 are methods characterized by the fact that
@@ -12,21 +12,17 @@ Any quantifier in QuaPy shoud extend the class `BaseQuantifier`,
 and implement some abstract methods:
 ```python
     @abstractmethod
-    def fit(self, data: LabelledCollection): ...
+    def fit(self, X, y): ...
 
     @abstractmethod
-    def quantify(self, instances): ...
+    def predict(self, X): ...
 ```
 The meaning of those functions should be familiar to those
 used to work with scikit-learn since the class structure of QuaPy
 is directly inspired by scikit-learn's _Estimators_. Functions
-`fit` and `quantify` are used to train the model and to provide
-class estimations (the reason why
-scikit-learn' structure has not been adopted _as is_ in QuaPy responds to 
-the fact that scikit-learn's `predict` function is expected to return
-one output for each input element --e.g., a predicted label for each
-instance in a sample-- while in quantification the output for a sample
-is one single array of class prevalences).
+`fit` and `predict` (for which there is an alias `quantify`) 
+are used to train the model and to provide
+class estimations.
 Quantifiers also extend from scikit-learn's `BaseEstimator`, in order
 to simplify the use of `set_params` and `get_params` used in 
 [model selection](./model-selection).
@@ -40,21 +36,26 @@ The methods that any `aggregative` quantifier must implement are:
 
 ```python
     @abstractmethod
-    def aggregation_fit(self, classif_predictions: LabelledCollection, data: LabelledCollection):
+    def aggregation_fit(self, classif_predictions, labels):
 
     @abstractmethod
-    def aggregate(self, classif_predictions:np.ndarray): ...
+    def aggregate(self, classif_predictions): ...
 ```
 
-These two functions replace the `fit` and `quantify` methods, since those
-come with default implementations. The `fit` function is provided and amounts to: 
+The argument `classif_predictions` is whatever the method `classify` returns. 
+QuaPy comes with default implementations that cover most common cases, but you can
+override `classify` in case your method requires further or different information to work.
+
+These two functions replace the `fit` and `predict` methods, which
+come with default implementations. For instance, the `fit` function is 
+provided and amounts to: 
 
 ```python
-def fit(self, data: LabelledCollection, fit_classifier=True, val_split=None):
-    self._check_init_parameters()
-    classif_predictions = self.classifier_fit_predict(data, fit_classifier, predict_on=val_split)
-    self.aggregation_fit(classif_predictions, data)
-    return self
+    def fit(self, X, y):
+        self._check_init_parameters()
+        classif_predictions, labels = self.classifier_fit_predict(X, y)
+        self.aggregation_fit(classif_predictions, labels)
+        return self
 ```
 
 Note that this function fits the classifier, and generates the predictions. This is assumed
@@ -72,11 +73,11 @@ overriden (if needed) and allows the method to quickly raise any exception based
 found in the `__init__` arguments, thus avoiding to break after training the classifier and generating
 predictions.
 
-Similarly, the function `quantify` is provided, and amounts to:
+Similarly, the function `predict` (alias `quantify`) is provided, and amounts to:
 
 ```python
-def quantify(self, instances):
-    classif_predictions = self.classify(instances)
+def predict(self, X):
+    classif_predictions = self.classify(X)
     return self.aggregate(classif_predictions)
 ```
 
@@ -84,12 +85,14 @@ in which only the function `aggregate` is required to be overriden in most cases
 
 Aggregative quantifiers are expected to maintain a classifier (which is
 accessed through the `@property` `classifier`). This classifier is
-given as input to the quantifier, and can be already fit
-on external data (in which case, the `fit_learner` argument should
-be set to False), or be fit by the quantifier's fit (default).
+given as input to the quantifier, and will be trained by the quantifier's fit (default).
+Alternatively, the classifier can be already fit on external data; in this case, the `fit_learner` 
+argument in the `__init__` should be set to False (see [4.using_pretrained_classifier.py](https://github.com/HLT-ISTI/QuaPy/blob/master/examples/4.using_pretrained_classifier.py)
+for a full code example).
 
-The above patterns (in training: fit the classifier, then fit the aggregation; 
-in test: classify, then aggregate) allows QuaPy to optimize many internal procedures.
+The above patterns (in training: (i) fit the classifier, then (ii) fit the aggregation; 
+in test: (i) classify, then (ii) aggregate) allows QuaPy to optimize many internal procedures, 
+on the grounds that steps (i) are slower than steps (ii). 
 In particular, the model selection routing takes advantage of this two-step process
 and generates classifiers only for the valid combinations of hyperparameters of the 
 classifier, and then _clones_ these classifiers and explores the combinations
@@ -124,6 +127,7 @@ import quapy.functional as F
 from sklearn.svm import LinearSVC
 
 training, test = qp.datasets.fetch_twitter('hcr', pickle=True).train_test
+Xtr, ytr = training.Xy
 
 # instantiate a classifier learner, in this case a SVM
 svm = LinearSVC()
@@ -131,7 +135,7 @@ svm = LinearSVC()
 # instantiate a Classify & Count with the SVM
 # (an alias is available in qp.method.aggregative.ClassifyAndCount)
 model = qp.method.aggregative.CC(svm)
-model.fit(training)
+model.fit(Xtr, ytr)
 estim_prevalence = model.predict(test.instances)
 ```
 
@@ -153,26 +157,14 @@ predictions. This parameters can also be set with an integer,
 indicating that the parameters should be estimated by means of
 _k_-fold cross-validation, for which the integer indicates the
 number _k_ of folds (the default value is 5). Finally, `val_split` can be set to a 
-specific held-out validation set (i.e., an instance of `LabelledCollection`).
-
-The specification of `val_split` can be
-postponed to the invokation of the fit method (if `val_split` was also
-set in the constructor, the one specified at fit time would prevail), 
-e.g.:
-
-```python
-model = qp.method.aggregative.ACC(svm)
-# perform 5-fold cross validation for estimating ACC's parameters
-# (overrides the default val_split=0.4 in the constructor)
-model.fit(training, val_split=5)
-```
+specific held-out validation set (i.e., an tuple `(X,y)`).
 
 The following code illustrates the case in which PCC is used:
 
 ```python
 model = qp.method.aggregative.PCC(svm)
-model.fit(training)
-estim_prevalence = model.predict(test.instances)
+model.fit(Xtr, ytr)
+estim_prevalence = model.predict(Xte)
 print('classifier:', model.classifier)
 ```
 In this case, QuaPy will print:
@@ -185,11 +177,11 @@ is not a probabilistic classifier (i.e., it does not implement the
 `predict_proba` method) and so, the classifier will be converted to
 a probabilistic one through [calibration](https://scikit-learn.org/stable/modules/calibration.html).
 As a result, the classifier that is printed in the second line points
-to a `CalibratedClassifier` instance. Note that calibration can only
-be applied to hard classifiers when `fit_learner=True`; an exception 
+to a `CalibratedClassifierCV` instance. Note that calibration can only
+be applied to hard classifiers if `fit_learner=True`; an exception 
 will be raised otherwise.
 
-Lastly, everything we said aboud ACC and PCC
+Lastly, everything we said about ACC and PCC
 applies to PACC as well.
 
 _New in v0.1.9_: quantifiers ACC and PACC now have three additional arguments: `method`, `solver` and `norm`:
@@ -259,22 +251,28 @@ An example of use can be found below:
 import quapy as qp
 from sklearn.linear_model import LogisticRegression
 
-dataset = qp.datasets.fetch_twitter('hcr', pickle=True)
+train, test = qp.datasets.fetch_twitter('hcr', pickle=True).train_test
 
 model = qp.method.aggregative.EMQ(LogisticRegression())
-model.fit(dataset.training)
-estim_prevalence = model.predict(dataset.test.instances)
+model.fit(*train.Xy)
+estim_prevalence = model.predict(test.X)
 ```
 
-_New in v0.1.7_: EMQ now accepts two new parameters in the construction method, namely
-`exact_train_prev` which allows to use the true training prevalence as the departing
-prevalence estimation (default behaviour), or instead an approximation of it as 
+EMQ accepts additional parameters in the construction method:
+* `exact_train_prev`: set to True for using the true training prevalence as the departing
+prevalence estimation (default behaviour), or to False for using an approximation of it as
 suggested by [Alexandari et al. (2020)](http://proceedings.mlr.press/v119/alexandari20a.html) 
-(by setting `exact_train_prev=False`).
-The other parameter is `recalib` which allows to indicate a calibration method, among those
+* `calib`: allows to indicate a calibration method, among those
 proposed by [Alexandari et al. (2020)](http://proceedings.mlr.press/v119/alexandari20a.html),
-including the Bias-Corrected Temperature Scaling, Vector Scaling, etc.
-See the API documentation for further details. 
+including the Bias-Corrected Temperature Scaling 
+(`bcts`), Vector Scaling (`bcts`), No-Bias Temperature Scaling (`nbvs`), 
+or Temperature Scaling (`ts`); default is `None` (no calibration).
+* `on_calib_error`: indicates the policy to follow in case the calibrator fails at runtime.
+        Options include `raise` (default), in which case a RuntimeException is raised; and `backup`, in which
+        case the calibrator is silently skipped.
+
+You can use the class method `EMQ_BCTS` to effortlessly instantiate EMQ with the best performing
+heuristics found by [Alexandari et al. (2020)](http://proceedings.mlr.press/v119/alexandari20a.html). See the API documentation for further details. 
 
 
 ### Hellinger Distance y (HDy)
@@ -289,11 +287,10 @@ This method works with a probabilistic classifier (hard classifiers
 can be used as well and will be calibrated) and requires a validation
 set to estimate parameter for the mixture model. Just like 
 ACC and PACC, this quantifier receives a `val_split` argument
-in the constructor (or in the fit method, in which case the previous
-value is overridden) that can either be a float indicating the proportion
+in the constructor that can either be a float indicating the proportion
 of training data to be taken as the validation set (in a random
-stratified split), or a validation set (i.e., an instance of 
-`LabelledCollection`) itself. 
+stratified split), or the validation set itself (i.e., an tuple
+`(X,y)`). 
 
 HDy was proposed as a binary classifier and the implementation
 provided in QuaPy accepts only binary datasets. 
@@ -309,11 +306,11 @@ dataset = qp.datasets.fetch_reviews('hp', pickle=True)
 qp.data.preprocessing.text2tfidf(dataset, min_df=5, inplace=True)
 
 model = qp.method.aggregative.HDy(LogisticRegression())
-model.fit(dataset.training)
-estim_prevalence = model.predict(dataset.test.instances)
+model.fit(*dataset.training.Xy)
+estim_prevalence = model.predict(dataset.test.X)
 ```
 
-_New in v0.1.7:_ QuaPy now provides an implementation of the generalized
+QuaPy also provides an implementation of the generalized
 "Distribution Matching" approaches for multiclass, inspired by the framework
 of [Firat (2016)](https://arxiv.org/abs/1606.00868). One can instantiate
 a variant of HDy for multiclass quantification as follows:
@@ -322,17 +319,22 @@ a variant of HDy for multiclass quantification as follows:
 mutliclassHDy = qp.method.aggregative.DMy(classifier=LogisticRegression(), divergence='HD', cdf=False)
 ``` 
 
-_New in v0.1.7:_ QuaPy now provides an implementation of the "DyS"
+QuaPy also provides an implementation of the "DyS"
 framework proposed by [Maletzke et al (2020)](https://ojs.aaai.org/index.php/AAAI/article/view/4376)
 and the "SMM" method proposed by [Hassan et al (2019)](https://ieeexplore.ieee.org/document/9260028)
 (thanks to _Pablo González_ for the contributions!)
 
 ### Threshold Optimization methods
 
-_New in v0.1.7:_ QuaPy now implements Forman's threshold optimization methods;
+QuaPy implements Forman's threshold optimization methods;
 see, e.g., [(Forman 2006)](https://dl.acm.org/doi/abs/10.1145/1150402.1150423) 
 and [(Forman 2008)](https://link.springer.com/article/10.1007/s10618-008-0097-y).
-These include: T50, MAX, X, Median Sweep (MS), and its variant MS2.
+These include: `T50`, `MAX`, `X`, Median Sweep (`MS`), and its variant `MS2`.
+
+These methods are binary-only and implement different heuristics for 
+improving the stability of the denominator of the ACC adjustment (`tpr-fpr`).
+The methods are called "threshold" since said heuristics have to do
+with different choices of the underlying classifier's threshold.
 
 ### Explicit Loss Minimization
 
@@ -415,16 +417,18 @@ model.fit(dataset.training)
 estim_prevalence = model.predict(dataset.test.instances)
 ```
 
-Check the examples on [explicit_loss_minimization](https://github.com/HLT-ISTI/QuaPy/blob/devel/examples/5.explicit_loss_minimization.py)
+Check the examples on [explicit loss minimization](https://github.com/HLT-ISTI/QuaPy/blob/devel/examples/17.explicit_loss_minimization.py)
 and on [one versus all quantification](https://github.com/HLT-ISTI/QuaPy/blob/devel/examples/10.one_vs_all.py) for more details.
+**Note** that the _one versus all_ approach is considered inappropriate under prior probability shift, though. 
 
 ### Kernel Density Estimation methods (KDEy)
 
-_New in v0.1.8_: QuaPy now provides implementations for the three variants
+QuaPy provides implementations for the three variants
 of KDE-based methods proposed in 
-_[Moreo, A., González, P. and del Coz, J.J., 2023. 
+_[Moreo, A., González, P. and del Coz, J.J.. 
 Kernel Density Estimation for Multiclass Quantification. 
-arXiv preprint arXiv:2401.00490](https://arxiv.org/abs/2401.00490)_. 
+Machine Learning. Vol 114 (92), 2025](https://link.springer.com/article/10.1007/s10994-024-06726-5)_
+(a [preprint](https://arxiv.org/abs/2401.00490) is available online). 
 The variants differ in the divergence metric to be minimized:
 
 - KDEy-HD: minimizes the (squared) Hellinger Distance and solves the problem via a Monte Carlo approach
@@ -435,22 +439,27 @@ These methods are specifically devised for multiclass problems (although they ca
 binary problems too). 
 
 All KDE-based methods depend on the hyperparameter `bandwidth` of the kernel. Typical values
-that can be explored in model selection range in [0.01, 0.25]. The methods' performance
-vary smoothing with smooth variations of this hyperparameter.
+that can be explored in model selection range in [0.01, 0.25]. Previous experiments reveal the methods' performance
+varies smoothly at small variations of this hyperparameter.
 
 
 ## Composable Methods
 
-The [](quapy.method.composable) module allows the composition of quantification methods from loss functions and feature transformations. Any composed method solves a linear system of equations by minimizing the loss after transforming the data. Methods of this kind include ACC, PACC, HDx, HDy, and many other well-known methods, as well as an unlimited number of re-combinations of their building blocks.
+The `quapy.method.composable` module integrates [qunfold](https://github.com/mirkobunse/qunfold) allows the composition
+of quantification methods from loss functions and feature transformations (thanks to Mirko Bunse for the integration!). 
+
+Any composed method solves a linear system of equations by minimizing the loss after transforming the data. Methods of this kind include ACC, PACC, HDx, HDy, and many other well-known methods, as well as an unlimited number of re-combinations of their building blocks.
 
 ### Installation
 
 ```sh
 pip install --upgrade pip setuptools wheel
 pip install "jax[cpu]"
-pip install "qunfold @ git+https://github.com/mirkobunse/qunfold@v0.1.4"
+pip install "qunfold @ git+https://github.com/mirkobunse/qunfold@v0.1.5"
 ```
 
+**Note:** since version 0.2.0, QuaPy is only compatible with qunfold >=0.1.5. 
+
 ### Basics
 
 The composition of a method is implemented through the [](quapy.method.composable.ComposableQuantifier) class. Its documentation also features an example to get you started in composing your own methods.
@@ -529,10 +538,11 @@ from quapy.method.meta import Ensemble
 from sklearn.linear_model import LogisticRegression
 
 dataset = qp.datasets.fetch_UCIBinaryDataset('haberman')
+train, test = dataset.train_test
 
 model = Ensemble(quantifier=ACC(LogisticRegression()), size=30, policy='ave', n_jobs=-1)
-model.fit(dataset.training)
-estim_prevalence = model.predict(dataset.test.instances)
+model.fit(*train.Xy)
+estim_prevalence = model.predict(test.X)
 ```
 
 Other aggregation policies implemented in QuaPy include:
@@ -579,13 +589,13 @@ learner = NeuralClassifierTrainer(cnn, device='cuda')
 
 # train QuaNet
 model = QuaNet(learner, device='cuda')
-model.fit(dataset.training)
-estim_prevalence = model.predict(dataset.test.instances)
+model.fit(*dataset.training.Xy)
+estim_prevalence = model.predict(dataset.test.X)
 ```
 
 ## Confidence Regions for Class Prevalence Estimation
 
-_(New in v0.1.10!)_ Some quantification methods go beyond providing a single point estimate of class prevalence values and also produce confidence regions, which characterize the uncertainty around the point estimate. In QuaPy, two such methods are currently implemented:
+_(New in v0.2.0!)_ Some quantification methods go beyond providing a single point estimate of class prevalence values and also produce confidence regions, which characterize the uncertainty around the point estimate. In QuaPy, two such methods are currently implemented:
 
 * Aggregative Bootstrap: The Aggregative Bootstrap method extends any aggregative quantifier by generating confidence regions for class prevalence estimates through bootstrapping. Key features of this method include:
 
@@ -593,9 +603,9 @@ _(New in v0.1.10!)_ Some quantification methods go beyond providing a single poi
 During training, bootstrap repetitions are performed only after training the classifier once. These repetitions are used to train multiple aggregation functions.
 During inference, bootstrap is applied over pre-classified test instances.
   * General Applicability: Aggregative Bootstrap can be applied to any aggregative quantifier.
-  For further information, check the [example](https://github.com/HLT-ISTI/QuaPy/tree/master/examples) provided.
+  For further information, check the [example](https://github.com/HLT-ISTI/QuaPy/tree/master/examples/16.confidence_regions.py) provided.
 
-* BayesianCC: is a Bayesian variant of the Adjusted Classify & Count (ACC) quantifier (see more details in [Aggregative Quantifiers](#bayesiancc)).
+* BayesianCC: is a Bayesian variant of the Adjusted Classify & Count (ACC) quantifier; see more details in the [example](https://github.com/HLT-ISTI/QuaPy/tree/master/examples/14.bayesian_quantification.py) provided.
 
 Confidence regions are constructed around a point estimate, which is typically computed as the mean value of a set of samples.
 The confidence region can be instantiated in three ways:
diff --git a/docs/source/manuals/model-selection.md b/docs/source/manuals/model-selection.md
index 097f902..6470ebf 100644
--- a/docs/source/manuals/model-selection.md
+++ b/docs/source/manuals/model-selection.md
@@ -87,7 +87,7 @@ model = qp.model_selection.GridSearchQ(
     error='mae',  # the error to optimize is the MAE (a quantification-oriented loss)
     refit=True,  # retrain on the whole labelled set once done
     verbose=True  # show information as the process goes on
-).fit(training)
+).fit(*training.Xy)
 
 print(f'model selection ended: best hyper-parameters={model.best_params_}')
 model = model.best_model_
@@ -133,7 +133,7 @@ learner = GridSearchCV(
     LogisticRegression(),
     param_grid={'C': np.logspace(-4, 5, 10), 'class_weight': ['balanced', None]},
     cv=5)
-model = DistributionMatching(learner).fit(dataset.train)
+model = DistributionMatching(learner).fit(*dataset.train.Xy)
 ```
 
 However, this is conceptually flawed, since the model should be
diff --git a/docs/source/manuals/plotting.md b/docs/source/manuals/plotting.md
index ec080da..67f9f16 100644
--- a/docs/source/manuals/plotting.md
+++ b/docs/source/manuals/plotting.md
@@ -2,6 +2,9 @@
 
 The module _qp.plot_ implements some basic plotting functions
 that can help analyse the performance of a quantification method.
+See the provided 
+[code example](https://github.com/HLT-ISTI/QuaPy/blob/master/examples/13.plotting.py) 
+for a full example. 
 
 All plotting functions receive as inputs the outcomes of 
 some experiments and include, for each experiment, 
@@ -77,7 +80,7 @@ def gen_data():
     method_names, true_prevs, estim_prevs, tr_prevs = [], [], [], []
 
     for method_name, model in models():
-        model.fit(train)
+        model.fit(*train.Xy)
         true_prev, estim_prev = qp.evaluation.prediction(model, APP(test, repeats=100, random_state=0))
 
         method_names.append(method_name)
@@ -171,7 +174,7 @@ def gen_data():
         training_size = 5000
         # since the problem is binary, it suffices to specify the negative prevalence, since the positive is constrained
         train_sample = train.sampling(training_size, 1-training_prevalence)
-        model.fit(train_sample)
+        model.fit(*train_sample.Xy)
         true_prev, estim_prev = qp.evaluation.prediction(model, APP(test, repeats=100, random_state=0))
         method_name = 'CC$_{'+f'{int(100*training_prevalence)}' + '\%}$'
         method_data.append((method_name, true_prev, estim_prev, train_sample.prevalence()))
diff --git a/docs/source/manuals/protocols.md b/docs/source/manuals/protocols.md
index 1d6193e..17bc41a 100644
--- a/docs/source/manuals/protocols.md
+++ b/docs/source/manuals/protocols.md
@@ -1,7 +1,5 @@
 # Protocols
 
-_New in v0.1.7!_
-
 Quantification methods are expected to behave robustly in the presence of 
 shift. For this reason, quantification methods need to be confronted with
 samples exhibiting widely varying amounts of shift. 
@@ -106,15 +104,16 @@ train, test = qp.datasets.fetch_reviews('imdb', tfidf=True, min_df=5).train_test
 
 # model selection
 train, val = train.split_stratified(train_prop=0.75)
+Xtr, ytr = train.Xy
 quantifier = qp.model_selection.GridSearchQ(
     quantifier, 
     param_grid={'classifier__C': np.logspace(-2, 2, 5)}, 
     protocol=APP(val)  # <- this is the protocol we use for generating validation samples
-).fit(train)
+).fit(Xtr, ytr)
 
 # default values are n_prevalences=21, repeats=10, random_state=0; this is equialent to:
 # val_app = APP(val, n_prevalences=21, repeats=10, random_state=0)
-# quantifier = GridSearchQ(quantifier, param_grid, protocol=val_app).fit(train)
+# quantifier = GridSearchQ(quantifier, param_grid, protocol=val_app).fit(Xtr, ytr)
 
 # evaluation with APP
 mae = qp.evaluation.evaluate(quantifier, protocol=APP(test), error_metric='mae')