Adapt examples; remaining: example 4-onwards not working: 15 (qunfold) Solve the warnings issue; right now there is a warning ignore in method/__init__.py: Add 'platt' to calib options in EMQ? Allow n_prevpoints in APP to be specified by a user-defined grid? Update READMEs, wiki, & examples for new fit-predict interface Add the fix suggested by Alexander: For a more general application, I would maybe first establish a per-class threshold value of plausible prevalence based on the number of actual positives and the required sample size; e.g., for sample_size=100 and actual positives [10, 100, 500] -> [0.1, 1.0, 1.0], meaning that class 0 can be sampled at most at 0.1 prevalence, while the others can be sampled up to 1. prevalence. Then, when a prevalence value is requested, e.g., [0.33, 0.33, 0.33], we may either clip each value and normalize (as you suggest for the extreme case, e.g., [0.1, 0.33, 0.33]/sum) or scale each value by per-class thresholds, i.e., [0.33*0.1, 0.33*1, 0.33*1]/sum. - This affects LabelledCollection - This functionality should be accessible via sampling protocols and evaluation functions Solve the pre-trained classifier issues. An example is the coptic-codes script I did, which needed a mock_lr to work for having access to classes_; think also the case in which the precomputed outputs are already generated as in the unifying problems code. Para quitar el labelledcollection de los métodos: - El follón viene por la semántica confusa de fit en agregativos, que recibe 3 parámetros: - data: LabelledCollection, que puede ser: - el training set si hay que entrenar el clasificador - None si no hay que entregar el clasificador - el validation, que entra en conflicto con val_split, si no hay que entrenar clasificador - fit_classifier: dice si hay que entrenar el clasificador o no, y estos cambia la semántica de los otros - val_split: que puede ser: - un número: el número de kfcv, lo cual implica fit_classifier=True y data=todo el training set - una fración en [0,1]: que indica la parte que usamos para validation; implica fit_classifier=True y data=train+val - un labelled collection: el conjunto de validación específico; no implica fit_classifier=True ni False - La forma de quitar la dependencia de los métodos con LabelledCollection debería ser así: - En el constructor se dice si el clasificador que se recibe por parámetro hay que entrenarlo o ya está entrenado; es decir, hay un fit_classifier=True o False. - fit_classifier=True: - data en fit es todo el training incluyendo el validation y todo - val_split: - int: número de folds en kfcv - proporción en [0,1] - fit_classifier=False: - [TODO] document confidence in manuals - [TODO] Test the return_type="index" in protocols and finish the "distributing_samples.py" example - [TODO] Add EDy (an implementation is available at quantificationlib) - [TODO] add ensemble methods SC-MQ, MC-SQ, MC-MQ - [TODO] add HistNetQ - [TODO] add CDE-iteration and Bayes-CDE methods - [TODO] add Friedman's method and DeBias - [TODO] check ignore warning stuff check https://docs.python.org/3/library/warnings.html#temporarily-suppressing-warnings - [TODO] nmd and md are not selectable from qp.evaluation.evaluate as a string