From 7f698b511ecb3662a3ccff19e3257bbe2672922c Mon Sep 17 00:00:00 2001 From: Alejandro Moreo Date: Mon, 6 Oct 2025 16:12:15 +0200 Subject: [PATCH] cleaning readmes --- CHANGE_LOG.txt | 2 -- TODO.txt | 40 +++------------------------------------- 2 files changed, 3 insertions(+), 39 deletions(-) diff --git a/CHANGE_LOG.txt b/CHANGE_LOG.txt index a701186..0b7a18c 100644 --- a/CHANGE_LOG.txt +++ b/CHANGE_LOG.txt @@ -1,8 +1,6 @@ Change Log 0.2.0 ----------------- -CLEAN TODO-FILE - - Base code Refactor: - Removing coupling between LabelledCollection and quantification methods; the fit interface changes: def fit(data:LabelledCollection): -> def fit(X, y): diff --git a/TODO.txt b/TODO.txt index de40ed9..4b80a34 100644 --- a/TODO.txt +++ b/TODO.txt @@ -1,53 +1,19 @@ -Adapt examples; remaining: example 4-onwards -not working: 15 (qunfold) - Solve the warnings issue; right now there is a warning ignore in method/__init__.py: Add 'platt' to calib options in EMQ? Allow n_prevpoints in APP to be specified by a user-defined grid? -Update READMEs, wiki, & examples for new fit-predict interface - -Add the fix suggested by Alexander: - -For a more general application, I would maybe first establish a per-class threshold value of plausible prevalence +Add the fix suggested by Alexander? +"For a more general application, I would maybe first establish a per-class threshold value of plausible prevalence based on the number of actual positives and the required sample size; e.g., for sample_size=100 and actual positives [10, 100, 500] -> [0.1, 1.0, 1.0], meaning that class 0 can be sampled at most at 0.1 prevalence, while the others can be sampled up to 1. prevalence. Then, when a prevalence value is requested, e.g., [0.33, 0.33, 0.33], we may either clip each value and normalize (as you suggest for the extreme case, e.g., [0.1, 0.33, 0.33]/sum) or -scale each value by per-class thresholds, i.e., [0.33*0.1, 0.33*1, 0.33*1]/sum. +scale each value by per-class thresholds, i.e., [0.33*0.1, 0.33*1, 0.33*1]/sum." - This affects LabelledCollection - This functionality should be accessible via sampling protocols and evaluation functions -Solve the pre-trained classifier issues. An example is the coptic-codes script I did, which needed a mock_lr to -work for having access to classes_; think also the case in which the precomputed outputs are already generated -as in the unifying problems code. - -Para quitar el labelledcollection de los métodos: - -- El follón viene por la semántica confusa de fit en agregativos, que recibe 3 parámetros: - - data: LabelledCollection, que puede ser: - - el training set si hay que entrenar el clasificador - - None si no hay que entregar el clasificador - - el validation, que entra en conflicto con val_split, si no hay que entrenar clasificador - - fit_classifier: dice si hay que entrenar el clasificador o no, y estos cambia la semántica de los otros - - val_split: que puede ser: - - un número: el número de kfcv, lo cual implica fit_classifier=True y data=todo el training set - - una fración en [0,1]: que indica la parte que usamos para validation; implica fit_classifier=True y data=train+val - - un labelled collection: el conjunto de validación específico; no implica fit_classifier=True ni False -- La forma de quitar la dependencia de los métodos con LabelledCollection debería ser así: - - En el constructor se dice si el clasificador que se recibe por parámetro hay que entrenarlo o ya está entrenado; - es decir, hay un fit_classifier=True o False. - - fit_classifier=True: - - data en fit es todo el training incluyendo el validation y todo - - val_split: - - int: número de folds en kfcv - - proporción en [0,1] - - fit_classifier=False: - - - - [TODO] document confidence in manuals - [TODO] Test the return_type="index" in protocols and finish the "distributing_samples.py" example - [TODO] Add EDy (an implementation is available at quantificationlib)