cleaning readmes

2025-10-06 16:12:15 +02:00 · 2025-10-06 16:12:15 +02:00 · 7f698b511e
parent 02b6a0cb05
commit 7f698b511e
2 changed files with 3 additions and 39 deletions
--- a/CHANGE_LOG.txt
+++ b/CHANGE_LOG.txt
@ -1,8 +1,6 @@
 Change Log 0.2.0
 -----------------

-CLEAN TODO-FILE
-
 - Base code Refactor:
    - Removing coupling between LabelledCollection and quantification methods; the fit interface changes:
        def fit(data:LabelledCollection): -> def fit(X, y):
--- a/TODO.txt
+++ b/TODO.txt
@ -1,53 +1,19 @@
-Adapt examples; remaining: example 4-onwards
-not working: 15 (qunfold)
-
 Solve the warnings issue; right now there is a warning ignore in method/__init__.py:

 Add 'platt' to calib options in EMQ?

 Allow n_prevpoints in APP to be specified by a user-defined grid?

-Update READMEs, wiki, & examples for new fit-predict interface
-
-Add the fix suggested by Alexander:
-
-For a more general application, I would maybe first establish a per-class threshold value of plausible prevalence
+Add the fix suggested by Alexander?
+"For a more general application, I would maybe first establish a per-class threshold value of plausible prevalence
 based on the number of actual positives and the required sample size; e.g., for sample_size=100 and actual
 positives [10, 100, 500] -> [0.1, 1.0, 1.0], meaning that class 0 can be sampled at most at 0.1 prevalence, while
 the others can be sampled up to 1. prevalence. Then, when a prevalence value is requested, e.g., [0.33, 0.33, 0.33],
 we may either clip each value and normalize (as you suggest for the extreme case, e.g., [0.1, 0.33, 0.33]/sum) or
-scale each value by per-class thresholds, i.e., [0.33*0.1, 0.33*1, 0.33*1]/sum.
+scale each value by per-class thresholds, i.e., [0.33*0.1, 0.33*1, 0.33*1]/sum."
 - This affects LabelledCollection
 - This functionality should be accessible via sampling protocols and evaluation functions

-Solve the pre-trained classifier issues. An example is the coptic-codes script I did, which needed a mock_lr to
-work for having access to classes_; think also the case in which the precomputed outputs are already generated
-as in the unifying problems code.
-
-Para quitar el labelledcollection de los métodos:
-
- El follón viene por la semántica confusa de fit en agregativos, que recibe 3 parámetros:
-    - data: LabelledCollection, que puede ser:
-        - el training set si hay que entrenar el clasificador
-        - None si no hay que entregar el clasificador
-        - el validation, que entra en conflicto con val_split, si no hay que entrenar clasificador
-    - fit_classifier: dice si hay que entrenar el clasificador o no, y estos cambia la semántica de los otros
-    - val_split: que puede ser:
-        - un número: el número de kfcv, lo cual implica fit_classifier=True y data=todo el training set
-        - una fración en [0,1]: que indica la parte que usamos para validation; implica fit_classifier=True y data=train+val
-        - un labelled collection: el conjunto de validación específico; no implica fit_classifier=True ni False
- La forma de quitar la dependencia de los métodos con LabelledCollection debería ser así:
-    - En el constructor se dice si el clasificador que se recibe por parámetro hay que entrenarlo o ya está entrenado;
-        es decir, hay un fit_classifier=True o False.
-        - fit_classifier=True:
-            - data en fit es todo el training incluyendo el validation y todo
-            - val_split:
-                - int: número de folds en kfcv
-                - proporción en [0,1]
-        - fit_classifier=False:
-
-
-
 - [TODO] document confidence in manuals
 - [TODO] Test the return_type="index" in protocols and finish the "distributing_samples.py" example
 - [TODO] Add EDy (an implementation is available at quantificationlib)