From aa894a3472e2ce0675d0991d7801aec721d3c021 Mon Sep 17 00:00:00 2001 From: Alejandro Moreo Date: Tue, 19 Mar 2024 15:01:42 +0100 Subject: [PATCH] merging PR; I have taken this opportunity to refactor some issues I didnt like, including the normalization of prevalence vectors, and improving the documentation here and there --- TODO.txt | 97 ----- docs/build/html/genindex.html | 30 +- docs/build/html/index.html | 12 +- docs/build/html/modules.html | 10 +- docs/build/html/objects.inv | Bin 3505 -> 3532 bytes docs/build/html/quapy.html | 293 +++++++++----- docs/build/html/quapy.method.html | 66 +-- docs/build/html/searchindex.js | 2 +- quapy/functional.py | 643 +++++++++++++++++------------- quapy/method/aggregative.py | 96 +++-- quapy/model_selection.py | 1 + 11 files changed, 682 insertions(+), 568 deletions(-) diff --git a/TODO.txt b/TODO.txt index 6547a5b..e69de29 100644 --- a/TODO.txt +++ b/TODO.txt @@ -1,97 +0,0 @@ -check sphinks doc for enumerations (for example, the doc for ACC) - -ensembles seem to be broken; they have an internal model selection which takes the parameters, but since quapy now - works with protocols it would need to know the validation set in order to pass something like - "protocol: APP(val, etc.)" -sample_size should not be mandatory when qp.environ['SAMPLE_SIZE'] has been specified -clean all the cumbersome methods that have to be implemented for new quantifiers (e.g., n_classes_ prop, etc.) -make truly parallel the GridSearchQ -make more examples in the "examples" directory -merge with master, because I had to fix some problems with QuaNet due to an issue notified via GitHub! -added cross_val_predict in qp.model_selection (i.e., a cross_val_predict for quantification) --would be nice to have - it parallelized - -check the OneVsAll module(s) - -check the set_params de neural.py, because the separation of estimator__ is not implemented; see also - __check_params_colision - -HDy can be customized so that the number of bins is specified, instead of explored within the fit method - -Packaging: -========================================== -Document methods with paper references -unit-tests -clean wiki_examples! - -Refactor: -========================================== -Unify ThresholdOptimization methods, as an extension of PACC (and not ACC), the fit methods are almost identical and - use a prob classifier (take into account that PACC uses pcc internally, whereas the threshold methods use cc - instead). The fit method of ACC and PACC has a block for estimating the validation estimates that should be unified - as well... -Refactor protocols. APP and NPP related functionalities are duplicated in functional, LabelledCollection, and evaluation - - -New features: -========================================== -Add "measures for evaluating ordinal"? -Add datasets for topic. -Do we want to cover cross-lingual quantification natively in QuaPy, or does it make more sense as an application on top? - -Current issues: -========================================== -Revise the class structure of quantification methods and the methods they inherit... There is some confusion regarding - methods isbinary, isprobabilistic, and the like. The attribute "learner_" in aggregative quantifiers is also - confusing, since there is a getter and a setter. -Remove the "deep" in get_params. There is no real compatibility with scikit-learn as for now. -SVMperf-based learners do not remove temp files in __del__? -In binary quantification (hp, kindle, imdb) we used F1 in the minority class (which in kindle and hp happens to be the -negative class). This is not covered in this new implementation, in which the binary case is not treated as such, but as -an instance of single-label with 2 labels. Check -Add automatic reindex of class labels in LabelledCollection (currently, class indexes should be ordered and with no gaps) -OVR I believe is currently tied to aggregative methods. We should provide a general interface also for general quantifiers -Currently, being "binary" only adds one checker; we should figure out how to impose the check to be automatically performed -Add random seed management to support replicability (see temp_seed in util.py). -GridSearchQ is not trully parallelized. It only parallelizes on the predictions. -In the context of a quantifier (e.g., QuaNet or CC), the parameters of the learner should be prefixed with "estimator__", - in QuaNet this is resolved with a __check_params_colision, but this should be improved. It might be cumbersome to - impose the "estimator__" prefix for, e.g., quantifiers like CC though... This should be changed everywhere... -QuaNet needs refactoring. The base quantifiers ACC and PACC receive val_data with instances already transformed. This - issue is due to a bad design. - -Improvements: -========================================== -Explore the hyperparameter "number of bins" in HDy -Rename EMQ to SLD ? -Parallelize the kFCV in ACC and PACC? -Parallelize model selection trainings -We might want to think of (improving and) adding the class Tabular (it is defined and used on branch tweetsent). A more - recent version is in the project ql4facct. This class is meant to generate latex tables from results (highligting - best results, computing statistical tests, colouring cells, producing rankings, producing averages, etc.). Trying - to generate tables is typically a bad idea, but in this specific case we do have pretty good control of what an - experiment looks like. (Do we want to abstract experimental results? this could be useful not only for tables but - also for plots). -Add proper logging system. Currently we use print -It might be good to simplify the number of methods that have to be implemented for any new Quantifier. At the moment, - there are many functions like get_params, set_params, and, specially, @property classes_, which are cumbersome to - implement for quick experiments. A possible solution is to impose get_params and set_params only in cases in which - the model extends some "ModelSelectable" interface only. The classes_ should have a default implementation. - -Checks: -========================================== -How many times is the system of equations for ACC and PACC not solved? How many times is it clipped? Do they sum up - to one always? -Re-check how hyperparameters from the quantifier and hyperparameters from the classifier (in aggregative quantifiers) - is handled. In scikit-learn the hyperparameters from a wrapper method are indicated directly whereas the hyperparams - from the internal learner are prefixed with "estimator__". In QuaPy, combinations having to do with the classifier - can be computed at the begining, and then in an internal loop the hyperparams of the quantifier can be explored, - passing fit_learner=False. -Re-check Ensembles. As for now, they are strongly tied to aggregative quantifiers. -Re-think the environment variables. Maybe add new ones (like, for example, parameters for the plots) -Do we want to wrap prevalences (currently simple np.ndarray) as a class? This might be convenient for some interfaces - (e.g., for specifying artificial prevalences in samplings, for printing them -- currently supported through - F.strprev(), etc.). This might however add some overload, and prevent/difficult post processing with numpy. -Would be nice to get a better integration with sklearn. - - diff --git a/docs/build/html/genindex.html b/docs/build/html/genindex.html index 9b2dce2..0099e44 100644 --- a/docs/build/html/genindex.html +++ b/docs/build/html/genindex.html @@ -116,8 +116,6 @@
  • acce() (in module quapy.error)
  • add_word() (quapy.data.preprocessing.IndexTransformer method) -
  • -
  • adjusted_quantification() (in module quapy.functional)
  • AdjustedClassifyAndCount (in module quapy.method.aggregative)
  • @@ -297,9 +295,7 @@ +
  • condsoftmax() (in module quapy.functional) +
  • ConfigStatus (class in quapy.model_selection)
  • counts() (quapy.data.base.LabelledCollection method) @@ -647,18 +645,20 @@

    L

    + - - +
    @@ -1225,8 +1227,12 @@
  • SMM (class in quapy.method.aggregative)
  • smooth() (in module quapy.error) +
  • +
  • softmax() (in module quapy.functional)
  • solve_adjustment() (in module quapy.functional) +
  • +
  • solve_adjustment_binary() (in module quapy.functional)
  • SOLVERS (quapy.method.aggregative.ACC attribute)
  • @@ -1265,6 +1271,8 @@
  • T50 (class in quapy.method._threshold_optim)
  • temp_seed() (in module quapy.util) +
  • +
  • ternary_search() (in module quapy.functional)
  • text2tfidf() (in module quapy.data.preprocessing)
  • diff --git a/docs/build/html/index.html b/docs/build/html/index.html index 7d09502..e3156f3 100644 --- a/docs/build/html/index.html +++ b/docs/build/html/index.html @@ -263,8 +263,8 @@
  • Submodules
  • quapy.method.aggregative module