- Things to try:
    - Why not optmize the calibration of the classifier, instead of the classifier as a component of the quantifier?
    - init chain helps? [seems irrelevant in MAPLS...]
    - Aitchison kernel is better?
    - other classifiers?
    - optimize classifier?
    - use all datasets?
    - improve KDE on wine-quality?
- Add other methods that natively provide uncertainty quantification methods?
    Ratio estimator
    Card & Smith
- MPIW (Mean Prediction Interval Width): is the average of the amplitudes (w/o aggregating coverage whatsoever)
- Implement Interval Score or Winkler Score
- analyze across shift
- add Bayesian EM:
    - https://github.com/ChangkunYe/MAPLS/blob/main/label_shift/mapls.py
    - take this opportunity to add RLLS:
        https://github.com/Angie-Liu/labelshift
        https://github.com/ChangkunYe/MAPLS/blob/main/label_shift/rlls.py
- add CIFAR10 and MNIST? Maybe consider also previously tested types of shift (tweak-one-out, etc.)? from RLLS paper
    - https://github.com/Angie-Liu/labelshift/tree/master
    - https://github.com/Angie-Liu/labelshift/blob/master/cifar10_for_labelshift.py
    - Note: MNIST is downloadable from https://archive.ics.uci.edu/dataset/683/mnist+database+of+handwritten+digits
    - Seem to be some pretrained models in:
        https://github.com/geifmany/cifar-vgg
        https://github.com/EN10/KerasMNIST
        https://github.com/tohinz/SVHN-Classifier
- consider prior knowledge in experiments:
    - One scenario in which our prior is uninformative (i.e., uniform)
    - One scenario in which our prior is wrong (e.g., alpha-prior = (3,2,1), protocol-prior=(1,1,5))
    - One scenario in which our prior is very good (e.g., alpha-prior = (3,2,1), protocol-prior=(3,2,1))
    - Do all my baseline methods come with the option to inform a prior?
- consider different bandwidths within the bayesian approach?
- how to improve the coverage (or how to increase the amplitude)?
    - Added temperature-calibration, improve things.
    - Is temperature-calibration actually not equivalent to using a larger bandwidth in the kernels?
- consider W as a measure of quantification error (the current e.g., w-CI is the winkler...)
- optimize also C and class_weight? [I don't think so, but could be done easily now]

- remove wikis from repo