QuaPy/test.py

from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
import quapy as qp
import quapy.functional as F
import sys

#qp.datasets.fetch_reviews('hp')
#qp.datasets.fetch_twitter('sst')

#sys.exit()

SAMPLE_SIZE=500
binary = False
svmperf_home = './svm_perf_quantification'

if binary:
    dataset = qp.datasets.fetch_reviews('kindle', tfidf=True, min_df=5)

else:
    dataset = qp.datasets.fetch_twitter('semeval13', model_selection=False, min_df=10)
    dataset.training = dataset.training.sampling(SAMPLE_SIZE, 0.2, 0.5, 0.3)

print('dataset loaded')

# training a quantifier
learner = LogisticRegression()
# model = qp.method.aggregative.ClassifyAndCount(learner)
# model = qp.method.aggregative.AdjustedClassifyAndCount(learner)
# model = qp.method.aggregative.ProbabilisticClassifyAndCount(learner)
# model = qp.method.aggregative.ProbabilisticAdjustedClassifyAndCount(learner)
# model = qp.method.aggregative.ExpectationMaximizationQuantifier(learner)
# model = qp.method.aggregative.ExplicitLossMinimisationBinary(svmperf_home, loss='q', C=100)
model = qp.method.aggregative.SVMQ(svmperf_home, C=1)

if not binary:
    model = qp.method.aggregative.OneVsAll(model)

print('fitting model')
model.fit(dataset.training)


# estimating class prevalences
print('quantifying')
prevalences_estim = model.quantify(dataset.test.instances)
prevalences_true  = dataset.test.prevalence()

# evaluation (one single prediction)
error = qp.error.mae(prevalences_true, prevalences_estim)

print(f'method {model.__class__.__name__}')

print(f'Evaluation in test (1 eval)')
print(f'true prevalence {F.strprev(prevalences_true)}')
print(f'estim prevalence {F.strprev(prevalences_estim)}')
print(f'mae={error:.3f}')


max_evaluations = 5000
n_prevpoints = F.get_nprevpoints_approximation(combinations_budget=max_evaluations, n_classes=dataset.n_classes)
n_evaluations = F.num_prevalence_combinations(n_prevpoints, dataset.n_classes)
print(f'the prevalence interval [0,1] will be split in {n_prevpoints} prevalence points for each class, so that\n'
      f'the requested maximum number of sample evaluations ({max_evaluations}) is not exceeded.\n'
      f'For the {dataset.n_classes} classes this dataset has, this will yield a total of {n_evaluations} evaluations.')

true_prev, estim_prev = qp.evaluation.artificial_sampling_prediction(model, dataset.test, SAMPLE_SIZE, n_prevpoints)

qp.error.SAMPLE_SIZE = SAMPLE_SIZE
print(f'Evaluation according to the artificial sampling protocol ({len(true_prev)} evals)')
for error in qp.error.QUANTIFICATION_ERROR:
    score = error(true_prev, estim_prev)
    print(f'{error.__name__}={score:.5f}')
evaluation by artificial prevalence sampling added. New methods added. New util functions added to quapy.functional and quapy.utils 2020-12-10 19:04:33 +01:00			`from sklearn.linear_model import LogisticRegression`
			`from sklearn.svm import LinearSVC`
			`import quapy as qp`
			`import quapy.functional as F`
dataset fetch for polarity reviews (hp, kindle, imdb) and twitter sentiment (11 datasets) added 2020-12-14 18:36:19 +01:00			`import sys`
evaluation by artificial prevalence sampling added. New methods added. New util functions added to quapy.functional and quapy.utils 2020-12-10 19:04:33 +01:00
dataset fetch for polarity reviews (hp, kindle, imdb) and twitter sentiment (11 datasets) added 2020-12-14 18:36:19 +01:00			`#qp.datasets.fetch_reviews('hp')`
			`#qp.datasets.fetch_twitter('sst')`

			`#sys.exit()`
merged 2020-12-10 19:08:22 +01:00
evaluation by artificial prevalence sampling added. New methods added. New util functions added to quapy.functional and quapy.utils 2020-12-10 19:04:33 +01:00			`SAMPLE_SIZE=500`
			`binary = False`
refactoring aggregative methods as methods that not only implement 'classify' and 'quantify', but that also implement 'aggregate' and that, by default, have a default implementation of 'quantify' as a pipeline of 'classify' and 'aggregate'; this helps speeding up evaluations A LOT, since the documents can be pre-classified and the samples are carried out across pre-classified values (labels, or posterior probabilities), and thus only aggregate is called many times within the artificial sampling protocol 2020-12-11 19:28:17 +01:00			`svmperf_home = './svm_perf_quantification'`
evaluation by artificial prevalence sampling added. New methods added. New util functions added to quapy.functional and quapy.utils 2020-12-10 19:04:33 +01:00
			`if binary:`
dataset fetch for polarity reviews (hp, kindle, imdb) and twitter sentiment (11 datasets) added 2020-12-14 18:36:19 +01:00			`dataset = qp.datasets.fetch_reviews('kindle', tfidf=True, min_df=5)`
evaluation by artificial prevalence sampling added. New methods added. New util functions added to quapy.functional and quapy.utils 2020-12-10 19:04:33 +01:00
			`else:`
dataset fetch for polarity reviews (hp, kindle, imdb) and twitter sentiment (11 datasets) added 2020-12-14 18:36:19 +01:00			`dataset = qp.datasets.fetch_twitter('semeval13', model_selection=False, min_df=10)`
refactoring aggregative methods as methods that not only implement 'classify' and 'quantify', but that also implement 'aggregate' and that, by default, have a default implementation of 'quantify' as a pipeline of 'classify' and 'aggregate'; this helps speeding up evaluations A LOT, since the documents can be pre-classified and the samples are carried out across pre-classified values (labels, or posterior probabilities), and thus only aggregate is called many times within the artificial sampling protocol 2020-12-11 19:28:17 +01:00			`dataset.training = dataset.training.sampling(SAMPLE_SIZE, 0.2, 0.5, 0.3)`

			`print('dataset loaded')`
evaluation by artificial prevalence sampling added. New methods added. New util functions added to quapy.functional and quapy.utils 2020-12-10 19:04:33 +01:00
			`# training a quantifier`
			`learner = LogisticRegression()`
refactoring aggregative methods as methods that not only implement 'classify' and 'quantify', but that also implement 'aggregate' and that, by default, have a default implementation of 'quantify' as a pipeline of 'classify' and 'aggregate'; this helps speeding up evaluations A LOT, since the documents can be pre-classified and the samples are carried out across pre-classified values (labels, or posterior probabilities), and thus only aggregate is called many times within the artificial sampling protocol 2020-12-11 19:28:17 +01:00			`# model = qp.method.aggregative.ClassifyAndCount(learner)`
evaluation by artificial prevalence sampling added. New methods added. New util functions added to quapy.functional and quapy.utils 2020-12-10 19:04:33 +01:00			`# model = qp.method.aggregative.AdjustedClassifyAndCount(learner)`
			`# model = qp.method.aggregative.ProbabilisticClassifyAndCount(learner)`
			`# model = qp.method.aggregative.ProbabilisticAdjustedClassifyAndCount(learner)`
			`# model = qp.method.aggregative.ExpectationMaximizationQuantifier(learner)`
refactoring aggregative methods as methods that not only implement 'classify' and 'quantify', but that also implement 'aggregate' and that, by default, have a default implementation of 'quantify' as a pipeline of 'classify' and 'aggregate'; this helps speeding up evaluations A LOT, since the documents can be pre-classified and the samples are carried out across pre-classified values (labels, or posterior probabilities), and thus only aggregate is called many times within the artificial sampling protocol 2020-12-11 19:28:17 +01:00			`# model = qp.method.aggregative.ExplicitLossMinimisationBinary(svmperf_home, loss='q', C=100)`
			`model = qp.method.aggregative.SVMQ(svmperf_home, C=1)`

			`if not binary:`
			`model = qp.method.aggregative.OneVsAll(model)`
aggregation methods updated 2020-12-09 12:46:50 +01:00
refactoring aggregative methods as methods that not only implement 'classify' and 'quantify', but that also implement 'aggregate' and that, by default, have a default implementation of 'quantify' as a pipeline of 'classify' and 'aggregate'; this helps speeding up evaluations A LOT, since the documents can be pre-classified and the samples are carried out across pre-classified values (labels, or posterior probabilities), and thus only aggregate is called many times within the artificial sampling protocol 2020-12-11 19:28:17 +01:00			`print('fitting model')`
evaluation by artificial prevalence sampling added. New methods added. New util functions added to quapy.functional and quapy.utils 2020-12-10 19:04:33 +01:00			`model.fit(dataset.training)`

refactoring aggregative methods as methods that not only implement 'classify' and 'quantify', but that also implement 'aggregate' and that, by default, have a default implementation of 'quantify' as a pipeline of 'classify' and 'aggregate'; this helps speeding up evaluations A LOT, since the documents can be pre-classified and the samples are carried out across pre-classified values (labels, or posterior probabilities), and thus only aggregate is called many times within the artificial sampling protocol 2020-12-11 19:28:17 +01:00
evaluation by artificial prevalence sampling added. New methods added. New util functions added to quapy.functional and quapy.utils 2020-12-10 19:04:33 +01:00			`# estimating class prevalences`
refactoring aggregative methods as methods that not only implement 'classify' and 'quantify', but that also implement 'aggregate' and that, by default, have a default implementation of 'quantify' as a pipeline of 'classify' and 'aggregate'; this helps speeding up evaluations A LOT, since the documents can be pre-classified and the samples are carried out across pre-classified values (labels, or posterior probabilities), and thus only aggregate is called many times within the artificial sampling protocol 2020-12-11 19:28:17 +01:00			`print('quantifying')`
evaluation by artificial prevalence sampling added. New methods added. New util functions added to quapy.functional and quapy.utils 2020-12-10 19:04:33 +01:00			`prevalences_estim = model.quantify(dataset.test.instances)`
			`prevalences_true = dataset.test.prevalence()`

			`# evaluation (one single prediction)`
			`error = qp.error.mae(prevalences_true, prevalences_estim)`

			`print(f'method {model.__class__.__name__}')`

			`print(f'Evaluation in test (1 eval)')`
			`print(f'true prevalence {F.strprev(prevalences_true)}')`
			`print(f'estim prevalence {F.strprev(prevalences_estim)}')`
			`print(f'mae={error:.3f}')`


refactoring aggregative methods as methods that not only implement 'classify' and 'quantify', but that also implement 'aggregate' and that, by default, have a default implementation of 'quantify' as a pipeline of 'classify' and 'aggregate'; this helps speeding up evaluations A LOT, since the documents can be pre-classified and the samples are carried out across pre-classified values (labels, or posterior probabilities), and thus only aggregate is called many times within the artificial sampling protocol 2020-12-11 19:28:17 +01:00			`max_evaluations = 5000`
			`n_prevpoints = F.get_nprevpoints_approximation(combinations_budget=max_evaluations, n_classes=dataset.n_classes)`
			`n_evaluations = F.num_prevalence_combinations(n_prevpoints, dataset.n_classes)`
dataset fetch for polarity reviews (hp, kindle, imdb) and twitter sentiment (11 datasets) added 2020-12-14 18:36:19 +01:00			`print(f'the prevalence interval [0,1] will be split in {n_prevpoints} prevalence points for each class, so that\n'`
			`f'the requested maximum number of sample evaluations ({max_evaluations}) is not exceeded.\n'`
refactoring aggregative methods as methods that not only implement 'classify' and 'quantify', but that also implement 'aggregate' and that, by default, have a default implementation of 'quantify' as a pipeline of 'classify' and 'aggregate'; this helps speeding up evaluations A LOT, since the documents can be pre-classified and the samples are carried out across pre-classified values (labels, or posterior probabilities), and thus only aggregate is called many times within the artificial sampling protocol 2020-12-11 19:28:17 +01:00			`f'For the {dataset.n_classes} classes this dataset has, this will yield a total of {n_evaluations} evaluations.')`

			`true_prev, estim_prev = qp.evaluation.artificial_sampling_prediction(model, dataset.test, SAMPLE_SIZE, n_prevpoints)`

			`qp.error.SAMPLE_SIZE = SAMPLE_SIZE`
evaluation by artificial prevalence sampling added. New methods added. New util functions added to quapy.functional and quapy.utils 2020-12-10 19:04:33 +01:00			`print(f'Evaluation according to the artificial sampling protocol ({len(true_prev)} evals)')`
			`for error in qp.error.QUANTIFICATION_ERROR:`
			`score = error(true_prev, estim_prev)`
			`print(f'{error.__name__}={score:.5f}')`
merged 2020-12-10 19:08:22 +01:00