Updated UCI binary notes
This commit is contained in:
parent
76b38cb81c
commit
daa275d325
|
@ -243,24 +243,15 @@ are summarized below.
|
||||||
| wine-q-white | 2 | 4898 | 11 | [0.335, 0.665] | dense |
|
| wine-q-white | 2 | 4898 | 11 | [0.335, 0.665] | dense |
|
||||||
| yeast | 2 | 1484 | 8 | [0.711, 0.289] | dense |
|
| yeast | 2 | 1484 | 8 | [0.711, 0.289] | dense |
|
||||||
|
|
||||||
### Issues:
|
#### Notes:
|
||||||
All datasets will be downloaded automatically the first time they are requested, and
|
All datasets will be downloaded automatically the first time they are requested, and
|
||||||
stored in the _quapy_data_ folder for faster further reuse.
|
stored in the _quapy_data_ folder for faster further reuse.
|
||||||
However, some datasets require special actions that at the moment are not fully
|
|
||||||
automated.
|
|
||||||
|
|
||||||
* Datasets with ids "ctg.1", "ctg.2", and "ctg.3" (_Cardiotocography Data Set_) load
|
However, notice that it is a good idea to ignore datasets:
|
||||||
an Excel file, which requires the user to install the _xlrd_ Python module in order
|
* _acute.a_ and _acute.b_: these are very easy and many classifiers would score 100% accuracy
|
||||||
to open it.
|
* _balance.2_: this is extremely difficult; probably there is some problem with this dataset,
|
||||||
* The dataset with id "pageblocks.5" (_Page Blocks Classification (5)_) needs to
|
the errors it tends to produce are orders of magnitude greater than for other datasets,
|
||||||
open a "unix compressed file" (extension .Z), which is not directly doable with
|
and this has a disproportionate impact in the average performance.
|
||||||
standard Pythons packages like gzip or zip. This file would need to be uncompressed using
|
|
||||||
OS-dependent software manually. Information on how to do it will be printed the first
|
|
||||||
time the dataset is invoked.
|
|
||||||
* It is a good idea to ignore datasets _acute.a_, _acute.b_ and _balance.2_, since the former two
|
|
||||||
are very easy (many classifiers would score 100% accuracy) while the latter is extremely difficult
|
|
||||||
(probably there is some problem with this dataset, the errors it tends to produce are orders of magnitude
|
|
||||||
greater than for other datasets, and this has a disproportionate impact in the average performance).
|
|
||||||
|
|
||||||
### Multiclass datasets
|
### Multiclass datasets
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue