Updated UCI binary notes

2024-07-02 16:33:27 +02:00 · 2024-07-02 16:33:27 +02:00 · daa275d325
parent 76b38cb81c
commit daa275d325
1 changed files with 6 additions and 15 deletions
--- a/docs/source/wiki_editable/Datasets.md
+++ b/docs/source/wiki_editable/Datasets.md
@ -243,24 +243,15 @@ are summarized below.
 | wine-q-white | 2 | 4898 | 11 | [0.335, 0.665] | dense |
 | yeast | 2 | 1484 | 8 | [0.711, 0.289] | dense |

-### Issues:
+#### Notes:
 All datasets will be downloaded automatically the first time they are requested, and
 stored in the _quapy_data_ folder for faster further reuse. 
-However, some datasets require special actions that at the moment are not fully
-automated.

-* Datasets with ids "ctg.1", "ctg.2", and "ctg.3" (_Cardiotocography Data Set_) load
-an Excel file, which requires the user to install the _xlrd_ Python module in order 
-to open it.
-* The dataset with id "pageblocks.5" (_Page Blocks Classification (5)_) needs to
-open a "unix compressed file" (extension .Z), which is not directly doable with
-standard Pythons packages like gzip or zip. This file would need to be uncompressed using
-OS-dependent software manually. Information on how to do it will be printed the first
-time the dataset is invoked. 
-* It is a good idea to ignore datasets _acute.a_, _acute.b_ and _balance.2_, since the former two
-are very easy (many classifiers would score 100% accuracy) while the latter is extremely difficult
-  (probably there is some problem with this dataset, the errors it tends to produce are orders of magnitude 
-greater than for other datasets, and this has a disproportionate impact in the average performance).
+However, notice that it is a good idea to ignore datasets:
+* _acute.a_ and _acute.b_: these are very easy and many classifiers would score 100% accuracy
+* _balance.2_: this is extremely difficult; probably there is some problem with this dataset, 
+the errors it tends to produce are orders of magnitude greater than for other datasets, 
+and this has a disproportionate impact in the average performance.

 ### Multiclass datasets