Data

For your convenience, you can download from this page datasets that were released as part of past AutoML challenges. This does NOT include labels on the validation at test sets. The websites of the challenges remain open if you want to obtain you performances on validation or test data.

DATASETS AVAILABLE FOR DOWNLOAD:

Round 0 data:
Set 1: adult.zip -- 1.1MB
Set 2: cadata.zip -- 0.7 MB
Set 3: digits.zip -- 117.7 MB
Set 4: dorothea.zip -- 4.7 MB
Set 5: newsgroups.zip -- 6.4 MB
We released the validation data truth values at the end of round 0 for practice purposes: phase0_valid.zip
Round 1 data:
Set 1: christine.zip -- 19 MB
Set 2: jasmine.zip -- 224 KB
Set 3: philippine.zip - 12 MB
Set 4: madeline.zip -- 2.3 MB
Set 5: sylvine.zip -- 663 KB
Round 2 data:
Set 1: albert.zip -- 55 MB
Set 2: dilbert.zip -- 168 MB
Set 3: fabert.zip - 572 KB
Set 4: robert.zip -- 192 MB
Set 5: volkert.zip -- 19 MB
Round 3 data:
Set 1: alexis.zip -- 17.4 MB
Set 2: dionis.zip -- 53.6 MB
Set 3: grigoris.zip - 86.8 MB
Set 4: jannis.zip -- 20.6 MB
Set 5: wallis.zip -- 4.1 MB
Round 4 data:
Set 1: Evita -- 13.4 MB
Set 2: Flora -- 104 MB
Set 3: Helena — 9.9 MB
Set 4: Tania -- 62.8 MB
Set 5: Yolanda -- 175.2 MB

Round

Num

Name

Task

Metric

Time

Cnum

Cbal

Sparse

Missng

Catvar

Irrvar

Pte

Pva

Ptr

N

Ptr/N

0

1

ADULT

multilabel

F1

300

3

1

0.16

0.011

1

0.5

9768

4884

34190

24

1,424.58

0

2

CADATA

regression

R2

200

0

  NaN

0

0

0

0.5

10640

5000

5000

16

312.5

0

3

DIGITS

multiclass

BAC

300

10

1

0.42

0

0

0.5

35000

20000

15000

1568

9.57

0

4

DOROTHEA

binary

AUC

100

2

0.46

0.99

0

0

0.5

800

350

800

100000

0.01

0

5

NEWSGROUPS

multiclass

PAC

300

20

1

1

0

0

0

3755

1877

13142

61188

0.21

1

1

CHRISTINE

binary

BAC

1200

2

1

0.071

0

0

0.5

2084

834

5418

1636

3.31

1

2

JASMINE

binary

BAC

1200

2

1

0.78

0

0

0.5

1756

526

2984

144

20.72

1

3

MADELINE

binary

BAC

1200

2

1

1.2 E-06

0

0

0.92

3240

1080

3140

259

12.12

1

4

PHILIPPINE

binary

BAC

1200

2

1

0.0012

0

0

0.5

4664

1166

5832

308

18.94

1

5

SYLVINE

binary

BAC

1200

2

1

0.01

0

0

0.5

10244

5124

5124

20

256.2

2

1

ALBERT

binary

F1

1200

2

1

0.049

0.14

1

0.5

51048

25526

425240

78

5,451.79

2

2

DILBERT

multiclass

PAC

1200

5

1

0

0

0

0.16

9720

4860

10000

2000

5

2

3

FABERT

multiclass

PAC

1200

7

0.96

0.99

0

0

0.5

2354

1177

8237

800

10.3

2

4

ROBERT

multiclass

BAC

1200

10

1

0.01

0

0

0

5000

2000

10000

7200

1.39

2

5

VOLKERT

multiclass

PAC

1200

10

0.89

0.34

0

0

0

7000

3500

58310

180

323.94

3

1

ALEXIS

multilabel

AUC

1200

18

0.92

0.98

0

0

0

15569

7784

54491

5000

10.9

3

2

DIONIS

multiclass

BAC

1200

355

1

0.11

0

0

0

12000

6000

416188

60

6,936.47

3

3

GRIGORIS

multilabel

AUC

1200

91

0.87

1

0

0

0

9920

6486

45400

301561

0.15

3

4

JANNIS

multiclass

BAC

1200

4

0.8

7.3 E-05

0

0

0.5

9851

4926

83733

54

1,550.61

3

5

WALLIS

multiclass

AUC

1200

11

0.91

1

0

0

0

8196

4098

10000

193731

0.05

4

1

EVITA

binary

AUC

1200

2

0.21

0.91

0

0

0.46

14000

8000

20000

3000

6.67

4

2

FLORA

regression

ABS

1200

0

  NaN

0.99

0

0

0.25

2000

2000

15000

200000

0.08

4

3

HELENA

multiclass

BAC

1200

100

0.9

6 E-05

0

0

0

18628

9314

65196

27

2,414.67

4

4

TANIA

multilabel

PAC

1200

95

0.79

1

0

0

0

44635

22514

157599

47236

3.34

4

5

YOLANDA

regression

R2

1200

0

  NaN

1 E-07

0

0

0.1

30000

30000

400000

100

4000

5

1

ARTURO

multiclass

F1

1200

20

1

0.82

0

0

0.5

2733

1366

9565

400

23.91

5

2

CARLO

binary

PAC

1200

2

0.097

0.0027

0

0

0.5

10000

10000

50000

1070

46.73

5

3

MARCO

multilabel

AUC

1200

180

0.76

0.99

0

0

0

20482

20482

163860

15299

10.71

5

4

PABLO

regression

ABS

1200

0

  NaN

0.11

0

0

0.5

23565

23565

188524

120

1,571.03

5

5

WALDO

multiclass

BAC

1200

4

1

0.029

0

1

0.5

2430

2430

19439

270

72



AutoML2 - 2018 
(we are releasing public data sets only)


Feedback phase
Set 1: ada.zip -- 0.6MB
Set 2: arcene.zip -- 8.5 MB
Set 3: gina.zip -- 19.1 MB
Set 4: guillermo.zip -- 243.0 MB
Set 5: RL.zip -- 2.4 MB

Final  phase
Set 1: PM.zip 
Set 2: RH.zip 
Set 3: RI.zip 
Set 4: riccardo.zip -- 202.0 MB
Set 5: RM.zip