Data

Round 0 data:
Set 1: adult.zip -- 1.1MB
Set 2: cadata.zip -- 0.7 MB
Set 3: digits.zip -- 117.7 MB
Set 4: dorothea.zip -- 4.7 MB
Set 5: newsgroups.zip -- 6.4 MB
We released the validation data truth values at the end of round 0 for practice purposes: phase0_valid.zip
Round 1 data:
Set 1: christine.zip -- 19 MB
Set 2: jasmine.zip -- 224 KB
Set 3: philippine.zip - 12 MB
Set 4: madeline.zip -- 2.3 MB
Set 5: sylvine.zip -- 663 KB
Round 2 data:
Set 1: albert.zip -- 55 MB
Set 2: dilbert.zip -- 168 MB
Set 3: fabert.zip - 572 KB
Set 4: robert.zip -- 192 MB
Set 5: volkert.zip -- 19 MB
Round 3 data:
Set 1: alexis.zip -- 17.4 MB
Set 2: dionis.zip -- 53.6 MB
Set 3: grigoris.zip - 86.8 MB
Set 4: jannis.zip -- 20.6 MB
Set 5: wallis.zip -- 4.1 MB
Round 4 data:
Set 1: Evita -- 13.4 MB
Set 2: Flora -- 104 MB
Set 3: Helena — 9.9 MB
Set 4: Tania -- 62.8 MB
Set 5: Yolanda -- 175.2 MB

Round

Num

Name

Task

Metric

Time

Cnum

Cbal

Sparse

Missng

Catvar

Irrvar

Pte

Pva

Ptr

N

Ptr/N

0

1

ADULT

multilabel

F1

300

3

1

0.16

0.011

1

0.5

9768

4884

34190

24

1,424.58

0

2

CADATA

regression

R2

200

0

  NaN

0

0

0

0.5

10640

5000

5000

16

312.5

0

3

DIGITS

multiclass

BAC

300

10

1

0.42

0

0

0.5

35000

20000

15000

1568

9.57

0

4

DOROTHEA

binary

AUC

100

2

0.46

0.99

0

0

0.5

800

350

800

100000

0.01

0

5

NEWSGROUPS

multiclass

PAC

300

20

1

1

0

0

0

3755

1877

13142

61188

0.21

1

1

CHRISTINE

binary

BAC

1200

2

1

0.071

0

0

0.5

2084

834

5418

1636

3.31

1

2

JASMINE

binary

BAC

1200

2

1

0.78

0

0

0.5

1756

526

2984

144

20.72

1

3

MADELINE

binary

BAC

1200

2

1

1.2 E-06

0

0

0.92

3240

1080

3140

259

12.12

1

4

PHILIPPINE

binary

BAC

1200

2

1

0.0012

0

0

0.5

4664

1166

5832

308

18.94

1

5

SYLVINE

binary

BAC

1200

2

1

0.01

0

0

0.5

10244

5124

5124

20

256.2

2

1

ALBERT

binary

F1

1200

2

1

0.049

0.14

1

0.5

51048

25526

425240

78

5,451.79

2

2

DILBERT

multiclass

PAC

1200

5

1

0

0

0

0.16

9720

4860

10000

2000

5

2

3

FABERT

multiclass

PAC

1200

7

0.96

0.99

0

0

0.5

2354

1177

8237

800

10.3

2

4

ROBERT

multiclass

BAC

1200

10

1

0.01

0

0

0

5000

2000

10000

7200

1.39

2

5

VOLKERT

multiclass

PAC

1200

10

0.89

0.34

0

0

0

7000

3500

58310

180

323.94

3

1

ALEXIS

multilabel

AUC

1200

18

0.92

0.98

0

0

0

15569

7784

54491

5000

10.9

3

2

DIONIS

multiclass

BAC

1200

355

1

0.11

0

0

0

12000

6000

416188

60

6,936.47

3

3

GRIGORIS

multilabel

AUC

1200

91

0.87

1

0

0

0

9920

6486

45400

301561

0.15

3

4

JANNIS

multiclass

BAC

1200

4

0.8

7.3 E-05

0

0

0.5

9851

4926

83733

54

1,550.61

3

5

WALLIS

multiclass

AUC

1200

11

0.91

1

0

0

0

8196

4098

10000

193731

0.05

4

1

EVITA

binary

AUC

1200

2

0.21

0.91

0

0

0.46

14000

8000

20000

3000

6.67

4

2

FLORA

regression

ABS

1200

0

  NaN

0.99

0

0

0.25

2000

2000

15000

200000

0.08

4

3

HELENA

multiclass

BAC

1200

100

0.9

6 E-05

0

0

0

18628

9314

65196

27

2,414.67

4

4

TANIA

multilabel

PAC

1200

95

0.79

1

0

0

0

44635

22514

157599

47236

3.34

4

5

YOLANDA

regression

R2

1200

0

  NaN

1 E-07

0

0

0.1

30000

30000

400000

100

4000

5

1

ARTURO

multiclass

F1

1200

20

1

0.82

0

0

0.5

2733

1366

9565

400

23.91

5

2

CARLO

binary

PAC

1200

2

0.097

0.0027

0

0

0.5

10000

10000

50000

1070

46.73

5

3

MARCO

multilabel

AUC

1200

180

0.76

0.99

0

0

0

20482

20482

163860

15299

10.71

5

4

PABLO

regression

ABS

1200

0

  NaN

0.11

0

0

0.5

23565

23565

188524

120

1,571.03

5

5

WALDO

multiclass

BAC

1200

4

1

0.029

0

1

0.5

2430

2430

19439

270

72