GPU track ICML

Table of contents:

Introduction

Chapter 1: Running the GPU starter KIT on the server

Chapter 2: Explore the code

Chapter 3: Putting files on the server

CODE: https://sites.google.com/a/chalearn.org/automl/hackathon/gpu-track-instructions/gpu.zip?attredirects=0&d=1

FAQ (typical problems):

1. I ran the code, but cannot see the predictions.

Answer: The predictions (a .zip containing the sample code + predictions for each of datasets, ready for a submission), is created a directory ABOVE the directory with code.

2. Where do I put the data folder?

Answer: Not in the folder with the code (since it is included as a part of the submission). Put it somewhere else (above the directory with code). We do not want you to upload all datasets to the platform in addition to the predictions.

3. What should the .zip contain (how does a good submission look like) ?

The .zip should contain: run.py, metadata, res/, lib/

-------------------- Introduction

At the hackathon, you will be able to login to our common server on AWS using a key pair that we will provide to work on a GPU. Chapter 1 describes how to connect to the server and run the code, chapter 2 how to explore the code.

Chapter 3 describes what files to put there to have the same set-up as on our server. You might also need to read this to know how to upload files to the server during the hackathon.

If you would like to work on your own server (also before/after the hackathon), we prepared the instructions:

https://sites.google.com/a/chalearn.org/automl/general-gpus-on-aws

If you work on your own server/computer with GPU, you need to put the code there and datasets in a directory outside your code. The program packs the directory with code and predictions together, creating a .zip that can be submitted to the competition on the codalab. The data should not be included. You will need to change a path to your data folder in the run.py.

-------------------- Chapter 1: Running the GPU starter KIT on the server

Username is 'ubuntu', so you should login as ubuntu@DNS, authorizing using the key. You need to save your private key in a safe place on your computer. The key is called:

X.pem

In the console (to have a console in Windows, install git: http://git-scm.com/download/win, then right-click 'Git Bash').

For convenience, you can add a key to your key ring:

> chmod 0400 X.pem

now it is ready to use to store it in your key ring do:

> ssh-add -K X.pem

Now you can do (remember to replace DNS with the proper IP):

> ssh ubuntu@DNS

If you cannot add a key to your key ring or have any other problems, you can do just:

> ssh -i X.pem ubuntu@DNS

EXAMPLE (hackathon.pem key in the current folder):

"ssh -i hackathon.pem ubuntu@ec2-52-2-192-148.us-west-2.compute.amazonaws.com"

1. Check the installation:

We will create a directory for you, called 'hackathon' (we refer to it as[mydir]), move to your own subdirectory [mydir]:

> ls

You should see a directory [mydir]/gpu/

You should see all the code pre-installed.

2. Run the code:

> python run.py

The results will end up in [mydir]/gpu/res/.

For the .zip, go one level up:

> cd ..

The code also creates in the directory [mydir] a zip file ready to submit (it is a bundle of all the contents of the gpu subdirectory).

It will include the code as well -- you have to keep all your files in the folder with code -- when you do modifications, we will be able to run your code to verify the resutls, whcih we can obtain from your submission.

3. Make a submission:

Download the zip file to your computer. Execute the scp command below.

scp -i [my_path]/X.pem ubuntu@DNS:/home/ubuntu/[mydir]/[NAME_OF_RESULTS].zip ./

*** WARNING! ****

SCP five details checklist -- HERE REMEMBER TO:

1. Open a new console on your local computer (do not type the command in the AWS server shell!)

2. Change [NAME_OF_RESULTS] to the exact name of your zip.

3. Provide the proper path after the 'home/ubuntu/[mydir]'. You can type 'pwd' in the server shell to see the current path.

4. provide a filename (also path) where is located the [my_path]/X.pem key

5. Change DNS to the proper server address.

To upload a file from your computer to the server, you need to swap the source and destination, e.g:

scp -i X.pem /C/local/path/a.zip ubuntu@DNS:/home/ubuntu/[mydir]/

ENSURE YOU HAVE PREDICTIONS FOR EACH OF THE 5 DATASETS IN THE /res FOLDER!!!

IF ANY IS MISSING, IT WILL TRY TO EXECUTE THE CODE AND WILL FAIL.

4. Submit .zip it to https://www.codalab.org/competitions/4191?secret_key=9c8a787d-c06d-4d3e-8961-1eade3e738e1

-------------------- Chapter 2: Explore the code

1. Obligatorily get familiar with Neural Networks in Theano, so you will be able to work during the workshop -- read at least these Theano tutorials (logistic regression, mlp):

http://deeplearning.net/tutorial/gettingstarted.html

http://deeplearning.net/tutorial/logreg.html

http://deeplearning.net/tutorial/mlp.html

Windows users: please also install git: http://git-scm.com/download/win

2. Check the performance when using different number of epochs (like 20..500, currently 50, edit gpu/lib/run_nn.py file, line 19)

3. In the provided NN, there are 20 and 10 units in the two hidden layers. Check other combinations (e.g. 50, 50), lines 79, 80.

4. Experiment with the learning rate, regularizations (line 15).

5. Experiment with all datasets separately (use different NNs for each dataset)

-------------------- Chapter 3: Putting files on the server

0. You need to put files in the same places as we did.

1. Connect using sftp client, if you don't have any then use WinSCP (windows) or FileZilla (all OS) to have file explorer so it is convenient to upload datasets and files and download predictions.

You can get datasets from here (click participate>get data):

https://www.codalab.org/competitions/4191?secret_key=9c8a787d-c06d-4d3e-8961-1eade3e738e1

Instead sftp client, you may use wget for each of the 5 datasets to download the datasets to the current folder if you prefer to, e.g.:

> wget http://www.causality.inf.ethz.ch/AutoML/christine.zip

Put the 5 folders to the Data directory, so the path to the above dataset is:

/home/ubuntu/Data/christine/

2. The starter kit is located here: https://sites.google.com/a/chalearn.org/automl/hackathon/gpu-track-instructions/gpu.zip?attredirects=0&d=1

Put it in the 'gpu' folder.