GPU track instructions
Table of contents:
Introduction
Chapter 1: Running the GPU starter KIT on the server
Chapter 2: Explore the code
Chapter 3: Starting a server on the Amazon Web Services (AWS)
Chapter 4: Putting files on the server
-------------------- Introduction
At the hackathon, you will be able to login to our common server on AWS using a key pair that we will provide to work on a GPU. Chapter 1 describes how to connect to the server and run the code, chapter 2 how to explore the code.
If you would like to work on your own server (also after the hackathon), Chapter 3 describes how to run a server, Chapter 4 what files to put there to have the same set-up as on our server. You can do it before the hackathon.
If you do not want to do anything before the hackathon, you should at least read (obligatory) Theano tutorials from Chapter 2. Chapter 1 is prepared for the 22nd April, but you can read it earlier.
-------------------- Chapter 1: Running the GPU starter KIT on the server
Username is 'ubuntu', so you should login as ubuntu@DNS, authorizing using the key. You need to save your private key in a safe place on your computer. The key is called:
yourname.pem
In the console (to have a console in Windows, install git: http://git-scm.com/download/win, then right-click 'Git Bash').
> chmod 0400 yourname.pem
now it is ready to use to store it in your key ring do:
> ssh-add -K yourname.pem
Now you can do:
> ssh ubuntu@DNS
If you cannot add a key to your key ring, you can do:
> ssh -i yourname.pem ubuntu@DNS
1. Check the installation:
We will create a directory for you, called [mydir], move to your own subdirectory [mydir]:
> ls
You should see a directory [mydir]:
> cd [mydir]/gpu
> ls
You should see all the code pre-installed.
2. Run the code:
> python run.py
The results will end up in [mydir]/gpu/res/.
Go one level up:
> cd ..
The code also creates in the directory [mydir] a zip file ready to submit (it is a bundle of all the contents of the gpu subdirectory).
3. Make a submission:
Exit ssh and download the zip file
scp -i yourname.pem ubuntu@DNS:/home/ubuntu/[mydir]/automl_*.zip
or on Widows using WinsSCP or using:
scp -i yourname.pem ubuntu@DNS:/home/ubuntu/[mydir]/automl_*.zip ./
4. Submit it to http://codalab.org/AutoML
-------------------- Chapter 2: Explore the code
1. Obligatorily get familiar with Neural Networks in Theano, so you will be able to work during the workshop -- read at least these Theano tutorials (logistic regression, mlp):
http://deeplearning.net/tutorial/gettingstarted.html
http://deeplearning.net/tutorial/logreg.html
http://deeplearning.net/tutorial/mlp.html
Windows users: please also install git: http://git-scm.com/download/win
2. Check the performance when using different number of epochs (like 20..500, currently 50, edit gpu/lib/run_nn.py file, line 19)
3. In the provided NN, there are 20 and 10 units in the two hidden layers. Check other combinations (e.g. 50, 50), lines 79, 80.
4. Experiment with the learning rate, regularizations (line 15).
5. Experiment with all datasets separately (use different NNs for each dataset)
------------------- Chapter 3: Starting a server on the Amazon Web Services (AWS)
0. Here we describe how to run your own server to work on your own, also after the hackathon.
Introduction:
There are two ways to create a server on AWS: on-demand and through bidding (they are called spot instances).
- Spot instances' costs are much smaller and of course might vary, but since people are not interested in GPU instances, the price is almost constant, around 10 times lower than on-demand, so we will use this one, in the last months the price was below $2 per day.
- the drawback is that in theory Amazon might stop a spot instance if they don't have free ones for on-demand, but it shouldn't happen for the GPU ones.
1. Open an AWS account (free tier) http://aws.amazon.com/free/ and provide your credit/debit card number. Limited basic services are then free (account, storage, basic server). We will pay only for GPU server then.
2. Sign in to the AWS console page > EC2. In Instances>Spot Requests click 'Request Spot Instances'.
3. Now let's choose an OS and server: various operating systems are available, and servers with various capabilities, but we will choose Ubuntu with Theano and CUDA drivers pre-installed (hereafter "Theano image"), and a GPU server.
In the top-right corner, you must choose "N. California" region if it is not yet selected, because the Theano image is there.
Now we will find the Theano image. Click 'Community AMIs' on the left. In the search field type 'theano', press enter, select 'Theano - CUDA 7'. Click next, choose g2.2xlarge instance (a GPU one), click next.
4. The only thing to change in the form is the maximum price $: put 0.11. Click review and launch, later click launch.
There should be 2 warnings:
1) Warning 'improve your instances security' means that if you have a key you can login from everywhere, so you can limit IP addresses later (see AWS manual) if your data will be confidential.
2) Because we selected a paid GPU, it will also say that it is not included in the free tier.
To login, we need to select a key pair. If you use AWS for the first time, you will create a new one just now, then 'create spot instances'.
5. It will take around 5 minutes for starting the server (instance). Go to instances>instances, you can see the current state of your server, wait until it is 'running'. Click on an instance, you can see the address (public DNS).
Login to your server as in the chapter 1.
6. Python and theano are already there, let's install sklearn:
> sudo apt-get install build-essential python-dev python-setuptools \
python-numpy python-scipy \
libatlas-dev libatlas3gf-base
> pip install --user --install-option="--prefix=" -U scikit-learn
(will take 2 minutes)
-------------------- Chapter 4: Putting files on the server
0. You need to put files in the same places as we did.
1. Connect using sftp client, if you don't have any then use WinSCP (windows) or FileZilla (all OS) to have file explorer so it is convenient to upload datasets and files and download predictions.
You can get datasets from here (click participate>get data):
https://www.codalab.org/competitions/3311?secret_key=ae748108-0594-4bda-ad2c-7c61a38ab5fd
Instead sftp client, you may use wget for each of the 5 datasets to download the datasets to the current folder if you prefer to, e.g.:
> wget http://www.causality.inf.ethz.ch/AutoML/christine.zip
Put the 5 folders to the Data directory, so the path to the above dataset is:
/home/ubuntu/Data/christine/
2. The starter kit is located here: https://sites.google.com/a/chalearn.org/automl/hackathon/gpu-track-instructions/gpu.zip?attredirects=0&d=1
Put it in the 'gpu' folder.