At the bootcamp, you will be able to login to our common server on AWS using a key that we will provide to work on a GPU.
Chapter 1 describes how to connect to the server and run the code, chapter 2 how to explore the code.
Chapter 3 describes what files to put on your computer to get ready to start
(you should not need this if you use the servers we provide, they are pre-configured).
For convenience, you can add a key to your key ring:
LocalComputer> chmod 0400 hackathon.pem
now it is ready to use to store it in your key ring do:LocalComputer
> ssh-add -K hackathon.pem
Now you can do (remember to replace DNS with the proper IP):LocalComputer
> ssh ubuntu@YOUR_DNS_SERVER_ID
If you cannot add a key to your key ring or have any other problems, you can do just:LocalComputer
> ssh -i hackathon
EXAMPLE (hackathon.pem key in the current folder):
"ssh -i hackathon.pem email@example.com"
1. Check your GPU installation:
We installed everything for your:
You should see two directories Data and hackathon
GPU> ls Data
christine christine.zip jasmine jasmine.zip madeline madeline.zip philippine philippine.zip sylvine sylvine.zip
You should see the 5 datasets of Round 1 (both zipped and unzipped).
GPU> ls hackathon
You should see a directory gpu/
GPU> ls hackathon/gpu
gpu.zip lib metadata README.txt run.py
You should see all the code pre-installed.
You may want to save a copy of the original code for further reference.
2. Run the code:
GPU> python run.py
The results will end up in hackathon/gpu/res/.
For the .zip, go one level up:
GPU> cd ..
The code also creates in the directory hackathon a zip file ready to submit (it is a bundle of all the contents of the gpu subdirectory, its name look like automl_sample_submission_15-08-07-07-35.zip).
It will include the code as well. For submission, you have to keep all your files in the folder including the code.
3. Make a submission:
Download the zip file to your computer. Execute the scp command below.
LocalComputer> scp ubuntu@YOUR_DNS_SERVER_ID:/home/ubuntu/hackathon/[NAME_OF_RESULTS].zip ./
LocalComputer> scp -i [my_path]/X.pem ubuntu@YOUR_DNS_SERVER_ID:/home/ubuntu/hackathon/[NAME_OF_RESULTS].zip ./
*** WARNING! ****
SCP five details checklist -- HERE REMEMBER TO:
1. Open a new console on your local computer (do not type the command in the AWS server shell!)
2. Change [NAME_OF_RESULTS] to the exact name of your zip.
3. Provide the proper path, i.e. submititurte 'home/ubuntu/hackathon' to wherever you put your result zip file. You can type 'pwd' in the GPU server shell to see the current path.
4. provide a filename (also path) where is located the [my_path]/X.pem key in your local computer.
5. Change YOUR_DNS_SERVER_ID to the proper server address.
To upload a file from your computer to the server, you need to swap the source and destination, e.g:
scp -i X.pem /C/local/path/a.zip ubuntu@YOUR_DNS_SERVER_ID:/home/ubuntu/[mydir]/
MAKE SURE YOU HAVE PREDICTIONS FOR EACH OF THE 5 DATASETS IN THE /res FOLDER!!!
IF ANY IS MISSING, IT WILL TRY TO EXECUTE THE CODE AND WILL FAIL.
4. Submit results to Codalab
to Round 1 Stanford hackathon.
Go to the Participate tab, then to the link "View/Submit results". Always include a description with your zip file.
Refresh to check the status. When the submission status turn to "Finished" check your submission in the leaderboard by looking up the "Results" tab.
-------------------- Chapter 2: Exploring the code
1. Get familiar with Neural Networks in Theano
To able to work during the workshop read at least these Theano tutorials (logistic regression, mlp):
2. Get familiar with the code.
Your task is to make modifications to run_nn.py in the lib/ subdirectory of the gpu code
. Note: the code is already uploaded to your GPU server.
run_nn is called from run.py.
The is the main script. In run.py, you should not need to make changes above line 231:
# ================ @CODE SUBMISSION (SUBTITUTE YOUR CODE) =================
Most of the code should remain unchanged. In particular:
D = DataManager(basename, input_dir, replace_missing=True, filter_features=True, verbose=verbose)
Just uploads the data. Keep as is.
Anything you may need to modify is below line 280:
GPU = True
Note: to run the code on CPU, just change this to:
GPU = False
Leave everything that has to do with time budget. The GPU example uses just 2 cycles (in the loop while cycle <= 1:).
The first one (cycle = 0) calibrates the time budget. We use for most learning machines ensemble methods voting over n_estimators base estimators.
Cycle 0 computes how many estimators we have time to compute. Cycle 1 trains all the estimators we have time to compute. For Neural Networks,
this is translated to number of epochs.
For the GPU track we are only running binary classification examples from Round 1 of the AutoML challenge. So all you should care about is line 300:
Y = run_nn.fit_predict( D.data['X_train'], D.data['Y_train'], [D.data['X_valid'], D.data['X_test']], n_epochs = n_estimators/10)
Go to run_nn.py and note that there are default parameters that you can change:
batch_size = 100
There is a safety feature, the number of epochs is maxed to 50 (lines 18-19):
if n_epochs > 50:
n_epochs = 50
You can remove that if you know what you are doing.
Go to nn.py and find where the architecture of the neural network is defined.
Line 130 and following, 2 layers are defined.
self.hiddenLayer = HiddenLayer(
self.hiddenLayer2 = HiddenLayer(
Notice how the number of outputs n_out is identical to the number of inputs n_in in the next layer.
Notice that hidden1 and hidden2 are the number of hidden units supplied as arguments in the constructor of the NN object:
def __init__(self, rng, input, n_in, n_hidden1,n_hidden2, n_out):
1. Modify the default arguments, e.g. change the learning rate in run.py:
Y = run_nn.fit_predict( D.data['X_train'], D.data['Y_train'], [D.data['X_valid'], D.data['X_test']], n_epochs = n_estimators/10, learning_rate=0.1)
Similary, change the regularizations and the batch size.
2. Check the performance when using different number of epochs (like 20..500, currently 50, edit gpu/lib/run_nn.py file, line 19)
if n_epochs > 500:
n_epochs = 500
You can also remove these line and control the time budget from run.py:
If on line 112 debug_mode = 0, then the time budget used is that of the "info" file of the datasets.
If on line 112 debug_mode = 1, the maximum running time on line 120 is used:
max_time = 90 means that the code runs in less than 90 sec per dataset.
3. In the provided NN, there are 20 and 10 units in the two hidden layers. Check other combinations (e.g. 50, 50), lines 79, 80.
classifier = NN(
4. Modify the number of layers of the neural network NN in nn.py. Add for example one layer.
This means adding one argument to the constructor:
def __init__(self, rng, input, n_in, n_hidden1,n_hidden2, hidden3, n_out):
and one extra layer:
self.hiddenLayer3 = HiddenLayer(
You also need to modify the logistic regression layer: change 2 for 3:
self.logRegressionLayer = LogisticRegression(
Also add the extra layer information in the regularization terms L1 and L2:
+ (self.hiddenLayer3.W ** 2).sum()
Finally, in self.params, also add:
5. Experiment with all datasets separately (use different NNs for each dataset)