Add prepare.sh to prepare the kaggle data

author: Étienne Simon <esimon@esimon.eu> 2015-07-14 07:53:03 -0400
committer: Étienne Simon <esimon@esimon.eu> 2015-07-14 07:53:03 -0400
commit: c97af300b17ac042c52cfc54f43d4f01fd61fbe9 (patch)
tree: ad4e847a8942f2b6e120e7f811d472b93d7766cf /README.md
parent: dc430951d6cb660ab804c7e6250aea1acc2dcd9d (diff)
download: taxi-c97af300b17ac042c52cfc54f43d4f01fd61fbe9.tar.gz
taxi-c97af300b17ac042c52cfc54f43d4f01fd61fbe9.zip
1 files changed, 4 insertions, 2 deletions
diff --git a/README.md b/README.md
index b35f35f..0aa5f99 100644
--- a/README.md
+++ b/README.md
@@ -38,10 +38,12 @@ Here is a brief description of the Python files in the archive:
 * `train.py` contains the main code for the training and testing
   
 ## How to reproduce the winning results?
+
+There is an helper script `prepare.sh` which might helps you (by performing step 1-6 and some other checks), but if you encounter an error, the script will re-execute all the steps from the beginning (before the actual training, step 2, 4 and 5 are quite long).
   
 1. Set the `TAXI_PATH` environment variable to the path of the folder containing the CSV files.
-2. Run `data/csv_to_hdf5.py` to generate the HDF5 file (which is generated in `TAXI_PATH`, along the CSV files). This takes around 20 minutes on our machines.
-3. Run `data/init_valid.py` to initialize the validation set HDF5 file.
+2. Run `data/csv_to_hdf5.py "$TAXI_PATH" "$TAXI_PATH/data.hdf5"` to generate the HDF5 file (which is generated in `TAXI_PATH`, along the CSV files). This takes around 20 minutes on our machines.
+3. Run `data/init_valid.py valid.hdf5` to initialize the validation set HDF5 file.
 4. Run `data/make_valid_cut.py test_times_0` to generate the validation set. This can take a few minutes.
 5. Run `data_analysis/cluster_arrival.py` to generate the arrival point clustering. This can take a few minutes.
 6. Create a folder `model_data` and a folder `output` (next to the training script), which will receive respectively a regular save of the model parameters and many submission files generated from the model at a regular interval.
author	Étienne Simon <esimon@esimon.eu>	2015-07-14 07:53:03 -0400
committer	Étienne Simon <esimon@esimon.eu>	2015-07-14 07:53:03 -0400
commit	c97af300b17ac042c52cfc54f43d4f01fd61fbe9 (patch)
tree	ad4e847a8942f2b6e120e7f811d472b93d7766cf /README.md
parent	dc430951d6cb660ab804c7e6250aea1acc2dcd9d (diff)
download	taxi-c97af300b17ac042c52cfc54f43d4f01fd61fbe9.tar.gz taxi-c97af300b17ac042c52cfc54f43d4f01fd61fbe9.zip