From c1edd640d56b2fb9fa2a75d83dd39313ff1a58c8 Mon Sep 17 00:00:00 2001 From: AdeB Date: Sat, 11 Jul 2015 19:00:28 -0400 Subject: Fix a few typos in the report. --- doc/report.tm | 66 +++++++++++++++++++++++++++++++---------------------------- 1 file changed, 35 insertions(+), 31 deletions(-) (limited to 'doc') diff --git a/doc/report.tm b/doc/report.tm index 6165cd1..b199841 100644 --- a/doc/report.tm +++ b/doc/report.tm @@ -9,28 +9,27 @@ >>> >|>|>|>|>>|||>|||>|>|>>|>>>>>> - Our model is based on a multi-layer perceptron (MLP), a simple feed-forward - neural network architecture. Our MLP model is trained by stochastic - gradient descent (SGD) on the training trajectories. The inputs to our MLP - are the 5 first and 5 last positions of the known part of the trajectory, - as well as embeddings for the context information (date, client and taxi - identification). \ The embeddings are trained with SGD jointly with the MLP - parameters. The MLP outputs probabilities for 3392 target points, and a - mean is calculated to get a unique destination point as an output. We did - no ensembling and used no external data. + Our model is based on a multi-layer perceptron (MLP). Our MLP model is + trained by stochastic gradient descent (SGD) on the training trajectories. + The inputs of our MLP are the 5 first and 5 last positions of the known + part of the trajectory, as well as embeddings for the context information + (date, client and taxi identification). \ The embeddings are trained with + SGD jointly with the MLP parameters. The MLP outputs probabilities for 3392 + target points, and a mean is calculated to get a unique destination point + as an output. We did no ensembling and did not use any external data. We used a mean-shift algorithm on the destination points of all the training trajectories to extract 3392 classes for the destination point. - These classes were used as a fixed output layer for the MLP architecture. + These classes were used as a fixed softmax layer in the MLP architecture. We used the embedding method which is common in neural language modeling approaches (see [1]) to take the metainformation into account in our model. @@ -73,8 +72,7 @@ We use a single hidden layer MLP. The hidden layer is of size 500, and the activation function is a Rectifier Linear - Unit (ie =max>). See [2] for more - information about ReLUs. + Unit (ie =max>) [2]. The output layer predicts a probability vector for the 3392 output classes that we obtained with our clustering @@ -88,18 +86,22 @@ Since > sums to one, this is a valid point on the map. - We directly train using an approximation of the mean - Haversine Distance as a cost. + We directly train using an approximation + (Equirectangular projection) of the mean Haversine Distance as a cost. We used a minibatch size of 200. The - optimization algorithm is simple SGD with a learning rate of 0.01 and a - momentum of 0.9. + optimization algorithm is simple SGD with a fixed learning rate of 0.01 + and a momentum of 0.9. To generate our validation set, we tried to create a set that looked like the training set. For that we generated ``cuts'' from the training set, i.e. extracted all the taxi rides that were occuring at given times. The times we selected for our validation - set are similar to those of the test set, only one year before. + set are similar to those of the test set, only one year before: + + 1380616200, # 2013-10-01 + 08:301381167900, # 2013-10-07 17:451383364800, # + 2013-11-02 04:001387722600 \ # 2013-12-22 14:30> @@ -170,18 +172,17 @@ testing - In the archive we have included only the files listed above, which are - strictly necessary for reproducing our results. More files for the other - models we have tried are available on GitHub at - . + In the archive we have included only the files listed above, which are the + strict minimum to reproduce our results. More files for the other models we + tried are available on GitHub at . We used the following packages developped at the MILA lab: <\itemize> - A general GPU-accelerated python math library, with - an interface similar to numpy (see [3, 4]). + A general GPU-accelerated python math library, + with an interface similar to numpy (see [3, 4]). A deep-learning and neural network framework for @@ -215,7 +216,7 @@ arrival point clustering. This can take a few minutes. Create a folder and a folder - (next to the training script), which will recieve + (next to the training script), which will receive respectively a regular save of the model parameters and many submission files generated from the model at a regular interval. @@ -224,16 +225,16 @@ every 1000 iterations. Interrupt the model with three consecutive Ctrl+C at any times. The training script is set to stop training after 10 000 000 iterations, but a result file produced after less than 2 000 000 - iterations is already the winning solution. The training is quite long - though: we trained our model on a GeForce GTX 680 card and it took about - an afternoon to generate the winning solution. + iterations is already the winning solution. We trained our model on a + GeForce GTX 680 card and it took about an afternoon to generate the + winning solution. When running the training script, set the following Theano flags environment variable to exploit GPU parallelism: - Theano is only compatible with CUDA, which requires an Nvidia GPUs. + Theano is only compatible with CUDA, which requires an Nvidia GPU. Training on the CPU is also possible but much slower. @@ -296,15 +297,18 @@ <\references> <\collection> > - > + > > > - > + > > > > > > + > + > + > > -- cgit v1.2.3