<\body> Winner Team's Report>| >>> >|>|>>|||>|||>|>|>>|>>>>>> Our model is based on a multi-layer perceptron (MLP), a simple feed-forward neural network architecture. Our MLP model is trained by stochastic gradient descent (SGD) on the training trajectories. The inputs to our MLP are the 5 first and 5 last positions of the known part of the trajectory, as well as embeddings for the context information (date, client and taxi identification). \ The embeddings are trained with SGD jointly with the MLP parameters. The MLP outputs probabilities for 3392 target points, and a mean is calculated to get a unique destination point as an output. We did no ensembling and used no external data. We used a mean-shift algorithm on the destination points of all the training trajectories to extract 3392 classes for the destination point. These classes were used as a fixed output layer for the MLP architecture. We used the embedding method which is common in neural language modeling approaches (see [1]) to take the metainformation into account in our model. The following embeddings were used (listed with corresponding dimensionnality): |>|>|>>|||>|||>|||>|||>|||>|||>|||>>>>>>>>>|Embeddings and corresponding dimensions used by the model> The embeddings were first initialized to random variables and were then let to evolve freely with SGD along with the other model parameters. The geographical data input in the network is a centered and normalized version of the GPS data points. We did no other preprocessing or feature selection. Here is a brief description of the model we used: <\itemize> The input layer of the MLP is the concatenation of the following inputs: <\itemize> Five first and five last points of the known part of the trajectory. Embeddings for all the metadata. We use a single hidden layer MLP. The hidden layer is of size 500, and the activation function is a Rectifier Linear Unit (ie =max>). See [2] for more information about ReLUs. The output layer predicts a probability vector for the 3392 output classes that we obtained with our clustering preprocessing step. If > is the probability vector output by our MLP (output by a softmax layer) and > is the centroid of cluster , our prediciton is given by: <\eqnarray*> >||p*c>>>> Since > sums to one, this is a valid point on the map. We directly train using an approximation of the mean Haversine Distance as a cost. We used a minibatch size of 200. The optimization algorithm is simple SGD with a learning rate of 0.01 and a momentum of 0.9. To generate our validation set, we tried to create a set that looked like the training set. For that we generated ``cuts'' from the training set, i.e. extracted all the taxi rides that were occuring at given times. The times we selected for our validation set are similar to those of the test set, only one year before. Here is a brief description of the Python files in the archive: <\itemize> : configuration files for the different models we have experimented with The model which gets the best solution is : files related to the data pipeline: <\itemize> contains some general statistics about the data : convert the CSV data file into an HDF5 file usable directly by Fuel : utility functions for exploiting the HDF5 file : initializes the HDF5 file for the validation set : generate a validation set using a list of time cuts. Cut lists are stored in Python files in (we used a single cut file) : Fuel pipeline for transforming the training dataset into structures usable by our model > : scripts for various statistical analyses on the dataset <\itemize> : the script used to generate the mean-shift clustering of the destination points, producing the 3392 target points : source code for the various models we tried <\itemize> contains code common to all the models, including the code for embedding the metadata contains code common to all MLP models containts code for our MLP destination prediction model using target points for the output layer contains the functions for calculating the error based on the Haversine Distance contains a Blocks extension for saving and reloading the model parameters so that training can be interrupted contains a Blocks extension that runs the model on the test set and produces an output CSV submission file contains the main code for the training and testing In the archive we have included only the files listed above, which are strictly necessary for reproducing our results. More files for the other models we have tried are available on GitHub at . We used the following packages developped at the MILA lab: <\itemize> A general GPU-accelerated python math library, with an interface similar to numpy (see [3, 4]). A deep-learning and neural network framework for Python based on Theano. A data pipelining framework for Blocks. We also used the Python library for their mean-shift clustering algorithm. , and are also used at various places. <\enumerate> Set the environment variable to the path of the folder containing the CSV files. Run to generate the HDF5 file (which is generated in , along the CSV files). This takes around 20 minutes on our machines. Run to initialize the validation set HDF5 file. Run to generate the validation set. This can take a few minutes. Run to generate the arrival point clustering. This can take a few minutes. Create a folder and a folder (next to the training script), which will recieve respectively a regular save of the model parameters and many submission files generated from the model at a regular interval. Run to train the model. Output solutions are generated in every 1000 iterations. Interrupt the model with three consecutive Ctrl+C at any times. The training script is set to stop training after 10 000 000 iterations, but a result file produced after less than 2 000 000 iterations is already the winning solution. The training is quite long though: we trained our model on a GeForce GTX 680 card and it took about an afternoon to generate the winning solution. When running the training script, set the following Theano flags environment variable to exploit GPU parallelism: Theano is only compatible with CUDA, which requires an Nvidia GPUs. Training on the CPU is also possible but much slower. The training examples fed to the model are not full trajectories, since that would make no sense, but prefixes of those trajectories that are generated on-the-fly by a Fuel transformer, , whose code is available in . The data pipeline is as follows: <\itemize> Select a random full trajectory from the dataset Generate a maximum of 100 prefixes for that trajectory. If the trajectory is smaller than 100 data points, generate all possible prefixes. Otherwise, chose a random subset of prefixes. Keep the final destination somewhere as it is used as a target for the training. Take only the 5 first and 5 last points of the trajectory. At this points we have a stream of prefixes sucessively taken from different trajectories. We create batches of size 200 with the items of the previous stream, taken in the order in which they come. The prefixes generated from a single trajectory may end up in two sucessive batches, or all in a single batch. <\enumerate> Bengio, Y., Ducharme, R., Vincent, P., & Janvin, C. (2003). A neural probabilistic language model. , , 1137-1155. Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. In (pp. 315-323). Bergstra, J., Bastien, F., Breuleux, O., Lamblin, P., Pascanu, R., Delalleau, O., ... & Bengio, Y. (2011). Theano: Deep learning on gpus with python. In . Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I., Bergeron, A., ... & Bengio, Y. (2012). Theano: new features and speed improvements. . <\initial> <\collection> <\references> <\collection> > > > > > > > > > > > <\auxiliary> <\collection> <\associate|table> > <\associate|toc> |math-font-series||1Summary> |.>>>>|> |math-font-series||2Feature Selection/Extraction> |.>>>>|> |math-font-series||3Modelling Techniques and Training> |.>>>>|> |math-font-series||4Code Description> |.>>>>|> |math-font-series||5Dependencies> |.>>>>|> |math-font-series||6How To Generate The Solution> |.>>>>|> |math-font-series||7Additional Comments and Observations> |.>>>>|> |math-font-series||8References> |.>>>>|>