doc/report.tm


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352

<TeXmacs|1.99.2>

<style|generic>

<\body>
  <doc-data|<doc-title|Taxi Destination Prediction Challenge<next-line>Winner
  Team's Report>|<doc-author|<author-data|<\author-affiliation>
    <em|Montr�al, July 2015>
  </author-affiliation>>>>

  <center|<tabular*|<tformat|<table|<row|<cell|<name|Alex
  Auvolat>>|<cell|<name|Alexandre De Br�bisson>>|<cell|<name|�tienne
  Simon>>>|<row|<cell|ENS Paris>|<cell|Universit� de Montr�al>|<cell|ENS
  Cachan>>|<row|<cell|France>|<cell|Qu�bec,
  Canada>|<cell|France>>|<row|<cell|<verbatim|alexis211@gmail.com>>|<cell|<verbatim|<strong|adbrebs@gmail.com>>>|<cell|<verbatim|esimon@esimon.eu>>>>>>>

  <section|Summary>

  Our model is based on a multi-layer perceptron (MLP), a simple feed-forward
  neural network architecture. Our MLP model is trained by stochastic
  gradient descent (SGD) on the training trajectories. The inputs to our MLP
  are the 5 first and 5 last positions of the known part of the trajectory,
  as well as embeddings for the context information (date, client and taxi
  identification). \ The embeddings are trained with SGD jointly with the MLP
  parameters. The MLP outputs probabilities for 3392 target points, and a
  mean is calculated to get a unique destination point as an output. We did
  no ensembling and used no external data.

  <section|Feature Selection/Extraction>

  We used a mean-shift algorithm on the destination points of all the
  training trajectories to extract 3392 classes for the destination point.
  These classes were used as a fixed output layer for the MLP architecture.

  We used the embedding method which is common in neural language modeling
  approaches (see [1]) to take the metainformation into account in our model.
  The following embeddings were used (listed with corresponding
  dimensionnality):

  <big-table|<tabular|<tformat|<table|<row|<cell|<tabular|<tformat|<cwith|1|1|1|-1|cell-bborder|1px>|<table|<row|<cell|<strong|Meta-data>>|<cell|<strong|Embedding
  Dimension>>|<cell|<strong|Number of classes>>>|<row|<cell|Unique caller
  number>|<cell|10>|<cell|57125>>|<row|<cell|Unique stand
  number>|<cell|10>|<cell|64>>|<row|<cell|Unique taxi
  number>|<cell|10>|<cell|448>>|<row|<cell|Week of
  year>|<cell|10>|<cell|54>>|<row|<cell|Day of
  week>|<cell|10>|<cell|7>>|<row|<cell|1/4 of hour of the
  day>|<cell|10>|<cell|96>>|<row|<cell|Day type (invalid
  data)>|<cell|10>|<cell|3>>>>>>>>>>|Embeddings and corresponding dimensions
  used by the model>

  The embeddings were first initialized to random variables and were then let
  to evolve freely with SGD along with the other model parameters.

  The geographical data input in the network is a centered and normalized
  version of the GPS data points.

  We did no other preprocessing or feature selection.

  <section|Modelling Techniques and Training>

  Here is a brief description of the model we used:

  <\itemize>
    <item><strong|Input.> The input layer of the MLP is the concatenation of
    the following inputs:

    <\itemize>
      <item>Five first and five last points of the known part of the
      trajectory.

      <item>Embeddings for all the metadata.
    </itemize>

    <item><strong|Hidden layer.> We use a single hidden layer MLP. The hidden
    layer is of size 500, and the activation function is a Rectifier Linear
    Unit (ie <math|f<around*|(|x|)>=max<around*|(|0,x|)>>). See [2] for more
    information about ReLUs.

    <item><strong|Output layer.> The output layer predicts a probability
    vector for the 3392 output classes that we obtained with our clustering
    preprocessing step. If <math|\<b-p\>> is the probability vector output by
    our MLP (output by a softmax layer) and <math|c<rsub|i>> is the centroid
    of cluster <math|i>, our prediciton is given by:

    <\eqnarray*>
      <tformat|<table|<row|<cell|<wide|y|^>>|<cell|=>|<cell|<big|sum><rsub|i>p<rsub|i>*c<rsub|i>>>>>
    </eqnarray*>

    Since <math|\<b-p\>> sums to one, this is a valid point on the map.

    <item><strong|Cost.> We directly train using an approximation of the mean
    Haversine Distance as a cost.

    <item><strong|SGD and optimization.> We used a minibatch size of 200. The
    optimization algorithm is simple SGD with a learning rate of 0.01 and a
    momentum of 0.9.

    <item><strong|Validation.> To generate our validation set, we tried to
    create a set that looked like the training set. For that we generated
    ``cuts'' from the training set, i.e. extracted all the taxi rides that
    were occuring at given times. The times we selected for our validation
    set are similar to those of the test set, only one year before.
  </itemize>

  <section|Code Description>

  Here is a brief description of the Python files in the archive:

  <\itemize>
    <item><verbatim|config/*.py> : configuration files for the different
    models we have experimented with

    The model which gets the best solution is
    <verbatim|mlp_tgtcls_1_cswdtx_alexandre.py>

    <item><verbatim|data/*.py> : files related to the data pipeline:

    <\itemize>
      <item><verbatim|__init__.py> contains some general statistics about the
      data

      <item><verbatim|csv_to_hdf5.py> : convert the CSV data file into an
      HDF5 file usable directly by Fuel

      <item><verbatim|hdf5.py> : utility functions for exploiting the HDF5
      file

      <item><verbatim|init_valid.py> : initializes the HDF5 file for the
      validation set

      <item><verbatim|make_valid_cut.py> : generate a validation set using a
      list of time cuts. Cut lists are stored in Python files in
      <verbatim|data/cuts/> (we used a single cut file)

      <item><verbatim|transformers.py> : Fuel pipeline for transforming the
      training dataset into structures usable by our model
    </itemize>

    <item><strong|<verbatim|data_analysis/*.py>> : scripts for various
    statistical analyses on the dataset

    <\itemize>
      <item><verbatim|cluster_arrival.py> : the script used to generate the
      mean-shift clustering of the destination points, producing the 3392
      target points
    </itemize>

    <item><verbatim|model/*.py> : source code for the various models we tried

    <\itemize>
      <item><verbatim|__init__.py> contains code common to all the models,
      including the code for embedding the metadata

      <item><verbatim|mlp.py> contains code common to all MLP models

      <item><verbatim|dest_mlp_tgtcls.py> containts code for our MLP
      destination prediction model using target points for the output layer
    </itemize>

    <item><verbatim|error.py> contains the functions for calculating the
    error based on the Haversine Distance

    <item><verbatim|ext_saveload.py> contains a Blocks extension for saving
    and reloading the model parameters so that training can be interrupted

    <item><verbatim|ext_test.py> contains a Blocks extension that runs the
    model on the test set and produces an output CSV submission file

    <item><verbatim|train.py> contains the main code for the training and
    testing
  </itemize>

  In the archive we have included only the files listed above, which are
  strictly necessary for reproducing our results. More files for the other
  models we have tried are available on GitHub at
  <hlink|https://github.com/adbrebs/taxi|><hlink||https://github.com/adbrebs/taxi>.

  <section|Dependencies>

  We used the following packages developped at the MILA lab:

  <\itemize>
    <item><strong|Thano.> A general GPU-accelerated python math library, with
    an interface similar to numpy (see [3, 4]).
    <hlink|http://deeplearning.net/software/theano/|>

    <item><strong|Blocks.> A deep-learning and neural network framework for
    Python based on Theano. <hlink|https://github.com/mila-udem/blocks|>

    <item><strong|Fuel.> A data pipelining framework for Blocks.
    <hlink|https://github.com/mila-udem/fuel|>
  </itemize>

  We also used the <verbatim|scikit-learn> Python library for their
  mean-shift clustering algorithm. <verbatim|numpy>, <verbatim|cPickle> and
  <verbatim|h5py> are also used at various places.

  <section|How To Generate The Solution>

  <\enumerate>
    <item>Set the <verbatim|TAXI_PATH> environment variable to the path of
    the folder containing the CSV files.

    <item>Run <verbatim|data/csv_to_hdf5.py> to generate the HDF5 file (which
    is generated in <verbatim|TAXI_PATH>, along the CSV files). This takes
    around 20 minutes on our machines.

    <item>Run <verbatim|data/init_valid.py> to initialize the validation set
    HDF5 file.

    <item>Run <verbatim|data/make_valid_cut.py test_times_0> to generate the
    validation set. This can take a few minutes.

    <item>Run <verbatim|data_analysis/cluster_arrival.py> to generate the
    arrival point clustering. This can take a few minutes.

    <item>Create a folder <verbatim|model_data> and a folder
    <verbatim|output> (next to the training script), which will recieve
    respectively a regular save of the model parameters and many submission
    files generated from the model at a regular interval.

    <item>Run <verbatim|./train.py dest_mlp_tgtcls_1_cswdtx_alexandre> to
    train the model. Output solutions are generated in <verbatim|output/>
    every 1000 iterations. Interrupt the model with three consecutive Ctrl+C
    at any times. The training script is set to stop training after 10 000
    000 iterations, but a result file produced after less than 2 000 000
    iterations is already the winning solution. The training is quite long
    though: we trained our model on a GeForce GTX 680 card and it took about
    an afternoon to generate the winning solution.

    When running the training script, set the following Theano flags
    environment variable to exploit GPU parallelism:

    <verbatim|THEANO_FLAGS=floatX=float32,device=gpu,optimizer=FAST_RUN>

    Theano is only compatible with CUDA, which requires an Nvidia GPUs.
    Training on the CPU is also possible but much slower.
  </enumerate>

  <section|Additional Comments and Observations>

  The training examples fed to the model are not full trajectories, since
  that would make no sense, but prefixes of those trajectories that are
  generated on-the-fly by a Fuel transformer, <verbatim|TaxiGenerateSplits>,
  whose code is available in <verbatim|data/transformers.py>. The data
  pipeline is as follows:

  <\itemize>
    <item>Select a random full trajectory from the dataset

    <item>Generate a maximum of 100 prefixes for that trajectory. If the
    trajectory is smaller than 100 data points, generate all possible
    prefixes. Otherwise, chose a random subset of prefixes. Keep the final
    destination somewhere as it is used as a target for the training.

    <item>Take only the 5 first and 5 last points of the trajectory.

    <item>At this points we have a stream of prefixes sucessively taken from
    different trajectories. We create batches of size 200 with the items of
    the previous stream, taken in the order in which they come. The prefixes
    generated from a single trajectory may end up in two sucessive batches,
    or all in a single batch.
  </itemize>

  <section|References>

  <\enumerate>
    <item><label|gs_cit0>Bengio, Y., Ducharme, R., Vincent, P., & Janvin, C.
    (2003). A neural probabilistic language model.
    <with|font-shape|italic|The Journal of Machine Learning Research>,
    <with|font-shape|italic|3>, 1137-1155.

    <item><label|gs_cit0>Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep
    sparse rectifier neural networks. In <with|font-shape|italic|International
    Conference on Artificial Intelligence and Statistics> (pp. 315-323).

    <item><label|gs_cit0>Bergstra, J., Bastien, F., Breuleux, O., Lamblin,
    P., Pascanu, R., Delalleau, O., ... & Bengio, Y. (2011). Theano: Deep
    learning on gpus with python. In <with|font-shape|italic|NIPS 2011,
    BigLearning Workshop, Granada, Spain>.

    <item><label|gs_cit0>Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J.,
    Goodfellow, I., Bergeron, A., ... & Bengio, Y. (2012). Theano: new
    features and speed improvements. <with|font-shape|italic|arXiv preprint
    arXiv:1211.5590>.
  </enumerate>
</body>

<\initial>
  <\collection>
    <associate|page-medium|paper>
    <associate|page-screen-margin|true>
  </collection>
</initial>

<\references>
  <\collection>
    <associate|auto-1|<tuple|1|1>>
    <associate|auto-10|<tuple|8|?>>
    <associate|auto-2|<tuple|2|1>>
    <associate|auto-3|<tuple|1|1>>
    <associate|auto-4|<tuple|3|2>>
    <associate|auto-5|<tuple|4|2>>
    <associate|auto-6|<tuple|5|3>>
    <associate|auto-7|<tuple|6|3>>
    <associate|auto-8|<tuple|7|4>>
    <associate|auto-9|<tuple|8|4>>
    <associate|gs_cit0|<tuple|4|4>>
  </collection>
</references>

<\auxiliary>
  <\collection>
    <\associate|table>
      <tuple|normal|Embeddings and corresponding dimensions used by the
      model|<pageref|auto-3>>
    </associate>
    <\associate|toc>
      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|1<space|2spc>Summary>
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-1><vspace|0.5fn>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|2<space|2spc>Feature
      Selection/Extraction> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-2><vspace|0.5fn>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|3<space|2spc>Modelling
      Techniques and Training> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-4><vspace|0.5fn>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|4<space|2spc>Code
      Description> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-5><vspace|0.5fn>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|5<space|2spc>Dependencies>
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-6><vspace|0.5fn>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|6<space|2spc>How
      To Generate The Solution> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-7><vspace|0.5fn>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|7<space|2spc>Additional
      Comments and Observations> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-8><vspace|0.5fn>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|8<space|2spc>References>
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-9><vspace|0.5fn>
    </associate>
  </collection>
</auxiliary>