From c1edd640d56b2fb9fa2a75d83dd39313ff1a58c8 Mon Sep 17 00:00:00 2001
From: AdeB <adbrebs@gmail.com>
Date: Sat, 11 Jul 2015 19:00:28 -0400
Subject: Fix a few typos in the report.

---
 doc/report.tm | 66 +++++++++++++++++++++++++++++++----------------------------
 1 file changed, 35 insertions(+), 31 deletions(-)
diff --git a/doc/report.tm b/doc/report.tm
index 6165cd1..b199841 100644
--- a/doc/report.tm
+++ b/doc/report.tm
@@ -9,28 +9,27 @@
   </author-affiliation>>>>
 
   <center|<tabular*|<tformat|<table|<row|<cell|<name|Alex
-  Auvolat>>|<cell|<name|Alexandre De Brébisson>>|<cell|<name|Étienne
+  Auvolat>>|<cell|<name|Alexandre de Brébisson>>|<cell|<name|Étienne
   Simon>>>|<row|<cell|ENS Paris>|<cell|Université de Montréal>|<cell|ENS
   Cachan>>|<row|<cell|France>|<cell|Québec,
   Canada>|<cell|France>>|<row|<cell|<verbatim|alexis211@gmail.com>>|<cell|<verbatim|<strong|adbrebs@gmail.com>>>|<cell|<verbatim|esimon@esimon.eu>>>>>>>
 
   <section|Summary>
 
-  Our model is based on a multi-layer perceptron (MLP), a simple feed-forward
-  neural network architecture. Our MLP model is trained by stochastic
-  gradient descent (SGD) on the training trajectories. The inputs to our MLP
-  are the 5 first and 5 last positions of the known part of the trajectory,
-  as well as embeddings for the context information (date, client and taxi
-  identification). \ The embeddings are trained with SGD jointly with the MLP
-  parameters. The MLP outputs probabilities for 3392 target points, and a
-  mean is calculated to get a unique destination point as an output. We did
-  no ensembling and used no external data.
+  Our model is based on a multi-layer perceptron (MLP). Our MLP model is
+  trained by stochastic gradient descent (SGD) on the training trajectories.
+  The inputs of our MLP are the 5 first and 5 last positions of the known
+  part of the trajectory, as well as embeddings for the context information
+  (date, client and taxi identification). \ The embeddings are trained with
+  SGD jointly with the MLP parameters. The MLP outputs probabilities for 3392
+  target points, and a mean is calculated to get a unique destination point
+  as an output. We did no ensembling and did not use any external data.
 
   <section|Feature Selection/Extraction>
 
   We used a mean-shift algorithm on the destination points of all the
   training trajectories to extract 3392 classes for the destination point.
-  These classes were used as a fixed output layer for the MLP architecture.
+  These classes were used as a fixed softmax layer in the MLP architecture.
 
   We used the embedding method which is common in neural language modeling
   approaches (see [1]) to take the metainformation into account in our model.
@@ -73,8 +72,7 @@
 
     <item><strong|Hidden layer.> We use a single hidden layer MLP. The hidden
     layer is of size 500, and the activation function is a Rectifier Linear
-    Unit (ie <math|f<around*|(|x|)>=max<around*|(|0,x|)>>). See [2] for more
-    information about ReLUs.
+    Unit (ie <math|f<around*|(|x|)>=max<around*|(|0,x|)>>) [2].
 
     <item><strong|Output layer.> The output layer predicts a probability
     vector for the 3392 output classes that we obtained with our clustering
@@ -88,18 +86,22 @@
 
     Since <math|\<b-p\>> sums to one, this is a valid point on the map.
 
-    <item><strong|Cost.> We directly train using an approximation of the mean
-    Haversine Distance as a cost.
+    <item><strong|Cost.> We directly train using an approximation
+    (Equirectangular projection) of the mean Haversine Distance as a cost.
 
     <item><strong|SGD and optimization.> We used a minibatch size of 200. The
-    optimization algorithm is simple SGD with a learning rate of 0.01 and a
-    momentum of 0.9.
+    optimization algorithm is simple SGD with a fixed learning rate of 0.01
+    and a momentum of 0.9.
 
     <item><strong|Validation.> To generate our validation set, we tried to
     create a set that looked like the training set. For that we generated
     ``cuts'' from the training set, i.e. extracted all the taxi rides that
     were occuring at given times. The times we selected for our validation
-    set are similar to those of the test set, only one year before.
+    set are similar to those of the test set, only one year before:
+
+    <code|1376503200, # 2013-08-14 18:00<next-line>1380616200, # 2013-10-01
+    08:30<next-line>1381167900, # 2013-10-07 17:45<next-line>1383364800, #
+    2013-11-02 04:00<next-line>1387722600 \ # 2013-12-22 14:30>
   </itemize>
 
   <section|Code Description>
@@ -170,18 +172,17 @@
     testing
   </itemize>
 
-  In the archive we have included only the files listed above, which are
-  strictly necessary for reproducing our results. More files for the other
-  models we have tried are available on GitHub at
-  <hlink|https://github.com/adbrebs/taxi|><hlink||https://github.com/adbrebs/taxi>.
+  In the archive we have included only the files listed above, which are the
+  strict minimum to reproduce our results. More files for the other models we
+  tried are available on GitHub at <hlink|https://github.com/adbrebs/taxi|><hlink||https://github.com/adbrebs/taxi>.
 
   <section|Dependencies>
 
   We used the following packages developped at the MILA lab:
 
   <\itemize>
-    <item><strong|Thano.> A general GPU-accelerated python math library, with
-    an interface similar to numpy (see [3, 4]).
+    <item><strong|Theano.> A general GPU-accelerated python math library,
+    with an interface similar to numpy (see [3, 4]).
     <hlink|http://deeplearning.net/software/theano/|>
 
     <item><strong|Blocks.> A deep-learning and neural network framework for
@@ -215,7 +216,7 @@
     arrival point clustering. This can take a few minutes.
 
     <item>Create a folder <verbatim|model_data> and a folder
-    <verbatim|output> (next to the training script), which will recieve
+    <verbatim|output> (next to the training script), which will receive
     respectively a regular save of the model parameters and many submission
     files generated from the model at a regular interval.
 
@@ -224,16 +225,16 @@
     every 1000 iterations. Interrupt the model with three consecutive Ctrl+C
     at any times. The training script is set to stop training after 10 000
     000 iterations, but a result file produced after less than 2 000 000
-    iterations is already the winning solution. The training is quite long
-    though: we trained our model on a GeForce GTX 680 card and it took about
-    an afternoon to generate the winning solution.
+    iterations is already the winning solution. We trained our model on a
+    GeForce GTX 680 card and it took about an afternoon to generate the
+    winning solution.
 
     When running the training script, set the following Theano flags
     environment variable to exploit GPU parallelism:
 
     <verbatim|THEANO_FLAGS=floatX=float32,device=gpu,optimizer=FAST_RUN>
 
-    Theano is only compatible with CUDA, which requires an Nvidia GPUs.
+    Theano is only compatible with CUDA, which requires an Nvidia GPU.
     Training on the CPU is also possible but much slower.
   </enumerate>
 
@@ -296,15 +297,18 @@
 <\references>
   <\collection>
     <associate|auto-1|<tuple|1|1>>
-    <associate|auto-10|<tuple|8|?>>
+    <associate|auto-10|<tuple|5|?>>
     <associate|auto-2|<tuple|2|1>>
     <associate|auto-3|<tuple|1|1>>
-    <associate|auto-4|<tuple|3|2>>
+    <associate|auto-4|<tuple|3|1>>
     <associate|auto-5|<tuple|4|2>>
     <associate|auto-6|<tuple|5|3>>
     <associate|auto-7|<tuple|6|3>>
     <associate|auto-8|<tuple|7|4>>
     <associate|auto-9|<tuple|8|4>>
+    <associate|firstHeading|<tuple|1|?>>
+    <associate|footnote-1|<tuple|1|?>>
+    <associate|footnr-1|<tuple|1|?>>
     <associate|gs_cit0|<tuple|4|4>>
   </collection>
 </references>
-- 
cgit v1.2.3