From 9729e8f0ac43516c2c7d25f7e51c46df0be27ab6 Mon Sep 17 00:00:00 2001
From: Alex Auvolat <alex@adnab.me>
Date: Tue, 26 Sep 2023 13:12:21 +0200
Subject: write why this method

---
 README.md | 130 ++++++++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 102 insertions(+), 28 deletions(-)

diff --git a/README.md b/README.md
index 1ef9c4d..daf6fb9 100644
--- a/README.md
+++ b/README.md
@@ -1,28 +1,100 @@
 # Datagengo
 
-Datagengo (データ言語) is a new (experimental) method for learning japanese kanji.
+Datagengo (データ言語) is a new (experimental) method for learning Japanese kanji.
 
 Datagengo is an algorimically-generated list of lessons, each containing a batch of 20 kanji and just enough example sentences to learn those kanji in context.
-The crux of the method consists in memorizing the example sentences and writing them down repeatedly on paper.
+The core of the method consists in memorizing the example sentences and writing them down repeatedly on paper.
 
 Lessons are in increasing difficulty according to JLPT levels and school grade indicated in [KANJIDIC2](http://www.edrdg.org/wiki/index.php/KANJIDIC_Project)
 (the JLPT levels used are the old levels N4-N1 and not the new levels, see [this document](https://jlpt.jp/e/reference/pdf/guide2011_e_02.pdf) for correspondence).
 Example sentences are sourced from the [Tanaka corpus](http://www.edrdg.org/wiki/index.php/Tanaka_Corpus).
 
+## Why this method?
+
+I'm putting this section first to make it visible, but you might want to skim
+[how to use Datagengo](#how_to_use_datagengo) first.
+
+I had previous experience with learning kanji using books, starting with
+Heisig's *Remembering the Kanji* (RTK), and spaced repetition systems (SRS)
+such as Anki or WaniKani.  I started with RTK, which I think was truly usefull
+to me as it gave me a sense of how kanji are constructed and therefore how to
+apprehend them, but I never completed the book.  I went on and started using an
+Anki deck that contained all the RTK kanji, which I didn't use for very long. A
+bit later I started WaniKani and went through a bunch of that, but I still
+stopped at some point, partly by lack of motivation and partly by frustration
+that this was going nowhere.
+
+Here are my main frustrations with SRS methods:
+
+
+- The methods I used were based on individual kanji or word, and did not
+  provide real-world context in which those kanji or words might be used.
+
+- Items (kanji or words) are taken in isolation and not grouped by level or by
+  lesson, which means that the full effort of memorization has to be made for
+  each item independently.
+
+- As a consequence of the lack of context and logical grouping, I found it very
+  easy to confuse different kanji or different words.
+
+- I was always doubtful of "spaced repetition with increasing intervals", it
+  feels way less efficient to me than consistently repeating something for some
+  time and then assuming that it is definitively learnt. In my head, "assuming
+  something is learnt", after having spent a definite amount of effort on it and
+  having decided it was over with, helps create a category of things I'm supposed to
+  know, which in turn helps cement that knowledge.
+
+During my multi-year break from Japanese (something like 2016-2023), I pondered
+this question occasionally and started thinking about a new method, that
+eventually became Datagengo.
+
+The basic idea behind Datagengo is to add as much context (explicit or implicit) with the
+learning of each kanji, so that all of these contextual clues can be used
+when recalling them.
+
+Here is what Datagengo does, and why I expect it to work much better, at least for me:
+
+- Datagengo exploits the Tanaka corpus to provide in-context example sentences
+  for all of the learnt kanji.
+
+- Datagengo tries to help you learn all the kanji using as few example
+  sentences as possible, to be efficient.
+
+- Datagengo requires the learner to write by hand, which is in my experience the
+  best way to learn difficult things.
+
+- Lessons, or "batches", are logical units that somewhat "work together" (even
+  if the sentences or kanji have nothing to do with one another), helping to cement
+  that entire unit of knowledge in one go instead of lots of independent
+  efforts on tiny things.
+
+- Once a lesson has been studied around a dozen times, it becomes very easy to
+  recite, and it is therefore natural to declare it as "acquired knowledge".
+  Even if you forget the details of the individual sentences, memory of the
+  kanji will stay, embedded within the context of the lesson and therefore
+  easier to pick up when called upon.
+
+- The period at which you were studying each kanji also becomes part of its
+  learning context, making it again easier to recall.  If like me you
+  have a visual understanding of the passage of time (I personally see the
+  year as a big loop, with months of the year each having a relatively
+  precise position on it), then this effect can be even stronger.
+
+
 ## How to use Datagengo
 
 ### How to study a lesson
 
-**High-level overview:**
+#### High-level overview
 
 1. Write down the 20 kanji for each lesson.
 2. Write down all of the example sentences in the lesson from memory.
 3. Check what you did.
 4. Repeat every day for about 10 days.
 
-**Detailed explanation:**
+#### Detailed explanation
 
-1. Write down the number of the lesson, the current date and time, and how many times you have studied this lesson (including this time).
+1. Write down the number of the lesson, the current date and time, and how many times you have studied this lesson.
 
 2. Write down the 20 kanji composing the lesson.
    If possible, do this from memory, otherwise it's fine to look at the lesson page.
@@ -57,54 +129,56 @@ start/finish!).
 If you are having a harder time memorizing the kanji and the sentences, you can
 adapt the schedule to your learning speed. Here are some examples:
 
-- Slowest: study each lesson for 12 days, add a new lesson every 6 days (average 2 lessons to study every day).
-- Slow: study each lesson for 10 to 12 days, add a new lesson every 5 days (average 2 lessons to study every day).
-- Medium: study each lesson for 10 to 12 days, add a new lesson every 4 days (average 2-3 lessons to study every day).
-- Fast: study each lesson for 10 to 12 days, add a new lesson every 3 days (average 3-4 lessons to study every day).
-- Extra-fast: study each lesson for 8 to 10 days, add a new lesson every 2 days (average 4-5 lessons to study every day).
+- __Slowest:__ study each lesson for 12 days, add a new lesson every 6 days (average 2 lessons to study every day).
+- __Slow:__ study each lesson for 10 to 12 days, add a new lesson every 5 days (average 2 lessons to study every day).
+- __Medium:__ study each lesson for 10 to 12 days, add a new lesson every 4 days (average 2-3 lessons to study every day).
+- __Fast:__ study each lesson for 10 to 12 days, add a new lesson every 3 days (average 3-4 lessons to study every day).
+- __Extra-fast:__ study each lesson for 8 to 10 days, add a new lesson every 2 days (average 4-5 lessons to study every day).
 
 The two parameters can be tuned separately according to your needs:
 
-- How many days you will keep studying each lesson: you can reduce this
+- __How many days you will keep studying each lesson:__ you can reduce this
   if you feel that the last repetitions are becoming boring/useless, but those
   last repetitions will also become very fast and it's always good to do them
   as practice.
 
-- How frequently you add a new lesson: being consistent with this will help you
+- __How frequently you add a new lesson:__ being consistent with this will help you
   plan long-term. For instance if you are on average adding one lesson every 6
   days you will know all JLPT N2 kanji within a year, and if you consistently
   add a new lesson every 4 days you will know all JLPT N1 kanji in slightly
   over one year.
 
-**Note that 65 JLPT N1 *jōyō* kanji, as well as 186 *jinmeiyō* kanji also
-marked for N1 in KANJIDIC2, did not have an example sentence in the Tanaka
-corpus and are therefore not included in Datagengo.** The list can be found
-below the batch list in the level list, in the "missing chars" column, in rows
-N1a and N1b for the *jōyō* kanji and N1-9 for the *jinmeiyō* kanji.  You might
-want to study at least the 65 *jōyō* kanji separately before attempting to pass
-JLPT N1.
+__Note that 65 JLPT N1 *jōyō* kanji, as well as 186 *jinmeiyō* kanji also marked for N1 in KANJIDIC2, did not have an example sentence in the Tanaka corpus and are therefore not included in Datagengo.__
+The list can be found below the batch list in the level list, in the "missing
+chars" column, in rows N1a and N1b for the *jōyō* kanji and N1-9 for the
+*jinmeiyō* kanji.  You might want to study at least the 65 *jōyō* kanji
+separately before attempting to pass JLPT N1.
 
 ### Which lessons should I learn?
 
 Here is how the lessons are organized, currently:
 
-- 000 to 005: old JLPT N4, current JLPT N5
-- 005 to 014: old JLPT N3, current JLPT N4
-- 014 to 051: JLPT N2
+- __000 to 005:__ old JLPT N4, current JLPT N5
+
+- __005 to 014:__ old JLPT N3, current JLPT N4
+
+- __014 to 051:__ JLPT N2
    - Lessons 014 to 043 (marked N2a) contain kanji learnt in Japanese elementary school.
    - Lessons 043 to 051 (marked N2b) contain kanji learnt in Japanese high school.
-- 051 to 098: JLPT N1
+
+- __051 to 098:__ JLPT N1
    - Lessons 051 to 058 (marked N1a) contain kanji learnt in Japanese elementary school.
    - Lessons 058 to 095 (marked N1b) contain kanji learnt in Japanese high school.
    - Lessons 095 to 098 (marked N1-9) contain *jinmeiyō* kanji (for use in names).
-- 098 to 105 (marked N0a and N0b): extra *jōyō* kanji not part of JLPT but learnt in Japanese elementary or high school
-- 105 to 114 (marked N0-9): extra *jinmeiyō* kanji
-- 114 to 126 (marked N0+): even more kanji, not part of JLPT, *jōyō* or *jinmeiyō*
+
+- __098 to 105:__ extra *jōyō* kanji not part of JLPT but learnt in Japanese elementary or high school (marked N0a and N0b)
+
+- __105 to 114:__ extra *jinmeiyō* kanji (marked N0-9)
+
+- __114 to 126:__ even more kanji, not part of JLPT, *jōyō* or *jinmeiyō* (marked N0+)
 
 If you are studying for advanced levels, make sure to check the character table
 below the lesson list, and in particular the "missing chars" column, to know
 all characters for which no example sentences were found in the Tanaka corpus
 and which are therefore not included in Datagengo.
 
-
-## Why this method?
-- 
cgit v1.2.3