aboutsummaryrefslogblamecommitdiff
path: root/README.md
blob: a2087cfd028d6840a75bec5a71679f45de1662f6 (plain) (tree)
1
2
3
4
5
6
7
8
9
10

           
                                                                                       

                                                                                                                                                               
                                                                                                              
 
                                                                                                                                                              

                                                                                                          







































































                                                                                         


                         
 
                        



                                                                     
                                      
 
                         
 
                                                                                                                   






















                                                                                                                                                                                         










                                                                               




                                                                                                                             


                                                                   
                                                                           



                                                                              
                                                                                   




                                                                              

                                                                                                                                                                                                             
                                                                          

                                                                          




                                                 
                         
 
                         
 
                         
 


                                                                              
                                                                                        
                                                                                       
                                                                                   
 
                                                                                                                                               



                                                                                         

                                                                               
                                                                            

                                                                             
 
# Datagengo

Datagengo (データ言語) is a new (experimental) method for learning Japanese kanji.

Datagengo is an algorimically-generated list of lessons, each containing a batch of 20 kanji and just enough example sentences to learn those kanji in context.
The core of the method consists in memorizing the example sentences and writing them down repeatedly on paper.

Lessons are in increasing difficulty according to JLPT levels and school grade indicated in [KANJIDIC2](http://www.edrdg.org/wiki/index.php/KANJIDIC_Project).
Example sentences are sourced from the [Tanaka corpus](http://www.edrdg.org/wiki/index.php/Tanaka_Corpus).

## Why this method?

I'm putting this section first to make it visible, but you might want to skim
[how to use Datagengo](#how_to_use_datagengo) first.

I had previous experience with learning kanji using books, starting with
Heisig's *Remembering the Kanji* (RTK), and spaced repetition systems (SRS)
such as Anki or WaniKani.  I started with RTK, which I think was truly usefull
to me as it gave me a sense of how kanji are constructed and therefore how to
apprehend them, but I never completed the book.  I went on and started using an
Anki deck that contained all the RTK kanji, which I didn't use for very long. A
bit later I started WaniKani and went through a bunch of that, but I still
stopped at some point, partly by lack of motivation and partly by frustration
that this was going nowhere.

Here are my main frustrations with SRS methods:


- The methods I used were based on individual kanji or word, and did not
  provide real-world context in which those kanji or words might be used.

- Items (kanji or words) are taken in isolation and not grouped by level or by
  lesson, which means that the full effort of memorization has to be made for
  each item independently.

- As a consequence of the lack of context and logical grouping, I found it very
  easy to confuse different kanji or different words.

- I was always doubtful of "spaced repetition with increasing intervals", it
  feels way less efficient to me than consistently repeating something for some
  time and then assuming that it is definitively learnt. In my head, "assuming
  something is learnt", after having spent a definite amount of effort on it and
  having decided it was over with, helps create a category of things I'm supposed to
  know, which in turn helps cement that knowledge.

During my multi-year break from Japanese (something like 2016-2023), I pondered
this question occasionally and started thinking about a new method, that
eventually became Datagengo.

The basic idea behind Datagengo is to add as much context (explicit or implicit) with the
learning of each kanji, so that all of these contextual clues can be used
when recalling them.

Here is what Datagengo does, and why I expect it to work much better, at least for me:

- Datagengo exploits the Tanaka corpus to provide in-context example sentences
  for all of the learnt kanji.

- Datagengo tries to help you learn all the kanji using as few example
  sentences as possible, to be efficient.

- Datagengo requires the learner to write by hand, which is in my experience the
  best way to learn difficult things.

- Lessons, or "batches", are logical units that somewhat "work together" (even
  if the sentences or kanji have nothing to do with one another), helping to cement
  that entire unit of knowledge in one go instead of lots of independent
  efforts on tiny things.

- Once a lesson has been studied around a dozen times, it becomes very easy to
  recite, and it is therefore natural to declare it as "acquired knowledge".
  Even if you forget the details of the individual sentences, memory of the
  kanji will stay, embedded within the context of the lesson and therefore
  easier to pick up when called upon.

- The period at which you were studying each kanji also becomes part of its
  learning context, making it again easier to recall.  If like me you
  have a visual understanding of the passage of time (I personally see the
  year as a big loop, with months of the year each having a relatively
  precise position on it), then this effect can be even stronger.


## How to use Datagengo

### How to study a lesson

#### High-level overview

1. Write down the 20 kanji for each lesson.
2. Write down all of the example sentences in the lesson from memory.
3. Check what you did.
4. Repeat every day for about 10 days.

#### Detailed explanation

1. Write down the number of the lesson, the current date and time, and how many times you have studied this lesson.

2. Write down the 20 kanji composing the lesson.
   If possible, do this from memory, otherwise it's fine to look at the lesson page.
   Make sure that you are not mixing up different kanji and that you know the correct stroke order for each kanji.

3. If this is your first encounter with this lesson, just read the sentences and familiarize yourself with all of the sentences and the new words they contain, and be done with it.
   Otherwise, proceed to the next step.

4. If this is one of the first few times you are studying this lesson, re-read all of the sentences to ensure you have them in mind.

5. Close the lesson's web page, and write down all of the sentences in the lesson from memory.
   You can use the list of 20 kanji you just wrote to help you remember the sentences.

6. Return to the lesson's web page and check that you wrote all sentences correctly.
   If you have any doubts on stroke order or pronunciation, take the time to check.

7. Write down the time at which you finished studying the lesson.

8. If you have the patience, have a look at the "extra vocabulary" section at the bottom of the lesson's page.
   You don't need to actively memorize this, it's just so that you have the information that these words can be written using the kanji of this lesson when you encounter them elsewhere.

The overall process takes 10 to 15 minutes per lesson, sometimes less, sometimes more. Each lesson will take less and less time as you repeat it.

### Planning your study

I recommend studying each lesson for 10 to 12 days, every day, and adding a new
lesson about every three or four days. This means that you will have 3 or 4
lessons to study each day, which takes me between 30 and 45 minutes total
(that's why I'm suggesting that you write down the time when you
start/finish!).

If you are having a harder time memorizing the kanji and the sentences, you can
adapt the schedule to your learning speed. Here are some examples:

- __Slowest:__ study each lesson for 12 days, add a new lesson every 6 days (average 2 lessons to study every day).
- __Slow:__ study each lesson for 10 to 12 days, add a new lesson every 5 days (average 2 lessons to study every day).
- __Medium:__ study each lesson for 10 to 12 days, add a new lesson every 4 days (average 2-3 lessons to study every day).
- __Fast:__ study each lesson for 10 to 12 days, add a new lesson every 3 days (average 3-4 lessons to study every day).
- __Extra-fast:__ study each lesson for 8 to 10 days, add a new lesson every 2 days (average 4-5 lessons to study every day).

The two parameters can be tuned separately according to your needs:

- __How many days you will keep studying each lesson:__ you can reduce this
  if you feel that the last repetitions are becoming boring/useless, but those
  last repetitions will also become very fast and it's always good to do them
  as practice.

- __How frequently you add a new lesson:__ being consistent with this will help you
  plan long-term. For instance if you are on average adding one lesson every 6
  days you will know all JLPT N2 kanji within a year, and if you consistently
  add a new lesson every 4 days you will know all JLPT N1 kanji in slightly
  over one year.

__Note that 65 JLPT N1 *jōyō* kanji, as well as 186 *jinmeiyō* kanji also marked for N1 in KANJIDIC2, did not have an example sentence in the Tanaka corpus and are therefore not included in Datagengo.__
The list can be found below the batch list in the level list, in the "missing
kanji" column, in rows N1a and N1b for the *jōyō* kanji and N1-9 for the
*jinmeiyō* kanji.  You might want to study at least the 65 *jōyō* kanji
separately before attempting to pass JLPT N1.

### Which lessons should I learn?

Here is how the lessons are organized, currently:

- __000 to 005:__ JLPT N5

- __005 to 014:__ JLPT N4

- __014 to 031:__ JLPT N3

- __031 to 051:__ JLPT N2

- __051 to 098:__ JLPT N1, further split in three parts for easier processing:
   - Lessons 051 to 058 (marked N1a) contain kanji learnt in Japanese elementary school.
   - Lessons 058 to 095 (marked N1b) contain kanji learnt in Japanese secondary school.
   - Lessons 095 to 098 (marked N1-9) contain *jinmeiyō* kanji (for use in names).

- __098 to 105:__ extra *jōyō* kanji not part of JLPT but learnt in Japanese elementary or secondary school (marked N0a and N0b respectively)

- __105 to 114:__ extra *jinmeiyō* kanji (marked N0-9)

- __114 to 126:__ even more kanji, not part of JLPT, *jōyō* or *jinmeiyō* (marked N0+)

If you are studying for advanced levels, make sure to check the character table
below the lesson list, and in particular the "missing kanji" column, to know
all characters for which no example sentences were found in the Tanaka corpus
and which are therefore not included in Datagengo.