aboutsummaryrefslogtreecommitdiff

Datagengo

Datagengo (データ言語) is a new (experimental) method for learning Japanese kanji.

Datagengo is an algorimically-generated list of lessons, each containing a batch of 20 kanji and just enough example sentences to learn those kanji in context. The core of the method consists in memorizing the example sentences and writing them down repeatedly on paper.

Lessons are in increasing difficulty according to JLPT levels and school grade indicated in KANJIDIC2. Example sentences are sourced from the Tanaka corpus.

Why this method?

I'm putting this section first to make it visible, but you might want to skim how to use Datagengo first.

I had previous experience with learning kanji using books, starting with Heisig's Remembering the Kanji (RTK), and spaced repetition systems (SRS) such as Anki or WaniKani. I started with RTK, which I think was truly usefull to me as it gave me a sense of how kanji are constructed and therefore how to apprehend them, but I never completed the book. I went on and started using an Anki deck that contained all the RTK kanji, which I didn't use for very long. A bit later I started WaniKani and went through a bunch of that, but I still stopped at some point, partly by lack of motivation and partly by frustration that this was going nowhere.

Here are my main frustrations with SRS methods:

  • The methods I used were based on individual kanji or word, and did not provide real-world context in which those kanji or words might be used.

  • Items (kanji or words) are taken in isolation and not grouped by level or by lesson, which means that the full effort of memorization has to be made for each item independently.

  • As a consequence of the lack of context and logical grouping, I found it very easy to confuse different kanji or different words.

  • I was always doubtful of "spaced repetition with increasing intervals", it feels way less efficient to me than consistently repeating something for some time and then assuming that it is definitively learnt. In my head, "assuming something is learnt", after having spent a definite amount of effort on it and having decided it was over with, helps create a category of things I'm supposed to know, which in turn helps cement that knowledge.

During my multi-year break from Japanese (something like 2016-2023), I pondered this question occasionally and started thinking about a new method, that eventually became Datagengo.

The basic idea behind Datagengo is to add as much context (explicit or implicit) with the learning of each kanji, so that all of these contextual clues can be used when recalling them.

Here is what Datagengo does, and why I expect it to work much better, at least for me:

  • Datagengo exploits the Tanaka corpus to provide in-context example sentences for all of the learnt kanji.

  • Datagengo tries to help you learn all the kanji using as few example sentences as possible, to be efficient.

  • Datagengo requires the learner to write by hand, which is in my experience the best way to learn difficult things.

  • Lessons, or "batches", are logical units that somewhat "work together" (even if the sentences or kanji have nothing to do with one another), helping to cement that entire unit of knowledge in one go instead of lots of independent efforts on tiny things.

  • Once a lesson has been studied around a dozen times, it becomes very easy to recite, and it is therefore natural to declare it as "acquired knowledge". Even if you forget the details of the individual sentences, memory of the kanji will stay, embedded within the context of the lesson and therefore easier to pick up when called upon.

  • The period at which you were studying each kanji also becomes part of its learning context, making it again easier to recall. If like me you have a visual understanding of the passage of time (I personally see the year as a big loop, with months of the year each having a relatively precise position on it), then this effect can be even stronger.

How to use Datagengo

How to study a lesson

High-level overview

  1. Write down the 20 kanji for each lesson.
  2. Write down all of the example sentences in the lesson from memory.
  3. Check what you did.
  4. Repeat every day for about 10 days.

Detailed explanation

  1. Write down the number of the lesson, the current date and time, and how many times you have studied this lesson.

  2. Write down the 20 kanji composing the lesson. If possible, do this from memory, otherwise it's fine to look at the lesson page. Make sure that you are not mixing up different kanji and that you know the correct stroke order for each kanji.

  3. If this is your first encounter with this lesson, just read the sentences and familiarize yourself with all of the sentences and the new words they contain, and be done with it. Otherwise, proceed to the next step.

  4. If this is one of the first few times you are studying this lesson, re-read all of the sentences to ensure you have them in mind.

  5. Close the lesson's web page, and write down all of the sentences in the lesson from memory. You can use the list of 20 kanji you just wrote to help you remember the sentences.

  6. Return to the lesson's web page and check that you wrote all sentences correctly. If you have any doubts on stroke order or pronunciation, take the time to check.

  7. Write down the time at which you finished studying the lesson.

  8. If you have the patience, have a look at the "extra vocabulary" section at the bottom of the lesson's page. You don't need to actively memorize this, it's just so that you have the information that these words can be written using the kanji of this lesson when you encounter them elsewhere.

The overall process takes 10 to 15 minutes per lesson, sometimes less, sometimes more. Each lesson will take less and less time as you repeat it.

Planning your study

I recommend studying each lesson for 10 to 12 days, every day, and adding a new lesson about every three or four days. This means that you will have 3 or 4 lessons to study each day, which takes me between 30 and 45 minutes total (that's why I'm suggesting that you write down the time when you start/finish!).

If you are having a harder time memorizing the kanji and the sentences, you can adapt the schedule to your learning speed. Here are some examples:

  • Slowest: study each lesson for 12 days, add a new lesson every 6 days (average 2 lessons to study every day).
  • Slow: study each lesson for 10 to 12 days, add a new lesson every 5 days (average 2 lessons to study every day).
  • Medium: study each lesson for 10 to 12 days, add a new lesson every 4 days (average 2-3 lessons to study every day).
  • Fast: study each lesson for 10 to 12 days, add a new lesson every 3 days (average 3-4 lessons to study every day).
  • Extra-fast: study each lesson for 8 to 10 days, add a new lesson every 2 days (average 4-5 lessons to study every day).

The two parameters can be tuned separately according to your needs:

  • How many days you will keep studying each lesson: you can reduce this if you feel that the last repetitions are becoming boring/useless, but those last repetitions will also become very fast and it's always good to do them as practice.

  • How frequently you add a new lesson: being consistent with this will help you plan long-term. For instance if you are on average adding one lesson every 6 days you will know all JLPT N2 kanji within a year, and if you consistently add a new lesson every 4 days you will know all JLPT N1 kanji in slightly over one year.

Note that 65 JLPT N1 jōyō kanji, as well as 186 jinmeiyō kanji also marked for N1 in KANJIDIC2, did not have an example sentence in the Tanaka corpus and are therefore not included in Datagengo. The list can be found below the batch list in the level list, in the "missing kanji" column, in rows N1a and N1b for the jōyō kanji and N1-9 for the jinmeiyō kanji. You might want to study at least the 65 jōyō kanji separately before attempting to pass JLPT N1.

Which lessons should I learn?

Here is how the lessons are organized, currently:

  • 000 to 005: JLPT N5

  • 005 to 014: JLPT N4

  • 014 to 031: JLPT N3

  • 031 to 051: JLPT N2

  • 051 to 098: JLPT N1, further split in three parts for easier processing:

  • Lessons 051 to 058 (marked N1a) contain kanji learnt in Japanese elementary school.
  • Lessons 058 to 095 (marked N1b) contain kanji learnt in Japanese secondary school.
  • Lessons 095 to 098 (marked N1-9) contain jinmeiyō kanji (for use in names).

  • 098 to 105: extra jōyō kanji not part of JLPT but learnt in Japanese elementary or secondary school (marked N0a and N0b respectively)

  • 105 to 114: extra jinmeiyō kanji (marked N0-9)

  • 114 to 126: even more kanji, not part of JLPT, jōyō or jinmeiyō (marked N0+)

If you are studying for advanced levels, make sure to check the character table below the lesson list, and in particular the "missing kanji" column, to know all characters for which no example sentences were found in the Tanaka corpus and which are therefore not included in Datagengo.