Ordinary Dictionary

Dictionary import format 辞書形式

od-dict/1 · the contract for importing custom dictionaries

The app can load your own dictionaries. One dictionary is one JSON file in the od-dict/1 format. You produce it by converting an existing dictionary (for example a Yomitan dictionary) — by hand or with an LLM — and the app builds its own database from it.

The format is a superset of jmdict-simplified (scriptin/jmdict-simplified): a valid jmdict-simplified word is already valid here, so almost no conversion is needed for that source.

This page is a readable summary. The full, code-verified contract lives in the repository: docs/od-dict-format.md.

1. The whole file

{
  "format": "od-dict/1",          // required, exact string
  "source": "custom",             // required; an id for this dictionary (§2)
  "metadata": {                   // optional; for display/provenance only
    "title": "My Dictionary",
    "license": "CC BY-SA 4.0",
    "url": "https://…"
  },
  "tags": { "v1": "Ichidan verb" }, // optional; NOT consumed (§6)
  "words": [
    {
      "id": "1259420",            // required; UNIQUE within this file (§3)
      "kanji": ["食べる"],         // optional; omit for kana-only words
      "kana":  ["たべる"],         // at least one form (kanji or kana) must exist
      "sense": [                  // required; ≥1
        {
          "partOfSpeech": ["v1"], // optional; coded POS ONLY, never free text (§4)
          "misc": ["transitive"], // optional; free labels, shown verbatim
          "gloss": [              // the translations
            { "lang": "en", "text": "to eat" },
            "пожирать"            // bare string = { lang: "ru", origin: "original" }
          ],
          "examples": [ ["ご飯を食べる", "to eat a meal"] ],  // optional; [japanese, translation]
          "references": []        // optional; cross-links (§5)
        }
      ]
    }
  ]
}

2. Top level

fieldreq.meaning
formatyesExactly "od-dict/1".
sourceyesAn id string for the whole dictionary. A stable slug ("custom", "jitendex"). It is the namespace for ids and cross-references. An unknown source is simply ranked after the built-in dictionaries — never rejected.
metadatanoFree object (title, license, url). Stored for display only; the builder does not read it.
tagsnoA code → label map for jmdict-simplified compatibility. Not consumed (§6). Put labels you want shown directly in misc.
wordsyesThe array of entries.

Security: the design intends to force source = "custom" at import so a file can't claim to be "jmdict". That is not implemented yet — treat source as advisory until the validating importer lands.

3. Words and identity

{
  "id": "1259420",
  "homograph": "II",          // optional; disambiguator when head+reading collide
  "label": "colloquial",      // optional; entry-level tag shown on the card
  "isExpression": true,       // optional; multi-word phrase
  "common": true,             // optional; a high-frequency entry
  "kanji": ["食べる", "喰べる"],  // 0…n written forms
  "kana":  ["たべる"],          // 0…n readings
  "sense": [ /* … */ ]
}

Identity is (source, id), and it is UNIQUE.

  • id must be unique within the file. The table is written with a plain INSERT against UNIQUE(source, source_entry_id), so a duplicate id throws and aborts the import batch — it is not silently ignored. Use your source's own id; if it has none, synthesise a stable one (a counter, or the headword).
  • kanji/kana items may be a bare string (the normal case) or a jmdict-simplified object { "text": "…" }; only text is read, the rest is accepted but dropped. A bare string is preferred.
  • A word needs at least one form. For Japanese, give kana; omit kanji for kana-only words.

Grouping. The app shows one card per (head + reading) group, merging across dictionaries. In your file, emit one word per entry — collect all written forms into kanji, all readings into kana, all meanings into sense. If your source splits one entry across rows (Yomitan does, via a shared sequence), merge them into one word first, or you get several cards for one entry.

4. partOfSpeech — the one field that can silently break search

This is the only field where a wrong value removes the word from search with no error. POS does not affect finding a word by its dictionary form or reading — 食べる and たべる are always found. POS is used in exactly one place: the deinflection gate — finding a word from an inflected surface the user typed (食べた, 静かな).

entry POSverdicteffect
absent / not a conjugation classweakstill found, ranked a little lower
a conjugation class that matchesconfirmedfound, ranked normally
a conjugation class that mismatchesrejecteddropped — not found from that inflection

If you are not sure of the conjugation class, omit partOfSpeech entirely. Absent is safe (weak), wrong is fatal (rejected). Omit beats guess.

  • Codes only, never free text. Put readable labels ("verb", "honorific") in misc, where they are shown verbatim.
  • Only the conjugation classes matter. Other codes (n, exp, vt, adv …) are ignored by search — harmless but useless.
  • The Yomitan trap: never copy the collapsed v5. The engine has no v5 class — it needs the specific one (v5k, v5r …). A bare "v5" matches nothing → rejected, i.e. worse than untagged. Take the specific code from definitionTags, or expand by the dictionary form's final kana, or omit.

The conjugation classes the engine understands:

v1                              ichidan (-ru):       食べる, 見る
v5u v5k v5g v5s v5t v5n v5b v5m v5r   godan, by the dictionary-form final kana
vk                              来る (kuru)
vs vs-i vs-s                    する verbs
adj-i                           い-adjectives:        高い, 良い
adj-na                          な-adjectives:        静か, 綺麗

5. Glosses, languages and cross-references

"гулять"                                  // bare string = { "lang": "ru", "origin": "original" }
{ "lang": "en", "text": "to take a walk" }
{ "lang": "ru", "text": "то же, что {0}", "origin": "original" }
  • A bare string is Russian (lang: "ru"). For any other language use the object form with lang.
  • origin is "original" (default) or "machine" (AI-produced); display only.
  • How lang affects search: a gloss with lang: "ru" goes in the Russian search channel; every other lang goes in the English channel. The channel is chosen by the script the user typed: Cyrillic → ru, Latin → en. So a Japanese-English dictionary's glosses must be lang: "en" to be reachable by an English search. A monolingual Japanese gloss (lang: "ja") lands in the en channel and is effectively display-only — still found by headword and reading, but not by its definition text.

Cross-references — ICU placeholders {0}, {1} … in the gloss text, resolved against a references array:

{
  "gloss": [ { "lang": "en", "text": "polite form of {0}" } ],
  "references": [ { "label": "", "to": "1234567", "text": "言う" } ]
}
  • {0} is replaced in place by references[0].text as a tappable link.
  • to is the id of the target word in this same file; the link opens { source: <this dictionary>, id: to }. A to that matches nothing simply opens nothing.
  • label is a relation marker ("see", "cf.") for trailing references (no {n}). A literal { in gloss text must be written '{' (ICU quoting).

6. What the format does not carry

These are added by the app from its own shared data, keyed by surface/reading, so every dictionary agrees — you do not provide them: pitch accent, furigana, romaji, frequency / "common" ranking, audio, JLPT/WaniKani levels, kanji breakdown. Also not consumed (though accepted for compatibility): the top-level tags map and per-form/per-gloss extras. The card renders misc/field/dialect as free strings, verbatim — no code expansion — so put readable labels there directly.

7. Minimal examples

The smallest word — kana only with one gloss:

{ "id": "w1", "kana": ["ありがとう"], "sense": [ { "gloss": [ { "lang": "en", "text": "thank you" } ] } ] }
// (a) JA→EN godan. POS is the SPECIFIC class v5k, not Yomitan's "v5".
{ "id": "1578850", "kanji": ["書く"], "kana": ["かく"],
  "sense": [ { "partOfSpeech": ["v5k", "vt"],
    "gloss": [ { "lang": "en", "text": "to write; to compose; to pen" } ] } ] }

// (b) JA→EN with an inline cross-reference.
{ "id": "敷衍", "kanji": ["敷衍", "敷延"], "kana": ["ふえん"],
  "sense": [ { "gloss": [ { "lang": "en", "text": "amplification (cf. {0})" } ],
    "references": [ { "label": "", "to": "演繹", "text": "演繹" } ] } ] }

// (c) kana-only な-adjective. adj-na IS a deinflection class, so tag it.
{ "id": "1000230", "kana": ["きれい"],
  "sense": [ { "partOfSpeech": ["adj-na"], "gloss": [ { "lang": "en", "text": "pretty; clean; neat" } ] } ] }

8. POS allowlist

Emit a code into partOfSpeech only if it is one of these (or a jmdict subclass starting with one). Anything else: put the human label in misc and omit it here.

v1
v5u v5k v5g v5s v5t v5n v5b v5m v5r
vk
vs vs-i vs-s
adj-i
adj-na

Collapsed Yomitan "v5" → expand by the dictionary-form final kana: う→v5u く→v5k ぐ→v5g す→v5s つ→v5t ぬ→v5n ぶ→v5b む→v5m る→v5r. Unsure → omit.

← Home