Dictionary import format 辞書形式
od-dict/1 · the contract for importing custom dictionaries
The app can load your own dictionaries. One dictionary is one JSON file in the
od-dict/1 format. You produce it by converting an existing dictionary (for example a
Yomitan dictionary) — by hand or with an LLM — and the app builds its own database from it.
The format is a superset of jmdict-simplified
(scriptin/jmdict-simplified): a valid
jmdict-simplified word is already valid here, so almost no conversion is needed for that source.
docs/od-dict-format.md.
1. The whole file
{
"format": "od-dict/1", // required, exact string
"source": "custom", // required; an id for this dictionary (§2)
"metadata": { // optional; for display/provenance only
"title": "My Dictionary",
"license": "CC BY-SA 4.0",
"url": "https://…"
},
"tags": { "v1": "Ichidan verb" }, // optional; NOT consumed (§6)
"words": [
{
"id": "1259420", // required; UNIQUE within this file (§3)
"kanji": ["食べる"], // optional; omit for kana-only words
"kana": ["たべる"], // at least one form (kanji or kana) must exist
"sense": [ // required; ≥1
{
"partOfSpeech": ["v1"], // optional; coded POS ONLY, never free text (§4)
"misc": ["transitive"], // optional; free labels, shown verbatim
"gloss": [ // the translations
{ "lang": "en", "text": "to eat" },
"пожирать" // bare string = { lang: "ru", origin: "original" }
],
"examples": [ ["ご飯を食べる", "to eat a meal"] ], // optional; [japanese, translation]
"references": [] // optional; cross-links (§5)
}
]
}
]
}
2. Top level
| field | req. | meaning |
|---|---|---|
format | yes | Exactly "od-dict/1". |
source | yes | An id string for the whole dictionary. A stable slug ("custom", "jitendex"). It is the namespace for ids and cross-references. An unknown source is simply ranked after the built-in dictionaries — never rejected. |
metadata | no | Free object (title, license, url). Stored for display only; the builder does not read it. |
tags | no | A code → label map for jmdict-simplified compatibility. Not consumed (§6). Put labels you want shown directly in misc. |
words | yes | The array of entries. |
Security: the design intends to force source = "custom" at import so a file can't claim to be "jmdict". That is not implemented yet — treat source as advisory until the validating importer lands.
3. Words and identity
{
"id": "1259420",
"homograph": "II", // optional; disambiguator when head+reading collide
"label": "colloquial", // optional; entry-level tag shown on the card
"isExpression": true, // optional; multi-word phrase
"common": true, // optional; a high-frequency entry
"kanji": ["食べる", "喰べる"], // 0…n written forms
"kana": ["たべる"], // 0…n readings
"sense": [ /* … */ ]
}
Identity is (source, id), and it is UNIQUE.
idmust be unique within the file. The table is written with a plainINSERTagainstUNIQUE(source, source_entry_id), so a duplicateidthrows and aborts the import batch — it is not silently ignored. Use your source's own id; if it has none, synthesise a stable one (a counter, or the headword).kanji/kanaitems may be a bare string (the normal case) or a jmdict-simplified object{ "text": "…" }; onlytextis read, the rest is accepted but dropped. A bare string is preferred.- A word needs at least one form. For Japanese, give
kana; omitkanjifor kana-only words.
Grouping. The app shows one card per (head + reading) group, merging
across dictionaries. In your file, emit one word per entry — collect all
written forms into kanji, all readings into kana, all meanings into
sense. If your source splits one entry across rows (Yomitan does, via a shared
sequence), merge them into one word first, or you get several cards for one entry.
4. partOfSpeech — the one field that can silently break search
This is the only field where a wrong value removes the word from search with no error.
POS does not affect finding a word by its dictionary form or reading — 食べる
and たべる are always found. POS is used in exactly one place: the
deinflection gate — finding a word from an inflected surface the user typed
(食べた, 静かな).
| entry POS | verdict | effect |
|---|---|---|
| absent / not a conjugation class | weak | still found, ranked a little lower |
| a conjugation class that matches | confirmed | found, ranked normally |
| a conjugation class that mismatches | rejected | dropped — not found from that inflection |
If you are not sure of the conjugation class, omit partOfSpeech
entirely. Absent is safe (weak), wrong is fatal (rejected). Omit beats guess.
- Codes only, never free text. Put readable labels ("verb", "honorific") in
misc, where they are shown verbatim. - Only the conjugation classes matter. Other codes (
n,exp,vt,adv…) are ignored by search — harmless but useless. - The Yomitan trap: never copy the collapsed
v5. The engine has nov5class — it needs the specific one (v5k,v5r…). A bare"v5"matches nothing →rejected, i.e. worse than untagged. Take the specific code fromdefinitionTags, or expand by the dictionary form's final kana, or omit.
The conjugation classes the engine understands:
v1 ichidan (-ru): 食べる, 見る
v5u v5k v5g v5s v5t v5n v5b v5m v5r godan, by the dictionary-form final kana
vk 来る (kuru)
vs vs-i vs-s する verbs
adj-i い-adjectives: 高い, 良い
adj-na な-adjectives: 静か, 綺麗
5. Glosses, languages and cross-references
"гулять" // bare string = { "lang": "ru", "origin": "original" }
{ "lang": "en", "text": "to take a walk" }
{ "lang": "ru", "text": "то же, что {0}", "origin": "original" }
- A bare string is Russian (
lang: "ru"). For any other language use the object form withlang. originis"original"(default) or"machine"(AI-produced); display only.- How
langaffects search: a gloss withlang: "ru"goes in the Russian search channel; every otherlanggoes in the English channel. The channel is chosen by the script the user typed: Cyrillic → ru, Latin → en. So a Japanese-English dictionary's glosses must belang: "en"to be reachable by an English search. A monolingual Japanese gloss (lang: "ja") lands in the en channel and is effectively display-only — still found by headword and reading, but not by its definition text.
Cross-references — ICU placeholders {0}, {1} … in the gloss
text, resolved against a references array:
{
"gloss": [ { "lang": "en", "text": "polite form of {0}" } ],
"references": [ { "label": "", "to": "1234567", "text": "言う" } ]
}
{0}is replaced in place byreferences[0].textas a tappable link.tois theidof the target word in this same file; the link opens{ source: <this dictionary>, id: to }. Atothat matches nothing simply opens nothing.labelis a relation marker ("see","cf.") for trailing references (no{n}). A literal{in gloss text must be written'{'(ICU quoting).
6. What the format does not carry
These are added by the app from its own shared data, keyed by surface/reading, so every dictionary
agrees — you do not provide them: pitch accent, furigana, romaji, frequency /
"common" ranking, audio, JLPT/WaniKani levels, kanji breakdown. Also not consumed (though accepted for
compatibility): the top-level tags map and per-form/per-gloss extras. The card renders
misc/field/dialect as free strings, verbatim —
no code expansion — so put readable labels there directly.
7. Minimal examples
The smallest word — kana only with one gloss:
{ "id": "w1", "kana": ["ありがとう"], "sense": [ { "gloss": [ { "lang": "en", "text": "thank you" } ] } ] }
// (a) JA→EN godan. POS is the SPECIFIC class v5k, not Yomitan's "v5".
{ "id": "1578850", "kanji": ["書く"], "kana": ["かく"],
"sense": [ { "partOfSpeech": ["v5k", "vt"],
"gloss": [ { "lang": "en", "text": "to write; to compose; to pen" } ] } ] }
// (b) JA→EN with an inline cross-reference.
{ "id": "敷衍", "kanji": ["敷衍", "敷延"], "kana": ["ふえん"],
"sense": [ { "gloss": [ { "lang": "en", "text": "amplification (cf. {0})" } ],
"references": [ { "label": "", "to": "演繹", "text": "演繹" } ] } ] }
// (c) kana-only な-adjective. adj-na IS a deinflection class, so tag it.
{ "id": "1000230", "kana": ["きれい"],
"sense": [ { "partOfSpeech": ["adj-na"], "gloss": [ { "lang": "en", "text": "pretty; clean; neat" } ] } ] }
8. POS allowlist
Emit a code into partOfSpeech only if it is one of these (or a jmdict subclass starting
with one). Anything else: put the human label in misc and omit it here.
v1
v5u v5k v5g v5s v5t v5n v5b v5m v5r
vk
vs vs-i vs-s
adj-i
adj-na
Collapsed Yomitan "v5" → expand by the dictionary-form final kana:
う→v5u く→v5k ぐ→v5g す→v5s つ→v5t ぬ→v5n ぶ→v5b む→v5m る→v5r.
Unsure → omit.