Files
BatchVocabListGenerator/instructions.md
jonasgaudian eabe2e2969 welcome gitea
2026-02-19 17:18:23 +01:00

2.2 KiB

Core Rules & App Mechanics

When generating the Python logic for the JSON, you must adhere to these rules because of how the app's Kotlin data models and Room database are structured:

  1. Export Structure: The root object is a CategoryExport. It must contain exactly one category definition and an array of items belonging to it.
  2. Dummy IDs: Do not attempt to guess or fetch the target database IDs. The app handles ID remapping (ConflictResolution) natively during import.
    • Always use 99999 for the category id.
    • Start item ids sequentially from 100000.
  3. Hardcoded Dates: The Kotlin parser uses kotlinx.serialization and strictly expects ISO-8601 timestamps for date fields. If they are missing or empty, the app will crash. However, the dates do not need to be accurate for new imports. Hardcode "2024-01-01T00:00:00.000Z" for the exportDate and createdAt fields.
  4. Minimal Data:
    • formatVersion must be 1.
    • The states array must be completely empty: [] (since these are new words without learning history).
    • The features property on every item must strictly be an empty JSON object string: "{}".
    • Completely omit the zipfFrequencyFirst and zipfFrequencySecond fields.
  5. Stage Mappings: Every generated item needs a corresponding entry in the stageMappings array, setting its learning stage to "NEW".

Target JSON Schema

Your Python script must output a JSON file that perfectly matches this structure. The items and stageMappings arrays should expand dynamically based on the input words.

{
  "type": "Category",
  "formatVersion": 1,
  "exportDate": "2024-01-01T00:00:00.000Z",
  "metadata": {
    "itemCount": 1, 
    "categoryCount": 1,
    "exportScope": "Category: <CATEGORY_NAME>"
  },
  "category": {
    "type": "TagCategory",
    "id": 99999,
    "name": "<CATEGORY_NAME>"
  },
  "items": [
    {
      "id": 100000,
      "languageFirstId": <LANG_FIRST_ID>,
      "languageSecondId": <LANG_SECOND_ID>,
      "wordFirst": "<WORD_1>",
      "wordSecond": "<WORD_2>",
      "createdAt": "2024-01-01T00:00:00.000Z",
      "features": "{}"
    }
  ],
  "states": [],
  "stageMappings": [
    {
      "vocabularyItemId": 100000,
      "stage": "NEW"
    }
  ]
}