How we built a multilingual culinary atlas with AI translation

A peppercorn carries its origin in its name. Voatsiperifery is Malagasy — *voa* meaning fruit, *tsiperifery* the climbing vine that produces it. Saying it in French does not change what it is; *poivre voatsiperifery* leaves both halves of the name intact because no European word points to the same berry. The translation problem begins the moment a name is forced through a language that has no concept for what is being named.

Sapor catalogues 101 single-origin spices and herbs across four languages: English, French, German, Spanish. Each ingredient carries a region, an IGP or PDO file when one exists, a producer where we have one, a phenology, an aromatic profile, a botanical synonym list. When we started the project we knew translation would be hard. We did not anticipate just how often the failure mode would be not mistranslation but erasure — the soft levelling of a name that means something specific into a name that means nothing at all.

The problem the atlas posed

Translating a recipe is a different job from translating an atlas. A recipe can adapt: "espagnolette" can become "shallot" without ceremony, and the dish still works. An atlas cannot. When we describe Penja pepper as a white pepper from the volcanic plain of the Moungo, the word "Penja" is not interchangeable with "white pepper from Cameroon." It refers to one administrative subdivision, one set of soils, one IGP file deposited in Yaoundé in 2013. Penja is a proper noun in the strictest sense — a place, a community, a registered designation. A translator who treats it as a flavour descriptor has already broken the article.

The same logic applies to *voatsiperifery*, to Tellicherry, to Aleppo, to Espelette. The names are not labels stuck onto generic products; they are the products themselves. The information loss when a model decides to "simplify" or "localise" such a term is not stylistic. It dismantles the entire purpose of the page.

What DeepL does well, and where it stops

DeepL remains the most accurate engine for ordinary prose. Its handling of register is excellent — it knows when a sentence wants the formal *vous* in French and when the informal *du* works in German. For paragraphs of pure description, where we are talking about altitude, rainfall, harvest months, soil pH, DeepL produces output we can ship with light editing.

It stops at proper nouns the moment those nouns drift from familiar geography. DeepL was confident in early tests that *Piper longum* should be translated as "long pepper" in English — correct as a common name, wrong as a botanical reference — and equally confident that *voatsiperifery* was a typo. It would silently correct the name to a phonetically closer French word, strip the asterisks around the binomial, or replace *Karimunda* with a paraphrase. None of these errors are visible without a domain-aware review.

For ingredient databases, DeepL is a fast first pass. It cannot be the last pass.

Comparing the generative models

The choice of which large language model to use for the second pass took longer than we expected. Our own assessment, drawn from the AI-writing tools comparator we maintain where each engine is tested with a paid account over weeks of real work, settled on Claude as the most reliable for editorial translation when paired with a precise prompt.

The difference between the major generative engines, in our testing, comes down to two behaviours: how aggressively a model "improves" the text on its own initiative, and how it handles uncertainty about a proper noun.

GPT, in its various 4-class iterations, tended to rewrite. Asked to translate a paragraph about Wayanad pepper into German, it would add stylistic flourishes that read as marketing — adjectives that were not in the source, mild hyperbole, a tendency to call things "exquisite" or "unique" without warrant. For an atlas, this is a discipline problem. We spent revision time pulling the text back to neutral.

Claude tended to preserve. Given an English source that named *Piper nigrum* and *Karimunda*, Claude's French and German outputs kept the binomial intact, kept the cultivar name italicised, and flagged in a brief note when it was uncertain about a regional spelling. The model also responded reliably to the instruction "do not paraphrase IGP names" — once told, it stayed told for the duration of a session.

Gemini we tested less extensively. Its handling of Asian scripts is excellent for languages we do not currently publish in, but for our four target locales it sat between the two others without clear advantage.

The choice was not about which model is "smartest." It was about which model could be instructed to behave like a careful editor rather than a confident author.

Heritage terms machines mistranslate by default

A short catalogue of failure modes we logged during the first three months of production.

*Penja* was repeatedly recast as "Cameroonian white pepper" — geographically accurate but legally meaningless. The IGP file covers a single volcanic plain in the Moungo département, not a country.

*Voatsiperifery* was variously transcribed *voa-tsipérifère*, *voatsi-péri-féry*, or simply "wild Madagascar pepper." None of these is the name on the producer invoice.

*Tellicherry* was paraphrased as "large-berry Indian pepper." This erases the grade specification — Tellicherry is a sieve calibre of 4.75 mm and above, not a synonym for "large."

*Espelette* was occasionally translated as "Basque chilli" in English drafts. Espelette is a village, an AOC since 2000, and a specific variety of *Capsicum annuum*. The word "Basque" is too broad by an order of magnitude.

*Sansho* — the Japanese pepper-relative used in *unagi* glazes — was rendered as "Japanese Sichuan pepper" by more than one model. It belongs to a different genus and produces a different alkaloid profile.

Each of these is a five-second fix once a human notices. None of them surface from automated quality scoring. The corpus was large enough that without a glossary mechanism the errors would have outpaced our manual review capacity within weeks.

The glossary that made it possible

What unlocked the workflow was a bilingual glossary file — a flat list of proper nouns, IGP names, cultivars, regions, and producer names that should never be translated, paraphrased, or "improved." Roughly 800 entries by the time the atlas reached 101 ingredients.

We pass this glossary to the model as part of the system context. Each translation pass begins with: "the following terms are proper nouns and must appear verbatim in the output." Then the list. Then the source text.

The discipline this imposes on the model is substantial. The translation becomes a structured task rather than a creative one. The model's freedom is limited to the prose between the protected terms. The boundary holds well in Claude, holds less well in GPT, and holds adequately in DeepL once integrated through its Pro glossary API.

The glossary also doubles as a quality-assurance checklist for our human reviewers. A translated paragraph that contains a paraphrase of a glossary term has, by definition, failed. The check is mechanical.

When the right answer is to keep the original word

Some terms do not have a target-language equivalent because no equivalent exists. *Voatsiperifery* in English, French, German, and Spanish remains *voatsiperifery*. *Sansho*, in all four languages, is *sansho*. *Berbere* is *berbere*. The atlas treats these as loanwords, italicised on first mention, and explained in the surrounding text rather than substituted.

This is a deliberate editorial choice that runs against the default behaviour of most translation tooling, which prefers to find a target-language word. The reason is precision. The names carry centuries of agricultural and culinary specificity. A reader who learns *voatsiperifery* learns something they can search, source, and order. A reader who reads "wild Madagascar pepper" learns nothing useful.

What still slips through

The errors we have not yet found a clean automated guard against fall into three classes.

The first is silent gender drift in French and German. A translation correctly preserves a proper noun but assigns the wrong article or adjective agreement. *Le voatsiperifery* is correct; *la voatsiperifery* is not. These slips pass under every model's own confidence check.

The second is the regional name that exists in two spellings: *Espelette* and *Ezpeleta* (Basque), *Kampot* and *Kâmpôt* (with the Khmer diacritic), *Cinnamomum verum* and *Cinnamomum zeylanicum* (older binomial). Choosing the wrong variant is not a translation error per se, but it changes how findable the page is in each language's search index.

The third is metaphor. When a French source talks about a spice's *tenue en bouche* — its persistence on the palate — the literal English translation "hold in the mouth" sounds clinical. The right phrasing depends on the surrounding paragraph. No model we have tested handles this reliably. These passages still require human rewriting, sentence by sentence.

A multilingual atlas is not finished when the last sentence is translated. It is finished when a reader in Lyon, Hamburg, Madrid, and Boston can land on the same page, find the same names, the same producers, the same IGP files, and trust that they are looking at the same product. Translation, done with the right tools and the right discipline, makes that trust possible. Done badly, it quietly dissolves it.

Sources