Multilingual Automation: My AI-Powered Markdown Translator Script

View project → GitHub

I’m delighted to present my project AI-Powered Markdown Translator, an open-source Python script that automatically translates the Markdown files from my blog and some README/documentation files from my GitHub repositories. By integrating cutting-edge artificial intelligence models such as OpenAI, Mistral AI, Anthropic (Claude) and Google Gemini, this tool translates articles, README files and technical documentation into 14 languages while preserving their structure and formatting. This project highlights my skills in automation, AI integration and reliability engineering, as well as my passion for making technical content accessible to everyone.

This is not just a script: it’s proof of my expertise and my vision for a more inclusive digital world.

Why this project?

Markdown files are essential to my digital ecosystem: they contain my blog posts, tutorials and open-source documentation. By automating their translation, I make my content accessible to a global audience. My blog is now available in 14 languages thanks to this script — nearly 1,800 translated versions (roughly speaking, excluding FR sources) are online today on jls42.org, and the counter keeps climbing with every publication.

v1.9 (May 2026) marks a milestone: code developed by vibe coding (vibe coding) in AI pair-programming (Claude Code + Codex), secured by an industrial-grade quality stack (14 hooks, 229 tests, SonarCloud, AI-assisted PR review) to aim for clean code even when every line is not manually reread.

Here are the concrete examples of the script in action:

This jls42.org blog in 14 languages — the entire multilingual editorial experience (articles, projects, AI news) is produced by this script. For example, you can browse the German, Japanese, Chinese, Spanish or Arabic version of the site — every translated editorial piece has gone through it (the interface elements, on the other hand, come from Astro’s native i18n system).
The project’s own README is translated into 14 languages on GitHub. Examples: English, Spanish, Chinese.

This project shows how AI can solve practical problems while improving accessibility.

My skills in the spotlight

This project showcases my technical know-how. Here’s what it highlights:

Multi-model orchestration: Claude Code in Opus for development, Codex as fallback (fallback), GPT-5.5 reasoning extra-high to challenge the plans, /pr-review-toolkit for review before merge
Integration of multiple AI APIs: 4 connected providers (OpenAI, Mistral AI, Claude, Gemini), with adaptation to each API’s specifics (handling of finish_reason / stop_reason, response formats, token limits)
Reliability engineering: two-layer post-translation validation (deterministic anti-verbatim-leak + probabilistic langdetect), detection of silent failures (silent failures), explicit status returns
Industrial-grade quality stack: 14 automated hooks (ruff, mypy, shellcheck, Opengrep SAST, pip-audit, Lizard…), 229 unittest tests, 11 SonarCloud badges, plus Codacy and CodeFactor
Open-source mindset: available on GitHub, GPLv3, README translated into 14 languages

These aspects demonstrate my ability to create powerful, reliable and maintainable tools over the long term.

Main features

Here’s what the script offers:

Multi-Provider: support for 4 APIs (OpenAI, Mistral AI, Claude, Gemini)
2026 Models: GPT-5.5, Claude Sonnet 4.6, Gemini 3.1 Pro by default
Economy Mode (--eco): faster and cheaper models
Single File (--file): translate a single file instead of an entire directory
Name Preservation (--keep_filename): keeps the original name and extension (ideal for Astro, Hugo, etc.)
Support for .env: automatic loading of API keys from a .env file
Support for .mdx files: in addition to classic .md files
Formatting preservation: code blocks, inline code, links and metadata remain intact

New in v1.9 (May 2026):

Post-translation validation: automatic detection of silent failures (silent failures) — target language verified, truncations intercepted across all providers.
Multi-position note (--note_position, --note_format): top, bottom or both; legacy format (legacy) or marker format (marker format) compatible with GitHub embed card (embed card).
Enhanced --news mode: already introduced in v1.8 to protect EN source quotes with placeholders, this mode gains hardened post-restore validation in v1.9 (residual placeholder = error, original quote and attribution URL verified, target/source flags checked) — used on all blog ia-actualites posts.

Provider	Quality (default)	Economy (`--eco`)
OpenAI	`gpt-5.5`	`gpt-5.4-mini`
Claude	`claude-sonnet-4-6`	`claude-haiku-4-5-20251001`
Mistral	`mistral-large-latest`	`mistral-small-latest`
Gemini	`gemini-3.1-pro-preview`	`gemini-3.1-flash-lite-preview`

Evolution v1.0 → v1.9

Version	Date	Main contribution
1.0–1.4	2024	OpenAI, then Mistral, then Claude
1.5	Sept. 2024	Client refactor, 2024 models (gpt-4o, claude-3.5-sonnet)
1.6	Jan. 2026	2026 models (gpt-5, claude-sonnet-4-5, gemini-3-pro), Gemini, `--eco` mode, single file (`--file`)
1.7	Jan. 2026	`--keep_filename`, `.env`, preserved inline code
1.8	Mar. 2026	GPT-5.4 models by default, `--news` mode with citation placeholders
1.9	May 2026	Post-translation validation, multi-position note, quality stack 14 hooks + 229 tests + AI review

Vibe coding + guardrails

The whole of v1.9 was written in AI pair-programming. My workflow: Claude Code (Opus, exclusively) writes the code, Codex takes over when Opus gets blocked or the usage window is saturated, GPT-5.5 (reasoning extra-high) challenges the plans before execution, and skill /pr-review-toolkit:review-pr rereads the PR before every merge. I do not reread the code myself. To make this development mode viable in production, I invested in a proportionate safety-net stack:

14 hooks automated (pre-commit + pre-push): shellcheck, ruff, prettier, detect-secrets, Lizard CCN, mypy, Opengrep SAST, pip-audit, unittest
229 tests unittest (~98% coverage on the new v1.9 code)
Practical tests: multi-repo on varied READMEs, internal product dogfooding on the blog (production = live test), visual rendering verification (browser or Markdown preview)
3 external platforms: SonarCloud (11 badges), Codacy, CodeFactor
Skill /pr-review-toolkit:review-pr: AI-assisted multi-agent review before merge
Two-layer post-translation validation: deterministic (anti-verbatim-leak) + probabilistic (langdetect)

The point is not to prove that we know how to do classic engineering. It’s that we have no choice: AI code that is not reread deserves more guardrails, not fewer. This discipline is detailed in the technical deep-dive.

In production on this blog

The project translates itself: its README is in 14 languages, and it generates all the multilingual versions of this blog.

Blog posts, 4 projects and 98 AI-news articles represent nearly 1,800 translated versions excluding FR sources (coverage varies by language depending on the content)
--news mode used systematically on ia-actualites articles to preserve source EN quotes
v1.9 safeguard active since May 2026: since introducing double post-translation validation, I have no longer detected any target-language silent failure
Meta-consistency: the page you read in English, German, Japanese… is translated by this script

To go further

To understand how this v1.9 was produced (the new features in detail, the multi-model workflow, the guardrails put in place to aim for clean code without rereading), see the full technical deep-dive.

And to compare the tone with an earlier release, the 2024 article on v1.5 follows a more classic release-notes format.

Try it yourself

Discover the project on GitHub, test it with your Markdown files, and share your feedback. Your ideas help me improve it!

Contact: contact@jls42.org