On April 18, xAI launches two audio APIs — speech recognition (Speech to Text) and speech synthesis (Text to Speech) — with pricing that undercuts all established competitors. Anthropic makes Claude directly accessible in Microsoft Word for Pro, Max, Team, and Enterprise subscribers. Midjourney rolls out V8.1 with native 2K rendering, three times faster and three times cheaper than V8. Meanwhile: Luma and Wonder Project launch the Innovative Dreams studio backed by AWS, MiniMax partners with NousResearch for MaxHermes, Kimi publishes a cross-datacenter inference architecture, and Google enriches Chrome with Gemini Skills.
Grok STT and TTS — the cheapest audio APIs on the market
April 17 — xAI simultaneously launches two standalone audio APIs: a speech recognition API (Speech to Text, STT) and a speech synthesis API (Text to Speech, TTS). The pricing is straightforward: both APIs have the lowest prices in their respective segments.
STT API (speech recognition)
Grok’s STT API offers two modes: batch REST and streaming WebSocket. The prices are $0.10/hour (batch) and $0.20/hour (streaming), compared with $0.22 and $0.39 at ElevenLabs, $0.21 and $0.45 at AssemblyAI, and $0.31 and $0.55 at Deepgram.
| Competitor | Batch (REST) | Streaming (WebSocket) |
|---|---|---|
| Grok | $0.10/h | $0.20/h |
| ElevenLabs | $0.22/h | $0.39/h |
| AssemblyAI | $0.21/h | $0.45/h |
| Deepgram | $0.31/h | $0.55/h |
On quality, Grok STT’s overall Word Error Rate is 6.9%, compared with 9.0% for ElevenLabs, 11.0% for Deepgram, and 12.9% for AssemblyAI. Grok STT covers 25+ languages with word-level timestamps, multi-speaker speaker diarization, multichannel support, and inverse text normalization (converting numbers and dates from speech).
TTS API (speech synthesis)
Grok’s TTS API is priced at $4.20 per million characters, while OpenAI charges $30, InWorld $40, Cartesia $46.70, and ElevenLabs $50. The API supports REST and streaming WebSocket. It introduces expressive tags: [laugh], [sigh], [whisper], <emphasis>, <slow>, <pause> — for controlling the tone and rhythm of the synthesis.
| Competitor | Price / million characters |
|---|---|
| Grok | $4.20 |
| OpenAI | $30.00 |
| InWorld | $40.00 |
| Cartesia | $46.70 |
| ElevenLabs | $50.00 |
xAI announces the launch of Grok speech to text and text to speech APIs. Grok STT has the world’s lowest word error rate and price. Grok TTS has the world’s most expressive voice and lowest price.
🇫🇷 xAI announces the launch of Grok speech to text and text to speech APIs. Grok STT has the world’s lowest word error rate and price. Grok TTS has the world’s most expressive voice and lowest price. — @xai on X
🔗 xAI announcement 🔗 @xai tweet
Claude for Word — Microsoft’s extension in beta
April 17 — Anthropic launches Claude for Word in beta for Pro, Max, Team, and Enterprise subscribers. The extension integrates directly into the Microsoft Word interface — with no separate window — and works at the document level.
| Feature | Description |
|---|---|
| Native tracked changes | All of Claude’s edits appear as Word revisions that can be accepted/rejected |
| Comment handling | Claude reads comments, edits anchored text, and replies in the thread |
| Formatting preservation | Inherits heading styles, numbering, and defined terms |
| Cross-context | Shares context with Excel and PowerPoint add-ins in the same conversation |
| Enterprise security | Sign in via Claude account or existing cloud provider |
Supported formats are .docx and .docm. The extension is installed via the Microsoft Marketplace under the identifier WA200010453.
🔗 claude.com/claude-for-word 🔗 @claudeai tweet
Midjourney V8.1 — native 2K rendering, 3× faster
April 14 — Midjourney has released version V8.1 of its image generator. This update brings native 2K HD rendering with a generation speed three times higher than V8, for a cost three times lower.
V8.1 is a significant refinement of the V8 engine: the resolution goes straight to 2K without post upscaling, improving fine-detail fidelity and reducing the usual artifacts from enlargement steps. The speed/price/resolution combination positions V8.1 as the most accessible version in the V8 lineup.
Luma × Wonder Project — the Innovative Dreams studio, backed by AWS
April 16 — Luma AI and Wonder Project (faith & values production studio, Prime Video partner) jointly announce the launch of Innovative Dreams — a new film production company, R&D lab, and VFX company, backed and funded by Amazon Web Services (AWS).
Innovative Dreams is presented as the first studio to deploy Realtime Hybrid Filmmaking at scale — an approach that combines performance capture, virtual production, and generative AI (notably Luma Agents) across all production stages: concept, pre-visualization, filming, and post-production.
| Aspect | Detail |
|---|---|
| CEO | Jon Erwin (Wonder Project founder) |
| CTO / Luma | Amit Jain (CEO of Luma AI) |
| Infrastructure | AWS cloud + AI for R&D and virtual production tools |
| Technology | Luma Agents + Realtime Hybrid Filmmaking |
| Location | MBS Media Campus, Manhattan Beach, California |
| First project | ”The Old Stories: Moses” (3 episodes) with Ben Kingsley and O-T Fagbenle, for Prime Video |
The “Realtime Hybrid Filmmaking” approach removes the traditional delays between shooting, rendering, and editing. Actors can react to digital environments in real time, shortening the distance between creative idea and final pixel while preserving human performance. Innovative Dreams also offers its tools to other Hollywood studios.
🔗 Luma announcement 🔗 @LumaLabsAI tweet
MiniMax M2.7 × NousResearch — MaxHermes, Hermes Agent without setup
April 16 — MiniMax announces a deep partnership with NousResearch to integrate the M2.7 model into the Hermes Agent harness. The announcement introduces MaxHermes — a managed cloud version of Hermes Agent accessible directly from @MiniMaxAgent, with no terminal setup or local installation.
The M2.7 × Hermes Agent co-evolution aims at higher-class agents: Hermes’s self-improving loop makes the most of the M2.7 model for agentic tasks. Users running Hermes locally can also connect their agent to MaxHermes to benefit from the managed cloud infrastructure.
Gemini Skills in Chrome — your prompts in one click
April 14 — Google Chrome integrates a new feature called “Skills” for Gemini in the browser. You can now save your most useful prompts and rerun them with a single click, without retyping. A library of predefined prompts is also available to get started quickly.
The feature was announced on April 14 and confirmed available on April 15, 2026, then included in the April 17 weekly @GoogleAI recap.
🔗 @googlechrome tweet (Apr 14) 🔗 @googlechrome tweet (Apr 15)
Gemini API — prepayment (Prepay Billing) in Google AI Studio
April 15 — Google AI Studio introduces “Prepay Billing” for the Gemini API. Developers can now buy credits in advance and consume them as they go, eliminating end-of-month billing surprises.
Auto-reload is available when the balance is low. The feature is compatible with Spend Caps (launched previously) and Usage Tiers. It is available in the United States for new Google Cloud billing accounts, with global rollout in the coming weeks. Established accounts with high usage tiers will be able to switch to postpaid.
Kimi Prefill-as-a-Service — cross-datacenter inference
April 18 — Moonshot AI (Kimi) publishes a technical advance in inference infrastructure: Prefill-as-a-Service (PraaS). The architecture pushes Prefill/Decode prefill/decode disaggregation beyond a single cluster, toward a cross-datacenter architecture with heterogeneous hardware.
Reported results: 1.54× more throughput and -64% on P90 TTFT (time to first token). The key technology is the hybrid Kimi Linear model, which reduces the cost of transferring the KV key-value cache between datacenters. This is not a consumer launch but a research publication on distributed inference infrastructure, with a direct impact on reducing cost per token for Kimi.
🔗 @Kimi_Moonshot tweet 🔗 arXiv paper
Claude Code v2.1.114 and Runway Seedance 2.0 API
April 18 — Claude Code v2.1.114 fixes a crash that occurred when a member of an agent team requested access to a tool via the permissions dialog.
April 16 — Runway makes Seedance 2.0 available via the Runway API for developers. After the web launch (April 9), 1080p rendering (April 16), and the iOS app (April 17), API access completes the model’s multi-channel rollout. Documentation is available at dev.runwayml.com.
🔗 Claude Code CHANGELOG 🔗 @runwayml tweet — Seedance API
What this means
The simultaneous launch of Grok’s STT and TTS APIs is the most aggressive pricing move of the week. By cutting prices by 2 to 10 times compared with ElevenLabs, AssemblyAI, and OpenAI TTS, xAI is clearly signaling that AI audio is becoming a commodity — which will accelerate adoption for independent developers and startups, but squeeze the margins of established players. The combination of one of the lowest recognition error rates on the market, bargain prices, and expressive tags makes these APIs immediately production-ready.
Claude for Word and Gemini Skills in Chrome reflect two different strategies: Anthropic embeds its model into existing office productivity tools, where its users already spend their days; Google, meanwhile, enriches its browser to make Gemini indispensable in everyday life. Both approaches aim to reduce friction in accessing the model.
Luma × Wonder Project × AWS illustrates the emergence of a new Hollywood studio model: generative AI integrated into every production stage, AWS cloud infrastructure, and the ambition to “localize” in Los Angeles productions that would otherwise be outsourced. The announcement is as symbolic as it is technical — it validates Realtime Hybrid Filmmaking as an industrializable pipeline, not just a concept.
Sources
- xAI announcement — Grok STT and TTS APIs
- @xai tweet — Grok STT and TTS
- @claudeai tweet — Claude for Word
- claude.com/claude-for-word
- Luma AI announcement — Innovative Dreams
- @LumaLabsAI tweet — Innovative Dreams
- @MiniMax_AI tweet — M2.7 × NousResearch
- @googlechrome tweet — Gemini Skills (Apr 14)
- @googlechrome tweet — Gemini Skills (Apr 15)
- @GoogleAIStudio tweet — Prepay Billing
- @Kimi_Moonshot tweet — PraaS
- arXiv paper — Kimi PraaS
- Claude Code CHANGELOG — v2.1.114
- @runwayml tweet — Seedance 2.0 API
This document was translated from the fr version into the en language using the gpt-5.4-mini model. For more information about the translation process, see https://gitlab.com/jls42/ai-powered-markdown-translator