Busy Week for AI Agents
From January 21 to 23, 2026, several major announcements regarding coding agents and infrastructure. Anthropic launches Claude in Excel and publishes three articles on multi-agent systems, OpenAI details the internal architecture of Codex and its PostgreSQL infrastructure, Qwen open-sources its text-to-speech model, and Runway adds Image to Video to Gen-4.5.
Anthropic: Claude in Excel and Claude Code
Claude in Excel
January 23 — Claude is now available in Microsoft Excel in beta. The integration allows analyzing complete Excel workbooks with their nested formulas and dependencies between tabs.
Features:
- Understanding of the entire workbook (formulas, multi-tab dependencies)
- Explanations with cell-level citations
- Updating assumptions while preserving formulas
Available for Claude Pro, Max, Team, and Enterprise subscribers.
Claude Code v2.1.19: Tasks System
January 23 — Version 2.1.19 introduces Tasks, a new task management system for complex multi-session projects.
We’re turning Todos into Tasks in Claude Code. Tasks are a new primitive that help Claude Code track and complete more complicated projects and collaborate on them across multiple sessions or subagents.
Tasks Features:
| Aspect | Detail |
|---|---|
| Storage | ~/.claude/tasks (files, allows building tools on top) |
| Collaboration | CLAUDE_CODE_TASK_LIST_ID=name claude to share between sessions |
| Dependencies | Tasks with dependencies and blockers stored in metadata |
| Broadcast | Update of a Task broadcasted to all sessions on the same Task List |
| Compatibility | Works with claude -p and AgentSDK |
What it’s for: On a complex project (multi-file refactoring, migration, long feature), Claude can break down the work into tasks, track what is done and what remains. Tasks are persisted on disk — they survive context compaction, session closing, and restart. Multiple sessions or subagents can collaborate on the same task list in real-time.
In practice: Claude creates tasks (TaskCreate), lists them (TaskList), and updates their status (TaskUpdate: pending → in_progress → completed). Example on an authentication refactoring:
#1 [completed] Migrate session storage to Redis
#2 [in_progress] Implement refresh token rotation
#3 [pending] Add OAuth integration tests
#4 [pending] Update API documentation
Tasks are stored in ~/.claude/tasks/ and can be shared between sessions via CLAUDE_CODE_TASK_LIST_ID.
Other new features v2.1.19:
- Shorthand
$0,$1for arguments in custom commands - VSCode session forking and rewind for everyone
- Skills without permissions run without approval
CLAUDE_CODE_ENABLE_TASKS=falseto temporarily disable
🔗 CHANGELOG Claude Code | Thread @trq212
Claude Code v2.1.18: Customizable Keybindings
Previous version adding the ability to configure keybindings by context and create chord sequences.
Command: /keybindings
⚠️ Note: This feature is currently in preview and is not available for all users.
Petri 2.0: Automated Alignment Audits
January 22 — Anthropic publishes Petri 2.0, an update to its automated behavioral audit tool for language models.
What it’s for: Petri tests if an LLM could behave mainly problematically — manipulation, deception, rule circumvention. The tool generates realistic scenarios and observes the model’s responses to detect unwanted behaviors before they occur in production.
| Improvement | Description |
|---|---|
| 70 new scenarios | Extended seed library to cover more edge cases |
| Eval-awareness mitigations | The model must not know it is being tested — otherwise it adapts its behavior. Petri 2.0 improves scenario realism to avoid this detection. |
| Frontier comparisons | Evaluation results for recent models (Claude, GPT, Gemini) |
Blog: When to Use (or Not) Multi-Agent Systems
January 23 — Anthropic publishes a pragmatic guide on multi-agent architectures. The main message: do not use multi-agent by default.
We’ve seen teams invest months building elaborate multi-agent architectures only to discover that improved prompting on a single agent achieved equivalent results.
The article identifies 3 cases where multi-agent truly brings value:
| Case | Problem | Multi-agent Solution |
|---|---|---|
| Context Pollution | An agent generates voluminous data of which only a summary is useful afterwards | A sub-agent retrieves 2000 tokens of history, returns just “order delivered” to the main agent |
| Parallelization | Multiple independent searches to be done | Launch 5 agents in parallel on 5 different sources instead of processing them sequentially |
| Specialization | Too many tools (20+) in a single agent degrades its ability to choose the right one | Separate into specialized agents: one for CRM, one for marketing, one for messaging |
The trap to avoid: Dividing by type of work (one agent plans, another implements, another tests). Each handover loses context and degrades quality. It is better for a single agent to handle a feature from end to end.
Real cost: 3-10x more tokens than a single agent for the same task.
Other articles in the series:
Building agents with Skills (Jan 22)
Instead of building agents specialized by domain, Anthropic proposes building skills: collections of files (workflows, scripts, best practices) that a generalist agent loads on demand.
Progressive disclosure in 3 levels:
| Level | Content | Size |
|---|---|---|
| 1 | Metadata (name, description) | ~50 tokens |
| 2 | Full SKILL.md file | ~500 tokens |
| 3 | Reference documentation | 2000+ tokens |
Each level is loaded only if necessary. Result: an agent can have hundreds of skills without saturating its context.
Eight trends 2026 (Jan 21)
Anthropic identifies 8 trends for software development in 2026.
Key message: Engineers are moving from writing code to coordinating agents that write code.
Important nuance: AI is used in ~60% of work, but only 0-20% can be fully delegated — human supervision remains essential.
| Company | Result |
|---|---|
| Rakuten | Claude Code on vLLM codebase (12.5M lines), 7h of autonomous work |
| TELUS | 30% faster, 500k hours saved |
| Zapier | 89% AI adoption, 800+ internal agents |
OpenAI: Codex Architecture and Infrastructure
Unrolling the Codex agent loop
January 23 — OpenAI opens the scenes of Codex CLI. First article of a series on the internal functioning of their software agent.
What we learn:
The agent loop is simple in theory: user sends a request → model generates a response or requests a tool → agent executes the tool → model resumes with the result → until a final response. In practice, the subtleties are in context management.
Prompt caching — the key to performance:
Each conversation turn adds content to the prompt. Without optimization, it is quadratic in sent tokens. Prompt caching allows reusing calculations from previous turns. Condition: the new prompt must be an exact prefix of the old one. OpenAI details the pitfalls that break the cache (changing MCP tools order, modifying config mid-conversation).
Automatic compaction:
When context exceeds a threshold, Codex calls /responses/compact which returns a compressed version of the conversation. The model keeps latent understanding via an opaque encrypted_content.
Zero Data Retention (ZDR):
For clients who do not want their data stored, encrypted_content allows preserving the model’s reasoning between turns without storing data server-side.
First article of a series — the next ones will cover CLI architecture, tool implementation, and sandboxing.
🔗 Unrolling the Codex agent loop | Codex GitHub
Scaling PostgreSQL: 800 million ChatGPT users
January 22 — OpenAI details how PostgreSQL powers ChatGPT and the API for 800 million users with millions of requests per second.
| Metric | Value |
|---|---|
| Users | 800 million |
| Throughput | Millions of QPS |
| Replicas | ~50 multi-region read replicas |
| p99 Latency | Double digit ms client-side |
| Availability | Five-nines (99.999%) |
Architecture:
- Single primary Azure PostgreSQL flexible server
- PgBouncer for connection pooling (connection latency: 50ms → 5ms)
- Write-heavy workloads migrated to Azure Cosmos DB
- Cache locking to protect against cache miss storms
- Cascading replication in testing to exceed 100 replicas
Only SEV-0 PostgreSQL in the last 12 months: during the viral launch of ChatGPT ImageGen (100M new users in one week, write traffic x10).
Qwen: Qwen3-TTS Open-Source
January 22-23 — Alibaba releases Qwen3-TTS in open-source under Apache 2.0 license.
| Feature | Detail |
|---|---|
| License | Apache 2.0 |
| Voice cloning | Yes |
| MLX-Audio support | Available |
Installation:
uv pip install -U mlx-audio --prerelease=allow
Runway: Gen-4.5 Image to Video
January 21 — Runway adds Image to Video functionality to Gen-4.5.
| Feature | Description |
|---|---|
| Image to Video | Transformation of an image into cinematic video |
| Camera control | Precise camera control |
| Coherent narratives | Coherent narratives over time |
| Character consistency | Characters that remain consistent |
Available for all Runway paid plans. Temporary promo: 15% discount.
What This Means
This week marks a maturation of coding agents tools. The two giants (Anthropic and OpenAI) publish detailed technical documentation on their agent architecture — a sign that the market is moving from the “demo” phase to the “production” phase.
On the infrastructure side, OpenAI’s PostgreSQL article shows that a single-primary architecture can hold up at the scale of hundreds of millions of users with the right optimizations.
The arrival of Claude in Excel opens a new front: AI integrated directly into daily productivity tools.