AI News Jan 23, 2026: Claude in Excel, Tasks Claude Code, Codex Agent Loop

Busy Week for AI Agents

From January 21 to 23, 2026, several major announcements regarding coding agents and infrastructure. Anthropic launches Claude in Excel and publishes three articles on multi-agent systems, OpenAI details the internal architecture of Codex and its PostgreSQL infrastructure, Qwen open-sources its text-to-speech model, and Runway adds Image to Video to Gen-4.5.

Anthropic: Claude in Excel and Claude Code

Claude in Excel

January 23 — Claude is now available in Microsoft Excel in beta. The integration allows analyzing complete Excel workbooks with their nested formulas and dependencies between tabs.

Features:

Understanding of the entire workbook (formulas, multi-tab dependencies)
Explanations with cell-level citations
Updating assumptions while preserving formulas

Available for Claude Pro, Max, Team, and Enterprise subscribers.

🔗 Claude in Excel

Claude Code v2.1.19: Tasks System

January 23 — Version 2.1.19 introduces Tasks, a new task management system for complex multi-session projects.

We’re turning Todos into Tasks in Claude Code. Tasks are a new primitive that help Claude Code track and complete more complicated projects and collaborate on them across multiple sessions or subagents.

— Thariq (@trq212), Claude Code team Anthropic

Tasks Features:

Aspect	Detail
Storage	`~/.claude/tasks` (files, allows building tools on top)
Collaboration	`CLAUDE_CODE_TASK_LIST_ID=name claude` to share between sessions
Dependencies	Tasks with dependencies and blockers stored in metadata
Broadcast	Update of a Task broadcasted to all sessions on the same Task List
Compatibility	Works with `claude -p` and AgentSDK

What it’s for: On a complex project (multi-file refactoring, migration, long feature), Claude can break down the work into tasks, track what is done and what remains. Tasks are persisted on disk — they survive context compaction, session closing, and restart. Multiple sessions or subagents can collaborate on the same task list in real-time.

In practice: Claude creates tasks (TaskCreate), lists them (TaskList), and updates their status (TaskUpdate: pending → in_progress → completed). Example on an authentication refactoring:

#1 [completed] Migrate session storage to Redis
#2 [in_progress] Implement refresh token rotation
#3 [pending] Add OAuth integration tests
#4 [pending] Update API documentation

Tasks are stored in ~/.claude/tasks/ and can be shared between sessions via CLAUDE_CODE_TASK_LIST_ID.

Other new features v2.1.19:

Shorthand $0, $1 for arguments in custom commands
VSCode session forking and rewind for everyone
Skills without permissions run without approval
CLAUDE_CODE_ENABLE_TASKS=false to temporarily disable

🔗 CHANGELOG Claude Code | Thread @trq212

Claude Code v2.1.18: Customizable Keybindings

Previous version adding the ability to configure keybindings by context and create chord sequences.

Command: /keybindings

⚠️ Note: This feature is currently in preview and is not available for all users.

🔗 Keybindings Documentation

Petri 2.0: Automated Alignment Audits

January 22 — Anthropic publishes Petri 2.0, an update to its automated behavioral audit tool for language models.

What it’s for: Petri tests if an LLM could behave mainly problematically — manipulation, deception, rule circumvention. The tool generates realistic scenarios and observes the model’s responses to detect unwanted behaviors before they occur in production.

Improvement	Description
70 new scenarios	Extended seed library to cover more edge cases
Eval-awareness mitigations	The model must not know it is being tested — otherwise it adapts its behavior. Petri 2.0 improves scenario realism to avoid this detection.
Frontier comparisons	Evaluation results for recent models (Claude, GPT, Gemini)

🔗 Petri 2.0 | GitHub

Blog: When to Use (or Not) Multi-Agent Systems

January 23 — Anthropic publishes a pragmatic guide on multi-agent architectures. The main message: do not use multi-agent by default.

We’ve seen teams invest months building elaborate multi-agent architectures only to discover that improved prompting on a single agent achieved equivalent results.

The article identifies 3 cases where multi-agent truly brings value:

Case	Problem	Multi-agent Solution
Context Pollution	An agent generates voluminous data of which only a summary is useful afterwards	A sub-agent retrieves 2000 tokens of history, returns just “order delivered” to the main agent
Parallelization	Multiple independent searches to be done	Launch 5 agents in parallel on 5 different sources instead of processing them sequentially
Specialization	Too many tools (20+) in a single agent degrades its ability to choose the right one	Separate into specialized agents: one for CRM, one for marketing, one for messaging

The trap to avoid: Dividing by type of work (one agent plans, another implements, another tests). Each handover loses context and degrades quality. It is better for a single agent to handle a feature from end to end.

Real cost: 3-10x more tokens than a single agent for the same task.

Other articles in the series:

Building agents with Skills (Jan 22)

Instead of building agents specialized by domain, Anthropic proposes building skills: collections of files (workflows, scripts, best practices) that a generalist agent loads on demand.

Progressive disclosure in 3 levels:

Level	Content	Size
1	Metadata (name, description)	~50 tokens
2	Full SKILL.md file	~500 tokens
3	Reference documentation	2000+ tokens

Each level is loaded only if necessary. Result: an agent can have hundreds of skills without saturating its context.

🔗 Building agents with Skills

Eight trends 2026 (Jan 21)

Anthropic identifies 8 trends for software development in 2026.

Key message: Engineers are moving from writing code to coordinating agents that write code.

Important nuance: AI is used in ~60% of work, but only 0-20% can be fully delegated — human supervision remains essential.

Company	Result
Rakuten	Claude Code on vLLM codebase (12.5M lines), 7h of autonomous work
TELUS	30% faster, 500k hours saved
Zapier	89% AI adoption, 800+ internal agents

🔗 Eight trends 2026

OpenAI: Codex Architecture and Infrastructure

Unrolling the Codex agent loop

January 23 — OpenAI opens the scenes of Codex CLI. First article of a series on the internal functioning of their software agent.

What we learn:

The agent loop is simple in theory: user sends a request → model generates a response or requests a tool → agent executes the tool → model resumes with the result → until a final response. In practice, the subtleties are in context management.

Prompt caching — the key to performance:

Each conversation turn adds content to the prompt. Without optimization, it is quadratic in sent tokens. Prompt caching allows reusing calculations from previous turns. Condition: the new prompt must be an exact prefix of the old one. OpenAI details the pitfalls that break the cache (changing MCP tools order, modifying config mid-conversation).

Automatic compaction:

When context exceeds a threshold, Codex calls /responses/compact which returns a compressed version of the conversation. The model keeps latent understanding via an opaque encrypted_content.

Zero Data Retention (ZDR):

For clients who do not want their data stored, encrypted_content allows preserving the model’s reasoning between turns without storing data server-side.

First article of a series — the next ones will cover CLI architecture, tool implementation, and sandboxing.

🔗 Unrolling the Codex agent loop | Codex GitHub

Scaling PostgreSQL: 800 million ChatGPT users

January 22 — OpenAI details how PostgreSQL powers ChatGPT and the API for 800 million users with millions of requests per second.

Metric	Value
Users	800 million
Throughput	Millions of QPS
Replicas	~50 multi-region read replicas
p99 Latency	Double digit ms client-side
Availability	Five-nines (99.999%)

Architecture:

Single primary Azure PostgreSQL flexible server
PgBouncer for connection pooling (connection latency: 50ms → 5ms)
Write-heavy workloads migrated to Azure Cosmos DB
Cache locking to protect against cache miss storms
Cascading replication in testing to exceed 100 replicas

Only SEV-0 PostgreSQL in the last 12 months: during the viral launch of ChatGPT ImageGen (100M new users in one week, write traffic x10).

🔗 Scaling PostgreSQL

Qwen: Qwen3-TTS Open-Source

January 22-23 — Alibaba releases Qwen3-TTS in open-source under Apache 2.0 license.

Feature	Detail
License	Apache 2.0
Voice cloning	Yes
MLX-Audio support	Available

Installation:

uv pip install -U mlx-audio --prerelease=allow

🔗 Qwen3-TTS on X

Runway: Gen-4.5 Image to Video

January 21 — Runway adds Image to Video functionality to Gen-4.5.

Feature	Description
Image to Video	Transformation of an image into cinematic video
Camera control	Precise camera control
Coherent narratives	Coherent narratives over time
Character consistency	Characters that remain consistent

Available for all Runway paid plans. Temporary promo: 15% discount.

🔗 Runway on X

What This Means

This week marks a maturation of coding agents tools. The two giants (Anthropic and OpenAI) publish detailed technical documentation on their agent architecture — a sign that the market is moving from the “demo” phase to the “production” phase.

On the infrastructure side, OpenAI’s PostgreSQL article shows that a single-primary architecture can hold up at the scale of hundreds of millions of users with the right optimizations.

The arrival of Claude in Excel opens a new front: AI integrated directly into daily productivity tools.