OpenAI launches GPT-5.5, Anthropic opens memory to Managed Agents, Kimi K2.6 Agent Swarm

April 23, 2026 marks a packed day: OpenAI launches GPT-5.5 with 85% on ARC-AGI-2 and an API price of $5/M tokens in input, while Anthropic opens persistent memory in beta for its Managed Agents and publishes a post-mortem on Claude Code. At the same time, GitHub Copilot delivers seven updates in three days, Kimi K2.6 deploys a swarm of 300 sub-agents, and SpaceX seals a coding partnership with Cursor.

GPT-5.5: OpenAI’s frontier model

April 23 — OpenAI launches GPT-5.5, its most powerful model to date, designed for real work and agents. It significantly improves agentic coding, computer use, knowledge work, and scientific research, while preserving GPT-5.4 latency.

Availability and pricing

GPT-5.5 is available immediately for ChatGPT Plus, Pro, Business, and Enterprise subscribers, as well as in Codex. API access is coming “very soon”.

Offering	API Access	Input	Output
GPT-5.5 standard	Soon	$5 / M tokens	$30 / M tokens
GPT-5.5 Pro	Soon	$30 / M tokens	$180 / M tokens

The context window in Codex reaches 400K tokens. A Fast mode — 1.5× faster, 2.5× the cost — is available.

Benchmarks

Evaluation	GPT-5.5	GPT-5.4	Claude Opus 4.7	Gemini 3.1 Pro
Terminal-Bench 2.0	82.7%	75.1%	69.4%	68.5%
Expert-SWE (internal)	73.1%	68.5%	—	—
SWE-Bench Pro	58.6%	57.7%	64.3%	54.2%
GDPval	84.9%	83.0%	80.3%	67.3%
OSWorld-Verified	78.7%	75.0%	78.0%	—
ARC-AGI-2	85.0%	73.3%	75.8%	77.1%
FrontierMath Tier 4	35.4%	27.1%	22.9%	16.7%
CyberGym	81.8%	79.0%	73.1%	—
BixBench (bioinformatics)	80.5%	74.0%	—	—

GPT-5.5 leads on most benchmarks, with one notable exception: SWE-Bench Pro, where Claude Opus 4.7 keeps the edge (64.3% vs 58.6%).

Infrastructure and safety

The model was co-designed with NVIDIA GB200/GB300 NVL72. Codex used GPT-5.5 to optimize its own infrastructure, gaining +20% token generation speed. On the cybersecurity side, GPT-5.5 is classified High in OpenAI’s Preparedness Framework (not Critical); the Trusted Access Cyber program is extended to it.

Scientific research

Beyond code, GPT-5.5 helped prove a new theorem on Ramsey numbers (combinatorics), formally verified in Lean. It also analyzed a genomic dataset of 62 samples and 28,000 genes in a few minutes — a task that would have taken months for a team of researchers.

« GPT-5.5 is noticeably smarter and more persistent than GPT-5.4, with stronger coding performance and more reliable tool use. »

GPT-5.5 is noticeably smarter and more persistent than GPT-5.4, with stronger coding performance and more reliable tool use. — Michael Truell, co-founder and CEO of Cursor

🔗 GPT-5.5 announcement

The wave of persistent agents

Three major announcements converge on April 23 around the persistent agent, capable of acting autonomously over long periods and retaining context from one session to the next.

OpenAI Workspace Agents in ChatGPT

April 22 — OpenAI introduces Workspace Agents: shared agents that a team creates once, uses together in ChatGPT or Slack, and improves over time. Powered by Codex in the cloud, they can carry out complex tasks even when the user is offline. Workspace Agents are gradually replacing GPTs, which remain available during the transition.

Agent type	Function
Software verifier	Reviews requests, compares policies, creates IT tickets
Product feedback router	Monitors Slack/support/forums → prioritized tickets
Report generator	Extracts data on Friday, creates charts, summary
Prospecting agent	Searches leads, scores them, drafts emails, updates CRM
Third-party risk manager	Evaluates vendors, produces structured report

Available in research preview for Business, Enterprise, Edu, and Teachers; free until May 6, 2026, then billed in credits.

According to Ankur Bhatt (AI Engineering, Rippling), what used to take salespeople 5 to 6 hours per week now runs automatically in the background on every opportunity.

🔗 Workspace Agents

Anthropic — Memory for Claude Managed Agents

April 23 — Memory for Claude Managed Agents is available in public beta on the Claude Platform. Agents can now learn from one session to the next thanks to a memory layer mounted directly on a file system: the agents use the same bash and code execution capabilities they already employ for agentic tasks.

Feature	Detail
Shareable stores	Multiple agents, different access scopes (read-only / read-write)
Concurrent access	No overwriting between parallel sessions
Audit log	Which session, which agent, which memory
Rollback	To any previous version
Exportability	Memories manageable via the API

Customer results illustrate the concrete impact:

Customer	Result
Rakuten	-97% first-pass errors, -27% cost, -34% latency
Wisedocs	+30% document review speed
Netflix	Context continuity across sessions without manual updates
Ando	Platform memory without dedicated infrastructure

Memory in Claude Managed Agents lets us put continuous learning into production at scale. Our agents distill lessons from every session, delivering 97% fewer first-pass errors at 27% lower cost and 34% lower latency.

Memory in Claude Managed Agents lets us put continuous learning into production at scale. Our agents distill lessons from every session, delivering 97% fewer first-pass errors at 27% lower cost and 34% lower latency. — Yusuke Kaji, General Manager AI for Business, Rakuten

🔗 Managed Agents memory

Claude Code: quality post-mortem and two new versions

Post-mortem and reset of limits

April 23 — The Claude Code team published a post-mortem on three quality issues reported over the past month. All are fixed in v2.1.116+. Usage limits have been reset for all subscribers.

Over the past month, some of you reported Claude Code’s quality had slipped. We investigated, and published a post-mortem on the three issues we found. All are fixed in v2.1.116+ and we’ve reset usage limits for all subscribers.

Over the past month, some of you reported Claude Code’s quality had slipped. We investigated, and published a post-mortem on the three issues we found. All are fixed in v2.1.116+ and we’ve reset usage limits for all subscribers. — @ClaudeDevs

v2.1.117 and v2.1.118

Version	Main features
v2.1.118	Visual Vim mode (`v`/`V`) with selection and operators; unified `/usage` (merges `/cost` and `/stats`); custom themes in `/theme`; hooks invoking MCP tools via `type: "mcp_tool"`; strict `DISABLE_UPDATES`; Windows managed settings inheritance via WSL
v2.1.117	Default effort moved to `high` for Pro/Max on Opus 4.6 and Sonnet 4.6 (was `medium`); sub-agent fork enabled on external builds; `glob`/`Grep` replaced by embedded `bfs`/`ugrep` for faster research; Opus 4.7 session fix (1M context calculated correctly); Bedrock+Opus 4.7 fix with thinking disabled

🔗 Claude Code CHANGELOG

New Claude connectors for everyday life

April 23 — Anthropic expands its connector catalog to consumer apps. Since July 2025, more than 200 connectors for professional tools were available; this update adds 15 everyday services.

Application	Category
AllTrails	Hiking
Audible	Audiobooks
Booking.com	Travel
Instacart	Online grocery
Intuit Credit Karma	Finance
Intuit TurboTax	Tax
Resy	Restaurant reservations
Spotify	Music
StubHub	Ticketing
Taskrabbit	Home services
Thumbtack	Local professionals
TripAdvisor	Travel
Uber	Transportation
Uber Eats	Food delivery
Viator	Tourist activities

Claude now automatically suggests relevant connectors based on conversation context. Available on all plans (free included), web, desktop, and mobile (mobile in beta). No paid placement or sponsored response; data from an app is not used to train the models.

🔗 Everyday life connectors

GitHub Copilot — Seven updates in three days

GitHub Copilot published seven changelog entries between April 22 and April 23.

Chat for pull requests (3 new capabilities)

April 23 — Copilot Chat now integrates three capabilities for pull requests, accessible via github.com/copilot or the Copilot button on diffs (public preview):

PR understanding (pull request understanding): comments, changes, commits, and reviews included as context
PR review: structured review on demand
PR summary: concise summary of changes

🔗 Copilot Chat PR improvements

Controllable agent sessions from issues and projects

April 23 — The cloud agent can now be controlled directly from GitHub issues and project boards: session indicator in the issue header, side progress panel, sessions enabled by default in all project views.

🔗 Agent sessions from issues

Structured debugging of stack traces on the web

April 23 — Copilot Chat on github.com now guides stack trace analysis in six structured steps: what failed, why, root cause, evidence from code, confidence level, and next checks.

🔗 Stack trace debugging

BYOK VS Code available (GA)

April 22 — Bring Your Own Key is generally available for Copilot Business and Enterprise users in VS Code. Anthropic, Gemini, OpenAI, OpenRouter, and Azure are supported, as well as local models via Ollama and Foundry Local. Billing is direct through the chosen provider, outside Copilot quotas.

🔗 BYOK VS Code GA

C++ Language Server in public preview for Copilot CLI

April 22 — The Microsoft C++ Language Server (Visual Studio/VS Code IntelliSense engine) is available in public preview for Copilot CLI. It provides precise semantic data (symbol definitions, references, call hierarchies, types) instead of iterative grep search. Requirements: Copilot CLI authentication + compile_commands.json.

🔗 C++ Language Server

New Business self-serve signups paused

April 22 — GitHub is pausing new self-serve signups for Copilot Business on GitHub Free and GitHub Team plans. Existing customers are not affected.

🔗 Pause Business self-serve

`used_copilot_cloud_agent` field in API metrics

April 23 — Following the “coding agent” → “cloud agent” rebrand, the metrics API adds the used_copilot_cloud_agent field in user reports (1-day and 28-day rolling windows). The old used_copilot_coding_agent field is maintained until August 1, 2026.

🔗 Cloud agent metrics

Gemini CLI v0.39.0 and Deep Think for all Ultra subscribers

Gemini CLI v0.39.0

April 23 — Google releases Gemini CLI v0.39.0, a stable version labeled “Latest”. The highlight is the new /memory inbox command to review and validate skills automatically extracted by the CLI during work sessions.

Feature	Description
`/memory inbox`	Review of automatically extracted skills
Unified `invoke_subagent`	Refactored sub-agent tool into a single interface
Compact formatting	Better readability in compact mode
Plan Mode — confirmations	Validation required before skill activation
Lightweight startup	Lightweight parent process for faster startup
JSONL streaming migration	Recording chat sessions in JSONL

Added keyboard shortcuts: Ctrl+Backspace for word-by-word deletion (Windows Terminal), Ctrl+Shift+G.

🔗 Gemini CLI v0.39.0

Deep Think open to all Ultra subscribers

April 22 — Google opens Deep Think mode (deep reasoning, extended thinking) to all Gemini Ultra subscribers. This mode was previously in limited access; it is now available directly from the Gemini app tools menu (web and mobile).

🔗 Tweet @GeminiApp

Kimi K2.6: swarm of 300 sub-agents and open-weights benchmarks

Agent Swarm — 300 parallel sub-agents

April 23 — Moonshot AI launches Kimi K2.6 Agent Swarm: a system capable of deploying 300 sub-agents in parallel over 4,000 steps per run, compared with 100 agents and 1,500 steps for K2.5.

Capability	K2.5	K2.6
Parallel sub-agents	100	300
Steps per run	1,500	4,000
Output types	Chat text	100+ real files, 100,000-word reviews, 20,000-line datasets

The sub-agents combine heterogeneous skills: web search, data analysis, coding, long-form writing, and visual generation. Available on kimi.com/agent-swarm.

🔗 Tweet @Kimi_Moonshot

Benchmarks: #1 open-weights

April 23 — Kimi K2.6 reaches the top spot among open-weights models on two benchmarks: - Design Arena: same performance band as Claude Opus 4.7

MathArena open (Think mode): ahead of GLM 5.1

🔗 Design Arena

SpaceXAI × Cursor and Grok Imagine

SpaceXAI × Cursor partnership

April 22 — SpaceXAI (an entity resulting from the merger of xAI/SpaceX) and Cursor announce a partnership to create “the world’s most capable coding and knowledge-work AI”. SpaceX provides the Colossus supercomputer (equivalent to one million H100s); Cursor grants it the right to acquire the company later in 2026 for $60 billion, or to pay $10 billion for the collaboration alone.

🔗 Tweet @SpaceX

Grok Imagine — Shareable custom templates

April 22 — SuperGrok and Premium+ subscribers can now create custom templates in Grok Imagine and share them publicly.

🔗 Tweet @imagine

NVIDIA × Google Cloud Next

April 22 — At Google Cloud Next (Las Vegas), NVIDIA and Google Cloud announce several major advances around agentic AI infrastructure.

Announcement	Detail
A5X instances (Vera Rubin NVL72)	Up to 960,000 Rubin GPUs in a multi-site cluster, 10× cheaper per token, 10× more throughput per megawatt
Gemini on Google Distributed Cloud	Preview with Blackwell and Blackwell Ultra GPUs — data sovereignty
Confidential VMs Blackwell	First Blackwell confidential computing offer in the public cloud
Nemotron 3 Super	Available on the Gemini Enterprise Agent Platform
NeMo RL API	Managed large-scale Reinforcement Learning

🔗 NVIDIA × Google Cloud Blog

Kling AI Video 3.0 — Native 4K mode

April 23 — Kling AI launches native 4K mode in its Video 3.0 series. 4K generation happens in a single click, with no extra upscaling step. Visual consistency (characters, text, styles, lighting) is preserved at native resolution for high-end production. Also available via fal.ai for enterprises.

Kling AI is simultaneously running a 4K Short Film Creative Contest, a global competition inviting creators to submit short films made with the new mode.

🔗 Tweet @Kling_ai

ChatGPT for Clinicians and OpenAI Privacy Filter

ChatGPT for Clinicians + HealthBench Professional

April 22 — OpenAI launches ChatGPT for Clinicians, a free version for verified U.S. healthcare professionals (physicians, nurse practitioners, physician assistants, pharmacists). The service includes access to frontier models for complex clinical questions, skills for repetitive workflows (referral letters, prior authorizations), cited clinical search in real time, and automatic generation of continuing medical education (CME) credits. HIPAA processing is available as an option via agreement.

OpenAI also releases HealthBench Professional, an open benchmark evaluating AI on real clinical tasks (700,000+ physician-evaluated answers). GPT-5.4 in ChatGPT for Clinicians outperforms human doctors on this benchmark under unlimited-time conditions with web access.

🔗 ChatGPT for Clinicians

OpenAI Privacy Filter

April 22 — OpenAI releases Privacy Filter, an open-weight model (Apache 2.0) to detect and mask personally identifiable information (PII) in text. The model runs locally (no data sent to a server), supports 128K tokens of context, and reaches an F1 score of 97.43% on the PII-Masking-300k benchmark.

Feature	Value
Architecture	Bidirectional token classifier (constrained Viterbi decoding)
Size	1.5B total parameters, 50M active
Context	128,000 tokens
License	Apache 2.0 (Hugging Face + GitHub)
F1	97.43% on corrected PII-Masking-300k

PII categories covered: private_person, private_address, private_email, private_phone, private_url, private_date, account_number, secret (passwords and API keys).

🔗 OpenAI Privacy Filter

Perplexity and Cohere

Perplexity integrates Kimi K2.6

April 23 — Moonshot AI’s Kimi K2.6 is now available to all Perplexity Pro and Max subscribers.

🔗 Tweet @perplexity_ai

Cohere — production-ready W4A8 in vLLM

April 22 — Cohere announces integration of its W4A8 inference (4-bit quantization for weights, 8-bit for activations) into vLLM. Results on Hopper GPU versus W4A16: +58% on time to first token (Time To First Token) and +45% on time per output token (Time Per Output Token). The integration primarily targets large-scale MoE Command A models in production.

🔗 Cohere W4A8 Blog

Briefs

Suno number 1 in the music App Store

April 21 — Suno, the AI music generation platform, reaches first place in the App Store music category. CEO Mikey Shulman announces: “The future of music is one where everyone enjoys creating.”

🔗 Tweet @suno

Anthropic Economic Index Survey

April 22 — Anthropic launches the Anthropic Economic Index Survey, a monthly survey conducted via Anthropic Interviewer with a random sample of Claude users. The goal is to collect qualitative data on AI’s economic impact: delegated tasks, productivity gains, role changes. The results will feed future Anthropic Economic Index reports.

🔗 Survey announcement

Anthropic — MCP agents in production: the numbers

April 22 — A technical Anthropic article documents the benefits of MCP for production agents: MCP SDKs exceed 300 million downloads per month, tool search reduces tool definition tokens by 85%, and programmatic tool calling reduces token usage by 37% on complex multi-step workflows.

🔗 MCP production agents blog

OpenAI — WebSockets in the Responses API: 40% latency gain

April 22 — OpenAI retrospective article explaining how WebSocket mode in the Responses API reduces agent loop latency by 40%. The persistent connection keeps an in-memory cache of prior response state, avoiding reprocessing the full history on each call. Already in production: Codex, Vercel AI SDK, Cline (+39%), Cursor (+30%).

🔗 WebSockets article

Perplexity Research — Training retrieval-augmented models

April 22 — Perplexity publishes research on its SFT + RL pipeline (Supervised Fine-Tuning + Reinforcement Learning) to improve search answer quality. Key result: post-trained Qwen models reach GPT-level factuality at lower cost.

🔗 Perplexity Research

What this means

April 23, 2026 outlines two converging trends. On one side, GPT-5.5 confirms that OpenAI has reclaimed the lead on agentic benchmarks (Terminal-Bench, ARC-AGI-2, OSWorld) after several months in which Claude Opus 4.7 dominated. The gap remains tight on SWE-Bench Pro, where Anthropic keeps the advantage — a sign that both labs agree on the same priority use cases.

On the other side, the day marks entry into the era of persistent agents with memory: OpenAI Workspace Agents, Anthropic Managed Agents Memory, and Kimi K2.6 Agent Swarm arrive simultaneously with different approaches (Slack integration, filesystem-based, swarm of sub-agents), but the same goal — for the agent to remember, learn, and act without constant supervision. Rakuten’s figures (-97% errors, -27% cost) provide an initial industrial measure of the impact.

GitHub Copilot continues its strategy of deep integration into GitHub.com (PR chat, agent sessions from issues, structured stack traces) while opening up externally via BYOK. The BYOK VS Code GA signals that Copilot is positioning itself as much as an interface as a model.

Sources

This document was translated from the fr version into the en language using the gpt-5.4-mini model. For more information about the translation process, see https://gitlab.com/jls42/ai-powered-markdown-translator