Search

OpenAI launches GPT-5.5, Anthropic opens memory to Managed Agents, Kimi K2.6 Agent Swarm

OpenAI launches GPT-5.5, Anthropic opens memory to Managed Agents, Kimi K2.6 Agent Swarm

April 23, 2026 marks a packed day: OpenAI launches GPT-5.5 with 85% on ARC-AGI-2 and an API price of $5/M tokens in input, while Anthropic opens persistent memory in beta for its Managed Agents and publishes a post-mortem on Claude Code. At the same time, GitHub Copilot delivers seven updates in three days, Kimi K2.6 deploys a swarm of 300 sub-agents, and SpaceX seals a coding partnership with Cursor.


GPT-5.5: OpenAIโ€™s frontier model

April 23 โ€” OpenAI launches GPT-5.5, its most powerful model to date, designed for real work and agents. It significantly improves agentic coding, computer use, knowledge work, and scientific research, while preserving GPT-5.4 latency.

Availability and pricing

GPT-5.5 is available immediately for ChatGPT Plus, Pro, Business, and Enterprise subscribers, as well as in Codex. API access is coming โ€œvery soonโ€.

OfferingAPI AccessInputOutput
GPT-5.5 standardSoon$5 / M tokens$30 / M tokens
GPT-5.5 ProSoon$30 / M tokens$180 / M tokens

The context window in Codex reaches 400K tokens. A Fast mode โ€” 1.5ร— faster, 2.5ร— the cost โ€” is available.

Benchmarks

EvaluationGPT-5.5GPT-5.4Claude Opus 4.7Gemini 3.1 Pro
Terminal-Bench 2.082.7%75.1%69.4%68.5%
Expert-SWE (internal)73.1%68.5%โ€”โ€”
SWE-Bench Pro58.6%57.7%64.3%54.2%
GDPval84.9%83.0%80.3%67.3%
OSWorld-Verified78.7%75.0%78.0%โ€”
ARC-AGI-285.0%73.3%75.8%77.1%
FrontierMath Tier 435.4%27.1%22.9%16.7%
CyberGym81.8%79.0%73.1%โ€”
BixBench (bioinformatics)80.5%74.0%โ€”โ€”

GPT-5.5 leads on most benchmarks, with one notable exception: SWE-Bench Pro, where Claude Opus 4.7 keeps the edge (64.3% vs 58.6%).

Infrastructure and safety

The model was co-designed with NVIDIA GB200/GB300 NVL72. Codex used GPT-5.5 to optimize its own infrastructure, gaining +20% token generation speed. On the cybersecurity side, GPT-5.5 is classified High in OpenAIโ€™s Preparedness Framework (not Critical); the Trusted Access Cyber program is extended to it.

Scientific research

Beyond code, GPT-5.5 helped prove a new theorem on Ramsey numbers (combinatorics), formally verified in Lean. It also analyzed a genomic dataset of 62 samples and 28,000 genes in a few minutes โ€” a task that would have taken months for a team of researchers.

ยซ GPT-5.5 is noticeably smarter and more persistent than GPT-5.4, with stronger coding performance and more reliable tool use. ยป

GPT-5.5 is noticeably smarter and more persistent than GPT-5.4, with stronger coding performance and more reliable tool use. โ€” Michael Truell, co-founder and CEO of Cursor

๐Ÿ”— GPT-5.5 announcement


The wave of persistent agents

Three major announcements converge on April 23 around the persistent agent, capable of acting autonomously over long periods and retaining context from one session to the next.

OpenAI Workspace Agents in ChatGPT

April 22 โ€” OpenAI introduces Workspace Agents: shared agents that a team creates once, uses together in ChatGPT or Slack, and improves over time. Powered by Codex in the cloud, they can carry out complex tasks even when the user is offline. Workspace Agents are gradually replacing GPTs, which remain available during the transition.

Agent typeFunction
Software verifierReviews requests, compares policies, creates IT tickets
Product feedback routerMonitors Slack/support/forums โ†’ prioritized tickets
Report generatorExtracts data on Friday, creates charts, summary
Prospecting agentSearches leads, scores them, drafts emails, updates CRM
Third-party risk managerEvaluates vendors, produces structured report

Available in research preview for Business, Enterprise, Edu, and Teachers; free until May 6, 2026, then billed in credits.

According to Ankur Bhatt (AI Engineering, Rippling), what used to take salespeople 5 to 6 hours per week now runs automatically in the background on every opportunity.

๐Ÿ”— Workspace Agents


Anthropic โ€” Memory for Claude Managed Agents

April 23 โ€” Memory for Claude Managed Agents is available in public beta on the Claude Platform. Agents can now learn from one session to the next thanks to a memory layer mounted directly on a file system: the agents use the same bash and code execution capabilities they already employ for agentic tasks.

FeatureDetail
Shareable storesMultiple agents, different access scopes (read-only / read-write)
Concurrent accessNo overwriting between parallel sessions
Audit logWhich session, which agent, which memory
RollbackTo any previous version
ExportabilityMemories manageable via the API

Customer results illustrate the concrete impact:

CustomerResult
Rakuten-97% first-pass errors, -27% cost, -34% latency
Wisedocs+30% document review speed
NetflixContext continuity across sessions without manual updates
AndoPlatform memory without dedicated infrastructure

Memory in Claude Managed Agents lets us put continuous learning into production at scale. Our agents distill lessons from every session, delivering 97% fewer first-pass errors at 27% lower cost and 34% lower latency.

Memory in Claude Managed Agents lets us put continuous learning into production at scale. Our agents distill lessons from every session, delivering 97% fewer first-pass errors at 27% lower cost and 34% lower latency. โ€” Yusuke Kaji, General Manager AI for Business, Rakuten

๐Ÿ”— Managed Agents memory


Claude Code: quality post-mortem and two new versions

Post-mortem and reset of limits

April 23 โ€” The Claude Code team published a post-mortem on three quality issues reported over the past month. All are fixed in v2.1.116+. Usage limits have been reset for all subscribers.

Over the past month, some of you reported Claude Codeโ€™s quality had slipped. We investigated, and published a post-mortem on the three issues we found. All are fixed in v2.1.116+ and weโ€™ve reset usage limits for all subscribers.

Over the past month, some of you reported Claude Codeโ€™s quality had slipped. We investigated, and published a post-mortem on the three issues we found. All are fixed in v2.1.116+ and weโ€™ve reset usage limits for all subscribers. โ€” @ClaudeDevs

v2.1.117 and v2.1.118

VersionMain features
v2.1.118Visual Vim mode (v/V) with selection and operators; unified /usage (merges /cost and /stats); custom themes in /theme; hooks invoking MCP tools via type: "mcp_tool"; strict DISABLE_UPDATES; Windows managed settings inheritance via WSL
v2.1.117Default effort moved to high for Pro/Max on Opus 4.6 and Sonnet 4.6 (was medium); sub-agent fork enabled on external builds; glob/Grep replaced by embedded bfs/ugrep for faster research; Opus 4.7 session fix (1M context calculated correctly); Bedrock+Opus 4.7 fix with thinking disabled

๐Ÿ”— Claude Code CHANGELOG


New Claude connectors for everyday life

April 23 โ€” Anthropic expands its connector catalog to consumer apps. Since July 2025, more than 200 connectors for professional tools were available; this update adds 15 everyday services.

ApplicationCategory
AllTrailsHiking
AudibleAudiobooks
Booking.comTravel
InstacartOnline grocery
Intuit Credit KarmaFinance
Intuit TurboTaxTax
ResyRestaurant reservations
SpotifyMusic
StubHubTicketing
TaskrabbitHome services
ThumbtackLocal professionals
TripAdvisorTravel
UberTransportation
Uber EatsFood delivery
ViatorTourist activities

Claude now automatically suggests relevant connectors based on conversation context. Available on all plans (free included), web, desktop, and mobile (mobile in beta). No paid placement or sponsored response; data from an app is not used to train the models.

๐Ÿ”— Everyday life connectors


GitHub Copilot โ€” Seven updates in three days

GitHub Copilot published seven changelog entries between April 22 and April 23.

Chat for pull requests (3 new capabilities)

April 23 โ€” Copilot Chat now integrates three capabilities for pull requests, accessible via github.com/copilot or the Copilot button on diffs (public preview):

  • PR understanding (pull request understanding): comments, changes, commits, and reviews included as context
  • PR review: structured review on demand
  • PR summary: concise summary of changes

๐Ÿ”— Copilot Chat PR improvements

Controllable agent sessions from issues and projects

April 23 โ€” The cloud agent can now be controlled directly from GitHub issues and project boards: session indicator in the issue header, side progress panel, sessions enabled by default in all project views.

๐Ÿ”— Agent sessions from issues

Structured debugging of stack traces on the web

April 23 โ€” Copilot Chat on github.com now guides stack trace analysis in six structured steps: what failed, why, root cause, evidence from code, confidence level, and next checks.

๐Ÿ”— Stack trace debugging

BYOK VS Code available (GA)

April 22 โ€” Bring Your Own Key is generally available for Copilot Business and Enterprise users in VS Code. Anthropic, Gemini, OpenAI, OpenRouter, and Azure are supported, as well as local models via Ollama and Foundry Local. Billing is direct through the chosen provider, outside Copilot quotas.

๐Ÿ”— BYOK VS Code GA

C++ Language Server in public preview for Copilot CLI

April 22 โ€” The Microsoft C++ Language Server (Visual Studio/VS Code IntelliSense engine) is available in public preview for Copilot CLI. It provides precise semantic data (symbol definitions, references, call hierarchies, types) instead of iterative grep search. Requirements: Copilot CLI authentication + compile_commands.json.

๐Ÿ”— C++ Language Server

New Business self-serve signups paused

April 22 โ€” GitHub is pausing new self-serve signups for Copilot Business on GitHub Free and GitHub Team plans. Existing customers are not affected.

๐Ÿ”— Pause Business self-serve

used_copilot_cloud_agent field in API metrics

April 23 โ€” Following the โ€œcoding agentโ€ โ†’ โ€œcloud agentโ€ rebrand, the metrics API adds the used_copilot_cloud_agent field in user reports (1-day and 28-day rolling windows). The old used_copilot_coding_agent field is maintained until August 1, 2026.

๐Ÿ”— Cloud agent metrics


Gemini CLI v0.39.0 and Deep Think for all Ultra subscribers

Gemini CLI v0.39.0

April 23 โ€” Google releases Gemini CLI v0.39.0, a stable version labeled โ€œLatestโ€. The highlight is the new /memory inbox command to review and validate skills automatically extracted by the CLI during work sessions.

FeatureDescription
/memory inboxReview of automatically extracted skills
Unified invoke_subagentRefactored sub-agent tool into a single interface
Compact formattingBetter readability in compact mode
Plan Mode โ€” confirmationsValidation required before skill activation
Lightweight startupLightweight parent process for faster startup
JSONL streaming migrationRecording chat sessions in JSONL

Added keyboard shortcuts: Ctrl+Backspace for word-by-word deletion (Windows Terminal), Ctrl+Shift+G.

๐Ÿ”— Gemini CLI v0.39.0

Deep Think open to all Ultra subscribers

April 22 โ€” Google opens Deep Think mode (deep reasoning, extended thinking) to all Gemini Ultra subscribers. This mode was previously in limited access; it is now available directly from the Gemini app tools menu (web and mobile).

๐Ÿ”— Tweet @GeminiApp


Kimi K2.6: swarm of 300 sub-agents and open-weights benchmarks

Agent Swarm โ€” 300 parallel sub-agents

April 23 โ€” Moonshot AI launches Kimi K2.6 Agent Swarm: a system capable of deploying 300 sub-agents in parallel over 4,000 steps per run, compared with 100 agents and 1,500 steps for K2.5.

CapabilityK2.5K2.6
Parallel sub-agents100300
Steps per run1,5004,000
Output typesChat text100+ real files, 100,000-word reviews, 20,000-line datasets

The sub-agents combine heterogeneous skills: web search, data analysis, coding, long-form writing, and visual generation. Available on kimi.com/agent-swarm.

๐Ÿ”— Tweet @Kimi_Moonshot

Benchmarks: #1 open-weights

April 23 โ€” Kimi K2.6 reaches the top spot among open-weights models on two benchmarks: - Design Arena: same performance band as Claude Opus 4.7

  • MathArena open (Think mode): ahead of GLM 5.1

๐Ÿ”— Design Arena


SpaceXAI ร— Cursor and Grok Imagine

SpaceXAI ร— Cursor partnership

April 22 โ€” SpaceXAI (an entity resulting from the merger of xAI/SpaceX) and Cursor announce a partnership to create โ€œthe worldโ€™s most capable coding and knowledge-work AIโ€. SpaceX provides the Colossus supercomputer (equivalent to one million H100s); Cursor grants it the right to acquire the company later in 2026 for $60 billion, or to pay $10 billion for the collaboration alone.

๐Ÿ”— Tweet @SpaceX

Grok Imagine โ€” Shareable custom templates

April 22 โ€” SuperGrok and Premium+ subscribers can now create custom templates in Grok Imagine and share them publicly.

๐Ÿ”— Tweet @imagine


NVIDIA ร— Google Cloud Next

April 22 โ€” At Google Cloud Next (Las Vegas), NVIDIA and Google Cloud announce several major advances around agentic AI infrastructure.

AnnouncementDetail
A5X instances (Vera Rubin NVL72)Up to 960,000 Rubin GPUs in a multi-site cluster, 10ร— cheaper per token, 10ร— more throughput per megawatt
Gemini on Google Distributed CloudPreview with Blackwell and Blackwell Ultra GPUs โ€” data sovereignty
Confidential VMs BlackwellFirst Blackwell confidential computing offer in the public cloud
Nemotron 3 SuperAvailable on the Gemini Enterprise Agent Platform
NeMo RL APIManaged large-scale Reinforcement Learning

๐Ÿ”— NVIDIA ร— Google Cloud Blog


Kling AI Video 3.0 โ€” Native 4K mode

April 23 โ€” Kling AI launches native 4K mode in its Video 3.0 series. 4K generation happens in a single click, with no extra upscaling step. Visual consistency (characters, text, styles, lighting) is preserved at native resolution for high-end production. Also available via fal.ai for enterprises.

Kling AI is simultaneously running a 4K Short Film Creative Contest, a global competition inviting creators to submit short films made with the new mode.

๐Ÿ”— Tweet @Kling_ai


ChatGPT for Clinicians and OpenAI Privacy Filter

ChatGPT for Clinicians + HealthBench Professional

April 22 โ€” OpenAI launches ChatGPT for Clinicians, a free version for verified U.S. healthcare professionals (physicians, nurse practitioners, physician assistants, pharmacists). The service includes access to frontier models for complex clinical questions, skills for repetitive workflows (referral letters, prior authorizations), cited clinical search in real time, and automatic generation of continuing medical education (CME) credits. HIPAA processing is available as an option via agreement.

OpenAI also releases HealthBench Professional, an open benchmark evaluating AI on real clinical tasks (700,000+ physician-evaluated answers). GPT-5.4 in ChatGPT for Clinicians outperforms human doctors on this benchmark under unlimited-time conditions with web access.

๐Ÿ”— ChatGPT for Clinicians

OpenAI Privacy Filter

April 22 โ€” OpenAI releases Privacy Filter, an open-weight model (Apache 2.0) to detect and mask personally identifiable information (PII) in text. The model runs locally (no data sent to a server), supports 128K tokens of context, and reaches an F1 score of 97.43% on the PII-Masking-300k benchmark.

FeatureValue
ArchitectureBidirectional token classifier (constrained Viterbi decoding)
Size1.5B total parameters, 50M active
Context128,000 tokens
LicenseApache 2.0 (Hugging Face + GitHub)
F197.43% on corrected PII-Masking-300k

PII categories covered: private_person, private_address, private_email, private_phone, private_url, private_date, account_number, secret (passwords and API keys).

๐Ÿ”— OpenAI Privacy Filter


Perplexity and Cohere

Perplexity integrates Kimi K2.6

April 23 โ€” Moonshot AIโ€™s Kimi K2.6 is now available to all Perplexity Pro and Max subscribers.

๐Ÿ”— Tweet @perplexity_ai

Cohere โ€” production-ready W4A8 in vLLM

April 22 โ€” Cohere announces integration of its W4A8 inference (4-bit quantization for weights, 8-bit for activations) into vLLM. Results on Hopper GPU versus W4A16: +58% on time to first token (Time To First Token) and +45% on time per output token (Time Per Output Token). The integration primarily targets large-scale MoE Command A models in production.

๐Ÿ”— Cohere W4A8 Blog


Briefs

Suno number 1 in the music App Store

April 21 โ€” Suno, the AI music generation platform, reaches first place in the App Store music category. CEO Mikey Shulman announces: โ€œThe future of music is one where everyone enjoys creating.โ€

๐Ÿ”— Tweet @suno

Anthropic Economic Index Survey

April 22 โ€” Anthropic launches the Anthropic Economic Index Survey, a monthly survey conducted via Anthropic Interviewer with a random sample of Claude users. The goal is to collect qualitative data on AIโ€™s economic impact: delegated tasks, productivity gains, role changes. The results will feed future Anthropic Economic Index reports.

๐Ÿ”— Survey announcement

Anthropic โ€” MCP agents in production: the numbers

April 22 โ€” A technical Anthropic article documents the benefits of MCP for production agents: MCP SDKs exceed 300 million downloads per month, tool search reduces tool definition tokens by 85%, and programmatic tool calling reduces token usage by 37% on complex multi-step workflows.

๐Ÿ”— MCP production agents blog

OpenAI โ€” WebSockets in the Responses API: 40% latency gain

April 22 โ€” OpenAI retrospective article explaining how WebSocket mode in the Responses API reduces agent loop latency by 40%. The persistent connection keeps an in-memory cache of prior response state, avoiding reprocessing the full history on each call. Already in production: Codex, Vercel AI SDK, Cline (+39%), Cursor (+30%).

๐Ÿ”— WebSockets article

Perplexity Research โ€” Training retrieval-augmented models

April 22 โ€” Perplexity publishes research on its SFT + RL pipeline (Supervised Fine-Tuning + Reinforcement Learning) to improve search answer quality. Key result: post-trained Qwen models reach GPT-level factuality at lower cost.

๐Ÿ”— Perplexity Research


What this means

April 23, 2026 outlines two converging trends. On one side, GPT-5.5 confirms that OpenAI has reclaimed the lead on agentic benchmarks (Terminal-Bench, ARC-AGI-2, OSWorld) after several months in which Claude Opus 4.7 dominated. The gap remains tight on SWE-Bench Pro, where Anthropic keeps the advantage โ€” a sign that both labs agree on the same priority use cases.

On the other side, the day marks entry into the era of persistent agents with memory: OpenAI Workspace Agents, Anthropic Managed Agents Memory, and Kimi K2.6 Agent Swarm arrive simultaneously with different approaches (Slack integration, filesystem-based, swarm of sub-agents), but the same goal โ€” for the agent to remember, learn, and act without constant supervision. Rakutenโ€™s figures (-97% errors, -27% cost) provide an initial industrial measure of the impact.

GitHub Copilot continues its strategy of deep integration into GitHub.com (PR chat, agent sessions from issues, structured stack traces) while opening up externally via BYOK. The BYOK VS Code GA signals that Copilot is positioning itself as much as an interface as a model.


Sources

This document was translated from the fr version into the en language using the gpt-5.4-mini model. For more information about the translation process, see https://gitlab.com/jls42/ai-powered-markdown-translator