April 23, 2026 marks a packed day: OpenAI launches GPT-5.5 with 85% on ARC-AGI-2 and an API price of $5/M tokens in input, while Anthropic opens persistent memory in beta for its Managed Agents and publishes a post-mortem on Claude Code. At the same time, GitHub Copilot delivers seven updates in three days, Kimi K2.6 deploys a swarm of 300 sub-agents, and SpaceX seals a coding partnership with Cursor.
GPT-5.5: OpenAIโs frontier model
April 23 โ OpenAI launches GPT-5.5, its most powerful model to date, designed for real work and agents. It significantly improves agentic coding, computer use, knowledge work, and scientific research, while preserving GPT-5.4 latency.
Availability and pricing
GPT-5.5 is available immediately for ChatGPT Plus, Pro, Business, and Enterprise subscribers, as well as in Codex. API access is coming โvery soonโ.
| Offering | API Access | Input | Output |
|---|---|---|---|
| GPT-5.5 standard | Soon | $5 / M tokens | $30 / M tokens |
| GPT-5.5 Pro | Soon | $30 / M tokens | $180 / M tokens |
The context window in Codex reaches 400K tokens. A Fast mode โ 1.5ร faster, 2.5ร the cost โ is available.
Benchmarks
| Evaluation | GPT-5.5 | GPT-5.4 | Claude Opus 4.7 | Gemini 3.1 Pro |
|---|---|---|---|---|
| Terminal-Bench 2.0 | 82.7% | 75.1% | 69.4% | 68.5% |
| Expert-SWE (internal) | 73.1% | 68.5% | โ | โ |
| SWE-Bench Pro | 58.6% | 57.7% | 64.3% | 54.2% |
| GDPval | 84.9% | 83.0% | 80.3% | 67.3% |
| OSWorld-Verified | 78.7% | 75.0% | 78.0% | โ |
| ARC-AGI-2 | 85.0% | 73.3% | 75.8% | 77.1% |
| FrontierMath Tier 4 | 35.4% | 27.1% | 22.9% | 16.7% |
| CyberGym | 81.8% | 79.0% | 73.1% | โ |
| BixBench (bioinformatics) | 80.5% | 74.0% | โ | โ |
GPT-5.5 leads on most benchmarks, with one notable exception: SWE-Bench Pro, where Claude Opus 4.7 keeps the edge (64.3% vs 58.6%).
Infrastructure and safety
The model was co-designed with NVIDIA GB200/GB300 NVL72. Codex used GPT-5.5 to optimize its own infrastructure, gaining +20% token generation speed. On the cybersecurity side, GPT-5.5 is classified High in OpenAIโs Preparedness Framework (not Critical); the Trusted Access Cyber program is extended to it.
Scientific research
Beyond code, GPT-5.5 helped prove a new theorem on Ramsey numbers (combinatorics), formally verified in Lean. It also analyzed a genomic dataset of 62 samples and 28,000 genes in a few minutes โ a task that would have taken months for a team of researchers.
ยซ GPT-5.5 is noticeably smarter and more persistent than GPT-5.4, with stronger coding performance and more reliable tool use. ยป
GPT-5.5 is noticeably smarter and more persistent than GPT-5.4, with stronger coding performance and more reliable tool use. โ Michael Truell, co-founder and CEO of Cursor
๐ GPT-5.5 announcement
The wave of persistent agents
Three major announcements converge on April 23 around the persistent agent, capable of acting autonomously over long periods and retaining context from one session to the next.
OpenAI Workspace Agents in ChatGPT
April 22 โ OpenAI introduces Workspace Agents: shared agents that a team creates once, uses together in ChatGPT or Slack, and improves over time. Powered by Codex in the cloud, they can carry out complex tasks even when the user is offline. Workspace Agents are gradually replacing GPTs, which remain available during the transition.
| Agent type | Function |
|---|---|
| Software verifier | Reviews requests, compares policies, creates IT tickets |
| Product feedback router | Monitors Slack/support/forums โ prioritized tickets |
| Report generator | Extracts data on Friday, creates charts, summary |
| Prospecting agent | Searches leads, scores them, drafts emails, updates CRM |
| Third-party risk manager | Evaluates vendors, produces structured report |
Available in research preview for Business, Enterprise, Edu, and Teachers; free until May 6, 2026, then billed in credits.
According to Ankur Bhatt (AI Engineering, Rippling), what used to take salespeople 5 to 6 hours per week now runs automatically in the background on every opportunity.
๐ Workspace Agents
Anthropic โ Memory for Claude Managed Agents
April 23 โ Memory for Claude Managed Agents is available in public beta on the Claude Platform. Agents can now learn from one session to the next thanks to a memory layer mounted directly on a file system: the agents use the same bash and code execution capabilities they already employ for agentic tasks.
| Feature | Detail |
|---|---|
| Shareable stores | Multiple agents, different access scopes (read-only / read-write) |
| Concurrent access | No overwriting between parallel sessions |
| Audit log | Which session, which agent, which memory |
| Rollback | To any previous version |
| Exportability | Memories manageable via the API |
Customer results illustrate the concrete impact:
| Customer | Result |
|---|---|
| Rakuten | -97% first-pass errors, -27% cost, -34% latency |
| Wisedocs | +30% document review speed |
| Netflix | Context continuity across sessions without manual updates |
| Ando | Platform memory without dedicated infrastructure |
Memory in Claude Managed Agents lets us put continuous learning into production at scale. Our agents distill lessons from every session, delivering 97% fewer first-pass errors at 27% lower cost and 34% lower latency.
Memory in Claude Managed Agents lets us put continuous learning into production at scale. Our agents distill lessons from every session, delivering 97% fewer first-pass errors at 27% lower cost and 34% lower latency. โ Yusuke Kaji, General Manager AI for Business, Rakuten
Claude Code: quality post-mortem and two new versions
Post-mortem and reset of limits
April 23 โ The Claude Code team published a post-mortem on three quality issues reported over the past month. All are fixed in v2.1.116+. Usage limits have been reset for all subscribers.
Over the past month, some of you reported Claude Codeโs quality had slipped. We investigated, and published a post-mortem on the three issues we found. All are fixed in v2.1.116+ and weโve reset usage limits for all subscribers.
Over the past month, some of you reported Claude Codeโs quality had slipped. We investigated, and published a post-mortem on the three issues we found. All are fixed in v2.1.116+ and weโve reset usage limits for all subscribers. โ @ClaudeDevs
v2.1.117 and v2.1.118
| Version | Main features |
|---|---|
| v2.1.118 | Visual Vim mode (v/V) with selection and operators; unified /usage (merges /cost and /stats); custom themes in /theme; hooks invoking MCP tools via type: "mcp_tool"; strict DISABLE_UPDATES; Windows managed settings inheritance via WSL |
| v2.1.117 | Default effort moved to high for Pro/Max on Opus 4.6 and Sonnet 4.6 (was medium); sub-agent fork enabled on external builds; glob/Grep replaced by embedded bfs/ugrep for faster research; Opus 4.7 session fix (1M context calculated correctly); Bedrock+Opus 4.7 fix with thinking disabled |
New Claude connectors for everyday life
April 23 โ Anthropic expands its connector catalog to consumer apps. Since July 2025, more than 200 connectors for professional tools were available; this update adds 15 everyday services.
| Application | Category |
|---|---|
| AllTrails | Hiking |
| Audible | Audiobooks |
| Booking.com | Travel |
| Instacart | Online grocery |
| Intuit Credit Karma | Finance |
| Intuit TurboTax | Tax |
| Resy | Restaurant reservations |
| Spotify | Music |
| StubHub | Ticketing |
| Taskrabbit | Home services |
| Thumbtack | Local professionals |
| TripAdvisor | Travel |
| Uber | Transportation |
| Uber Eats | Food delivery |
| Viator | Tourist activities |
Claude now automatically suggests relevant connectors based on conversation context. Available on all plans (free included), web, desktop, and mobile (mobile in beta). No paid placement or sponsored response; data from an app is not used to train the models.
GitHub Copilot โ Seven updates in three days
GitHub Copilot published seven changelog entries between April 22 and April 23.
Chat for pull requests (3 new capabilities)
April 23 โ Copilot Chat now integrates three capabilities for pull requests, accessible via github.com/copilot or the Copilot button on diffs (public preview):
- PR understanding (pull request understanding): comments, changes, commits, and reviews included as context
- PR review: structured review on demand
- PR summary: concise summary of changes
๐ Copilot Chat PR improvements
Controllable agent sessions from issues and projects
April 23 โ The cloud agent can now be controlled directly from GitHub issues and project boards: session indicator in the issue header, side progress panel, sessions enabled by default in all project views.
๐ Agent sessions from issues
Structured debugging of stack traces on the web
April 23 โ Copilot Chat on github.com now guides stack trace analysis in six structured steps: what failed, why, root cause, evidence from code, confidence level, and next checks.
BYOK VS Code available (GA)
April 22 โ Bring Your Own Key is generally available for Copilot Business and Enterprise users in VS Code. Anthropic, Gemini, OpenAI, OpenRouter, and Azure are supported, as well as local models via Ollama and Foundry Local. Billing is direct through the chosen provider, outside Copilot quotas.
๐ BYOK VS Code GA
C++ Language Server in public preview for Copilot CLI
April 22 โ The Microsoft C++ Language Server (Visual Studio/VS Code IntelliSense engine) is available in public preview for Copilot CLI. It provides precise semantic data (symbol definitions, references, call hierarchies, types) instead of iterative grep search. Requirements: Copilot CLI authentication + compile_commands.json.
๐ C++ Language Server
New Business self-serve signups paused
April 22 โ GitHub is pausing new self-serve signups for Copilot Business on GitHub Free and GitHub Team plans. Existing customers are not affected.
๐ Pause Business self-serve
used_copilot_cloud_agent field in API metrics
April 23 โ Following the โcoding agentโ โ โcloud agentโ rebrand, the metrics API adds the used_copilot_cloud_agent field in user reports (1-day and 28-day rolling windows). The old used_copilot_coding_agent field is maintained until August 1, 2026.
๐ Cloud agent metrics
Gemini CLI v0.39.0 and Deep Think for all Ultra subscribers
Gemini CLI v0.39.0
April 23 โ Google releases Gemini CLI v0.39.0, a stable version labeled โLatestโ. The highlight is the new /memory inbox command to review and validate skills automatically extracted by the CLI during work sessions.
| Feature | Description |
|---|---|
/memory inbox | Review of automatically extracted skills |
Unified invoke_subagent | Refactored sub-agent tool into a single interface |
| Compact formatting | Better readability in compact mode |
| Plan Mode โ confirmations | Validation required before skill activation |
| Lightweight startup | Lightweight parent process for faster startup |
| JSONL streaming migration | Recording chat sessions in JSONL |
Added keyboard shortcuts: Ctrl+Backspace for word-by-word deletion (Windows Terminal), Ctrl+Shift+G.
๐ Gemini CLI v0.39.0
Deep Think open to all Ultra subscribers
April 22 โ Google opens Deep Think mode (deep reasoning, extended thinking) to all Gemini Ultra subscribers. This mode was previously in limited access; it is now available directly from the Gemini app tools menu (web and mobile).
๐ Tweet @GeminiApp
Kimi K2.6: swarm of 300 sub-agents and open-weights benchmarks
Agent Swarm โ 300 parallel sub-agents
April 23 โ Moonshot AI launches Kimi K2.6 Agent Swarm: a system capable of deploying 300 sub-agents in parallel over 4,000 steps per run, compared with 100 agents and 1,500 steps for K2.5.
| Capability | K2.5 | K2.6 |
|---|---|---|
| Parallel sub-agents | 100 | 300 |
| Steps per run | 1,500 | 4,000 |
| Output types | Chat text | 100+ real files, 100,000-word reviews, 20,000-line datasets |
The sub-agents combine heterogeneous skills: web search, data analysis, coding, long-form writing, and visual generation. Available on kimi.com/agent-swarm.
๐ Tweet @Kimi_Moonshot
Benchmarks: #1 open-weights
April 23 โ Kimi K2.6 reaches the top spot among open-weights models on two benchmarks: - Design Arena: same performance band as Claude Opus 4.7
- MathArena open (Think mode): ahead of GLM 5.1
๐ Design Arena
SpaceXAI ร Cursor and Grok Imagine
SpaceXAI ร Cursor partnership
April 22 โ SpaceXAI (an entity resulting from the merger of xAI/SpaceX) and Cursor announce a partnership to create โthe worldโs most capable coding and knowledge-work AIโ. SpaceX provides the Colossus supercomputer (equivalent to one million H100s); Cursor grants it the right to acquire the company later in 2026 for $60 billion, or to pay $10 billion for the collaboration alone.
๐ Tweet @SpaceX
Grok Imagine โ Shareable custom templates
April 22 โ SuperGrok and Premium+ subscribers can now create custom templates in Grok Imagine and share them publicly.
๐ Tweet @imagine
NVIDIA ร Google Cloud Next
April 22 โ At Google Cloud Next (Las Vegas), NVIDIA and Google Cloud announce several major advances around agentic AI infrastructure.
| Announcement | Detail |
|---|---|
| A5X instances (Vera Rubin NVL72) | Up to 960,000 Rubin GPUs in a multi-site cluster, 10ร cheaper per token, 10ร more throughput per megawatt |
| Gemini on Google Distributed Cloud | Preview with Blackwell and Blackwell Ultra GPUs โ data sovereignty |
| Confidential VMs Blackwell | First Blackwell confidential computing offer in the public cloud |
| Nemotron 3 Super | Available on the Gemini Enterprise Agent Platform |
| NeMo RL API | Managed large-scale Reinforcement Learning |
๐ NVIDIA ร Google Cloud Blog
Kling AI Video 3.0 โ Native 4K mode
April 23 โ Kling AI launches native 4K mode in its Video 3.0 series. 4K generation happens in a single click, with no extra upscaling step. Visual consistency (characters, text, styles, lighting) is preserved at native resolution for high-end production. Also available via fal.ai for enterprises.
Kling AI is simultaneously running a 4K Short Film Creative Contest, a global competition inviting creators to submit short films made with the new mode.
๐ Tweet @Kling_ai
ChatGPT for Clinicians and OpenAI Privacy Filter
ChatGPT for Clinicians + HealthBench Professional
April 22 โ OpenAI launches ChatGPT for Clinicians, a free version for verified U.S. healthcare professionals (physicians, nurse practitioners, physician assistants, pharmacists). The service includes access to frontier models for complex clinical questions, skills for repetitive workflows (referral letters, prior authorizations), cited clinical search in real time, and automatic generation of continuing medical education (CME) credits. HIPAA processing is available as an option via agreement.
OpenAI also releases HealthBench Professional, an open benchmark evaluating AI on real clinical tasks (700,000+ physician-evaluated answers). GPT-5.4 in ChatGPT for Clinicians outperforms human doctors on this benchmark under unlimited-time conditions with web access.
OpenAI Privacy Filter
April 22 โ OpenAI releases Privacy Filter, an open-weight model (Apache 2.0) to detect and mask personally identifiable information (PII) in text. The model runs locally (no data sent to a server), supports 128K tokens of context, and reaches an F1 score of 97.43% on the PII-Masking-300k benchmark.
| Feature | Value |
|---|---|
| Architecture | Bidirectional token classifier (constrained Viterbi decoding) |
| Size | 1.5B total parameters, 50M active |
| Context | 128,000 tokens |
| License | Apache 2.0 (Hugging Face + GitHub) |
| F1 | 97.43% on corrected PII-Masking-300k |
PII categories covered: private_person, private_address, private_email, private_phone, private_url, private_date, account_number, secret (passwords and API keys).
Perplexity and Cohere
Perplexity integrates Kimi K2.6
April 23 โ Moonshot AIโs Kimi K2.6 is now available to all Perplexity Pro and Max subscribers.
๐ Tweet @perplexity_ai
Cohere โ production-ready W4A8 in vLLM
April 22 โ Cohere announces integration of its W4A8 inference (4-bit quantization for weights, 8-bit for activations) into vLLM. Results on Hopper GPU versus W4A16: +58% on time to first token (Time To First Token) and +45% on time per output token (Time Per Output Token). The integration primarily targets large-scale MoE Command A models in production.
๐ Cohere W4A8 Blog
Briefs
Suno number 1 in the music App Store
April 21 โ Suno, the AI music generation platform, reaches first place in the App Store music category. CEO Mikey Shulman announces: โThe future of music is one where everyone enjoys creating.โ
๐ Tweet @suno
Anthropic Economic Index Survey
April 22 โ Anthropic launches the Anthropic Economic Index Survey, a monthly survey conducted via Anthropic Interviewer with a random sample of Claude users. The goal is to collect qualitative data on AIโs economic impact: delegated tasks, productivity gains, role changes. The results will feed future Anthropic Economic Index reports.
๐ Survey announcement
Anthropic โ MCP agents in production: the numbers
April 22 โ A technical Anthropic article documents the benefits of MCP for production agents: MCP SDKs exceed 300 million downloads per month, tool search reduces tool definition tokens by 85%, and programmatic tool calling reduces token usage by 37% on complex multi-step workflows.
๐ MCP production agents blog
OpenAI โ WebSockets in the Responses API: 40% latency gain
April 22 โ OpenAI retrospective article explaining how WebSocket mode in the Responses API reduces agent loop latency by 40%. The persistent connection keeps an in-memory cache of prior response state, avoiding reprocessing the full history on each call. Already in production: Codex, Vercel AI SDK, Cline (+39%), Cursor (+30%).
๐ WebSockets article
Perplexity Research โ Training retrieval-augmented models
April 22 โ Perplexity publishes research on its SFT + RL pipeline (Supervised Fine-Tuning + Reinforcement Learning) to improve search answer quality. Key result: post-trained Qwen models reach GPT-level factuality at lower cost.
๐ Perplexity Research
What this means
April 23, 2026 outlines two converging trends. On one side, GPT-5.5 confirms that OpenAI has reclaimed the lead on agentic benchmarks (Terminal-Bench, ARC-AGI-2, OSWorld) after several months in which Claude Opus 4.7 dominated. The gap remains tight on SWE-Bench Pro, where Anthropic keeps the advantage โ a sign that both labs agree on the same priority use cases.
On the other side, the day marks entry into the era of persistent agents with memory: OpenAI Workspace Agents, Anthropic Managed Agents Memory, and Kimi K2.6 Agent Swarm arrive simultaneously with different approaches (Slack integration, filesystem-based, swarm of sub-agents), but the same goal โ for the agent to remember, learn, and act without constant supervision. Rakutenโs figures (-97% errors, -27% cost) provide an initial industrial measure of the impact.
GitHub Copilot continues its strategy of deep integration into GitHub.com (PR chat, agent sessions from issues, structured stack traces) while opening up externally via BYOK. The BYOK VS Code GA signals that Copilot is positioning itself as much as an interface as a model.
Sources
- GPT-5.5 โ OpenAI
- Tweet OpenAI GPT-5.5
- Workspace Agents โ OpenAI
- Tweet Workspace Agents
- ChatGPT for Clinicians
- OpenAI Privacy Filter
- WebSockets API Responses โ OpenAI
- Managed Agents Memory โ Anthropic
- Everyday life connectors โ Anthropic
- Connector tweet โ @claudeai
- Claude Code post-mortem โ @ClaudeDevs
- Tweet @bcherny
- CHANGELOG Claude Code
- MCP production agents โ Anthropic
- Anthropic Economic Index Survey
- Copilot Chat PR improvements
- Copilot agent sessions from issues
- Copilot stack trace debugging
- Copilot BYOK VS Code GA
- Copilot C++ Language Server
- Copilot Business self-serve pause
- Copilot cloud agent metrics
- Gemini CLI v0.39.0
- Gemini Deep Think Ultra โ @GeminiApp
- Kimi K2.6 Agent Swarm โ @Kimi_Moonshot
- Kimi K2.6 Design Arena
- Kimi K2.6 MathArena
- SpaceXAI ร Cursor โ @SpaceX
- Grok Imagine templates โ @imagine
- NVIDIA ร Google Cloud Next
- Kling AI Video 3.0 Mode 4K
- Kling AI 4K Short Film Contest
- Perplexity Kimi K2.6
- Perplexity Research Search-Augmented LMs
- Cohere W4A8 vLLM
- Suno number 1 App Store
This document was translated from the fr version into the en language using the gpt-5.4-mini model. For more information about the translation process, see https://gitlab.com/jls42/ai-powered-markdown-translator