Claude Opus 4.6 and GPT-5.3-Codex: Dual Launch, Gemini 3 Update

Double launch at the summit: Anthropic releases Claude Opus 4.6 with 1M token context and agent teams, while OpenAI responds with GPT-5.3-Codex and an enterprise platform. Google pushes Gemini 3 on all fronts, and GitHub finally answers an 8-year-old request.

Claude Opus 4.6: SOTA in agentic coding and 1M context

February 5 — Anthropic launches Claude Opus 4.6, a major update to its most intelligent model. The model improves in planning, long sessions, code review, and offers for the first time a 1 million token context in beta for an Opus model.

Benchmark	Score	Detail
Terminal-Bench 2.0	SOTA	Highest agentic coding score
Humanity’s Last Exam	SOTA	Multidisciplinary reasoning
GDPval-AA	+144 Elo vs GPT-5.2	Professional work (finance, legal)
BrowseComp	SOTA	Complex information retrieval
MRCR v2 (8-needle 1M)	76%	vs 18.5% for Sonnet 4.5

API and Product New Features

Feature	Description
Agent teams	Multiple Claude Code agents in parallel (research preview)
Adaptive thinking	The model chooses when to use deep thinking
Effort controls	4 levels: low, medium, high (default), max
Context compaction	Automatic context summarization for long sessions
128k output tokens	Longer outputs in a single request
Claude in PowerPoint	Research preview (Max, Team, Enterprise)

Pricing: Unchanged at $5/$ 25 per million tokens (input/output). Premium pricing beyond 200k tokens ( $10/$ 37.50).

Availability: claude.ai, API (claude-opus-4-6), and all major cloud platforms.

Engineering blogs: Infrastructure noise and C compiler

Anthropic publishes two technical articles on the same day. The first quantifies infrastructure noise in agentic coding benchmarks: on Terminal-Bench 2.0, resource configuration alone can create gaps of 6 percentage points between setups. The second documents the construction of a C compiler in Rust by 16 Claude agents in parallel: 100,000 lines of code, capable of compiling the Linux 6.9 kernel on x86, ARM, and RISC-V, in ~2,000 Claude Code sessions for ~$20,000.

Opus 4.6 in GitHub Copilot

The same day, Claude Opus 4.6 becomes available in GA in GitHub Copilot via Agent HQ, after the public preview announced the day before.

🔗 Opus 4.6 Announcement | Infrastructure noise | Building a C compiler

GPT-5.3-Codex: coding frontier + pro knowledge

February 5 — OpenAI launches GPT-5.3-Codex, which merges the coding performance of GPT-5.2-Codex with the reasoning capabilities of GPT-5.2, all 25% faster.

Benchmark	Score
SWE-Bench Pro (Public)	56.8%
Terminal-Bench 2.0	77.3%
OSWorld-Verified	64.7%
GDPval (wins or ties)	70.9%
Cybersecurity CTF	77.6%
SWE-Lancer IC Diamond	81.4%

GPT-5.3-Codex is the first model to have contributed to its own creation: the team used preliminary versions to debug training, manage deployment, and analyze test results.

Beyond code

The model produces presentations, spreadsheets, data analysis, and handles productivity tasks in a desktop environment (64.7% on OSWorld-Verified).

Cybersecurity: high capability

GPT-5.3-Codex is the first model rated high capability for cybersecurity under OpenAI’s preparedness framework, and the first specifically trained to identify software vulnerabilities.

🔗 GPT-5.3-Codex Blog | System Card

OpenAI: Frontier, MCP Apps, security and biotech

OpenAI Frontier: enterprise agent platform

February 5 — OpenAI launches Frontier, a platform to develop, deploy, and manage AI agents in the enterprise. Agents receive shared business context, permissions, and learn from experience.

Aspect	Detail
First customers	HP, Intuit, Oracle, State Farm, Thermo Fisher, Uber
AI Partners	Abridge, Clay, Ambience, Decagon, Harvey, Sierra
Approach	Forward Deployed Engineers (FDE) integrated into teams
Standards	Open standards, compatible with existing systems

ChatGPT: MCP Apps in beta

February 5 — MCP Apps arrive in beta in ChatGPT Business, Enterprise, and Edu. New partner connectors: Amplitude, Fireflies, Vercel, Monday.com, Stripe, Hex, Egnyte, and others. Organizations can build custom MCP apps via developer mode.

Trusted Access for Cyber

February 5 — OpenAI launches Trusted Access for Cyber, a trust-based access pilot program for advanced cyber capabilities. Users can verify their identity at chatgpt.com/cyber. $10 million in API credits are allocated to cyber defense via the Cybersecurity Grant Program.

GPT-5 reduces protein synthesis cost

February 5 — In partnership with Ginkgo Bioworks, OpenAI connects GPT-5 to a robotic lab to optimize cell-free protein synthesis (CFPS). Result: 40% reduction in production cost and 57% improvement in reagent cost, after 36,000 compositions tested on 580 automated plates in six rounds of experimentation.

🔗 OpenAI Frontier | MCP Apps | Trusted Access for Cyber | GPT-5 proteins

Google: Gemini 3, Super Bowl and NotebookLM

Gemini 3: updates and Super Bowl

February 5-6 — Google pushes Gemini 3 on all fronts. Gemini 3 Flash, launched recently, offers Pro-level reasoning at Flash speed: 90.4% on GPQA Diamond and 33.7% on Humanity’s Last Exam (without tools). Gemini 3 becomes the default model for AI Overviews in Google Search.

Google is also preparing a 60-second Gemini ad for Super Bowl LX (February 8) — the “New Home” spot shows a child preparing for a move with the help of Gemini, illustrating search capabilities in Google Photos and image generation.

NotebookLM: Infographics and Slide Decks

NotebookLM, now built on Gemini 3, rolls out Infographics and Slide Decks for Free and Pro users. Slide Decks are already the second most popular output studio. Ultra users can remove the watermark.

🔗 Gemini 3 Flash | Gemini 3 App | NotebookLM Infographics

GitHub: pinned comments on Issues

February 5 — GitHub launches pinned comments on Issues. It is now possible to pin a comment to the top of an issue from the context menu. A feature requested since 2017 to highlight decisions, updates, and key next steps in long threads.

🔗 Changelog

What this means

February 5, 2026, will remain as a landmark day: Anthropic and OpenAI simultaneously launch their most advanced coding models. Claude Opus 4.6 dominates professional work and information retrieval benchmarks, while GPT-5.3-Codex excels in terminal coding and computer use. Both models claim SOTA (State Of The Art) on Terminal-Bench 2.0 — Anthropic’s article on infrastructure noise makes perfect sense.

Beyond the models, the platform battle is intensifying: OpenAI Frontier attacks the enterprise with agents deployed at Oracle and Uber, while Anthropic bets on the developer ecosystem (GitHub, Xcode, Claude Code). Google advances on all fronts with Gemini 3 in Search, Chrome, and NotebookLM, and prepares the Super Bowl to anchor Gemini in the mainstream.