Double launch at the summit: Anthropic releases Claude Opus 4.6 with 1M token context and agent teams, while OpenAI responds with GPT-5.3-Codex and an enterprise platform. Google pushes Gemini 3 on all fronts, and GitHub finally answers an 8-year-old request.
Claude Opus 4.6: SOTA in agentic coding and 1M context
February 5 — Anthropic launches Claude Opus 4.6, a major update to its most intelligent model. The model improves in planning, long sessions, code review, and offers for the first time a 1 million token context in beta for an Opus model.
| Benchmark | Score | Detail |
|---|---|---|
| Terminal-Bench 2.0 | SOTA | Highest agentic coding score |
| Humanity’s Last Exam | SOTA | Multidisciplinary reasoning |
| GDPval-AA | +144 Elo vs GPT-5.2 | Professional work (finance, legal) |
| BrowseComp | SOTA | Complex information retrieval |
| MRCR v2 (8-needle 1M) | 76% | vs 18.5% for Sonnet 4.5 |
API and Product New Features
| Feature | Description |
|---|---|
| Agent teams | Multiple Claude Code agents in parallel (research preview) |
| Adaptive thinking | The model chooses when to use deep thinking |
| Effort controls | 4 levels: low, medium, high (default), max |
| Context compaction | Automatic context summarization for long sessions |
| 128k output tokens | Longer outputs in a single request |
| Claude in PowerPoint | Research preview (Max, Team, Enterprise) |
Pricing: Unchanged at 25 per million tokens (input/output). Premium pricing beyond 200k tokens (37.50).
Availability: claude.ai, API (claude-opus-4-6), and all major cloud platforms.
Engineering blogs: Infrastructure noise and C compiler
Anthropic publishes two technical articles on the same day. The first quantifies infrastructure noise in agentic coding benchmarks: on Terminal-Bench 2.0, resource configuration alone can create gaps of 6 percentage points between setups. The second documents the construction of a C compiler in Rust by 16 Claude agents in parallel: 100,000 lines of code, capable of compiling the Linux 6.9 kernel on x86, ARM, and RISC-V, in ~2,000 Claude Code sessions for ~$20,000.
Opus 4.6 in GitHub Copilot
The same day, Claude Opus 4.6 becomes available in GA in GitHub Copilot via Agent HQ, after the public preview announced the day before.
🔗 Opus 4.6 Announcement | Infrastructure noise | Building a C compiler
GPT-5.3-Codex: coding frontier + pro knowledge
February 5 — OpenAI launches GPT-5.3-Codex, which merges the coding performance of GPT-5.2-Codex with the reasoning capabilities of GPT-5.2, all 25% faster.
| Benchmark | Score |
|---|---|
| SWE-Bench Pro (Public) | 56.8% |
| Terminal-Bench 2.0 | 77.3% |
| OSWorld-Verified | 64.7% |
| GDPval (wins or ties) | 70.9% |
| Cybersecurity CTF | 77.6% |
| SWE-Lancer IC Diamond | 81.4% |
GPT-5.3-Codex is the first model to have contributed to its own creation: the team used preliminary versions to debug training, manage deployment, and analyze test results.
Beyond code
The model produces presentations, spreadsheets, data analysis, and handles productivity tasks in a desktop environment (64.7% on OSWorld-Verified).
Cybersecurity: high capability
GPT-5.3-Codex is the first model rated high capability for cybersecurity under OpenAI’s preparedness framework, and the first specifically trained to identify software vulnerabilities.
🔗 GPT-5.3-Codex Blog | System Card
OpenAI: Frontier, MCP Apps, security and biotech
OpenAI Frontier: enterprise agent platform
February 5 — OpenAI launches Frontier, a platform to develop, deploy, and manage AI agents in the enterprise. Agents receive shared business context, permissions, and learn from experience.
| Aspect | Detail |
|---|---|
| First customers | HP, Intuit, Oracle, State Farm, Thermo Fisher, Uber |
| AI Partners | Abridge, Clay, Ambience, Decagon, Harvey, Sierra |
| Approach | Forward Deployed Engineers (FDE) integrated into teams |
| Standards | Open standards, compatible with existing systems |
ChatGPT: MCP Apps in beta
February 5 — MCP Apps arrive in beta in ChatGPT Business, Enterprise, and Edu. New partner connectors: Amplitude, Fireflies, Vercel, Monday.com, Stripe, Hex, Egnyte, and others. Organizations can build custom MCP apps via developer mode.
Trusted Access for Cyber
February 5 — OpenAI launches Trusted Access for Cyber, a trust-based access pilot program for advanced cyber capabilities. Users can verify their identity at chatgpt.com/cyber. $10 million in API credits are allocated to cyber defense via the Cybersecurity Grant Program.
GPT-5 reduces protein synthesis cost
February 5 — In partnership with Ginkgo Bioworks, OpenAI connects GPT-5 to a robotic lab to optimize cell-free protein synthesis (CFPS). Result: 40% reduction in production cost and 57% improvement in reagent cost, after 36,000 compositions tested on 580 automated plates in six rounds of experimentation.
🔗 OpenAI Frontier | MCP Apps | Trusted Access for Cyber | GPT-5 proteins
Google: Gemini 3, Super Bowl and NotebookLM
Gemini 3: updates and Super Bowl
February 5-6 — Google pushes Gemini 3 on all fronts. Gemini 3 Flash, launched recently, offers Pro-level reasoning at Flash speed: 90.4% on GPQA Diamond and 33.7% on Humanity’s Last Exam (without tools). Gemini 3 becomes the default model for AI Overviews in Google Search.
Google is also preparing a 60-second Gemini ad for Super Bowl LX (February 8) — the “New Home” spot shows a child preparing for a move with the help of Gemini, illustrating search capabilities in Google Photos and image generation.
NotebookLM: Infographics and Slide Decks
NotebookLM, now built on Gemini 3, rolls out Infographics and Slide Decks for Free and Pro users. Slide Decks are already the second most popular output studio. Ultra users can remove the watermark.
🔗 Gemini 3 Flash | Gemini 3 App | NotebookLM Infographics
GitHub: pinned comments on Issues
February 5 — GitHub launches pinned comments on Issues. It is now possible to pin a comment to the top of an issue from the context menu. A feature requested since 2017 to highlight decisions, updates, and key next steps in long threads.
What this means
February 5, 2026, will remain as a landmark day: Anthropic and OpenAI simultaneously launch their most advanced coding models. Claude Opus 4.6 dominates professional work and information retrieval benchmarks, while GPT-5.3-Codex excels in terminal coding and computer use. Both models claim SOTA (State Of The Art) on Terminal-Bench 2.0 — Anthropic’s article on infrastructure noise makes perfect sense.
Beyond the models, the platform battle is intensifying: OpenAI Frontier attacks the enterprise with agents deployed at Oracle and Uber, while Anthropic bets on the developer ecosystem (GitHub, Xcode, Claude Code). Google advances on all fronts with Gemini 3 in Search, Chrome, and NotebookLM, and prepares the Super Bowl to anchor Gemini in the mainstream.
Sources
- Introducing Claude Opus 4.6
- Quantifying infrastructure noise
- Building a C compiler with parallel Claudes
- Introducing GPT-5.3-Codex
- GPT-5.3-Codex System Card
- Introducing OpenAI Frontier
- Introducing apps in ChatGPT
- Trusted Access for Cyber
- GPT-5 lowers protein synthesis cost
- Gemini 3 Flash
- NotebookLM Infographics
- Pinned comments on GitHub Issues