This Week in AI
Busy day for major announcements: Anthropic publishes a research paper on LLM safety after 1700 hours of intensive red-teaming. OpenAI launches an enterprise offering dedicated to hospitals with HIPAA support. ElevenLabs unveils Scribe v2, its new speech-to-text transcription model.
Constitutional Classifiers++: Anthropic Strengthens Safety
January 9, 2026 — Anthropic publishes a major new research paper on the robustness of its defenses against jailbreaks.
Context
Last year, Anthropic introduced Constitutional Classifiers, a system that trains classifiers based on a “constitution” specifying which queries Claude should or should not answer. This system reduced jailbreak success rates from 86% to 4.4%, but presented two problems: computationally expensive and prone to refusing legitimate queries.
Three Key Innovations
The new Constitutional Classifiers++ system brings three major improvements:
| Innovation | Description |
|---|---|
| Exchange Classifiers | Evaluate responses in their full conversational context, correcting the vulnerability of previous systems that examined outputs in isolation |
| Two-Stage Cascade | Lightweight classifiers filter all traffic, escalating only suspicious exchanges to more powerful classifiers |
| Linear Probes | Practical application of interpretability: probes observe Claude’s internal activations (“gut instincts”) to detect suspicious queries |
Results
| Metric | Performance |
|---|---|
| Cost Reduction | 40x compared to baseline |
| Compute overhead | Only ~1% |
| Production Refusal Rate | 0.05% |
| False Refusal Drop | 87% |
| Red-teaming | 1700h without universal jailbreak |
After 1,700 cumulative hours of red-teaming, we’ve yet to identify a universal jailbreak (a consistent attack strategy that works across many queries) that works on our new system.
Why It Matters
The system uses Claude’s internal activations as a “gut instinct” that is difficult to trick. When the probe detects a suspicious query, it sends it to a more powerful “exchange” classifier that analyzes both sides of the conversation. This cascade architecture allows for robust protection without the prohibitive computational cost of previous generations.
OpenAI for Healthcare: AI Enters Hospitals
January 8, 2026 — OpenAI launches an enterprise offering dedicated to the healthcare sector, distinct from ChatGPT Health announced the day before.
Difference from ChatGPT Health
| Product | Target | Focus |
|---|---|---|
| ChatGPT Health | General Public | Personal wellness, health app connection |
| OpenAI for Healthcare | Enterprises | Hospitals, clinics, clinical workflows |
ChatGPT for Healthcare
An enterprise version of ChatGPT designed for healthcare organizations:
- Healthcare-Optimized Models: GPT-5.2 with evaluations by 260+ physicians in 60 countries on HealthBench
- Transparent Medical Citations: Responses sourced from peer-reviewed studies, clinical guidelines, with titles, journals, and dates
- Institutional Alignment: SharePoint integration to respect facility protocols and pathways
- Reusable Templates: Discharge summaries, patient instructions, clinical letters, prior authorization support
Launch Partners
| Institution | Specialty |
|---|---|
| Boston Children’s Hospital | Pediatrics |
| Stanford Medicine Children’s Health | Pediatrics |
| Memorial Sloan Kettering | Oncology |
| Cedars-Sinai Medical Center | General Hospital |
| HCA Healthcare | Hospital Network |
| UCSF | Academic Medical Center |
| AdventHealth | Hospital Network |
| Baylor Scott & White Health | Hospital Network |
HIPAA Compliance
| Aspect | Support |
|---|---|
| BAA | Business Associate Agreement with OpenAI |
| Data residency | Data residency options |
| Audit logs | Comprehensive audit logs |
| Encryption | Customer-managed encryption keys |
| Training | Data not used to train models |
Healthcare is among the fastest-growing enterprise markets adopting AI, and hospitals and academic medical centers are already rolling out ChatGPT for Healthcare across their teams.
— OpenAI
ElevenLabs Scribe v2: Next-Gen Transcription
January 9, 2026 — ElevenLabs announces the availability of the Scribe v2 API for developers and enterprises.
🔗 Scribe v2 Documentation | X Thread
Main Capabilities
| Feature | Details |
|---|---|
| Languages | 90+ supported languages |
| Keyterm prompting | Up to 100 terms to bias the model towards specific words |
| Entity detection | 56 entity types (names, card numbers, medical conditions, SSN) |
| Speaker diarization | Up to 48 distinct speakers |
| Timestamps | Word-level precision |
| Audio tagging | Automatic detection of audio events (laughter, applause) |
Realtime Version
Scribe v2 also exists in a real-time version:
| Metric | Performance |
|---|---|
| Latency | ~150ms |
| Languages | 90+ |
| Transcription | Real-time via WebSockets |
Enterprise Compliance
ElevenLabs offers a Business Associate Agreement (BAA) for clients requiring HIPAA compliance, making Scribe v2 usable in medical contexts.
With Scribe v2, developers and enterprises can automate complex audio pipelines, achieve higher accuracy in global content workflows, and scale with full compliance and data residency controls.
What This Means
Anthropic continues to lead on LLM safety. The combination of interpretability + classifier cascade is elegant: using Claude’s “gut instincts” to detect attacks is harder to bypass than explicit rules. The 87% reduction in false refusals is crucial for enterprise adoption.
OpenAI is attacking the B2B healthcare market head-on, one of the most regulated sectors. The complete offering with HIPAA, BAA, and prestigious hospital partnerships positions OpenAI for Healthcare as a serious alternative to legacy solutions. The differentiation from ChatGPT Health (B2C) shows a mature product strategy.
ElevenLabs completes its audio stack with a state-of-the-art STT. The combination of TTS (voice) + STT (transcription) + HIPAA compliance makes it a full-stack solution for enterprise voice applications. Keyterm prompting is particularly useful for technical terms or proper names.