Search

MiniMax M2.5 reaches 80% on SWE-Bench in open-source, Kling 3.0 transforms AI video, Perplexity launches Model Council

MiniMax M2.5 reaches 80% on SWE-Bench in open-source, Kling 3.0 transforms AI video, Perplexity launches Model Council

MiniMax releases M2.5, an open-source frontier model reaching 80.2% on SWE-Bench Verified. Kling launches its 3.0 model with 1080p video and realistic dialogue. On the research side, Perplexity deploys Model Council to run three models simultaneously, and runs Deep Research on Claude Opus 4.6. Mistral announces its biggest global hackathon with $200K in prizes.


MiniMax M2.5 — open-source frontier model

February 12 — MiniMax announces M2.5, an open-source frontier model designed for real-world productivity. The model achieves state-of-the-art performance in four critical areas: coding, web search, agentic tool calls, and office work.

BenchmarkScoreCategory
SWE-Bench Verified80.2%Real bug fixing
BrowseComp76.3%Web search and navigation
BFCL76.8%Agentic tool calls
Office WorkOptimizedDocument productivity

The 80.2% score on SWE-Bench Verified places M2.5 among the best coding models across all categories. On BrowseComp, OpenAI’s web navigation benchmark, it reaches 76.3% — a sign of solid autonomous search capability.

MiniMax claims 37% faster execution on complex tasks compared to competing models, with a cost of $1 USD per hour at 100 tokens/second. The stated goal: to make scaling long-horizon agents economically viable.

The model is available via MiniMax Agent (agent.minimax.io) and the developer API (platform.minimax.io). As an open-source frontier model, M2.5 positions itself directly against leading proprietary models.

🔗 MiniMax M2.5 Announcement


MiniMax Forge — RL framework for production agents

February 12 — Alongside M2.5, MiniMax releases Forge, a scalable reinforcement learning (RL) framework and algorithm for training production AI agents.

Forge addresses a recurring problem in agent training: the instability of learning at scale. The framework offers an optimized approach for agent reward modeling, targeting ML developers and researchers deploying autonomous agents.

The dual announcement of M2.5 + Forge signals MiniMax’s ambition to offer a complete stack for AI agents: frontier model + training framework.

🔗 Forge on MiniMax News


Kling 3.0 — “Everyone a Director”

February 1 — Kling AI launches its 3.0 model, a major update to its video generation engine positioned around the concept “Everyone a Director”. The model aims to make cinematic creation accessible without technical expertise.

Main improvements focus on visual quality and realism of human interactions:

CapabilityDetail
ResolutionNative 1080p
DialogueRealistic facial expressions and gestures
ConsistencyVisual style maintained over long sequences
FlexibilityFrom simple prompt to full cinematic storyboard

Feedback from the creative community is positive, especially on dialogue realism and the ability to produce scenes with convincing human interactions — a historical weak point of AI video models.

🔗 Kling 3.0 Announcement


February 5 — Perplexity deploys Model Council, a feature that executes the same query on three frontier models simultaneously and produces a single synthesized answer.

Instead of manually switching between models, Model Council runs the query on Claude Opus 4.6, GPT 5.2, and Gemini 3.0 in parallel. A synthesizer model analyzes the results, resolves conflicts between answers, and shows where models converge or diverge.

Use CaseDetail
InvestmentBalanced market perspectives
Complex DecisionsBusiness strategy, major purchases
BrainstormingDiversified creative ideas
VerificationValidate information with increased confidence

The feature is available immediately on the web for Perplexity Max subscribers. The mobile version is in development.

🔗 Introducing Model Council


Perplexity Deep Research moves to Opus 4.6

February 9 — Perplexity announces that Deep Research now runs on Claude Opus 4.6, improving state-of-the-art results on internal and external benchmarks. The upgrade strengthens reasoning capabilities in deep research.

The feature is available immediately for Max users, with a progressive rollout to Pro users.

🔗 Deep Research Opus 4.6 Announcement


Perplexity releases DRACO Benchmark as open-source

February 4 — Perplexity makes DRACO public, an open-source benchmark designed to evaluate deep research tools. The rubrics and full methodology are publicly available.

DRACO validates that Perplexity Deep Research achieves state-of-the-art performance on external benchmarks, surpassing other deep research tools in accuracy and reliability.

🔗 DRACO Announcement


Mistral announces its biggest hackathon — $200K in prizes

February 10 — Mistral AI launches its biggest global hackathon ever organized, scheduled for February 28 to March 1, 2026.

DetailInformation
Format48 hours
LocationsParis, London, New York, San Francisco, Tokyo, Singapore, Sydney + online
Prizes$200K in rewards
PartnersNVIDIA, AWS, Weights & Biases, Hugging Face
Special PrizesElevenLabs, Hugging Face

The event takes place simultaneously in 8 cities and online. The list of partners (NVIDIA, AWS, WandB, Hugging Face) signals the confidence of the major AI ecosystem in the Mistral platform.

🔗 Mistral Hackathon Announcement


Cohere signs Magnus Carlsen as ambassador

February 13 — Cohere announces a partnership with Magnus Carlsen, five-time World Chess Champion and world No. 1 player, as global brand ambassador.

Carlsen will participate in visibility campaigns, thought leadership initiatives, and high-profile Cohere events. The partnership aims to illustrate the parallels between chess strategy and Cohere’s approach to enterprise AI: focus on fundamentals, anticipation, and sustainable advantages.

🔗 Cohere + Magnus Carlsen Announcement


In brief

February 12Runway launches Story Panels, a new workflow allowing the creation of full films or ads from a single image, with character, location, and style consistency.

🔗 Runway Story Panels

February 12-13Mooncake, a PyTorch memory allocator co-developed by Moonshot AI (Kimi) and Tsinghua University, joins the PyTorch ecosystem. The tool optimizes memory peak reduction and fragmentation, relevant for long-context LLM deployment.

🔗 Mooncake Announcement

February 9Ideogram highlights its image editing via natural language prompt, allowing modification of generated images via simple text instructions.

January 30Perplexity integrates Kimi K2.5, Moonshot AI’s open-source reasoning model, for its Pro and Max subscribers. Inference runs on Perplexity’s own infrastructure in the US.

February 4MiniMax and Hyperbond Studio announce a partnership to develop conversational AI companions with “Call Me Sensei”, using MiniMax LLM and agent APIs.


What this means

The first half of February 2026 confirms several underlying trends. MiniMax M2.5 proves that a less publicized player can release an open-source model rivaling leaders on coding benchmarks — 80.2% on SWE-Bench Verified is a remarkable score for an open model. With Forge complementing it, MiniMax offers a complete agent stack.

Perplexity accelerates its differentiation with Model Council, a pragmatic approach acknowledging that no single model dominates all use cases. Integrating Opus 4.6 into Deep Research and open-sourcing DRACO reinforce the platform’s transparency and credibility.

Kling 3.0 marks an advance in video generation with realistic dialogues — a step towards accessible cinematic production tools. On the community side, the $200K Mistral hackathon in 8 cities shows the maturity of the European open-source ecosystem.


Sources