---
title: "Kimi K2.6"
url: "https://handyai.news/modeldrop/kimi-k26"
published: "2026-04-22T14:33:45.000Z"
section: "Model Drop"
source: "https://handyai.substack.com/p/model-drop-kimi-k26"
description: "Model: Kimi K2.6 (kimi-k2.6)Model type: Text + vision, with native image and video inputShip date: April 20, 2026Maker: Moonshot AI (Beijing)Pricing: $0.60 /…"
---

# Kimi K2.6

*Published April 22, 2026 · Model Drop*

![](https://substackcdn.com/image/fetch/$s_!5kr8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa74f6350-80b7-43c0-8565-d7b1ca975140_1456x1048.png)

**Model**: Kimi K2.6 (`kimi-k2.6`)

**Model type**: Text + vision, with native image and video input

**Ship date**: April 20, 2026

**Maker**: [Moonshot AI](https://www.moonshot.ai/) (Beijing)

**Pricing**: $0.60 / $2.50 per million input / output tokens on the [Moonshot API](https://platform.moonshot.ai/). $0.60 / $2.80 on [OpenRouter](https://openrouter.ai/moonshotai). Free weights on Hugging Face for self-hosting.

**Available on**: [Kimi.com](https://www.kimi.com/), the Kimi App, [Kimi API](https://platform.moonshot.ai/), [Kimi Code](https://www.kimi.com/code), [Hugging Face](https://huggingface.co/moonshotai) (open weights), [OpenRouter](https://openrouter.ai/moonshotai), and [Vercel AI Gateway](https://vercel.com/ai-gateway)

**Headline benchmarks**: SWE-Bench Pro 58.6% (leads GPT-5.4 and Claude Opus 4.6), HLE-Full with tools 54.0% (leads every model Moonshot tested against), BrowseComp 83.2% (with Agent Swarm: 86.3%), DeepSearchQA F1 92.5%, Terminal-Bench 2.0 (Terminus-2) 66.7%, SWE-Bench Verified 80.2%.

**Other info**: 256K context window. Mixture-of-experts: 1 trillion total parameters, 32B active per token, 384 experts (8 selected + 1 shared), 61 transformer layers, Multi-head Latent Attention, SwiGLU activation, 160K vocab, 15.5T training tokens. Knowledge cutoff April 2025. Agent Swarm scales to 300 concurrent sub-agents across 4,000 coordinated steps (up from 100 / 1,500 on K2.5). License: Modified MIT (free commercial use; visible “Kimi K2.6” credit required on products with 100M+ MAU or $20M+/month revenue).

**More details**: [Kimi K2.6 tech blog](https://www.kimi.com/blog/kimi-k2-6)

## What shipped

Moonshot AI dropped Kimi K2.6 yesterday, as an open-weight successor to K2.5 aimed squarely at long-horizon coding, agent swarms, and autonomous execution. It’s a mixture-of-experts model (at the same 1T / 32B-active parameter budget as K2.5), with a 256K context window, native multimodal input including video, and a Modified MIT license that lets you use it commercially.

Moonshot claims frontier-grade coding and agent performance at roughly 88% less than [Claude Opus 4.7](https://handyai.substack.com/p/model-drop-claude-opus-47). The headline numbers support the framing on specific benchmarks. SWE-Bench Pro at 58.6% beats GPT-5.4 (57.7%) and Opus 4.6 (53.4%). Humanity’s Last Exam with tools at 54.0% leads every frontier model Moonshot compared against. And, Moonshot shipped workload proofs that are hard to fake: a 13-hour autonomous rewrite of `exchange-core` (8-year-old open-source financial matching engine) that produced a 185% throughput gain across 4,000+ lines of code and 1,000+ tool calls, plus a 12-hour port of Qwen 0.8B inference to Zig on a Mac.

Math (AIME 2026, HMMT), general reasoning (HLE without tools), and vision (MMMU-Pro, MathVision) still trail the closed frontier by 3-6 points.

## What’s new

K2.6 is an iteration on the K2 MoE family with a handful of capabilities that don’t have clean analogues in the closed frontier.

- Agent Swarm, scaled out. K2.6 can orchestrate up to 300 concurrent sub-agents across 4,000 steps, tripling K2.5’s 100-agent / 1,500-step ceiling. This is the closest thing the open ecosystem has to a “manager agent plus specialist workforce” primitive.
- Sustained autonomous execution. Moonshot shipped a 5-day continuous-ops agent trace (monitoring, incident response, scheduled tasks) alongside the 12-hour Zig port and 13-hour `exchange-core` refactor.
- Native multimodal input, now including video. K2 Thinking was text-only. K2.5 added vision. K2.6 adds video input at the same parameter budget.
- Claw Groups (research preview). A new orchestration layer where humans and agents running on different devices, different models, and different vendor stacks operate in a shared space. K2.6 acts as the coordinator, matches tasks to agents by skill profile, and reassigns when an agent stalls.
- Skills from documents. Upload a PDF, a spreadsheet, or a slide deck and K2.6 extracts the structural and stylistic DNA as a reusable “Skill.” The McKinsey-deck reproduction is the obvious demo, the less obvious use is reproducing a regulator’s filing format or a brand deck.

## How and where to use it

Where it runs, what it actually does well, and where you’ll regret reaching for it.

- Where it’s available: Kimi.com and the Kimi App for chat
- Kimi Code for the coding agent in terminal and IDE
- Moonshot API (OpenAI-SDK compatible, one-line base URL swap)
- Hugging Face for open weights, served via vLLM or SGLang
- OpenRouter for multi-provider routing

**What it’s good at**:

- Long-horizon coding across Rust, Go, Python, and front-end
- Multi-file refactors on large codebases
- Agent orchestration where you actually want 100+ parallel sub-agents
- Tool-heavy browsing and deep research
- Workloads where the cost-per-token ratio dominates the decision and you need near-Opus-class output at a fraction of the price

**What it’s bad at / shouldn’t be used for**:

- Anything where mathematical correctness is load-bearing
- Complex tool scheduling
- Vision-heavy workloads
- Regulated workloads where a Chinese-jurisdiction model is a non-starter regardless of capability
- Anything where the K2.5 family’s documented hallucination tendency is a dealbreaker (Moonshot hasn’t published a K2.6 system card yet and nothing in the public materials claims that tendency has been fixed)

## First impressions

#### The positives

Clement Delangue at Hugging Face framed K2.6 as the standout open-source model at launch. Simon Paxton’s writeup captured where that framing actually lands:

> “Kimi K2.6 sets a new bar for open-source. It excels on coding tasks at a level comparable to leading closed source models... In early testing, it sustains long multi-step sessions with impressive stability, far beyond typical models.”

The single most-cited community signal: the `exchange-core` rewrite demo. Thirteen hours of unsupervised work, 1,000+ tool calls, 4,000+ lines of code, 185% throughput gain on an 8-year-old matching engine that was already operating near its performance limits. Described by [Simon Paxton at dev.to](https://dev.to/simon_paxton/kimi-k26-rewrote-legacy-code-for-185-more-throughput-1580) as the kind of workload proof that distinguishes “actual long-horizon work” from “benchmark wins.”

The [ComputeLeap cost analysis](https://dev.to/max_quimby/kimi-k26-vs-claude-opus-47-the-88-cost-advantage-2916) boiled the structural case down to a line every procurement team will run with:

> “Kimi K2.6, the latest open-weight model from Beijing-based Moonshot AI, runs at $0.60 per million input tokens on the official API. Claude Opus 4.7, Anthropic’s frontier model, costs $5.00 per million input tokens. That’s an 8.3× difference — or roughly 88% cheaper.”

Eight-times-cheaper with OpenAI-compatible SDK support means the switching cost for an A/B is a one-line base URL change.

#### The negatives

Hacker News user nikcub posted the honest capability summary from someone with no skin in the game:

> “Below sonnet and opus 4.0 on capability... better than gemini 2.5 pro on tool calling.”

That’s the working mental model most independent reviewers arrived at: K2.6 is not the best model available, it’s a price-for-capability tradeoff that works for specific workloads and breaks down on others. The same HN thread flagged a second concern, that K2.6 “does only slightly better than Kimi K2.5” on day-to-day work, and “struggles with domain-specific tasks.”

[Blockchain.news](https://blockchain.news/ainews/kimi-k2-6-open-weights-model-vs-claude-opus-4-6-latest-benchmark-analysis-real-world-gaps-and-6-business-takeaways) kept returning to the same gap every independent reviewer is naming:

> “Open-weights models underperform in real-world usage compared with closed models such as Claude Opus 4.6.”

Moonshot’s vendor benchmark table shows K2.6 winning on several agentic metrics. Third-party evaluations, where they exist, still put Claude Opus ahead on sustained multi-step reliability.

On safety, the [independent evaluation of K2.5](https://arxiv.org/abs/2604.03121v1) documented significantly fewer refusals on CBRNE-adjacent prompts than GPT-5.2 and Claude Opus 4.5, plus elevated compliance on disinformation and copyright-infringement requests, plus political bias in Chinese-language outputs. Moonshot has not published a K2.6 system card. Until an independent red team retests on K2.6, the working assumption should be that the safety profile has not meaningfully changed.

## Jake’s take

From the K2-family, K2.6 is the first open-weights release where the price-per-capability math starts hurting the closed frontier in an obvious way. Sixty cents in and two-fifty out against Opus 4.7’s five dollars and twenty-five is significant (especially as frontier labs continue to raise prices across the board).

For the long-horizon coding, K2.6 may eat a lot of volume out of the Claude and GPT-5 tier. If you’re spending five figures a month on Opus for code generation, you owe it to your budget to run the A/B. The OpenAI-compatible endpoint makes the test a one-line change.

The safety profile inherited from K2.5 is real; it’s a standard red-team doc showing the model will help with CBRNE (Chemical, Biological, Radiological, Nuclear, and high-yield Explosives)-adjacent prompts that Claude and GPT-5 refuse. Moonshot’s answer to that has been “it’s open weights, that’s the tradeoff,” which is honest and also a non-answer for anyone running K2.6 inside a regulated workload. Stack that on top of the data-jurisdiction question (Beijing-based lab, nation-state interest in agentic infrastructure, political censorship findings baked into the K2.5 safety paper), the hallucination inheritance community testers keep flagging, and you get a model that is an unambiguously great deal for the right workload and a liability for the wrong one.

The interesting question for the rest of the year is whether Moonshot ships a system card that changes the safety calculus, and whether anyone outside China is willing to trust the answer when they do.


---

Source: https://handyai.news/modeldrop/kimi-k26

Original: https://handyai.substack.com/p/model-drop-kimi-k26
