Model directory
Choose the right
model stack.
32 models for PR review, Remedy, and Critique Chat. Filter by job, compare benchmark scores, and line up lead, specialist, and Remedy lanes before you ship.
32
Models in the review catalog
0.5 cr
Lowest credit floor (Gemma-4, MiMo, Ling, DeepSeek Flash)
4
Team-only escalation lanes
Benchmarks
Compare published scores
on the same suite.
Each tab is one exact benchmark — SWE-bench Verified and SWE-bench Pro are never mixed. Scores come from vendor launch posts and public eval tables in the Critique catalog, not Critique-run tests.
500 human-validated real GitHub issues — the standard cross-vendor coding ruler.
SWE-bench Verified
Source: Vendor model cards & launch posts · harnesses differ
25 models with data
Claude Opus 4.8
Anthropic
88.6%
#1 · 37 cr
Claude Opus 4.8 (Fast)
Anthropic
88.6%
#2 · 74 cr
GPT-5.5
OpenAI
82.6%
#3 · 40 cr
DeepSeek V4 Pro
DeepSeek
80.6%
#4 · 1 cr
Gemini 3.1 Pro
80.6%
#5 · 16 cr
Qwen3.7-Max
Alibaba
80.4%
#6 · 6 cr
MiniMax-M2.7
MiniMax
80.2%
#7 · 1.5 cr
Kimi K2.6
MoonshotAI
80.2%
#8 · 4 cr
GPT-5.2 Pro
OpenAI
80%
#9 · 180 cr
Claude Sonnet 4.6
Anthropic
79.6%
#10 · 22 cr
DeepSeek V4 Flash
DeepSeek
79%
#11 · 0.5 cr
MiMo v2.5 Pro
Xiaomi
78.9%
#12 · 1 cr
Qwen3.6 Plus
Alibaba
78.8%
#13 · 2 cr
GPT-5.4
OpenAI
78.2%
#14 · 20 cr
Gemini 3 Flash
78%
#15 · 4 cr
GLM-5.1
Z.AI
77.8%
#16 · 3 cr
StepFun-3.7 Flash
StepFun
74.4%
#17 · 1 cr
Ring-2.6-1T
InclusionAI
74%
#18 · 1.5 cr
GLM-5V-Turbo
Z.AI
73.8%
#19 · 3 cr
MiMo v2.5
Xiaomi
73.4%
#20 · 0.5 cr
KAT Coder Pro V2
KwaiPilot
73.4%
#21 · 2 cr
GPT-5.4 Mini
OpenAI
73%
#22 · 6 cr
Trinity-Large-Thinking
Arcee AI
63.2%
#23 · 1 cr
Ling-2.6-Flash
InclusionAI
61.2%
#24 · 0.5 cr
GPT-5.4 Nano
OpenAI
46.5%
#25 · 2 cr
Start with a stack,
not a blank slate.
Pre-configured lead + specialist + Remedy combinations. Pin any layer in the dashboard when you have data.
| Profile | Lead | Specialist | Remedy |
|---|---|---|---|
| Default stack | GLM-5V-Turbo | Qwen3.6 Plus | MiMo v2.5 Pro |
| Cheap volume | DeepSeek V4 Flash | Ling-2.6-Flash | MiMo v2.5 |
| Quality first | GPT-5.4 | Claude Sonnet 4.6 | GLM-5.1 |
| Team escalation | Claude Sonnet 4.6 | GPT-5.5 | Claude Opus 4.8 |
Optimize for
Balanced quality and cost for most engineering teams — Best starting point for most teams. Strong lead + reliable Remedy.
Named routing templates
Best Default
GLM-5V-Turbo · Qwen3.6-35B-A3B · MiMo v2.5 Pro
3–8 cr / review
Cheap Volume
DeepSeek V4 Flash · Ling-2.6-Flash · MiMo v2.5
1–3 cr / review
Balanced Engineering
GLM-5.1 · GPT-5.5 · GLM-5V-Turbo
40–90 cr / review
Frontier Escalation
Claude Sonnet 4.6 · GPT-5.5 · GLM-5.1
40–240+ cr / review
Full catalog
Click a row for routing guidance, benchmark receipts, and OpenRouter IDs. Select up to three models to compare side by side.
32 models
| Model | Provider | Floor | Context | Speed | Plan | |
|---|---|---|---|---|---|---|
GLM-5V-Turbo z-ai/glm-5v-turbo | Z.AI | 3 cr | 203K | Fast | Solo + Pro | |
MiMo v2.5 ProNew xiaomi/mimo-v2.5-pro | Xiaomi | 1 cr | 1M | Deep | Solo + Pro | |
Claude Sonnet 4.6 anthropic/claude-sonnet-4.6 | Anthropic | 22 cr | 1M | Balanced | Solo + Pro | |
GPT-5.4 openai/gpt-5.4 | OpenAI | 20 cr | 1M | Balanced | Solo + Pro | |
Qwen3.6 Plus qwen/qwen3.6-plus | Alibaba | 2 cr | 1M | Balanced | Solo + Pro | |
Gemma-4-31B google/gemma-4-31b-it | 0.5 cr | 262K | Fast | Solo + Pro | ||
MiMo v2.5 xiaomi/mimo-v2.5 | Xiaomi | 0.5 cr | 1M | Instant | Solo + Pro | |
Ling-2.6-FlashNew inclusionai/ling-2.6-flash | InclusionAI | 0.5 cr | 262K | Instant | Solo + Pro | |
DeepSeek V4 FlashNew deepseek/deepseek-v4-flash | DeepSeek | 0.5 cr | 1M | Fast | Solo + Pro | |
StepFun-3.7 Flash stepfun/step-3.7-flash | StepFun | 1 cr | 262K | Instant | Solo + Pro | |
Trinity-Large-ThinkingNew arcee-ai/trinity-large-thinking | Arcee AI | 1 cr | 262K | Balanced | Solo + Pro | |
DeepSeek V4 ProNew deepseek/deepseek-v4-pro | DeepSeek | 1 cr | 1M | Balanced | Solo + Pro | |
MiniMax-M2.7 minimax/minimax-m2.7 | MiniMax | 1.5 cr | 197K | Balanced | Solo + Pro | |
Ring-2.6-1T inclusionai/ring-2.6-1t | InclusionAI | 1.5 cr | 262K | Balanced | Solo + Pro | |
KAT Coder Pro V2 kwaipilot/kat-coder-pro-v2 | KwaiPilot | 2 cr | 256K | Fast | Solo + Pro | |
Gemini 3.1 Flash Lite google/gemini-3.1-flash-lite-preview | 2 cr | 1M | Instant | Solo + Pro | ||
GPT-5.4 Nano openai/gpt-5.4-nano | OpenAI | 2 cr | 400K | Instant | Solo + Pro | |
GLM-5.1 z-ai/glm-5.1 | Z.AI | 3 cr | 203K | Balanced | Solo + Pro | |
Grok Build 0.1 x-ai/grok-build-0.1 | xAI | 3 cr | 256K | Balanced | Solo + Pro | |
Grok 4.3New x-ai/grok-4.3 | xAI | 3 cr | 1M | Balanced | Solo + Pro | |
Kimi K2.6New moonshotai/kimi-k2.6 | MoonshotAI | 4 cr | 262K | Balanced | Solo + Pro | |
Gemini 3 Flash google/gemini-3-flash-preview | 4 cr | 1M | Fast | Solo + Pro | ||
GPT-5.4 Mini openai/gpt-5.4-mini | OpenAI | 6 cr | 400K | Fast | Solo + Pro | |
Qwen3.7-Max qwen/qwen3.7-max | Alibaba | 6 cr | 262K | Balanced | Solo + Pro | |
Gemini 3.5 Flash google/gemini-3.5-flash | 10 cr | 1M | Balanced | Solo + Pro | ||
Gemini 3.1 Pro google/gemini-3.1-pro-preview | 16 cr | 1M | Deep | Solo + Pro | ||
GPT-5.3 Codex openai/gpt-5.3-codex | OpenAI | 18 cr | 400K | Deep | Solo + Pro | |
Claude Opus 4.8 anthropic/claude-opus-4.8 | Anthropic | 37 cr | 1M | Balanced | Team | |
GPT-5.5New openai/gpt-5.5 | OpenAI | 40 cr | 1M | Balanced | Solo + Pro | |
Claude Opus 4.8 (Fast) anthropic/claude-opus-4.8-fast | Anthropic | 74 cr | 1M | Balanced | Team | |
GPT-5.2 Pro openai/gpt-5.2-pro | OpenAI | 180 cr | 400K | Deep | Team | |
GPT-5.5 Pro openai/gpt-5.5-pro | OpenAI | 237 cr | 1M | Deep | Team |
Credit floor vs. real cost
Floor is not the full review bill
Actual spend is lead + specialist + depth multiplier. Remedy adds execution overhead and possible re-review.
See pricing examples →Critique Chat
Two free chat models
Chat does not spend PR review credits. Current roster: Ling 2.6 Flash and DeepSeek V4 Flash.
Open Critique Chat →Start with the default stack
GLM-5V-Turbo lead, Qwen3.6 Plus specialist, MiMo v2.5 Pro for Remedy — then swap layers when your PR data says so.