Skip to content

Model directory

Choose the right
model stack.

32 models for PR review, Remedy, and Critique Chat. Filter by job, compare benchmark scores, and line up lead, specialist, and Remedy lanes before you ship.

32

Models in the review catalog

0.5 cr

Lowest credit floor (Gemma-4, MiMo, Ling, DeepSeek Flash)

4

Team-only escalation lanes

Benchmarks

Compare published scores
on the same suite.

Each tab is one exact benchmark — SWE-bench Verified and SWE-bench Pro are never mixed. Scores come from vendor launch posts and public eval tables in the Critique catalog, not Critique-run tests.

500 human-validated real GitHub issues — the standard cross-vendor coding ruler.

SWE-bench Verified

Source: Vendor model cards & launch posts · harnesses differ

25 models with data

Claude Opus 4.8

Anthropic

SWE-bench Verified

88.6%

#1 · 37 cr

Claude Opus 4.8 (Fast)

Anthropic

SWE-bench Verified

88.6%

#2 · 74 cr

GPT-5.5

OpenAI

SWE-bench Verified

82.6%

#3 · 40 cr

DeepSeek V4 Pro

DeepSeek

SWE-bench Verified

80.6%

#4 · 1 cr

Gemini 3.1 Pro

Google

SWE-bench Verified

80.6%

#5 · 16 cr

Qwen3.7-Max

Alibaba

SWE-bench Verified

80.4%

#6 · 6 cr

MiniMax-M2.7

MiniMax

SWE-bench Verified

80.2%

#7 · 1.5 cr

Kimi K2.6

MoonshotAI

SWE-bench Verified

80.2%

#8 · 4 cr

GPT-5.2 Pro

OpenAI

SWE-bench Verified

80%

#9 · 180 cr

Claude Sonnet 4.6

Anthropic

SWE-bench Verified

79.6%

#10 · 22 cr

DeepSeek V4 Flash

DeepSeek

SWE-bench Verified

79%

#11 · 0.5 cr

MiMo v2.5 Pro

Xiaomi

SWE-bench Verified

78.9%

#12 · 1 cr

Qwen3.6 Plus

Alibaba

SWE-bench Verified

78.8%

#13 · 2 cr

GPT-5.4

OpenAI

SWE-bench Verified

78.2%

#14 · 20 cr

Gemini 3 Flash

Google

SWE-bench Verified

78%

#15 · 4 cr

GLM-5.1

Z.AI

SWE-bench Verified

77.8%

#16 · 3 cr

StepFun-3.7 Flash

StepFun

SWE-bench Verified

74.4%

#17 · 1 cr

Ring-2.6-1T

InclusionAI

SWE-bench Verified

74%

#18 · 1.5 cr

GLM-5V-Turbo

Z.AI

SWE-bench Verified

73.8%

#19 · 3 cr

MiMo v2.5

Xiaomi

SWE-bench Verified

73.4%

#20 · 0.5 cr

KAT Coder Pro V2

KwaiPilot

SWE-bench Verified

73.4%

#21 · 2 cr

GPT-5.4 Mini

OpenAI

SWE-bench Verified

73%

#22 · 6 cr

Trinity-Large-Thinking

Arcee AI

SWE-bench Verified

63.2%

#23 · 1 cr

Ling-2.6-Flash

InclusionAI

SWE-bench Verified

61.2%

#24 · 0.5 cr

GPT-5.4 Nano

OpenAI

SWE-bench Verified

46.5%

#25 · 2 cr

Start with a stack,
not a blank slate.

Pre-configured lead + specialist + Remedy combinations. Pin any layer in the dashboard when you have data.

ProfileLeadSpecialistRemedy
Default stackGLM-5V-TurboQwen3.6 PlusMiMo v2.5 Pro
Cheap volumeDeepSeek V4 FlashLing-2.6-FlashMiMo v2.5
Quality firstGPT-5.4Claude Sonnet 4.6GLM-5.1
Team escalationClaude Sonnet 4.6GPT-5.5Claude Opus 4.8

Optimize for

Balanced quality and cost for most engineering teamsBest starting point for most teams. Strong lead + reliable Remedy.

Named routing templates

Best Default

GLM-5V-Turbo · Qwen3.6-35B-A3B · MiMo v2.5 Pro

3–8 cr / review

Cheap Volume

DeepSeek V4 Flash · Ling-2.6-Flash · MiMo v2.5

1–3 cr / review

Balanced Engineering

GLM-5.1 · GPT-5.5 · GLM-5V-Turbo

40–90 cr / review

Frontier Escalation

Claude Sonnet 4.6 · GPT-5.5 · GLM-5.1

40–240+ cr / review

Full catalog

Click a row for routing guidance, benchmark receipts, and OpenRouter IDs. Select up to three models to compare side by side.

32 models

ModelProviderFloorContextSpeedPlan
Z.ai

GLM-5V-Turbo

z-ai/glm-5v-turbo

Z.AI3 cr203KFastSolo + Pro
XiaomiMiMo

MiMo v2.5 ProNew

xiaomi/mimo-v2.5-pro

Xiaomi1 cr1MDeepSolo + Pro
Claude

Claude Sonnet 4.6

anthropic/claude-sonnet-4.6

Anthropic22 cr1MBalancedSolo + Pro
OpenAI

GPT-5.4

openai/gpt-5.4

OpenAI20 cr1MBalancedSolo + Pro
Qwen

Qwen3.6 Plus

qwen/qwen3.6-plus

Alibaba2 cr1MBalancedSolo + Pro
Google

Gemma-4-31B

google/gemma-4-31b-it

Google0.5 cr262KFastSolo + Pro
XiaomiMiMo

MiMo v2.5

xiaomi/mimo-v2.5

Xiaomi0.5 cr1MInstantSolo + Pro
AntGroup

Ling-2.6-FlashNew

inclusionai/ling-2.6-flash

InclusionAI0.5 cr262KInstantSolo + Pro
DeepSeek

DeepSeek V4 FlashNew

deepseek/deepseek-v4-flash

DeepSeek0.5 cr1MFastSolo + Pro
Stepfun

StepFun-3.7 Flash

stepfun/step-3.7-flash

StepFun1 cr262KInstantSolo + Pro
Arcee

Trinity-Large-ThinkingNew

arcee-ai/trinity-large-thinking

Arcee AI1 cr262KBalancedSolo + Pro
DeepSeek

DeepSeek V4 ProNew

deepseek/deepseek-v4-pro

DeepSeek1 cr1MBalancedSolo + Pro
Minimax

MiniMax-M2.7

minimax/minimax-m2.7

MiniMax1.5 cr197KBalancedSolo + Pro
AntGroup

Ring-2.6-1T

inclusionai/ring-2.6-1t

InclusionAI1.5 cr262KBalancedSolo + Pro
KwaiKAT

KAT Coder Pro V2

kwaipilot/kat-coder-pro-v2

KwaiPilot2 cr256KFastSolo + Pro
Google

Gemini 3.1 Flash Lite

google/gemini-3.1-flash-lite-preview

Google2 cr1MInstantSolo + Pro
OpenAI

GPT-5.4 Nano

openai/gpt-5.4-nano

OpenAI2 cr400KInstantSolo + Pro
Z.ai

GLM-5.1

z-ai/glm-5.1

Z.AI3 cr203KBalancedSolo + Pro
Grok

Grok Build 0.1

x-ai/grok-build-0.1

xAI3 cr256KBalancedSolo + Pro
Grok

Grok 4.3New

x-ai/grok-4.3

xAI3 cr1MBalancedSolo + Pro
Kimi

Kimi K2.6New

moonshotai/kimi-k2.6

MoonshotAI4 cr262KBalancedSolo + Pro
Google

Gemini 3 Flash

google/gemini-3-flash-preview

Google4 cr1MFastSolo + Pro
OpenAI

GPT-5.4 Mini

openai/gpt-5.4-mini

OpenAI6 cr400KFastSolo + Pro
Qwen

Qwen3.7-Max

qwen/qwen3.7-max

Alibaba6 cr262KBalancedSolo + Pro
Google

Gemini 3.5 Flash

google/gemini-3.5-flash

Google10 cr1MBalancedSolo + Pro
Google

Gemini 3.1 Pro

google/gemini-3.1-pro-preview

Google16 cr1MDeepSolo + Pro
OpenAI

GPT-5.3 Codex

openai/gpt-5.3-codex

OpenAI18 cr400KDeepSolo + Pro
Claude

Claude Opus 4.8

anthropic/claude-opus-4.8

Anthropic37 cr1MBalancedTeam
OpenAI

GPT-5.5New

openai/gpt-5.5

OpenAI40 cr1MBalancedSolo + Pro
Claude

Claude Opus 4.8 (Fast)

anthropic/claude-opus-4.8-fast

Anthropic74 cr1MBalancedTeam
OpenAI

GPT-5.2 Pro

openai/gpt-5.2-pro

OpenAI180 cr400KDeepTeam
OpenAI

GPT-5.5 Pro

openai/gpt-5.5-pro

OpenAI237 cr1MDeepTeam

Credit floor vs. real cost

Floor is not the full review bill

Actual spend is lead + specialist + depth multiplier. Remedy adds execution overhead and possible re-review.

See pricing examples →

Critique Chat

Two free chat models

Chat does not spend PR review credits. Current roster: Ling 2.6 Flash and DeepSeek V4 Flash.

Open Critique Chat →

Start with the default stack

GLM-5V-Turbo lead, Qwen3.6 Plus specialist, MiMo v2.5 Pro for Remedy — then swap layers when your PR data says so.

Get started