Product
Ship log
App artifact v5.5.0
Ship log for reviewers and operators: the v4 AI Change Control Platform, Change Passports, merge policy as code, agent risk scoring, verified repair, and the Control Board—each drop explained in plain language, semver-tied to the app artifact.
Ship notes describe what changed in Critique and what operators and reviewers should notice—plain language, no fluff. Newest releases appear first.
v5.5 — Review pipeline hardening, API scope enforcement, and public assistant rate limits
At a glance
Review pipeline reliability
- QStash delivery processing is now idempotent. Finished GitHub webhook deliveries (
COMPLETED,IGNORED) short-circuit with HTTP 200 instead of re-entering the worker; a compare-and-set claim (updateManyon non-terminal status) ensures only one worker owns a delivery at a time. QStash retries onFAILEDdeliveries still work as the recovery path. - Review-run queueing no longer resurrects finished runs.
queuePullRequestReviewpreservesCOMPLETED,FAILED, andCANCELLEDruns for the same(repository, PR, head SHA)unless the caller passesforceRerun: true. Webhook-driven queueing updates metadata only and skips re-enqueueing; dashboard, chat, MCP, checkpoint override, and explicit PR-comment reruns still force a fresh run when you ask for one. Callers gatereview/run.requestedonwasRequeuedso duplicate deliveries do not spawn parallel pipelines. runReviewPipelineacquires a concurrency claim before work starts. Only one worker can own aQUEUEDrun, a staleIN_PROGRESSrun (no heartbeat for 10+ minutes), or a watchdog-markedFAILEDrecovery run at a time; duplicate QStash deliveries return HTTP 200 withskipped: trueinstead of starting a second sandbox. Sandbox progress heartbeats touchreviewRun.updatedAtduring live OpenCode review so healthy runs are not mistaken for stalls.- Watchdog recovery is better coordinated. After marking a stalled run
FAILED, Critique requeues the backend fallback first, then refunds sandbox OpenCode usage (idempotent refund paths unchanged). Refunds no longer race ahead of the requeue decision. - Characterization tests lock queue and credit-gate decisions in
lib/review/queue-decision.ts,lib/review/run-claim.ts, andlib/usage/credit-gate-decision.tsso future pipeline refactors must prove they changed only what they intended.
API security
- Scoped v1 routes no longer treat browser sessions as all-powerful API keys.
requireApiScopedefaults toapi-key-only; session cookies no longer satisfywrite:inference,write:builder,read:passports, or other scoped checks on programmatic endpoints. Use acrt_key with the right scope for Inference API, Coding Agent API, and passport export. No first-party dashboard UI called these v1 routes with cookie auth, so day-to-day product flows are unchanged.
Abuse prevention
- Public marketing assistants are rate-limited by default. Blog, legal, and dear-investors assistant routes share a per-IP limiter: 10 requests per minute when
CRITIQUE_PUBLIC_ASSISTANT_RATE_LIMIT_PER_MINUTEis unset. Set the env var to another positive integer to tune, or0to disable (previous legal-assistant-only opt-in limiter behavior). Limits are per-instance best-effort on serverless; pair with edge/KV limiting if you see sustained abuse.
v5.4 — Unified sandbox PR review
At a glance
- Premium PR review now runs collectors and OpenCode in one sandbox session. Critique clones the pull request once, runs deterministic checks (diff enrichment, typecheck, lint, tests, and configured security scanners), then hands that evidence to the OpenCode reviewer in the same workspace—so the agent starts from real command output instead of re-discovering the repo from scratch.
- AUTO, Sandbox, and OpenCode review modes share the same unified path. Policy labels that used to imply different backends now route to one deep sandbox review; GitHub API evidence plus backend synthesis remains the bounded fallback when sandbox execution fails or is unavailable.
- Collector-backed review is legacy. Explicit Collector policy still runs the older two-stage flow (sandbox collectors, then backend specialist synthesis) for teams that want it during migration; new defaults favor unified sandbox review.
- OpenCode review input carries full collector context when unified mode is on: changed-file evidence, deterministic findings, collector summaries, and the analysis report—so inline comments and verdicts can cite failing tests, lint, or scanner output the agent did not have to rediscover.
- Operator controls: set
CRITIQUE_REVIEW_UNIFIED_SANDBOX=true(default when sandbox-agent execution is enabled) to keep collectors before OpenCode; setfalseto restore OpenCode-only exploration inside the sandbox. Installation and repository policy cards describe the unified flow and fallback behavior. - Stall recovery unchanged in spirit: if a sandbox review hangs, Critique still finishes the PR with GitHub-backed evidence rather than leaving the check stuck—without starting a second sandbox pass.
v5.3 — Inference API expansion, workspace runtime ledger, OSS program, and operator surfaces
At a glance
Inference API
- Critique Inference API adds Kimi K2.6, GLM-5.1, and Trinity Large Thinking alongside DeepSeek V4 Flash, Tencent Hy3 Preview, and NVIDIA Nemotron 3 Ultra. Use the same OpenAI-compatible surface at
/inference-api(GET /api/v1/models,POST /api/v1/chat/completions) and the same Critique credit pool on managed billing. - The
/inference-apilanding page promotes among the lowest published rates we track for Kimi K2.6 and GLM-5.1, with per-model rate cards for Trinity, Hy3, DeepSeek, and Nemotron intro pricing where active. Western-hosted routing and private-by-default payloads stay the standard tier. - Training opt-in now covers Kimi K2.6, GLM-5.1, and Trinity Large Thinking in addition to DeepSeek V4 Flash and Hy3 Preview: opt in to prompt logging for future model improvement and pay 75% less per token (25% of list price). Nemotron Inference API rates are unchanged. Enable account-wide in Settings → Connections or per request with
X-Critique-DeepSeek-Training-Opt-In: true(legacy header name retained). - Operators must accept training opt-in conditions before previewing or enabling discounted rates on the Inference API page or in Settings — a clear warning, bullet conditions, and a consent checkbox so teams know logging applies before they turn the deal on.
- A new Billing section on
/inference-apiexplains how prompt and completion tokens convert to USD at the active rate card, then to Critique credits at Solo-plan economics, with a worked example and response headersX-Critique-Credits-ChargedandX-Critique-Estimated-Usdon non-streaming completions. - Landing page refresh: “Why switch,” value-promo rate cards, compatible-IDE strip (Cursor, Windsurf, Zed, VS Code, JetBrains), and expanded per-model pricing panels with training-opt-in strikethroughs.
- Managed Inference API and review traffic now identify Critique as the calling app with
https://critique.shattribution on upstream model requests, including sandbox review paths that proxy usage through the same headers. SharedbuildOpenRouterFetchHeaderscentralizes attribution across review, inference, and change-control paths.
Workspace runtime consolidation
- First-class workspace runs: new
workspace_agent_runtable andlib/workspace/session-runs.tspersist runtime executions (status, sandbox, raw run/session IDs, metadata) per Builder/Chat session. - Turn kinds replace mode-only dispatch:
turnKind(ask|plan|build|review|remedy) on turns and runs;POST /api/workspace/runsacceptsturnKindand returnsworkspaceRunin the payload. - Review ↔ workspace binding:
review_run_runtime_bindinglinks each PR review run to a workspace session/turn/run; the review pipeline creates and syncs bindings through queue, in-progress, and completion so PR review activity appears alongside Builder/Chat sessions. - Runtime adapters updated (agent-critique, Claude Code, Codex, external) to emit turn kinds and workspace run records; session sync propagates
turnKindon run upserts. - Operator doc:
docs/chat-builder-rebuild/WORKSPACE_RUNTIME_CONSOLIDATION_PLAN.mdcaptures the consolidation plan for the runtime model.
Repo dashboard
- Repo home quick setup card prompts new operators to pin a default repository or keep the all-repos queue; recommends up to four “most active” repos from current attention counts.
- Persistent default via cookie + API:
POST /api/dashboard/repo-home-preferencessaves choice;critique-repo-home-default-repositoryhttpOnly cookie restores landing view on return visits. - Pre-run credit estimate on repo-home review launches (run/rerun variants) calls the same estimate heuristics as the server before you queue.
Open Source Program
- New
/oss-programlanding page announces €100,000 in committed compute — €50k for OSS review/fix/build support, €50k for Critique beta capacity (faster reviews, larger repos, more models, expanded access). - Application flow with eligibility signals, checklist, FAQs, JSON-LD, partner thanks (Google Cloud, Microsoft Azure), and announcement graphic at
/marketing/oss-program-announcement.png. - Product facts updated to distinguish the program from the existing $5/mo student/OSS lane and Pro/Team foundation pricing.
Blog, SEO, and marketing
- Six new essays (8 Jun 2026): Vercel/local-vs-prod builds, Cursor dead-loop TDD, Next.js circular imports, silent
anyTypeScript failures, vibe-coder secrets checklist, and sandbox verification workflow. - Blog taxonomy is now three tracks: Essays, Product ships, and Model updates — index feed filters/sections by category with refreshed header layout.
- New SEO landing page
/vercel-build-failed-on-pull-requesttargets build-failure, TypeScript, circular-import, and sandbox-verification queries; added to sitemap andllms.txtQ&A. llms.txtexpanded with CodeRabbit seat-pricing alternatives, OSS program link, Vercel-build guidance, and richer buyer-intent classification hints for AI answer engines.- Alternatives + compare pages sharpened for CodeRabbit seat tax, free OSS alternatives, CodiumAI/Qodo keywords, and sandbox-verification positioning; competitor entries get dedicated SEO titles/descriptions.
- Global site keywords refocused toward sandbox verification, safe merging, and competitor pricing terms.
Developer surfaces
- New
/apiDeveloper APIs hub — one-page map of Inference API, Coding Agent API, and Platform/MCP withcrt_key guidance, canonical metadata, and JSON-LD. /developerspermanently redirects to/api.
Secrets / BYOK storage
- Optional BYOK secret tables fail gracefully when storage isn’t provisioned: shared
storage-availabilityhelper lets Anthropic, OpenAI, OpenRouter, and Crof key loaders degrade cleanly instead of hard-crashing local/preview environments.
SDK / monorepo
- New
@critique/sdk-corepackage (pnpm workspace) ships client-side PII scrubbing (scrubText/scrubPayload) and agent loop detection (LoopDetector,CritiqueLoopError) with tests; wired into roottest:unitandbuild:sdk-core.
Database / deploy
- Migration
20260608120000_workspace_runtime_runs: addsworkspace_agent_turn.turnKind,workspace_agent_run, andreview_run_runtime_bindingwith indexes for session/turn/run lookups. .env.examplenotes that production builds runprisma migrate deployand documents preview/migrate escape hatches.
v5.2 — Critique Inference API, spend transparency, and Western-hosted models
At a glance
- Repo-home review launches show a pre-run credit estimate before you queue — lane, lead model, specialist count, sandbox/runtime, BYOK note, and active promos (for example MiniMax M3 welcome). Ranges come from the same catalog heuristics as the server; actual credits appear on the post-run receipt.
- Completed review runs ship a unified spend receipt — Critique credits charged, models used, BYOK/promo flags, and a link to Insights → Cost. Remedy handoffs add an economics table so teams can choose managed Remedy, BYOA (vendor plan — not Critique credits), or a free fix prompt without burning the wrong budget.
- Insights acts as a finance companion: 7-day credit burn, month-end forecast, estimated Checkpoint savings, and drill-down links from cost attribution to repositories and evidence runs.
- GitHub Actions CI now gates every push and pull request with typecheck, unit tests, and lint — eating our own merge discipline in public. Live E2B smoke stays operator-driven.
- Critique Inference API is live at
/inference-api. OpenAI-compatibleGET /api/v1/modelsandPOST /api/v1/chat/completionsbill token spend from your Critique credit balance — same pool as review and Builder. Built for Coding Agent sidecars, internal tools, and any app that already has Critique credits. Western servers, private-by-default payloads, and a quick acceptable-use policy on the landing page. Use any OpenAI SDK withbaseURL: https://critique.sh/api/v1and acrt_key (write:inferenceorwrite:builder). - Inference usage dashboard at
/inference-dashboard(signed in) — charts for credits and tokens, model and API-key attribution, limit status, and a paginated activity log. Settings → Connections includes a mini Inference API panel with today/month spend, optional daily/monthly credit caps, request caps, credit reserve for review, enable/disable toggle, and DeepSeek training opt-in (GET/PATCH /api/settings/inference-api). Header links to the API docs and full dashboard. - Essay: Why we built the Critique Inference API — Western sweetener servers, NVIDIA-powered capacity, stress-tested launch, privacy defaults, and the DeepSeek V4 Flash training opt-in deal (75% off token rates).
- Docs & README — Inference API platform guide, Connections/billing cross-links, and repository map updates for v5.2.
- Hosted model lineup on Inference API (
GET /api/v1/models):- DeepSeek V4 Flash (
deepseek/deepseek-v4-flash) — default, Western-hosted, 1M context, 284B MoE (13B active). Standard private inference at $0.15 / $0.30 per million input/output tokens. DeepSeek-only training opt-in: allow prompt logging for future model training at 75% off ($0.0375 / $0.075 per M) via Settings orX-Critique-DeepSeek-Training-Opt-In: true. - Tencent Hy3 Preview (
tencent/hy3-preview) — 0.5 credits per PR review run; Inference API at 10% below market ($0.0567 / $0.189 per M vs $0.063 / $0.21). 205B MoE, 262K context, Western-hosted with no logs and no training data retained. - NVIDIA Nemotron 3 Ultra (
nvidia/nemotron-3-ultra-550b-a55b) — frontier MoE for agent orchestration and coding pipelines. Temporary intro pricing through 19 June 2026 (UTC): 2 credits per PR review run (then 3 credits shelf) and API token rates at 50% off market ($0.25 / $1.25 per M vs $0.50 / $2.50). After the intro window, API pricing returns to standard market rates. Nemotron joins the review/Remedy catalog on/modelswith promo strikethrough while the window is open.
- DeepSeek V4 Flash (
v5.1.0 — Transparent review runs, one Coding Agent product, and repo-first Platform
At a glance
- OpenCode review runs now show why Critique waited, nudged, followed up, accepted, or aborted. Controller decisions persist on the run so operators can audit agent behavior instead of treating sandbox review as a one-shot black box.
- Blocked-merge review runs open with one clear next step. Completed runs summarize verdict, passport, merge policy, and evidence quality, then offer a single primary action: Start Remedy, Queue BYOA, or Dismiss with feedback.
- Evidence quality is one badge in the UI: strong, usable, or limited. Legacy source labels collapse for faster scanning; full proof stays in the verdict panel when you need it.
- Repository quick settings add policy-aware presets for auth/security and payments/data paths: concrete path rules, stronger specialists, OpenCode review mode, and the strongest allowed lead model from your catalog.
- Finding feedback explains how it shapes the next run. Accepted, false positive, fixed, and suppress actions state that Findings Memory updates future reviews on that repository.
- Yarn repositories now get dependency audit signal in sandbox collectors (high and critical advisories from Yarn JSON/NDJSON output).
- Coding Agent is one product with two doors: Builder and the API. The public
/coding-agent-apipage, Workspace Builder, andPOST /api/v1/coding-agent/runsshare the same job model; responses link to Builder, status, stream, and passport when a draft PR exists. - See credit cost before you start a Builder or API run. Builder shows model floor, balance, and projected balance after the minimum charge and blocks submit when credits are insufficient; the API supports
"preview": trueon create without queueing a sandbox. - Automation gets idempotent creates, cursor-based lists, durable SSE resume, models discovery, compact status polls, cancel endpoints, safety budgets, and signed webhooks on the Coding Agent API for CI and long-running agents.
- Builder hands off to PR review when a draft PR exists. Open the PR, jump to passport/review for that branch, or return to the repository passport queue from the Branch / PR tab.
- Skills marketplace surfaces real performance on cards and detail pages (attributed runs, false-positive rate, label counts, leaderboard readiness) so publishers see how installs become ranked quality.
crt_keys can read marketplace skills at pinned versions (read:skills, content fetch by version, portable bundles for CI); installed skills can be the default PR-review lens per repo with attribution on manual launches.- Publish flow documents official badge criteria (identity, safe SKILL.md, performance visibility, fixtures, marketplace review) versus instant community listings.
- New installs land on repo home by default with cross-repo attention inbox at
/dashboard/pull-requests, repository context on every row, and webhook/cache health on repo picker rows (Healthy, Stale cache, Missed events, and related states). - The control-room experience is labeled Platform in navigation while repo-first PR work stays primary; settings still let you switch experiences.
- Connections is framed as a beta platform layer: scoped
crt_keys, MCP v0.1.0 (list_passports,get_review_run,queue_review), REST v1 for passports and review runs, Coding Agent API, native Linear and Slack, and Zapier-first wiring for the rest of the stack.
v5.0.0 — Agent Skill Marketplace, merge policy, passport exports, M3 welcome, Cursor SDK on Composer 2.5, Coding Agent API, and repo-first dashboard
At a glance
- The Agent Skill Marketplace is live at
/skills. Browse versioned critique-review lenses (official and community), search by category, and install skills into Critique Chat or any agent runtime with portablenpx skills addbundles—similar to the public skills.sh directory, tuned for merge-ready PR review. - Publishing requires a Critique account. Signed-in users can ship new skills or patch versions at
/skills/publish, choose public or unlisted listings, and decide whether performance stats appear on the global board or stay org-internal only. - A Skill Performance Leaderboard at
/skills/leaderboardranks lenses by real outcomes. Scores blend human acceptance, false-positive rate, actionable fixes, and post-merge incident correlation from finding feedback on review runs—not install counts alone. - Org-internal leaderboard mode shows the same metrics scoped to your GitHub App installation, visible only after sign-in, so teams can compare custom lenses without exposing private repo detail on the public board.
- Critique Chat links to the marketplace from the skills sheet, and the
critique-reviewlanding page now points to browse, publish, and leaderboard flows alongside the existing Markdown download. - Control Board now includes a natural language merge policy editor for repository scopes. Describe rules in plain English; Critique compiles them into strict policy JSON with MiniMax-M3 (falling back to MiniMax-M2.7 and DeepSeek V4 Flash when needed), shows JSON and canonical YAML previews, a live rule diff, assumptions, unsupported clauses, and a confidence badge before you save.
- Merge-time enforcement stays deterministic. The LLM only translates operator intent at compile time; Critique validates the result, renders
.critique/policy.yml, and evaluates pull requests with the same server-ownedCritique / Merge Policycheck—no model calls at merge time. - Merge policy v2 adds path-aware rules operators actually ask for: block or warn when changed paths match risk tags (auth, migrations, infra, and related lanes) or glob patterns; require touched test files in the PR; require a minimum count of current-head GitHub approvals (stale approvals on older commits do not count).
- Safety modes gate how far automation goes before a human signs off: Require review blocks apply and policy PR when confidence is low; Allow draft PR still opens a draft policy pull request for manual review; Ask followups holds apply until clarification questions are answered.
- Three save paths after compile: persist to the dashboard, open a branch that writes
.critique/policy.ymland opens a draft or ready PR (depending on safety mode), or copy or download YAML for manual review—without bypassing validation or confidence gates when review is required. - MiniMax M3 is on the paid PR review and Remedy catalog (
minimax/minimax-m3) with a two-week welcome price: 1.5 credits per run through 17 June 2026 (UTC)—the same effective floor as MiniMax-M2.7 today—then 3 credits per run on the shelf (double M2.7). The/modelstable and/pricingpage show the active promo, strikethrough shelf price, and countdown while the window is open. - Qwen3.7 Plus remains a paid review lane at 1.5 credits on lead, specialist, and Remedy-capable stacks. Legacy Qwen ids in saved repo policies still alias forward to
qwen/qwen3.7-plus; this release does not change that floor or move Plus into free chat. - Critique Chat is unchanged: Ling 2.6 Flash and DeepSeek V4 Flash only, with no extra chat fee. M3 and Qwen3.7 Plus are not in the chat model picker; older MiniMax or Qwen chat preferences normalize to DeepSeek V4 Flash so review models are not silently treated as free chat.
- A launch essay at
/blog/minimax-m3-qwen37-plus-welcome-june-2026welcomes MiniMax M3 and Qwen3.7 Plus for operators: vendor-reported SWE-Bench Pro, terminal, and multimodal benchmarks vs M2.7, Qwen3.6 Plus, GLM-5.1, Kimi K2.6, Composer 2.5, and Claude Opus 4.8, plus credit-math framing for when to use M3 vs Opus on review. - Blog promo strips with gold treasury styling now accept custom headlines and intros so new model launches (including this M3/Qwen essay) no longer reuse DeepSeek-only copy from earlier catalog posts.
- The
/modelsand/pricingsurfaces show the active MiniMax M3 welcome window with strikethrough shelf pricing, promo countdown, and clear post-promo credit floor so operators see 1.5 credits vs 3 credits without opening the essay. - Cursor BYOA now queues through the Cursor Agent SDK on Composer 2.5 (
composer-2.5) in Cursor cloud VMs — isolated repo clone, tool loop, PR URL, andworkOnCurrentBranchon the head you reviewed. Execution bills your Cursor plan, not Critique review credits. If the SDK path is unavailable in a given deploy, Critique falls back to the Cloud Agents REST API with the same handoff shape. - Settings → Cursor agent (BYOA) and completed review runs explain the SDK path: save your key once, optionally add operator instructions, Queue Cursor agent, then Open in Cursor when the cloud run URL is ready. JSON export at
GET /api/review-runs/{reviewRunId}/byoa/cursorstill works for custom CI. - A deep harness essay at
/blog/critique-cursor-composer-25-byoacovers Cursor as a top-tier agent runtime, Composer 2.5 specs and published benchmarks (SWE-Bench Multilingual, Terminal-Bench 2.0, pricing tiers), how that compares to Opus, GPT-5.5, Kimi K2.6, MiniMax M3, and Qwen3.7 Plus on Critique’s review catalog, and why review stays on Critique while fixes run in Cursor cloud — with Colossus framed as training scale, not where your agent executes. - BYOA docs at
/docs/platform/byoanow describe the Cursor SDK default; Claude Managed Agents and OpenAI Codex queued paths are unchanged from v3.5 — same “review on Critique, execution on your vendor” split. - Repo pickers on the repo-first dashboard now surface every connected repository, not a short alphabetical slice. The old empty state capped quick picks at six repos with no search;
/dashboard/pull-requestsinstead shows an always-visible searchable list (scroll or filter by owner or name) with a connected-repo count, and the header selector opens in a portaled menu so long lists are not clipped inside the scrolling dashboard shell. - Issues and review-run repo filters use the same full installation list as repo home—native selects and the shared selector receive all active repos from your GitHub App installations, ordered by full name.
- Critique Chat and Workspace repo menus list the full catalog when opened. You no longer need to type before any repo appears; search still narrows the list, and scrolling reaches repos beyond the first screen.
- Dashboard and Settings share one workspace chrome (sidebar, top bar, spacing) so repo home, review runs, issues, control, usage, help, and settings subpages feel like one signed-in product—not a mix of marketing chrome and dashboard panels.
- Change Passports now export a signed audit evidence bundle for compliance workflows. From the passport timeline, the passport queue, or a review run linked to a passport, download redacted JSON with manifest section hashes, snapshots, provenance, risk, merge policy decisions, remedy proof, incidents, and timeline—HMAC-signed for integrity. Batch export covers up to 25 filtered passports in one file. Copy positions this as audit evidence formatted for SOC 2 / ISO review, not a compliance certification.
- Platform API and signed-in routes mirror the same export:
GET /api/v1/passports/:passportId/export(API key withread:passports) andPOST /api/passports/exportfor filtered batches in the dashboard. - Review-run findings gain one-click feedback without leaving the dashboard. Expand a finding to mark Accepted, False positive, Fixed, or Suppress; action buttons no longer compete with the expand control. Private memory, suppressions, and the existing feedback ledger still apply by default.
- Optional anonymized model-feedback sharing is strictly opt-in. Grant or revoke consent per installation (or single repo) from Control Board → Memory, export a signed batch of queued examples, or tick Share anonymized feedback on a finding when consent is active. GitHub slash commands accept
--share-feedbackonly with the same consent gate—no silent training export from PR comments. - Anonymized examples strip patches, file contents, and secrets before they leave your account; they reinforce Skill Performance Leaderboard metrics at
/skills/leaderboard, not a promise of automatic model training. - Coding Agent as API ships for automation teams: public overview at
/coding-agent-api,POST /api/v1/coding-agent/runsto start an OpenCode-backed run (repo + prompt + model),POST …/runs/{id}/messagesfor follow-ups, andGET …/runs/{id}for status, timeline events, patch, and draft-PR linkage. Choose managed billing (Critique credits) or pass an OpenRouter key for that run only; optional draft PR publish and validation mode match Builder semantics. - Coding Agent API now keeps a warm OpenCode session between turns on the same run. After the first turn finishes, status becomes
idle(sessionActive: trueuntilsessionExpiresAt) while the E2B sandbox and OpenCode session stay connected—follow-ups onPOST …/runs/{id}/messagessend the next prompt into that live session instead of spinning a new sandbox with chained summary text. - Live activity streams over SSE at
GET /api/v1/coding-agent/runs/{id}/stream. Subscribe while a turn isrunning(optional?after=event cursor) to receive OpenCode activity rows and a terminalrun.statusevent when the turn reachesidle,completed, orfailed. - Close a persistent session explicitly with
{ "endSession": true }on the messages route when automation is done—Critique tears down the sandbox and marks the run completed. If the session already expired or predates persistent sessions, follow-ups still fall back to a chained new run with bounded prior context so older integrations keep working. - Workspace adds a durable agent queue in the explorer: line up prompts for Critique, Claude Code, or Codex in Ask or Build mode, send or remove items via
/api/workspace/queue, scoped to your active repository and chat or builder session — so you can stage work before kicking off a long run. - Workspace inspector → Processes shows live run and request state (streaming chat, builder job, sandbox, retrieval) in one panel so operators can tell whether work is still moving without digging through raw logs first.
- Settings → Agents groups Cursor, Anthropic, and OpenAI BYOA key panels in one place with consistent copy and links to
/docs/platform/byoa, so agent-owner setup is not scattered across unrelated settings tabs. - Insights at
/dashboard/insightsgives operators and leadership one signed-in hub for velocity, risk, spend, retrospectives, compliance exports, staffing signals, and optional fleet benchmarks — grounded in Change Passport and gate evidence, not a separate analytics silo. - Velocity vs risk charts plot daily merges against high-risk merges over 30- and 90-day windows so you can see whether throughput rose while risky merges held flat or fell — the “speed vs safety” story in one glance.
- Cost attribution for the current month breaks down Critique credits by repository, surfaces your most expensive pull requests, and estimates how much BYOK routing saved versus bundled credits on the same usage.
- Blame-aware retrospective reports summarize a sprint or date range: merges, incidents linked to passports, checkpoint and merge-policy overrides, and heuristic likely correct vs likely incorrect override signals (14-day incident window) — with optional AI narrative and cited evidence, not individual blame callouts.
- Generate retrospectives or staffing reports from Insights (or the reports API) for a chosen installation and period; staffing view projects weekly review load and suggests when added senior reviewer capacity may be needed to keep your current risk posture.
- One-click compliance period export for a calendar month builds a signed outer bundle plus per-passport audit JSON: review trails, merge policy decisions, checkpoint gates, coverage summary (% reviewed, override counts), and the same SOC 2 / ISO audit evidence disclaimer as single-passport export — not a certification.
- Fleet insights (strictly opt-in) place your installation in an anonymized cohort (similar stack size and risk tier). When enough teams participate, you see benchmark comparisons (for example auth-path block rates) and dry-run policy suggestions — no repo names or secrets leave your account.
- Daily insight rollups update live as reviews complete, merge policy and checkpoint events fire, overrides are recorded, usage is metered, and merged pull requests close. After upgrade, account operators can backfill historical daily metrics from existing reviews, gates, and usage so 30- and 90-day charts are useful immediately.
v4.2.0 — Repo-first PR dashboard and GitHub inbox
At a glance
- The signed-in dashboard can now center on pull requests per repository, not only the legacy control-room home. On first visit you choose Repo home (inbox-style PR work) or Control room (passports and platform overview). Switch anytime under Settings → Workspace; your choice is remembered across sessions.
- GitHub pull requests and issues are cached in Critique so the PR table, repo picker, and issues list load from durable inbox data instead of hammering the API on every click. Refresh GitHub on a repo pulls the latest open PRs; webhooks keep snapshots warm when
pull_requestandissuesevents arrive. - Repo home at
/dashboard/pull-requestsgives you a searchable PR table with attention states (needs review, running, blocked, passed, failed, closed), a side inspector for the selected PR, linked-issue context, checkpoint blockers, and one-click Run review / Rerun / View live actions. - Quick settings on launch let you pick model lane (auto, fast, balanced, premium, or a specific model), runtime (auto, GitHub-backed, collector sandbox, OpenCode agent), depth (standard or deep), whether to post the GitHub review on completion, and which context packs to include (linked issues, incident signals, repository memory). Those choices apply to the queued run—not ignored defaults.
- Global and filtered PR views use the same inbox rows: filter by repository, text search, GitHub state, and dashboard views such as needs attention, running, or blocked.
/dashboard/issueslists cached issues for the selected repo with working repo navigation. - New dashboard APIs back the UI (repositories, repo home, refresh, issues, global pull requests, review queue, PR chat) with checkpoint-aware 409 responses when the Agent Firewall blocks a run. Settings adds Agents and Appearance entry points alongside the experience toggle.
- Control room remains one click away for passport queues, platform connections, and cross-repo activity when you need the v4 operating picture—not a forced migration.
v4.1.2 — Deep Critique Review Skill upgrade
At a glance
critique-reviewnow ships as a much deeper review protocol, not just a compact findings-first prompt. The skill now starts with explicit intake and triage, names review mode up front (PR/diff, codebase slice, or focused lane), and pushes agents to classify risk before they start writing findings.- The built-in runtime skill now has dedicated reference packs for adaptive depth and blast-radius triage, stack-specific review lenses, and a stricter output contract. That means Critique agents can review backend/API paths, React/frontend changes, migrations, async jobs, infrastructure, and AI-agent systems with different checks instead of one generic pass.
- False-positive control is materially stronger. The upgraded rubric now forces research-before-reporting on ambiguous issues, warns against pattern-matching common framework-safe constructs as vulnerabilities, and separates actionable findings from open questions and residual risk more aggressively.
- The public downloadable Markdown at
/api/skills/critique-reviewand/skills/critique-review.mdwas upgraded in parallel. Teams installing the free open-source skill now get the deeper standalone protocol, while Critique’s internal runtime copy keeps the richer multi-file reference structure. - The public landing page at
/skills/critique-reviewnow reflects the new shape of the skill. It describes adaptive review depth, stack lenses, independent-review discipline, and the stricter review artifact contract instead of only the original launch framing.
v4.1.1 — Critique Review Skill, downloads, and agent-owner updates
At a glance
critique-reviewis now a built-in, open-source skill for AI code review. It is designed for agents working in Codex, Cursor, Claude, and similar coding environments: start from the diff, inspect call sites and contracts, prioritize correctness/security/data-loss risks, and publish findings before summary.- A public skill landing page ships at
/skills/critique-review. It explains why Critique made the skill, how it compares with broad code-quality and PR-request skills, gives teams a direct Markdown download through/api/skills/critique-review, and links the standalone MIT-licensed GitHub repo atgithub.com/repath500/critique-review. - Critique Chat can recommend the built-in skill from inside the product. When the conversation calls for code review help, the built-in skills context can surface
critique-reviewalongside the rest of the Critique system guidance. - The homepage now carries a Latest updates section linking the skill, BYOA, BYOK, and the full ship log so production visitors can see the newest agent-owner workflows without hunting through docs.
- Type-check cleanup keeps the production audit green. The BYOK pricing preview, BYOA agent panel, Slack client, and built-in skills chat label now compile cleanly under the full TypeScript pass.
v4.1.0 — Connections, Platform API, and Agents (MCP)
At a glance
- Critique enters your engineering ecosystem. A new Connections hub at
/settings/connectionslets you link Linear (API key today), manage where each connection applies (Chat, Review, Remedy, Builder), and issue Critique API keys (crt_) for tools that call Critique on your behalf. - Platform API (v1) exposes passport and review-run truth to automation:
GET /api/v1/passportsandGET /api/v1/review-runs/:idaccept your browser session or a scoped API key — so internal portals and scripts can read the same queue the dashboard uses without scraping HTML. - Agents API via MCP ships at
POST /api/mcp: connect Cursor, Claude Desktop, or any MCP host with acrt_key to list passports, fetch a review run, or queue a review for a pull request head commit. The Workspace inspector shows your endpoint and setup steps. - Dashboard control room now surfaces platform connections (GitHub, Linear, Sentry, Jira, Vercel health), a passport queue preview, cross-stack activity (incidents, firewall blocks, merge policy, evidence, delivery failures), and Linear workspace panels when connected — so the home screen reads as an operating picture, not only install metadata.
- Chat gains
searchIssuesAndRoadmapwhen Linear is connected, so repo work and ticket context can sit on one thread. Slack, Sentry OAuth, and outbound webhooks remain on the roadmap; production incidents still ingest via Control Board webhooks as in v4. - Architecture and API contracts for partners live in
docs/critique/critique-platform-api-agents-and-connections.md. A launch essay at/blog/critique-v41-connections-ecosystemexplains why Critique is building outward from passports, not inward toward another dashboard tab.
v4.0.0 — The AI Change Control Platform
At a glance
- Critique is now an AI Change Control Platform. The product is reframed around the merge boundary: every pull request becomes a Change Passport that records provenance, risk, gate events, evidence runs, merge-policy decisions, Remedy proof, findings memory, and incident learnings. The dashboard gravity shifts so Passports come first and evidence runs become a commit-level drill-down.
- Change Passports are the new system of record.
/dashboard/passportsis the primary queue (filterable by repo, risk, state, and verdict) and/dashboard/[owner]/[repo]/pulls/[n]renders the PR-level passport with summary, provenance, risk, gate, evidence runs, merge permission, Remedy proof, memory, incidents, and a single chronological timeline. - Agent risk scoring persists a score, band, and reasons on review runs and flows into every operator surface. Evidence Contract v1 normalizes legacy review artifacts and exposes blocking decisions plus per-finding evidence so a block always cites something.
- Merge policy as code ships a schema, evaluator, and the
Critique / Merge Policycheck. Policy lives in the dashboard or a repo file (.critique/policy.yml|yaml|json) and runs in dry-run, warn, or enforce. Operator overrides record provenance and can patch the GitHub check-run status. - Verified repair stores a proof bundle on Remedy attempts (patch hash, validation, push/export mode, verification linkage). Findings memory surfaces suppressions and a feedback ledger; incident feedback ingests Sentry, Linear, Jira, Vercel, and generic/manual events, links them to passports, and drafts learnings you can promote into rules.
- Control Board unifies Gate, Policy, Delivery, Memory, and Learnings into one operator surface, including inline merge-permission controls. Checkpoint is presented as the Agent Firewall in the UI, but GitHub check names stay stable (
Critique / Checkpoint) so branch protection keeps working. Legacy/dashboard/change-gateand/dashboard/checkpointredirect into the Control Board; the legacy automation editor is preserved directly. - A launch essay at
/blog/critique-v4-change-control-platformexplains v4 vs v1–v3, why passports replaced review-as-product, how AI-powered sandbox reviews continue as evidence runs, and the WHO / WHY / WHAT NOT control framing — plus the six operator surfaces and five control layers.
v3.6.0 — BYOK: CrofAI (nahcrof) direct billing
At a glance
- Bring your own key now supports CrofAI alongside OpenRouter: save a Crof API key in Settings and Critique routes chat and sandbox PR review through
https://crof.ai/v1with OpenRouter catalog ids mapped to Crof model slugs at request time. - The settings panel explains why the in-app model picker can look narrower than Crof’s live catalog (OpenRouter-shaped ids vs Crof slugs; frontier GPT/Claude lanes are OpenRouter-only).
- When both Crof and OpenRouter keys are saved, Crof takes precedence; remove the Crof key to fall back to OpenRouter BYOK.
v3.5.0 — BYOA: Cursor, Claude Managed Agents, and OpenAI Codex
At a glance
- Bring your own agent now includes three queued execution paths from completed review runs: Cursor Cloud Agents, Claude Managed Agents (Anthropic API with PR checkout), and OpenAI Codex-style runs via the Responses API—each using your encrypted API key server-side, not Critique execution credits.
- Settings adds panels for Cursor, Anthropic, and OpenAI keys; review run pages show handoff promos with queue actions, JSON export, and latest run status for each provider.
- Docs and essays at
/docs/platform/byoaand new blog posts (critique-welcomes-cursor,critique-welcomes-claude-code,critique-welcomes-codex) explain when to use Remedy vs BYOA and how billing splits between Critique and your agent vendors.
v3.4.0 — Workspace: review, chat, build, and repair in one place
At a glance
- Workspace is now the recommended signed-in surface at
/workspace, unifying Critique Chat, Builder history, review context, and Remedy handoffs into one left rail, one repo context bundle, and one credit ledger. - The
/chatand/builderroutes still work; Workspace is the merged control room — same shell styling, searchable repo and model selectors, Build and Plan lanes, activity timelines, and microphone capture in the composer. - A launch essay at
/blog/introducing-critique-workspaceexplains why the four surfaces merged, what shipped in v0.3.0, and how Checkpoint and automated PR review stay on the GitHub path while humans operate from Workspace. - Docs under Chat & workspace describe the mode handoffs (
?mode=build) and how Workspace relates to Dashboard, Remedy, and the GitHub App. - SEO & discovery:
/workspaceis in the sitemap; the launch essay ships FAQPage + BlogPosting@graphJSON-LD, RSS alternates, and/llms.txtcitations so Google and AI answer engines can index and quote the Workspace story.
v3.3.0 — May model catalog, pricing calculator, and marketing polish
At a glance
- The review model catalog refreshed for May 2026: permanent credit floors on DeepSeek V4, MiMo, Ling, and MiniMax M2.7; new lanes including Opus 4.8, Qwen3.7-Max, Gemini 3.5 Flash, Grok Build 0.1, and Step 3.7 Flash; OpenRouter
:floorrouting for eligible models; and legacy Qwen IDs remapped so saved defaults keep working. - Critique Chat now offers two models — Ling 2.6 Flash (default) and DeepSeek V4 Flash — with pricing and banner copy aligned to the slimmer roster.
- The PR review cost calculator moved under
/pricing/calculatorwith sliders, plan-aware ROI math, and a link from the main pricing page; the old/tools/pr-review-cost-calculatorURL redirects permanently. - The public
/checkpointlanding page was redesigned: editorial layout, stats instead of pill badges, a rule catalog table, a numbered pipeline, and checkbox-driven demo feed without pill clutter. - Blog posts use a centered editorial layout (Instrument Serif headings, DM Sans body); the May model-catalog spring post includes an optional +100 credit claim for eligible readers.
- SEO linking improved across compare pages, the sitemap, footer competitor links, and internal blog CTAs.
v3.2.0 — Pricing reset, BYOK OpenRouter, and safer review credit gates
At a glance
- Critique pricing now separates two buying paths: bundled monthly credit plans for teams that want one Critique invoice, and a new Bring Your Own OpenRouter Key harness for teams that want OpenRouter to bill model tokens directly.
- Customer-facing tiers now read as Solo, Pro, and Team with larger review credit pools, clearer plan positioning, and pricing copy that explains when to choose bundled credits versus direct provider billing.
- The new $8/month BYOK harness covers Critique orchestration—sandboxes, OpenCode PR review runs, chat streaming, GitHub checks, repository retrieval, encrypted key storage, and usage ledgers—while model spend stays in the customer’s OpenRouter account.
- Signed-in users can save an OpenRouter key in Settings. New Critique Chat calls and sandbox-native OpenCode PR review calls prefer that key when present, without exposing the saved secret back to the browser.
- BYOK model usage is now marked as externally billed in usage records so connected OpenRouter-key runs remain traceable without draining Critique review credits.
- Automated PR review now has a low-balance credit gate before expensive work starts, helping teams avoid starting a model-heavy review when the remaining bundled credit pool is already below the review floor.
- The public pricing page, home pricing preview, dashboard usage page, machine-readable
/pricing.md, SEO metadata, and the new pricing blog all explain the updated credit calculation, Checkpoint’s spend-control role, and the BYOK tradeoff in customer-facing language.
v3.1.0 — Critique Checkpoint, the deterministic PR trust gate
At a glance
- Critique Checkpoint is now the first mini-app under critique.sh: a pre-review trust layer that runs before Critique review, before review credits burn, and before maintainers spend attention on low-trust pull requests.
- The public
/checkpointlanding page introduces the product with a live gate demo, interactive rule toggles, rule catalog cards, the Checkpoint → Critique Review → Merge pipeline, dry-run positioning, and “coming soon as open-source” copy. - Signed-in teams now get a global Dashboard → Checkpoint overview with gated-today counts, blocked/warned/passed totals, top triggered rules, enabled repository count, recent events, and direct repo configuration links.
- Each repository has its own
/dashboard/[owner]/[repo]/checkpointconfiguration surface with Checkpoint enablement, Dry Run / Warn / Block mode selection, a Checkpoint-only switch for repos that do not want Critique review after a pass, autosaved rule cards, editable thresholds, slop pattern textarea, language allow list, per-login allow/block overrides, and an expandable recent gate log. - Every gate creates a durable event detail page at
/dashboard/[owner]/[repo]/checkpoint/events/[eventId]showing the contributor fingerprint, PR metadata, rule-by-rule threshold/value receipt, decision timeline, GitHub check link, and a maintainer “trust contributor” override action. - The backend now persists Checkpoint policies, overrides, and gate events in Prisma; evaluates deterministic identity, activity, and content-shape rules; publishes a dedicated
Critique / CheckpointGitHub check run; and short-circuits the pull request webhook before review queues when Block mode fails. - Checkpoint can run with or without Critique review automation. Repositories can leave reviews on after a pass, or switch to standalone Checkpoint-only mode where the gate publishes its GitHub check and event log without queueing Critique review.
- The launch surface now tells the story more clearly in production: the public landing page leans harder into the live gate demo, receipts-style rule evaluation, and Checkpoint-first pipeline, while public visitors fall back cleanly when auth session state or advanced WebGL effects are unavailable.
- Final launch hardening adds native Postgres enum migration correction, idempotent gate-event handling, Checkpoint check-run reruns, GitHub check success updates when maintainers override an event, accessible dashboard controls, a Checkpoint launch essay, RSS output, and
llms.txtcoverage for AI-facing discovery.
v0.3.0 — Workspace unification, live usage truth, and sturdier OpenCode review recovery
At a glance
- Critique advances from v0.2.8 to v0.3.0 by combining the May 5 shipping wave into one coherent release: the product now reads more like one system across Workspace, Builder, review runs, and the public-facing usage story.
/workspacenow serves as the shared operating surface for both chat and Builder history, with one left rail for recent chats and Builder runs, cleaner mode-aware actions, and Builder shell styling that matches the rest of the signed-in workspace instead of feeling like a separate console.- Builder is materially more usable for real repo work: repository and model selectors are searchable, Build and Plan lanes are explicit, activity reads as a timeline rather than raw log output, and the composer can publish to a prompt-derived
codex/...branch and open or reuse a draft PR after the run finishes. - OpenCode review and Builder execution are more durable under failure. Critique now bounds stalled turns, resets timers on real activity, aborts idle sessions with context, marks stale sandbox runs explicitly, and queues fallback publishing so a wedged sandbox is less likely to strand a pull request without a verdict.
- Usage and credits are now much closer to source-of-truth accounting. Critique persists per-call OpenRouter-style usage for both backend and sandbox paths, ships it through signed ingest with QStash retry fallback, reconciles dashboards and totals against the durable ledger, and preserves failed or rerouted calls so large review spend no longer disappears behind misleadingly tiny or empty usage rows.
- Signed-in review run detail is easier to scan during active triage: verdict and live status come first, KPI blocks are larger, findings expand into available space when no snapshot exists, in-progress states show skeleton placeholders instead of dead air, and usage/models move behind expandable sections while audit history reads more like a timeline.
- Operators also get a clearer support and policy surface: the Help area adds onboarding progress, workspace health, recent failures, guided fix flows, and support shortcuts, while installation-level Light, Standard, and Max presets explain review behavior and model posture before teams save policy changes.
- Signed-in workspaces now have a Help surface at
/dashboard/helpwith a setup checklist, live workspace health, a recent-failures feed, an activity timeline, a recommended next step from account state, guided fix flows, contextual doc shortcuts, PR or run id lookup, categorized product feedback, and pre-filled support mailto with an account context bundle—so operators move from “something broke” to the right dashboard view faster. - The Help page now shows at a glance how far through onboarding a workspace is: the checklist displays a completion counter and segmented progress bar, completed steps are marked through, each health row carries a live status indicator that pulses when healthy, the activity section renders as a proper timeline, and guided fix flows are numbered so the right path is obvious without reading every accordion header.
- Installation defaults are easier to tune: Light, Standard, and Max review presets explain the expected GitHub check behavior, specialist coverage, rerun rules, execution mode, and model choices before teams save a policy.
- Sandbox OpenCode reviews now capture usage through an in-sandbox OpenRouter logger and richer session extraction, improving token, cache, reasoning, cost, and per-call credit traceability across nested agents, follow-up turns, Builder, Remedy, and stalled review runs.
May 2026 — Sandbox OpenCode metering, liveness, and stalled-turn handling
At a glance
- OpenCode sandbox reviews cap how long a single hung turn can block the run, give stall recovery more room before the session times out, and check for liveness on a shorter cadence so quiet stretches surface as activity in the live feed sooner instead of looking frozen.
- Nested sandbox agent work inside OpenCode (subagents and related tool paths) now rolls into the same token and credit accounting as the primary review path, so dashboards and per-run usage reflect the full OpenRouter-style workload instead of undercounting nested calls.
- Critique captures and merges OpenCode usage snapshots through the session so per-model spend stays traceable after nudges, retries, or partial completions; credits shown in dashboards still follow the stored charged amount wherever it exists so usage views, review summaries, and credit pool math stay aligned with what was actually deducted.
May 2026 — Model catalog refresh across Chat, review, and Remedy
At a glance
- The public model catalog now reflects the new routing ladder: Grok 4.3 replaces the older Grok slot at a lower floor, Qwen3.6-35B-A3B replaces the retired Qwen3.5-27B lane, Ling-2.6-Flash joins as a new 1-credit fast agent model, and Qwen3.6-Max-Preview joins as the higher-end Alibaba lane.
- Pricing moved in both directions to match current shelf reality: GLM-5.1 dropped by 1 credit, MiMo v2.5 rose by 1 credit, and KAT Coder Pro V2 returned to its normal 2-credit price after a month-long discounted partnership window.
- Critique Chat’s picker was cleaned up to match the intended free-chat roster, and saved defaults pointing at removed Chat IDs remap automatically so existing threads reopen without manual repair.
May 2026 — Speech input across Chat and Builder
At a glance
- Critique Chat voice input now runs through OpenRouter's dedicated speech-to-text path instead of a conversational audio workaround, which makes transcription behavior simpler and easier to track.
- Builder now has the same microphone capture flow as Chat, so operators can dictate a build prompt directly into the workspace composer before launching a run.
- Speech transcription responses now carry the provider generation identifier alongside the transcript, which gives operators and support a cleaner handle for tracing voice requests.
May 2026 — Repository map intelligence, usage clarity, and snapshot-first dashboards
At a glance
- The dashboard repository vector map now reads like an architecture brief: it highlights central nodes, fan-out risk, isolated files, and test relationships, and a Signals panel calls out why parts of the topology matter—not only that they exist.
- Act on this map connects the graph to the rest of Critique: open a node or neighborhood in Chat with a prefilled prompt, copy an agent handoff, or export JSON, Markdown, agent handoff text, or DOT/Graphviz for tools outside the app.
- From the same view you can run GitHub-oriented actions—copy review-ready Markdown, draft an issue from the selection, or attach the selected context to a PR by number—without losing map context.
- The usage area emphasizes managed execution instead of raw provider plumbing; hidden model visibility rules are respected, and charts and labels resolve to catalog display names where Critique has them.
- PR reviews now support a clearer premium-first path: when a repository is set to Auto or Premium OpenCode Review, Critique attempts the full OpenCode-led audit first, falls back to the collector-backed review path if that premium run fails, and labels the resulting review authority so operators can tell whether the output came from native OpenCode, collector fallback, or backend-only synthesis.
- Metering and credits now line up with how reviews actually run: multi-pass work (reasoning plus structured output) rolls into one honest usage picture, OpenRouter-style native and reasoning token fields parse consistently, sandbox and OpenCode completions surface customer-safe labels, and credits no longer snap tiny when a large uncataloged-token call would warrant more—legacy rows can correct upward when token evidence shows they were too low. Assistant and fix-prompt traffic now leaves durable usage records.
- OpenRouter traffic defaults site attribution to Critique’s public home, and review calls send the current attribution headers expected by the provider.
- Dashboard GitHub data defaults to persisted snapshots in Critique: full reconciliation runs after Connect or Re-Sync repositories, not on ordinary loads of usage analytics or review-run detail—the UI states that repository access stays cached until you re-sync.
April 2026 — PR review runtime unification and live-run clarity
At a glance
- PR review routing now separates OpenCode agent, Collect sandbox, and GitHub-backed review types so operators can choose the exact backend and understand whether an agent stream should exist.
- OpenCode agent reviews now start from a single sandbox path: Critique prepares the PR context, launches the OpenCode audit, streams the run, and consumes the final review output without a separate collector sandbox blocking the agent.
- Review usage and live-run diagnostics are more accountable: token-bearing calls no longer display zero credits from legacy rows, and sandbox setup or failure events remain visible even when the agent never reaches a rich tool stream.
April 2026 — OpenCode operator hardening for longer review sessions
At a glance
- The live review stream is easier to read at a glance: tool calls, thinking, and assistant messages now surface with clearer tags and cleaner output, and the dashboard shows whether the sandbox, OpenRouter, and QStash prerequisites are ready.
- Long-running sandbox reviews now recover more gracefully when the agent goes quiet. Critique can send a follow-up into the same OpenCode session, and if the sandbox stalls entirely, the review can fall back to the backend publish path instead of dying silently.
- Review prompts now push for deeper autonomous work by default: Critique expects subagents to be used early on non-trivial changes, avoids permission-seeking loops, and treats missing env-dependent validation as a cue to switch to targeted checks instead of stalling.
April 2026 — Workspace Build: branch-aware sandboxes and the full Builder model roster
At a glance
- Build mode in
/workspacenow offers searchable repository and model pickers; models come from the same remedy / Builder catalog as/builder, filtered by the selected repo’s plan (ultra-only checkpoints stay ultra-only). - You can choose a Git branch or tag (GitHub-backed list plus manual ref), optionally set a sandbox-only branch (
git checkout -b) before OpenCode runs, and those choices persist on the Builder job record for later inspection. - The builder execution path checks out the requested ref in E2B, then creates the local work branch when provided—still no automatic push to GitHub; PR flow stays manual.
April 2026 — Transparent automation credits end to end
At a glance
- Usage analytics now reconcile OpenRouter-style payloads (including nested cache fields) across review, remedy, sandbox OpenCode completions, and builder jobs so token math and credits derive from identical inputs.
- The automation ledger separates review-agent, remedy, and builder rows; each exposes prompt versus completion totals, billed credits, latency, and textual purpose fields so accountants can chase down the exact workload.
- When sandbox PR reviews emit granular OpenRouter completions, aggregated lead rows skip double-accounting—the quota matches one OpenRouter session on the bookkeeping side while granular rows carry the explanatory trail.
Search paths
Looking for the product pages behind these releases?
The changelog helps with branded discovery, but the pages below are the ones built for comparison, buyer research, and broader AI code review search intent.
Open source
Hundreds of PRs, no time to review?
PR control for foundations and high-volume OSS — Pro/Team for scale, verified OSS lane, OSS credits on request.
Hub
All Critique guides
PR control, git control, manage pull requests at scale, AI change control, and AI code review — one index for operators.
PR control
PR control for high-volume teams
Gate slop before review spend, Control Board operations, Change Passports, and auditable merge decisions on GitHub.
Git control
Git control at the merge boundary
Govern what merges without replacing Git — Agent Firewall, unified policy, and passports for platform teams.
Operations
Manage pull requests at scale
Triage queues, weekly operating rhythm, and PR management when agent volume explodes.
Launch essay
Critique v4 — full platform breakdown
v3 vs v4, passports, evidence runs, WHO/WHY/WHAT NOT, and why Critique is not just another review bot.
Critique v4
AI change control guide
Merge-boundary governance: Control Board, Change Passports, gate → review → merge phases.
Company
About Critique
Not just a code review CI tool — the real control board for pull requests.
Guide
AI code review guide
What AI PR review does, multi-model review, rollout steps, and limits of comment-only bots.
Comparison
Best AI code review tools
2026 shortlist with pricing, model stacks, and fit by team shape.
Head to head
Critique vs competitors
CodeRabbit, Copilot, Greptile, Qodo, Cursor Bugbot, and more.
Pricing
AI code review pricing
Shared credits, BYOK, student/OSS plans, and PR review cost at scale.
Models
Code review model directory
Lead and specialist models by speed, cost, and reasoning depth.
Essays
Blog and ship log
Product notes, buyer guides, and release updates.