Product

Ship log

App artifact v6.5.0

Ship log for reviewers and operators: newest releases first — v6 Platform API write paths, lifecycle webhooks, and lighter default UI; v5 marketplace, Coding Agent API, and Inference API; v4 Change Passports and merge policy as code — each drop in plain language, semver-tied to the app artifact.

Ship notes describe what changed in Critique and what operators and reviewers should notice—plain language, no fluff. Newest releases appear first.

v6.5 — OpenCode review runtime hardening and PR chat readiness

26 June 2026

At a glance

Sandbox-native review runs get a longer worker window — the QStash review worker now has a 30-minute serverless budget instead of five minutes, matching the reality that a single OpenCode sandbox review may need to clone, inspect, run commands, call OpenRouter, and write the final artifact before returning.
OpenCode review turns no longer monopolize E2B command execution — the long /session/:id/message request runs through an E2B background command handle so Critique can keep polling OpenCode activity, writing Control Board heartbeats, and aborting stalled turns with diagnostics.
Duplicate review claims wait for real sandbox budgets — in-progress review runs are no longer reclaimed after ten minutes while a legitimate OpenCode pass may still be running; the claim window now sits above the review worker duration.
Watchdog activity checks respect backend heartbeats — stale-run detection uses the newest of sandbox-live board activity and review-run progress updates so recovery attempts are not repeatedly failed and requeued while they are still active.
Interactive PR chat is production-wired for @critique-bot — hosted installs document and default to @critique-bot while still accepting the legacy @critique alias. Help text, docs, and environment examples now show the hosted bot login.
PR comment threads have first-class storage — the pr_comment_thread migration records trigger comments, prompts, context, replies, status, and errors for GitHub PR chat runs.
Comment webhooks can self-heal stale repository sync — PR comment handling can upsert the repository from the webhook payload when the installation sync has not materialized the repo row yet, instead of failing with an opaque missing-repository error.

Docs: Bot mentions · PR review · Change control

v6.4 — Agent Workspace usability, operator boundaries, and layout polish

26 June 2026

At a glance

The session header is slimmer — toggle controls, connected-agent settings, and a single status line with the active lane title. The old cloud icon row, duplicate runtime labels, frame chips, and bottom legend bar are gone so the transcript column has more room.
The left session rail is tabbed — browse Lanes, Repos, Sources, and Projects instead of one long scroll. The mobile session sheet uses the same tabs. Runtime picking and prompt/sandbox mode live in the composer footer and agent settings, not as always-visible rail sections.
Cloud lane rows are compact — each row shows title, status, repository, and age; continue, sync, rename, fork, pin, color, archive, and project moves sit in a … menu. Pinned, running, and attention lanes still stand out through status and presentation cues.
Lane lists are deduplicated — the rail is the place to browse lanes. The empty canvas shows at most three recent lanes, and the composer Target menu only lists continuation targets that match the selected runtime and repository.
The sticky composer dock is an input bar again — the command-center title, dense preflight chip rows, dispatch banner, and duplicate runtime switcher are removed for the dock. One compact route line, a Preflight summary control for detail on demand, the target menu, and the footer dropdowns carry runtime, model, and agent context without repeating it three times.
Composer menus and slash commands are readable — footer dropdowns, the target menu, and the slash-command palette render on elevated, opaque surfaces with correct stacking so options stay legible over the transcript, sticky dock, and inspector instead of showing through as transparent layers.
The composer dock no longer nests its own scroll — only the message field scrolls when text grows; the dock wrapper stays fixed at the bottom without a separate scrollbar fighting the transcript.
Deployment setup language stays out of the main workspace — when something blocks a runtime, users see connect your key or not available in this workspace, not encryption storage labels, sandbox template names, or env-var instructions. Inline blockers and provider cards use the same plain language.
Provider and agent key setup moved to a dedicated settings sheet — Connected agents in the header opens Anthropic and OpenAI connection in a side sheet instead of a permanent block inside the sticky composer dock.
The operations inspector floats by default on desktop — it overlays the transcript instead of permanently shrinking the center column. Use Pin in the inspector header to dock it into the layout when you want a fixed side panel; narrow widths still open operations in a sheet.
Fewer nested boxes across workspace chrome — provider connection cards, operations navigation, status chips, and lane lists lean on spacing and typography rather than stacked bordered cards.

Docs: Agent Workspace · Change control

v6.3 — Agent Workspace layout aligned to a calmer T3-style shell

23 June 2026

At a glance

Agent Workspace at /workspace now follows a fixed three-region shell — session rail on the left, transcript or empty prompt in the center, and an optional operations inspector on the right. The center column is no longer stacked with separate command-center, activity, agent-board, and setup cards fighting for attention.
Empty workspace state is sparse — a centered prompt, repository and runtime context, recent cloud lanes, and the sticky composer. The old multi-card setup checklist no longer fills the canvas before the first run.
Active cloud lanes own the center — when a lane is selected, the transcript and composer are the primary surface. Runtime, model, mode, queue intent, and send controls stay in the composer footer instead of duplicating in a second command center.
Prompt queue management moved to the inspector — reorder, edit, dispatch, and clear queued prompts from the Agents tab when items are queued. The session rail N queued link opens the same inspector view; keyboard and slash-command queue focus routes there too.
Provider setup appears only when credentials are missing — Anthropic and OpenAI connection cards sit above the composer until runtimes are ready, instead of occupying a permanent center panel.
Visual chrome is flatter — fewer nested rounded cards, thinner borders, a single ~920px readable column, and build mode still opens the inspector on Diffs by default. No API or persistence changes; existing /api/workspace/* routes and cloud-lane behavior are unchanged.

Docs: Agent Workspace · Change control

v6.2 — Critique Intake and sharper sandbox review recovery

23 June 2026

At a glance

Critique Intake ships as agentic bug intake — an embeddable report widget, one focused follow-up question, browser context capture, deterministic triage, and an agent-ready handoff prompt operators can route into Builder, GitHub, Linear, or Review runs.
Install with one script tag at /intake/widget.js. The widget records page URL, user agent, viewport, recent clicks, console errors, and failed fetch calls, then posts to POST /api/intake/report with CORS support for customer apps.
Dashboard → Intake lists investigated reports with severity, likely owner, evidence, reproduction steps, and a copyable Critique run prompt. Operators can mark reports investigated, queued, resolved, or ignored via PATCH /api/intake/reports.
Triage classifies reports without a model round-trip — category, severity, confidence, suggested owner, evidence bullets, and a copyable agent prompt. Each report keeps the full browser context and triage packet for later handoff.
Platform docs and launch essay cover install, API payloads, inbox workflow, and how Intake connects to Review runs, Change Passports, and the Merge Gate API — see Critique Intake and Feedback That Arrives Already Debugged.
Marketing and discovery surfaces include Intake — primary nav, sitemap, docs index, change-control cross-links, and llms.txt Q&A so agents and search tools can find the workflow.
Sandbox-native review runs lean on OpenCode end-to-end — the unified sandbox path drops the separate clone and collector stages; OpenCode owns repository checkout, dependency probes, verification commands, and artifact generation in agent mode.
OpenCode review turns now enforce a hard wall-clock limit — stalled sandbox turns abort, report progress, and kill the sandbox instead of hanging until the worker times out.
Backend fallback recovery is more explicit — QStash replay can force backend-led synthesis, write Control Board progress for the recovery phase, and heartbeat run progress during long fallback attempts.

Docs: Critique Intake · PR review · Change control

v6.1.2 — Review queue hardening, webhook delivery audit rows, and current naming cleanup

18 June 2026

At a glance

Review queue and active-repository enforcement

Inactive repositories can no longer slip through non-dashboard paths. REST POST /api/v1/reviews, MCP queue_review, dashboard PR snapshot access, chat-triggered reruns, checkpoint-triggered queueing, and webhook/process-delivery paths now all converge on active-repository checks.
Review queue helpers were centralized so callers share the same repository access filter and the same enqueue behavior instead of partially re-implementing queue logic in separate surfaces.
Inactive-repo rejection is covered by characterization tests to keep later refactors from reopening the gap.

Safer enqueue semantics and webhook evidence

review.run.queued fires only after QStash publish succeeds. The old ordering could say “queued” to downstream systems before the background job was actually accepted.
Failed enqueue attempts now mark the run FAILED with a reason and emit the failed lifecycle/per-run webhook path instead of leaving a misleading queued state behind.
Per-run webhook delivery audit rows now persist for review-run callbacks, including delivery status, status code, target URL, and error text. This gives operators a first-class record for CI receivers and merge-gate integrations, not just installation-scoped lifecycle endpoints.
Delivery audit storage ships in a dedicated migration via review_run_webhook_delivery.

Docs, API guidance, and naming

Platform API docs now state the cookie/auth model correctly. Scoped v1 review endpoints require a crt_ key with the right scope; browser session cookies do not satisfy those programmatic checks.
Current docs and AI-facing guidance stop presenting “v4 dashboard” as the live product name. Historical v4/v5/v6 references remain release history, but active product copy now uses current surface names such as Dashboard, Change Passports, Review runs, and Merge Gate API.
AGENTS.md was added so coding agents working in the repo stop anchoring on historical naming when generating copy or explanations.

Docs: Merge Gate API · Connections & API · Agent stack integrations

v6.1.1 — Agent stack integrations and upstream signals on the merge gate

17 June 2026

At a glance

Upstream signals (merge gate)

upstreamSignals (max 8) on review queue requests — attest RWX CI runs, Swytchcode tool policy, Iron Book agent identity, or custom partner outcomes.
Stored on the review run, returned on GET /api/v1/review-runs/:id, echoed on lifecycle and per-run webhooks, and copied into Change Passport snapshot provenance for compliance export.
Provider enum: rwx · swytchcode · identity_machines · custom. Each signal requires provider + kind; optional status, label, externalId, url, recordedAt, metadata.
Orchestrator-attested in v6.1.1 — Critique does not call partner APIs to verify signals yet; your supervisor records what already passed upstream before queueing the judge.

Docs and cookbooks

Agent stack integrations — RWX vs Swytchcode vs Iron Book vs Critique, recommended loops, FAQ.
Four new cookbooks on /merge-gate-api: RWX green → gate, Swytchcode + gate, Iron Book provenance, full stack loop.
OpenAPI QueueReviewRequest updated with upstreamSignals schema.

Positioning

RWX proves the build; Critique proves the change may merge.
Swytchcode secures prod API calls; Critique secures repo changes.
Iron Book secures runtime agent IAM; Critique secures the SDLC merge boundary with evidence on the passport.

Docs: Merge Gate API · Agent stack integrations · /merge-gate-api

Essay: Critique v6.1.1: Agent Stack Integrations — RWX, Swytchcode, Iron Book, and the Merge Gate — why upstreamSignals exists, partner-by-partner loops, data flow, scope boundaries, and operator checklist.

v6.1 — Platform API v1, GitHub API hardening, GLM-5.2 catalog rollover, lifecycle webhooks, and calmer default chrome

17 June 2026

At a glance

Platform API v1

POST /api/v1/reviews is live. Queue a pull request review over HTTP with a crt_ key that has write:reviews — the same outcome as MCP queue_review. Poll GET /api/v1/review-runs/{id} for status and verdict when CI or internal portals need REST instead of JSON-RPC.
MCP enforces API key scopes on every tool call. Keys without write:reviews cannot queue reviews or Remedy; read tools require read:passports or read:reviews as documented. This closes the gap where any valid crt_ key could queue work regardless of scopes.
MCP expands to ten tools aligned with in-app chat: passport and review reads, review queueing, Remedy queue and status, workspace status, finding-memory search, and merge-policy draft compilation — each gated by the same scopes as REST v1.
Insights, checkpoint policy, finding suppressions, and merge-policy compile accept crt_ keys with read:insights, write:insights, or review scopes; signed-in dashboard sessions still work unchanged.
OpenAPI contract at GET /api/v1/openapi (YAML) documents passports, reviews, insights, repository policy, Coding Agent, inference, and MCP for automation teams treating the beta surface as stable.
New crt_ keys include read:insights and write:insights by default. Rotate older keys in Settings → Connections if programmatic insights or compliance export returns 403.

GitHub reliability and repo inbox

GitHub fallback paths are now rate-aware per installation. Review evidence fallback, repository code search, tree/file indexing, and inbox sync all route through a shared budget layer that respects retry headers, backs off on rate limits, and spaces code-search traffic so large repos and retry storms stop stampeding the GitHub API.
Repo inbox sync now prefers GitHub GraphQL for PR and issue snapshots, with REST fallback kept in place. Critique pulls the dashboard card shape directly — node IDs, review decisions, counts, labels, assignees, and review requests — then drops back to REST pagination if GraphQL fails or GitHub schema drift shows up, so operators get richer repo-home context without a brittle single-path dependency.

GLM catalog

GLM-5.2 replaces the older Z.AI GLM review lane at the same shelf. Critique now routes the Z.AI lane through z-ai/glm-5.2 while keeping the 3-credit review and Remedy floor teams already budgeted for GLM-5.1.
Older GLM entries were removed from the active catalog and aliased forward. Legacy z-ai/glm-5.1, z-ai/glm-5-turbo, and z-ai/glm-5v-turbo IDs now normalize to z-ai/glm-5.2 so saved policies and inference clients keep working while the public catalog stops advertising split older GLM SKUs.
Light mode is the default on first visit. Dark mode remains available from the theme toggle and Settings → Appearance; saved preferences are unchanged.
Primary buttons and links use neutral charcoal and stone white instead of bright lime-green pills and glow effects on the homepage, marketing navigation, footer, and sign-up calls to action.
Green is reserved for success and pass states (for example review PASS, status indicators) as a muted sage — not as the default accent across the whole product shell.
Marketing and blog surfaces inherit the active theme rather than forcing a dark green-tinted background; homepage panels, cards, and CTAs use shared shell tokens so light and dark modes both read cleanly.
Docs and editorial long-form pages pick up the same calmer palette: softer borders, neutral inline code and callouts, and links that do not default to neon green.
Workspace and dashboard chrome align with neutral dark grays when you switch to dark mode, without the previous green wash on rails, composer rings, and selection highlights.

Lifecycle webhooks (installation subscriptions)

Signed HTTPS callbacks now cover the review and passport lifecycle. Critique POSTs JSON to URLs you register per GitHub App installation, with HMAC-SHA256 body signing matching Coding Agent run webhooks (X-Critique-Webhook-Signature, X-Critique-Webhook-Event, X-Critique-Webhook-ID). User-Agent is Critique-Lifecycle-Webhook/1.0 so log filters stay distinct from agent traffic.
Manage endpoints with crt_ keys that include the manage:webhooks scope: GET / POST /api/v1/webhook-endpoints, GET / DELETE /api/v1/webhook-endpoints/:id. Secrets are encrypted at rest; responses never return the raw secret.

Merge gate for agents (Platform API packaging)

Why we packaged it: Coding agents multiplied PR volume faster than human review scaled. The same model that wrote a patch is the wrong final judge — confirmation bias at machine speed. Critique’s wedge is not another coding host; it is merge governance: a separate judge role with PASS / WARN / FAIL, evidence on the Change Passport, and a programmatic contract so orchestrators close the writer → gate → fix loop without scraping GitHub check summaries. v6.0 shipped the routes; v6.1 names the story, fattens the payload, and documents it end to end.

Essay: The Merge Gate API: Why Agent Teams Need a Judge That Is Not the Writer — writer vs judge vs fixer, benefits for CI and compliance, structured findings, two webhook channels, and where Critique sits as agent infrastructure consolidates (not competing on git hosting).

Docs: Merge Gate API reference · product page /merge-gate-api · cookbooks on the product page.

/merge-gate-api product page documents the agent merge loop: queue review → webhook or poll → structured findings. Same crt_ keys and /api/v1/* routes — packaged for orchestrators, not dashboard operators.
GET /api/v1/review-runs/:id?findings=all returns the full findings list for automation; default responses cap at five findings for chat-sized payloads.
review.run.completed lifecycle payloads now include inline findings[] and findingsCount so agents do not need a second fetch for blockers.
Per-run webhook on POST /api/v1/reviews mirrors Coding Agent runs (Critique-Review-Webhook/1.0) for single CI jobs; installation subscriptions still fire in parallel.
MCP get_review_run returns all findings for IDE agents (Cursor, Claude Desktop).
Default subscription delivers review.run.completed and review.run.failed. Opt in to review.run.queued, Remedy terminal events, checkpoint.gate.evaluated, passport.snapshot.created, and merge_policy.evaluated when wiring deploy gates, Slack, or Jira.
Subscriptions attach to your GitHub App installation, so callbacks fire on the normal PR review path—not only when you queue a run over HTTP. That closes the gap where Coding Agent runs had signed webhooks but review, Remedy, Checkpoint, and passport milestones still required polling GET /api/v1/review-runs/:id or scraping GitHub checks.
Remedy, Checkpoint, merge policy, and passport snapshot events share the same endpoint when you opt in: wire one receiver for “review finished → deploy gate” and add remedy.run.completed, checkpoint.gate.evaluated, or merge_policy.evaluated without standing up separate integrations.
Webhook payloads stay credential-free — no API keys or repo secrets in JSON. review.run.completed includes structured findings[] (severity, path, lines, fingerprint) plus verdict, risk band, PR number, head SHA, and links to GET /api/v1/review-runs/:id?findings=all for the full record; other lifecycle events remain summary-shaped.
V1 delivery is single-attempt (8s timeout, delivery audit rows). Poll review-run status if your receiver misses an event until automatic retries ship.

Benefits at a glance

For	What the merge gate buys you
Agent orchestrators	Branch on PASS / WARN / FAIL in code; feed `findings[]` to fix agents; re-queue on new SHAs without human copy-paste.
Platform / SRE	One `crt_` integration for review queue, passport reads, and lifecycle webhooks — distinct User-Agent strings for log filters.
Security & compliance	Same passport and signed export trail as dashboard reviews; programmatic queueing does not bypass Checkpoint or merge policy.
Cost control	Checkpoint first on agent PR floods; full sandbox review only when the gate says the PR deserves spend.
IDE agents (MCP)	`queue_review` / `get_review_run` with scope enforcement — same runs as REST, all findings for local fix loops.

Public site navigation

Top navigation and footer now share one link map. Compare always opens the alternatives hub at /alternatives — the same label no longer sends you to a different compare index from the footer.
Footer primary links are capped at eight (Blog, Changelog, About, Founder, Docs, Pricing, Compare, Workspace), plus Privacy & Terms, Feedback, and Support. PR control, git control, and other intent guides stay on /guides and in search — not repeated in the footer row.
/introduction is out of the top bar. The technical product guide remains at /introduction; reach it from About, Docs → Getting started, or the Guides hub when you want the pipeline map instead of the company story.
Introduction uses the same editorial layout as About, Founder, and the developer API pages — one long-form reading shell instead of workspace-style chrome when you move from company story to product depth.
/api is in the main navigation so the Coding Agent and Inference API hubs are easier to find from any marketing page.
Founder is in the footer primary row for company context without adding another top-bar peer beside About.

Platform setup and repository activation

The sidebar “Platform” page is now an admin setup surface, not an internal ops dump. Repo home still handles day-to-day PR work; Platform explains when to use it, splits Setup blockers, Policy controls, and System health, and points you back to Repo home or Workspace when you do not need governance tooling.
Setup blocker cards now have explicit next steps — open passport setup, configure incident providers, review merge policy, or open the delivery queue — instead of status-only labels like “Backfill required” with no click path.
Free workspaces can activate up to 3 repositories for automated review (was 0). Credits and repository slots are separate: having review credits does not by itself turn repos on until you pick which ones Critique should govern.
Choose repositories inside Critique — no GitHub settings detour required. On Repo home (empty state) and Platform → Delivery, use Discover repositories, check up to your plan limit, then Save repository selection. Critique stores your picks on the installation and marks only those repos active even when the GitHub App grants “All repositories.”
GitHub sync errors read like user tasks, not raw infra logs. Plan-limit and permission failures show numbered steps, Upgrade plan links to Pricing, and CLI/env-var detail stays behind Technical details instead of wrapping one character per line in a narrow column.
Database: installations persist activatedRepositoryGithubIds so your in-app repository selection survives re-sync and redeploys.

v5.5 — Review pipeline hardening, API scope enforcement, and public assistant rate limits

12 June 2026

At a glance

Review pipeline reliability

QStash delivery processing is now idempotent. Finished GitHub webhook deliveries (COMPLETED, IGNORED) short-circuit with HTTP 200 instead of re-entering the worker; a compare-and-set claim (updateMany on non-terminal status) ensures only one worker owns a delivery at a time. QStash retries on FAILED deliveries still work as the recovery path.
Review-run queueing no longer resurrects finished runs. queuePullRequestReview preserves COMPLETED, FAILED, and CANCELLED runs for the same (repository, PR, head SHA) unless the caller passes forceRerun: true. Webhook-driven queueing updates metadata only and skips re-enqueueing; dashboard, chat, MCP, checkpoint override, and explicit PR-comment reruns still force a fresh run when you ask for one. Callers gate review/run.requested on wasRequeued so duplicate deliveries do not spawn parallel pipelines.
runReviewPipeline acquires a concurrency claim before work starts. Only one worker can own a QUEUED run, a stale IN_PROGRESS run (no heartbeat for 10+ minutes), or a watchdog-marked FAILED recovery run at a time; duplicate QStash deliveries return HTTP 200 with skipped: true instead of starting a second sandbox. Sandbox progress heartbeats touch reviewRun.updatedAt during live OpenCode review so healthy runs are not mistaken for stalls.
Watchdog recovery is better coordinated. After marking a stalled run FAILED, Critique requeues the backend fallback first, then refunds sandbox OpenCode usage (idempotent refund paths unchanged). Refunds no longer race ahead of the requeue decision.
Characterization tests lock queue and credit-gate decisions in lib/review/queue-decision.ts, lib/review/run-claim.ts, and lib/usage/credit-gate-decision.ts so future pipeline refactors must prove they changed only what they intended.

API security

Scoped v1 routes no longer treat browser sessions as all-powerful API keys. requireApiScope defaults to api-key-only; session cookies no longer satisfy write:inference, write:builder, read:passports, or other scoped checks on programmatic endpoints. Use a crt_ key with the right scope for Inference API, Coding Agent API, and passport export. No first-party dashboard UI called these v1 routes with cookie auth, so day-to-day product flows are unchanged.

Abuse prevention

Public marketing assistants are rate-limited by default. Blog, legal, and dear-investors assistant routes share a per-IP limiter: 10 requests per minute when CRITIQUE_PUBLIC_ASSISTANT_RATE_LIMIT_PER_MINUTE is unset. Set the env var to another positive integer to tune, or 0 to disable (previous legal-assistant-only opt-in limiter behavior). Limits are per-instance best-effort on serverless; pair with edge/KV limiting if you see sustained abuse.

v5.4 — Unified sandbox PR review

9 June 2026

At a glance

Premium PR review now runs collectors and OpenCode in one sandbox session. Critique clones the pull request once, runs deterministic checks (diff enrichment, typecheck, lint, tests, and configured security scanners), then hands that evidence to the OpenCode reviewer in the same workspace—so the agent starts from real command output instead of re-discovering the repo from scratch.
AUTO, Sandbox, and OpenCode review modes share the same unified path. Policy labels that used to imply different backends now route to one deep sandbox review; GitHub API evidence plus backend synthesis remains the bounded fallback when sandbox execution fails or is unavailable.
Collector-backed review is legacy. Explicit Collector policy still runs the older two-stage flow (sandbox collectors, then backend specialist synthesis) for teams that want it during migration; new defaults favor unified sandbox review.
OpenCode review input carries full collector context when unified mode is on: changed-file evidence, deterministic findings, collector summaries, and the analysis report—so inline comments and verdicts can cite failing tests, lint, or scanner output the agent did not have to rediscover.
Operator controls: set CRITIQUE_REVIEW_UNIFIED_SANDBOX=true (default when sandbox-agent execution is enabled) to keep collectors before OpenCode; set false to restore OpenCode-only exploration inside the sandbox. Installation and repository policy cards describe the unified flow and fallback behavior.
Stall recovery unchanged in spirit: if a sandbox review hangs, Critique still finishes the PR with GitHub-backed evidence rather than leaving the check stuck—without starting a second sandbox pass.

v5.3 — Inference API expansion, workspace runtime ledger, OSS program, and operator surfaces

8 June 2026

At a glance

Inference API

Critique Inference API adds Kimi K2.6, GLM-5.1, and Trinity Large Thinking alongside DeepSeek V4 Flash, Tencent Hy3 Preview, and NVIDIA Nemotron 3 Ultra. Use the same OpenAI-compatible surface at /inference-api (GET /api/v1/models, POST /api/v1/chat/completions) and the same Critique credit pool on managed billing.
The /inference-api landing page promotes among the lowest published rates we track for Kimi K2.6 and GLM-5.1, with per-model rate cards for Trinity, Hy3, DeepSeek, and Nemotron intro pricing where active. Western-hosted routing and private-by-default payloads stay the standard tier.
Training opt-in now covers Kimi K2.6, GLM-5.1, and Trinity Large Thinking in addition to DeepSeek V4 Flash and Hy3 Preview: opt in to prompt logging for future model improvement and pay 75% less per token (25% of list price). Nemotron Inference API rates are unchanged. Enable account-wide in Settings → Connections or per request with X-Critique-DeepSeek-Training-Opt-In: true (legacy header name retained).
Operators must accept training opt-in conditions before previewing or enabling discounted rates on the Inference API page or in Settings — a clear warning, bullet conditions, and a consent checkbox so teams know logging applies before they turn the deal on.
A new Billing section on /inference-api explains how prompt and completion tokens convert to USD at the active rate card, then to Critique credits at Solo-plan economics, with a worked example and response headers X-Critique-Credits-Charged and X-Critique-Estimated-Usd on non-streaming completions.
Landing page refresh: “Why switch,” value-promo rate cards, compatible-IDE strip (Cursor, Windsurf, Zed, VS Code, JetBrains), and expanded per-model pricing panels with training-opt-in strikethroughs.
Managed Inference API and review traffic now identify Critique as the calling app with https://critique.sh attribution on upstream model requests, including sandbox review paths that proxy usage through the same headers. Shared buildOpenRouterFetchHeaders centralizes attribution across review, inference, and change-control paths.

Workspace runtime consolidation

First-class workspace runs: new workspace_agent_run table and lib/workspace/session-runs.ts persist runtime executions (status, sandbox, raw run/session IDs, metadata) per Builder/Chat session.
Turn kinds replace mode-only dispatch: turnKind (ask | plan | build | review | remedy) on turns and runs; POST /api/workspace/runs accepts turnKind and returns workspaceRun in the payload.
Review ↔ workspace binding: review_run_runtime_binding links each PR review run to a workspace session/turn/run; the review pipeline creates and syncs bindings through queue, in-progress, and completion so PR review activity appears alongside Builder/Chat sessions.
Runtime adapters updated (agent-critique, Claude Code, Codex, external) to emit turn kinds and workspace run records; session sync propagates turnKind on run upserts.
Operator doc: docs/chat-builder-rebuild/WORKSPACE_RUNTIME_CONSOLIDATION_PLAN.md captures the consolidation plan for the runtime model.

Repo dashboard

Repo home quick setup card prompts new operators to pin a default repository or keep the all-repos queue; recommends up to four “most active” repos from current attention counts.
Persistent default via cookie + API: POST /api/dashboard/repo-home-preferences saves choice; critique-repo-home-default-repository httpOnly cookie restores landing view on return visits.
Pre-run credit estimate on repo-home review launches (run/rerun variants) calls the same estimate heuristics as the server before you queue.

Open Source Program

New /oss-program landing page announces €100,000 in committed compute — €50k for OSS review/fix/build support, €50k for Critique beta capacity (faster reviews, larger repos, more models, expanded access).
Application flow with eligibility signals, checklist, FAQs, JSON-LD, partner thanks (Google Cloud, Microsoft Azure), and announcement graphic at /marketing/oss-program-announcement.png.
Product facts updated to distinguish the program from the existing $5/mo student/OSS lane and Pro/Team foundation pricing.

Blog, SEO, and marketing

Six new essays (8 Jun 2026): Vercel/local-vs-prod builds, Cursor dead-loop TDD, Next.js circular imports, silent any TypeScript failures, vibe-coder secrets checklist, and sandbox verification workflow.
Blog taxonomy is now three tracks: Essays, Product ships, and Model updates — index feed filters/sections by category with refreshed header layout.
New SEO landing page /vercel-build-failed-on-pull-request targets build-failure, TypeScript, circular-import, and sandbox-verification queries; added to sitemap and llms.txt Q&A.
llms.txt expanded with CodeRabbit seat-pricing alternatives, OSS program link, Vercel-build guidance, and richer buyer-intent classification hints for AI answer engines.
Alternatives + compare pages sharpened for CodeRabbit seat tax, free OSS alternatives, CodiumAI/Qodo keywords, and sandbox-verification positioning; competitor entries get dedicated SEO titles/descriptions.
Global site keywords refocused toward sandbox verification, safe merging, and competitor pricing terms.

Developer surfaces

New /api Developer APIs hub — one-page map of Inference API, Coding Agent API, and Platform/MCP with crt_ key guidance, canonical metadata, and JSON-LD.
/developers permanently redirects to /api.

Secrets / BYOK storage

Optional BYOK secret tables fail gracefully when storage isn’t provisioned: shared storage-availability helper lets Anthropic, OpenAI, OpenRouter, and Crof key loaders degrade cleanly instead of hard-crashing local/preview environments.

SDK / monorepo

New @critique/sdk-core package (pnpm workspace) ships client-side PII scrubbing (scrubText / scrubPayload) and agent loop detection (LoopDetector, CritiqueLoopError) with tests; wired into root test:unit and build:sdk-core.

Database / deploy

Migration 20260608120000_workspace_runtime_runs: adds workspace_agent_turn.turnKind, workspace_agent_run, and review_run_runtime_binding with indexes for session/turn/run lookups.
.env.example notes that production builds run prisma migrate deploy and documents preview/migrate escape hatches.

v5.2 — Critique Inference API, spend transparency, and Western-hosted models

5 June 2026

At a glance

Repo-home review launches show a pre-run credit estimate before you queue — lane, lead model, specialist count, sandbox/runtime, BYOK note, and active promos (for example MiniMax M3 welcome). Ranges come from the same catalog heuristics as the server; actual credits appear on the post-run receipt.
Completed review runs ship a unified spend receipt — Critique credits charged, models used, BYOK/promo flags, and a link to Insights → Cost. Remedy handoffs add an economics table so teams can choose managed Remedy, BYOA (vendor plan — not Critique credits), or a free fix prompt without burning the wrong budget.
Insights acts as a finance companion: 7-day credit burn, month-end forecast, estimated Checkpoint savings, and drill-down links from cost attribution to repositories and evidence runs.
GitHub Actions CI now gates every push and pull request with typecheck, unit tests, and lint — eating our own merge discipline in public. Live E2B smoke stays operator-driven.
Critique Inference API is live at /inference-api. OpenAI-compatible GET /api/v1/models and POST /api/v1/chat/completions bill token spend from your Critique credit balance — same pool as review and Builder. Built for Coding Agent sidecars, internal tools, and any app that already has Critique credits. Western servers, private-by-default payloads, and a quick acceptable-use policy on the landing page. Use any OpenAI SDK with baseURL: https://critique.sh/api/v1 and a crt_ key (write:inference or write:builder).
Inference usage dashboard at /inference-dashboard (signed in) — charts for credits and tokens, model and API-key attribution, limit status, and a paginated activity log. Settings → Connections includes a mini Inference API panel with today/month spend, optional daily/monthly credit caps, request caps, credit reserve for review, enable/disable toggle, and DeepSeek training opt-in (GET/PATCH /api/settings/inference-api). Header links to the API docs and full dashboard.
Essay: Why we built the Critique Inference API — Western sweetener servers, NVIDIA-powered capacity, stress-tested launch, privacy defaults, and the DeepSeek V4 Flash training opt-in deal (75% off token rates).
Docs & README — Inference API platform guide, Connections/billing cross-links, and repository map updates for v5.2.
Hosted model lineup on Inference API (GET /api/v1/models):
- DeepSeek V4 Flash (deepseek/deepseek-v4-flash) — default, Western-hosted, 1M context, 284B MoE (13B active). Standard private inference at $0.15 / $0.30 per million input/output tokens. DeepSeek-only training opt-in: allow prompt logging for future model training at 75% off ($0.0375 / $0.075 per M) via Settings or X-Critique-DeepSeek-Training-Opt-In: true.
- Tencent Hy3 Preview (tencent/hy3-preview) — 0.5 credits per PR review run; Inference API at 10% below market ($0.0567 / $0.189 per M vs $0.063 / $0.21). 205B MoE, 262K context, Western-hosted with no logs and no training data retained.
- NVIDIA Nemotron 3 Ultra (nvidia/nemotron-3-ultra-550b-a55b) — frontier MoE for agent orchestration and coding pipelines. Temporary intro pricing through 19 June 2026 (UTC): 2 credits per PR review run (then 3 credits shelf) and API token rates at 50% off market ($0.25 / $1.25 per M vs $0.50 / $2.50). After the intro window, API pricing returns to standard market rates. Nemotron joins the review/Remedy catalog on /models with promo strikethrough while the window is open.

v5.1.0 — Transparent review runs, one Coding Agent product, and repo-first Platform

5 June 2026

At a glance

OpenCode review runs now show why Critique waited, nudged, followed up, accepted, or aborted. Controller decisions persist on the run so operators can audit agent behavior instead of treating sandbox review as a one-shot black box.
Blocked-merge review runs open with one clear next step. Completed runs summarize verdict, passport, merge policy, and evidence quality, then offer a single primary action: Start Remedy, Queue BYOA, or Dismiss with feedback.
Evidence quality is one badge in the UI: strong, usable, or limited. Legacy source labels collapse for faster scanning; full proof stays in the verdict panel when you need it.
Repository quick settings add policy-aware presets for auth/security and payments/data paths: concrete path rules, stronger specialists, OpenCode review mode, and the strongest allowed lead model from your catalog.
Finding feedback explains how it shapes the next run. Accepted, false positive, fixed, and suppress actions state that Findings Memory updates future reviews on that repository.
Yarn repositories now get dependency audit signal in sandbox collectors (high and critical advisories from Yarn JSON/NDJSON output).
Coding Agent is one product with two doors: Builder and the API. The public /coding-agent-api page, Workspace Builder, and POST /api/v1/coding-agent/runs share the same job model; responses link to Builder, status, stream, and passport when a draft PR exists.
See credit cost before you start a Builder or API run. Builder shows model floor, balance, and projected balance after the minimum charge and blocks submit when credits are insufficient; the API supports "preview": true on create without queueing a sandbox.
Automation gets idempotent creates, cursor-based lists, durable SSE resume, models discovery, compact status polls, cancel endpoints, safety budgets, and signed webhooks on the Coding Agent API for CI and long-running agents.
Builder hands off to PR review when a draft PR exists. Open the PR, jump to passport/review for that branch, or return to the repository passport queue from the Branch / PR tab.
Skills marketplace surfaces real performance on cards and detail pages (attributed runs, false-positive rate, label counts, leaderboard readiness) so publishers see how installs become ranked quality.
crt_ keys can read marketplace skills at pinned versions (read:skills, content fetch by version, portable bundles for CI); installed skills can be the default PR-review lens per repo with attribution on manual launches.
Publish flow documents official badge criteria (identity, safe SKILL.md, performance visibility, fixtures, marketplace review) versus instant community listings.
New installs land on repo home by default with cross-repo attention inbox at /dashboard/pull-requests, repository context on every row, and webhook/cache health on repo picker rows (Healthy, Stale cache, Missed events, and related states).
The control-room experience is labeled Platform in navigation while repo-first PR work stays primary; settings still let you switch experiences.
Connections is framed as a beta platform layer: scoped crt_ keys, MCP v0.1.0 (list_passports, get_review_run, queue_review), REST v1 for passports and review runs, Coding Agent API, native Linear and Slack, and Zapier-first wiring for the rest of the stack.

v5.0.0 — Agent Skill Marketplace, merge policy, passport exports, M3 welcome, Cursor SDK on Composer 2.5, Coding Agent API, and repo-first dashboard

3 June 2026

At a glance

The Agent Skill Marketplace is live at /skills. Browse versioned critique-review lenses (official and community), search by category, and install skills into Critique Chat or any agent runtime with portable npx skills add bundles—similar to the public skills.sh directory, tuned for merge-ready PR review.
Publishing requires a Critique account. Signed-in users can ship new skills or patch versions at /skills/publish, choose public or unlisted listings, and decide whether performance stats appear on the global board or stay org-internal only.
A Skill Performance Leaderboard at /skills/leaderboard ranks lenses by real outcomes. Scores blend human acceptance, false-positive rate, actionable fixes, and post-merge incident correlation from finding feedback on review runs—not install counts alone.
Org-internal leaderboard mode shows the same metrics scoped to your GitHub App installation, visible only after sign-in, so teams can compare custom lenses without exposing private repo detail on the public board.
Critique Chat links to the marketplace from the skills sheet, and the critique-review landing page now points to browse, publish, and leaderboard flows alongside the existing Markdown download.
Control Board now includes a natural language merge policy editor for repository scopes. Describe rules in plain English; Critique compiles them into strict policy JSON with MiniMax-M3 (falling back to MiniMax-M2.7 and DeepSeek V4 Flash when needed), shows JSON and canonical YAML previews, a live rule diff, assumptions, unsupported clauses, and a confidence badge before you save.
Merge-time enforcement stays deterministic. The LLM only translates operator intent at compile time; Critique validates the result, renders .critique/policy.yml, and evaluates pull requests with the same server-owned Critique / Merge Policy check—no model calls at merge time.
Merge policy v2 adds path-aware rules operators actually ask for: block or warn when changed paths match risk tags (auth, migrations, infra, and related lanes) or glob patterns; require touched test files in the PR; require a minimum count of current-head GitHub approvals (stale approvals on older commits do not count).
Safety modes gate how far automation goes before a human signs off: Require review blocks apply and policy PR when confidence is low; Allow draft PR still opens a draft policy pull request for manual review; Ask followups holds apply until clarification questions are answered.
Three save paths after compile: persist to the dashboard, open a branch that writes .critique/policy.yml and opens a draft or ready PR (depending on safety mode), or copy or download YAML for manual review—without bypassing validation or confidence gates when review is required.
MiniMax M3 is on the paid PR review and Remedy catalog (minimax/minimax-m3) with a two-week welcome price: 1.5 credits per run through 17 June 2026 (UTC)—the same effective floor as MiniMax-M2.7 today—then 3 credits per run on the shelf (double M2.7). The /models table and /pricing page show the active promo, strikethrough shelf price, and countdown while the window is open.
Qwen3.7 Plus remains a paid review lane at 1.5 credits on lead, specialist, and Remedy-capable stacks. Legacy Qwen ids in saved repo policies still alias forward to qwen/qwen3.7-plus; this release does not change that floor or move Plus into free chat.
Critique Chat is unchanged: Ling 2.6 Flash and DeepSeek V4 Flash only, with no extra chat fee. M3 and Qwen3.7 Plus are not in the chat model picker; older MiniMax or Qwen chat preferences normalize to DeepSeek V4 Flash so review models are not silently treated as free chat.
A launch essay at /blog/minimax-m3-qwen37-plus-welcome-june-2026 welcomes MiniMax M3 and Qwen3.7 Plus for operators: vendor-reported SWE-Bench Pro, terminal, and multimodal benchmarks vs M2.7, Qwen3.6 Plus, GLM-5.1, Kimi K2.6, Composer 2.5, and Claude Opus 4.8, plus credit-math framing for when to use M3 vs Opus on review.
Blog promo strips with gold treasury styling now accept custom headlines and intros so new model launches (including this M3/Qwen essay) no longer reuse DeepSeek-only copy from earlier catalog posts.
The /models and /pricing surfaces show the active MiniMax M3 welcome window with strikethrough shelf pricing, promo countdown, and clear post-promo credit floor so operators see 1.5 credits vs 3 credits without opening the essay.
Cursor BYOA now queues through the Cursor Agent SDK on Composer 2.5 (composer-2.5) in Cursor cloud VMs — isolated repo clone, tool loop, PR URL, and workOnCurrentBranch on the head you reviewed. Execution bills your Cursor plan, not Critique review credits. If the SDK path is unavailable in a given deploy, Critique falls back to the Cloud Agents REST API with the same handoff shape.
Settings → Cursor agent (BYOA) and completed review runs explain the SDK path: save your key once, optionally add operator instructions, Queue Cursor agent, then Open in Cursor when the cloud run URL is ready. JSON export at GET /api/review-runs/{reviewRunId}/byoa/cursor still works for custom CI.
A deep harness essay at /blog/critique-cursor-composer-25-byoa covers Cursor as a top-tier agent runtime, Composer 2.5 specs and published benchmarks (SWE-Bench Multilingual, Terminal-Bench 2.0, pricing tiers), how that compares to Opus, GPT-5.5, Kimi K2.6, MiniMax M3, and Qwen3.7 Plus on Critique’s review catalog, and why review stays on Critique while fixes run in Cursor cloud — with Colossus framed as training scale, not where your agent executes.
BYOA docs at /docs/platform/byoa now describe the Cursor SDK default; Claude Managed Agents and OpenAI Codex queued paths are unchanged from v3.5 — same “review on Critique, execution on your vendor” split.
Repo pickers on the repo-first dashboard now surface every connected repository, not a short alphabetical slice. The old empty state capped quick picks at six repos with no search; /dashboard/pull-requests instead shows an always-visible searchable list (scroll or filter by owner or name) with a connected-repo count, and the header selector opens in a portaled menu so long lists are not clipped inside the scrolling dashboard shell.
Issues and review-run repo filters use the same full installation list as repo home—native selects and the shared selector receive all active repos from your GitHub App installations, ordered by full name.
Critique Chat and Workspace repo menus list the full catalog when opened. You no longer need to type before any repo appears; search still narrows the list, and scrolling reaches repos beyond the first screen.
Dashboard and Settings share one workspace chrome (sidebar, top bar, spacing) so repo home, review runs, issues, control, usage, help, and settings subpages feel like one signed-in product—not a mix of marketing chrome and dashboard panels.
Change Passports now export a signed audit evidence bundle for compliance workflows. From the passport timeline, the passport queue, or a review run linked to a passport, download redacted JSON with manifest section hashes, snapshots, provenance, risk, merge policy decisions, remedy proof, incidents, and timeline—HMAC-signed for integrity. Batch export covers up to 25 filtered passports in one file. Copy positions this as audit evidence formatted for SOC 2 / ISO review, not a compliance certification.
Platform API and signed-in routes mirror the same export: GET /api/v1/passports/:passportId/export (API key with read:passports) and POST /api/passports/export for filtered batches in the dashboard.
Review-run findings gain one-click feedback without leaving the dashboard. Expand a finding to mark Accepted, False positive, Fixed, or Suppress; action buttons no longer compete with the expand control. Private memory, suppressions, and the existing feedback ledger still apply by default.
Optional anonymized model-feedback sharing is strictly opt-in. Grant or revoke consent per installation (or single repo) from Control Board → Memory, export a signed batch of queued examples, or tick Share anonymized feedback on a finding when consent is active. GitHub slash commands accept --share-feedback only with the same consent gate—no silent training export from PR comments.
Anonymized examples strip patches, file contents, and secrets before they leave your account; they reinforce Skill Performance Leaderboard metrics at /skills/leaderboard, not a promise of automatic model training.
Coding Agent as API ships for automation teams: public overview at /coding-agent-api, POST /api/v1/coding-agent/runs to start an OpenCode-backed run (repo + prompt + model), POST …/runs/{id}/messages for follow-ups, and GET …/runs/{id} for status, timeline events, patch, and draft-PR linkage. Choose managed billing (Critique credits) or pass an OpenRouter key for that run only; optional draft PR publish and validation mode match Builder semantics.
Coding Agent API now keeps a warm OpenCode session between turns on the same run. After the first turn finishes, status becomes idle (sessionActive: true until sessionExpiresAt) while the E2B sandbox and OpenCode session stay connected—follow-ups on POST …/runs/{id}/messages send the next prompt into that live session instead of spinning a new sandbox with chained summary text.
Live activity streams over SSE at GET /api/v1/coding-agent/runs/{id}/stream. Subscribe while a turn is running (optional ?after= event cursor) to receive OpenCode activity rows and a terminal run.status event when the turn reaches idle, completed, or failed.
Close a persistent session explicitly with { "endSession": true } on the messages route when automation is done—Critique tears down the sandbox and marks the run completed. If the session already expired or predates persistent sessions, follow-ups still fall back to a chained new run with bounded prior context so older integrations keep working.
Workspace adds a durable agent queue in the explorer: line up prompts for Critique, Claude Code, or Codex in Ask or Build mode, send or remove items via /api/workspace/queue, scoped to your active repository and chat or builder session — so you can stage work before kicking off a long run.
Workspace inspector → Processes shows live run and request state (streaming chat, builder job, sandbox, retrieval) in one panel so operators can tell whether work is still moving without digging through raw logs first.
Settings → Agents groups Cursor, Anthropic, and OpenAI BYOA key panels in one place with consistent copy and links to /docs/platform/byoa, so agent-owner setup is not scattered across unrelated settings tabs.
Insights at /dashboard/insights gives operators and leadership one signed-in hub for velocity, risk, spend, retrospectives, compliance exports, staffing signals, and optional fleet benchmarks — grounded in Change Passport and gate evidence, not a separate analytics silo.
Velocity vs risk charts plot daily merges against high-risk merges over 30- and 90-day windows so you can see whether throughput rose while risky merges held flat or fell — the “speed vs safety” story in one glance.
Cost attribution for the current month breaks down Critique credits by repository, surfaces your most expensive pull requests, and estimates how much BYOK routing saved versus bundled credits on the same usage.
Blame-aware retrospective reports summarize a sprint or date range: merges, incidents linked to passports, checkpoint and merge-policy overrides, and heuristic likely correct vs likely incorrect override signals (14-day incident window) — with optional AI narrative and cited evidence, not individual blame callouts.
Generate retrospectives or staffing reports from Insights (or the reports API) for a chosen installation and period; staffing view projects weekly review load and suggests when added senior reviewer capacity may be needed to keep your current risk posture.
One-click compliance period export for a calendar month builds a signed outer bundle plus per-passport audit JSON: review trails, merge policy decisions, checkpoint gates, coverage summary (% reviewed, override counts), and the same SOC 2 / ISO audit evidence disclaimer as single-passport export — not a certification.
Fleet insights (strictly opt-in) place your installation in an anonymized cohort (similar stack size and risk tier). When enough teams participate, you see benchmark comparisons (for example auth-path block rates) and dry-run policy suggestions — no repo names or secrets leave your account.
Daily insight rollups update live as reviews complete, merge policy and checkpoint events fire, overrides are recorded, usage is metered, and merged pull requests close. After upgrade, account operators can backfill historical daily metrics from existing reviews, gates, and usage so 30- and 90-day charts are useful immediately.

v4.2.0 — Repo-first PR dashboard and GitHub inbox

3 June 2026

At a glance

The signed-in dashboard can now center on pull requests per repository, not only the legacy control-room home. On first visit you choose Repo home (inbox-style PR work) or Control room (passports and platform overview). Switch anytime under Settings → Workspace; your choice is remembered across sessions.
GitHub pull requests and issues are cached in Critique so the PR table, repo picker, and issues list load from durable inbox data instead of hammering the API on every click. Refresh GitHub on a repo pulls the latest open PRs; webhooks keep snapshots warm when pull_request and issues events arrive.
Repo home at /dashboard/pull-requests gives you a searchable PR table with attention states (needs review, running, blocked, passed, failed, closed), a side inspector for the selected PR, linked-issue context, checkpoint blockers, and one-click Run review / Rerun / View live actions.
Quick settings on launch let you pick model lane (auto, fast, balanced, premium, or a specific model), runtime (auto, GitHub-backed, collector sandbox, OpenCode agent), depth (standard or deep), whether to post the GitHub review on completion, and which context packs to include (linked issues, incident signals, repository memory). Those choices apply to the queued run—not ignored defaults.
Global and filtered PR views use the same inbox rows: filter by repository, text search, GitHub state, and dashboard views such as needs attention, running, or blocked. /dashboard/issues lists cached issues for the selected repo with working repo navigation.
New dashboard APIs back the UI (repositories, repo home, refresh, issues, global pull requests, review queue, PR chat) with checkpoint-aware 409 responses when the Agent Firewall blocks a run. Settings adds Agents and Appearance entry points alongside the experience toggle.
Control room remains one click away for passport queues, platform connections, and cross-repo activity when you need the v4 operating picture—not a forced migration.

v4.1.2 — Deep Critique Review Skill upgrade

2 June 2026

At a glance

critique-review now ships as a much deeper review protocol, not just a compact findings-first prompt. The skill now starts with explicit intake and triage, names review mode up front (PR/diff, codebase slice, or focused lane), and pushes agents to classify risk before they start writing findings.
The built-in runtime skill now has dedicated reference packs for adaptive depth and blast-radius triage, stack-specific review lenses, and a stricter output contract. That means Critique agents can review backend/API paths, React/frontend changes, migrations, async jobs, infrastructure, and AI-agent systems with different checks instead of one generic pass.
False-positive control is materially stronger. The upgraded rubric now forces research-before-reporting on ambiguous issues, warns against pattern-matching common framework-safe constructs as vulnerabilities, and separates actionable findings from open questions and residual risk more aggressively.
The public downloadable Markdown at /api/skills/critique-review and /skills/critique-review.md was upgraded in parallel. Teams installing the free open-source skill now get the deeper standalone protocol, while Critique’s internal runtime copy keeps the richer multi-file reference structure.
The public landing page at /skills/critique-review now reflects the new shape of the skill. It describes adaptive review depth, stack lenses, independent-review discipline, and the stricter review artifact contract instead of only the original launch framing.

v4.1.1 — Critique Review Skill, downloads, and agent-owner updates

1 June 2026

At a glance

critique-review is now a built-in, open-source skill for AI code review. It is designed for agents working in Codex, Cursor, Claude, and similar coding environments: start from the diff, inspect call sites and contracts, prioritize correctness/security/data-loss risks, and publish findings before summary.
A public skill landing page ships at /skills/critique-review. It explains why Critique made the skill, how it compares with broad code-quality and PR-request skills, gives teams a direct Markdown download through /api/skills/critique-review, and links the standalone MIT-licensed GitHub repo at github.com/repath500/critique-review.
Critique Chat can recommend the built-in skill from inside the product. When the conversation calls for code review help, the built-in skills context can surface critique-review alongside the rest of the Critique system guidance.
The homepage now carries a Latest updates section linking the skill, BYOA, BYOK, and the full ship log so production visitors can see the newest agent-owner workflows without hunting through docs.
Type-check cleanup keeps the production audit green. The BYOK pricing preview, BYOA agent panel, Slack client, and built-in skills chat label now compile cleanly under the full TypeScript pass.

v4.1.0 — Connections, Platform API, and Agents (MCP)

31 May 2026

At a glance

Critique enters your engineering ecosystem. A new Connections hub at /settings/connections lets you link Linear (API key today), manage where each connection applies (Chat, Review, Remedy, Builder), and issue Critique API keys (crt_) for tools that call Critique on your behalf.
Platform API (v1) exposes passport and review-run truth to automation: GET /api/v1/passports and GET /api/v1/review-runs/:id accept your browser session or a scoped API key — so internal portals and scripts can read the same queue the dashboard uses without scraping HTML.
Agents API via MCP ships at POST /api/mcp: connect Cursor, Claude Desktop, or any MCP host with a crt_ key to list passports, fetch a review run, or queue a review for a pull request head commit. The Workspace inspector shows your endpoint and setup steps.
Dashboard control room now surfaces platform connections (GitHub, Linear, Sentry, Jira, Vercel health), a passport queue preview, cross-stack activity (incidents, firewall blocks, merge policy, evidence, delivery failures), and Linear workspace panels when connected — so the home screen reads as an operating picture, not only install metadata.
Chat gains searchIssuesAndRoadmap when Linear is connected, so repo work and ticket context can sit on one thread. Slack, Sentry OAuth, and outbound webhooks remain on the roadmap; production incidents still ingest via Control Board webhooks as in v4.
Architecture and API contracts for partners live in docs/critique/critique-platform-api-agents-and-connections.md. A launch essay at /blog/critique-v41-connections-ecosystem explains why Critique is building outward from passports, not inward toward another dashboard tab.

v4.0.0 — The AI Change Control Platform

30 May 2026

At a glance

Critique is now an AI Change Control Platform. The product is reframed around the merge boundary: every pull request becomes a Change Passport that records provenance, risk, gate events, evidence runs, merge-policy decisions, Remedy proof, findings memory, and incident learnings. The dashboard gravity shifts so Passports come first and evidence runs become a commit-level drill-down.
Change Passports are the new system of record. /dashboard/passports is the primary queue (filterable by repo, risk, state, and verdict) and /dashboard/[owner]/[repo]/pulls/[n] renders the PR-level passport with summary, provenance, risk, gate, evidence runs, merge permission, Remedy proof, memory, incidents, and a single chronological timeline.
Agent risk scoring persists a score, band, and reasons on review runs and flows into every operator surface. Evidence Contract v1 normalizes legacy review artifacts and exposes blocking decisions plus per-finding evidence so a block always cites something.
Merge policy as code ships a schema, evaluator, and the Critique / Merge Policy check. Policy lives in the dashboard or a repo file (.critique/policy.yml|yaml|json) and runs in dry-run, warn, or enforce. Operator overrides record provenance and can patch the GitHub check-run status.
Verified repair stores a proof bundle on Remedy attempts (patch hash, validation, push/export mode, verification linkage). Findings memory surfaces suppressions and a feedback ledger; incident feedback ingests Sentry, Linear, Jira, Vercel, and generic/manual events, links them to passports, and drafts learnings you can promote into rules.
Control Board unifies Gate, Policy, Delivery, Memory, and Learnings into one operator surface, including inline merge-permission controls. Checkpoint is presented as the Agent Firewall in the UI, but GitHub check names stay stable (Critique / Checkpoint) so branch protection keeps working. Legacy /dashboard/change-gate and /dashboard/checkpoint redirect into the Control Board; the legacy automation editor is preserved directly.
A launch essay at /blog/critique-v4-change-control-platform explains v4 vs v1–v3, why passports replaced review-as-product, how AI-powered sandbox reviews continue as evidence runs, and the WHO / WHY / WHAT NOT control framing — plus the six operator surfaces and five control layers.

v3.6.0 — BYOK: CrofAI (nahcrof) direct billing

29 May 2026

At a glance

Bring your own key now supports CrofAI alongside OpenRouter: save a Crof API key in Settings and Critique routes chat and sandbox PR review through https://crof.ai/v1 with OpenRouter catalog ids mapped to Crof model slugs at request time.
The settings panel explains why the in-app model picker can look narrower than Crof’s live catalog (OpenRouter-shaped ids vs Crof slugs; frontier GPT/Claude lanes are OpenRouter-only).
When both Crof and OpenRouter keys are saved, Crof takes precedence; remove the Crof key to fall back to OpenRouter BYOK.

v3.5.0 — BYOA: Cursor, Claude Managed Agents, and OpenAI Codex

29 May 2026

At a glance

Bring your own agent now includes three queued execution paths from completed review runs: Cursor Cloud Agents, Claude Managed Agents (Anthropic API with PR checkout), and OpenAI Codex-style runs via the Responses API—each using your encrypted API key server-side, not Critique execution credits.
Settings adds panels for Cursor, Anthropic, and OpenAI keys; review run pages show handoff promos with queue actions, JSON export, and latest run status for each provider.
Docs and essays at /docs/platform/byoa and new blog posts (critique-welcomes-cursor, critique-welcomes-claude-code, critique-welcomes-codex) explain when to use Remedy vs BYOA and how billing splits between Critique and your agent vendors.

v3.4.0 — Workspace: review, chat, build, and repair in one place

29 May 2026

At a glance

Workspace is now the recommended signed-in surface at /workspace, unifying Critique Chat, Builder history, review context, and Remedy handoffs into one left rail, one repo context bundle, and one credit ledger.
The /chat and /builder routes still work; Workspace is the merged control room — same shell styling, searchable repo and model selectors, Build and Plan lanes, activity timelines, and microphone capture in the composer.
A launch essay at /blog/introducing-critique-workspace explains why the four surfaces merged, what shipped in v0.3.0, and how Checkpoint and automated PR review stay on the GitHub path while humans operate from Workspace.
Docs under Chat & workspace describe the mode handoffs (?mode=build) and how Workspace relates to Dashboard, Remedy, and the GitHub App.
SEO & discovery: /workspace is in the sitemap; the launch essay ships FAQPage + BlogPosting @graph JSON-LD, RSS alternates, and /llms.txt citations so Google and AI answer engines can index and quote the Workspace story.

v3.3.0 — May model catalog, pricing calculator, and marketing polish

29 May 2026

At a glance

The review model catalog refreshed for May 2026: permanent credit floors on DeepSeek V4, MiMo, Ling, and MiniMax M2.7; new lanes including Opus 4.8, Qwen3.7-Max, Gemini 3.5 Flash, Grok Build 0.1, and Step 3.7 Flash; OpenRouter :floor routing for eligible models; and legacy Qwen IDs remapped so saved defaults keep working.
Critique Chat now offers two models — Ling 2.6 Flash (default) and DeepSeek V4 Flash — with pricing and banner copy aligned to the slimmer roster.
The PR review cost calculator moved under /pricing/calculator with sliders, plan-aware ROI math, and a link from the main pricing page; the old /tools/pr-review-cost-calculator URL redirects permanently.
The public /checkpoint landing page was redesigned: editorial layout, stats instead of pill badges, a rule catalog table, a numbered pipeline, and checkbox-driven demo feed without pill clutter.
Blog posts use a centered editorial layout (Instrument Serif headings, DM Sans body); the May model-catalog spring post includes an optional +100 credit claim for eligible readers.
SEO linking improved across compare pages, the sitemap, footer competitor links, and internal blog CTAs.

v3.2.0 — Pricing reset, BYOK OpenRouter, and safer review credit gates

12 May 2026

At a glance

Critique pricing now separates two buying paths: bundled monthly credit plans for teams that want one Critique invoice, and a new Bring Your Own OpenRouter Key harness for teams that want OpenRouter to bill model tokens directly.
Customer-facing tiers now read as Solo, Pro, and Team with larger review credit pools, clearer plan positioning, and pricing copy that explains when to choose bundled credits versus direct provider billing.
The new $8/month BYOK harness covers Critique orchestration—sandboxes, OpenCode PR review runs, chat streaming, GitHub checks, repository retrieval, encrypted key storage, and usage ledgers—while model spend stays in the customer’s OpenRouter account.
Signed-in users can save an OpenRouter key in Settings. New Critique Chat calls and sandbox-native OpenCode PR review calls prefer that key when present, without exposing the saved secret back to the browser.
BYOK model usage is now marked as externally billed in usage records so connected OpenRouter-key runs remain traceable without draining Critique review credits.
Automated PR review now has a low-balance credit gate before expensive work starts, helping teams avoid starting a model-heavy review when the remaining bundled credit pool is already below the review floor.
The public pricing page, home pricing preview, dashboard usage page, machine-readable /pricing.md, SEO metadata, and the new pricing blog all explain the updated credit calculation, Checkpoint’s spend-control role, and the BYOK tradeoff in customer-facing language.

v3.1.0 — Critique Checkpoint, the deterministic PR trust gate

6 May 2026

At a glance

Critique Checkpoint is now the first mini-app under critique.sh: a pre-review trust layer that runs before Critique review, before review credits burn, and before maintainers spend attention on low-trust pull requests.
The public /checkpoint landing page introduces the product with a live gate demo, interactive rule toggles, rule catalog cards, the Checkpoint → Critique Review → Merge pipeline, dry-run positioning, and “coming soon as open-source” copy.
Signed-in teams now get a global Dashboard → Checkpoint overview with gated-today counts, blocked/warned/passed totals, top triggered rules, enabled repository count, recent events, and direct repo configuration links.
Each repository has its own /dashboard/[owner]/[repo]/checkpoint configuration surface with Checkpoint enablement, Dry Run / Warn / Block mode selection, a Checkpoint-only switch for repos that do not want Critique review after a pass, autosaved rule cards, editable thresholds, slop pattern textarea, language allow list, per-login allow/block overrides, and an expandable recent gate log.
Every gate creates a durable event detail page at /dashboard/[owner]/[repo]/checkpoint/events/[eventId] showing the contributor fingerprint, PR metadata, rule-by-rule threshold/value receipt, decision timeline, GitHub check link, and a maintainer “trust contributor” override action.
The backend now persists Checkpoint policies, overrides, and gate events in Prisma; evaluates deterministic identity, activity, and content-shape rules; publishes a dedicated Critique / Checkpoint GitHub check run; and short-circuits the pull request webhook before review queues when Block mode fails.
Checkpoint can run with or without Critique review automation. Repositories can leave reviews on after a pass, or switch to standalone Checkpoint-only mode where the gate publishes its GitHub check and event log without queueing Critique review.
The launch surface now tells the story more clearly in production: the public landing page leans harder into the live gate demo, receipts-style rule evaluation, and Checkpoint-first pipeline, while public visitors fall back cleanly when auth session state or advanced WebGL effects are unavailable.
Final launch hardening adds native Postgres enum migration correction, idempotent gate-event handling, Checkpoint check-run reruns, GitHub check success updates when maintainers override an event, accessible dashboard controls, a Checkpoint launch essay, RSS output, and llms.txt coverage for AI-facing discovery.

v0.3.0 — Workspace unification, live usage truth, and sturdier OpenCode review recovery

5 May 2026

At a glance

Critique advances from v0.2.8 to v0.3.0 by combining the May 5 shipping wave into one coherent release: the product now reads more like one system across Workspace, Builder, review runs, and the public-facing usage story.
/workspace now serves as the shared operating surface for both chat and Builder history, with one left rail for recent chats and Builder runs, cleaner mode-aware actions, and Builder shell styling that matches the rest of the signed-in workspace instead of feeling like a separate console.
Builder is materially more usable for real repo work: repository and model selectors are searchable, Build and Plan lanes are explicit, activity reads as a timeline rather than raw log output, and the composer can publish to a prompt-derived codex/... branch and open or reuse a draft PR after the run finishes.
OpenCode review and Builder execution are more durable under failure. Critique now bounds stalled turns, resets timers on real activity, aborts idle sessions with context, marks stale sandbox runs explicitly, and queues fallback publishing so a wedged sandbox is less likely to strand a pull request without a verdict.
Usage and credits are now much closer to source-of-truth accounting. Critique persists per-call OpenRouter-style usage for both backend and sandbox paths, ships it through signed ingest with QStash retry fallback, reconciles dashboards and totals against the durable ledger, and preserves failed or rerouted calls so large review spend no longer disappears behind misleadingly tiny or empty usage rows.
Signed-in review run detail is easier to scan during active triage: verdict and live status come first, KPI blocks are larger, findings expand into available space when no snapshot exists, in-progress states show skeleton placeholders instead of dead air, and usage/models move behind expandable sections while audit history reads more like a timeline.
Operators also get a clearer support and policy surface: the Help area adds onboarding progress, workspace health, recent failures, guided fix flows, and support shortcuts, while installation-level Light, Standard, and Max presets explain review behavior and model posture before teams save policy changes.
Signed-in workspaces now have a Help surface at /dashboard/help with a setup checklist, live workspace health, a recent-failures feed, an activity timeline, a recommended next step from account state, guided fix flows, contextual doc shortcuts, PR or run id lookup, categorized product feedback, and pre-filled support mailto with an account context bundle—so operators move from “something broke” to the right dashboard view faster.
The Help page now shows at a glance how far through onboarding a workspace is: the checklist displays a completion counter and segmented progress bar, completed steps are marked through, each health row carries a live status indicator that pulses when healthy, the activity section renders as a proper timeline, and guided fix flows are numbered so the right path is obvious without reading every accordion header.
Installation defaults are easier to tune: Light, Standard, and Max review presets explain the expected GitHub check behavior, specialist coverage, rerun rules, execution mode, and model choices before teams save a policy.
Sandbox OpenCode reviews now capture usage through an in-sandbox OpenRouter logger and richer session extraction, improving token, cache, reasoning, cost, and per-call credit traceability across nested agents, follow-up turns, Builder, Remedy, and stalled review runs.

May 2026 — Sandbox OpenCode metering, liveness, and stalled-turn handling

3 May 2026

At a glance

OpenCode sandbox reviews cap how long a single hung turn can block the run, give stall recovery more room before the session times out, and check for liveness on a shorter cadence so quiet stretches surface as activity in the live feed sooner instead of looking frozen.
Nested sandbox agent work inside OpenCode (subagents and related tool paths) now rolls into the same token and credit accounting as the primary review path, so dashboards and per-run usage reflect the full OpenRouter-style workload instead of undercounting nested calls.
Critique captures and merges OpenCode usage snapshots through the session so per-model spend stays traceable after nudges, retries, or partial completions; credits shown in dashboards still follow the stored charged amount wherever it exists so usage views, review summaries, and credit pool math stay aligned with what was actually deducted.

May 2026 — Model catalog refresh across Chat, review, and Remedy

2 May 2026

At a glance

The public model catalog now reflects the new routing ladder: Grok 4.3 replaces the older Grok slot at a lower floor, Qwen3.6-35B-A3B replaces the retired Qwen3.5-27B lane, Ling-2.6-Flash joins as a new 1-credit fast agent model, and Qwen3.6-Max-Preview joins as the higher-end Alibaba lane.
Pricing moved in both directions to match current shelf reality: GLM-5.1 dropped by 1 credit, MiMo v2.5 rose by 1 credit, and KAT Coder Pro V2 returned to its normal 2-credit price after a month-long discounted partnership window.
Critique Chat’s picker was cleaned up to match the intended free-chat roster, and saved defaults pointing at removed Chat IDs remap automatically so existing threads reopen without manual repair.

May 2026 — Speech input across Chat and Builder

2 May 2026

At a glance

Critique Chat voice input now runs through OpenRouter's dedicated speech-to-text path instead of a conversational audio workaround, which makes transcription behavior simpler and easier to track.
Builder now has the same microphone capture flow as Chat, so operators can dictate a build prompt directly into the workspace composer before launching a run.
Speech transcription responses now carry the provider generation identifier alongside the transcript, which gives operators and support a cleaner handle for tracing voice requests.

May 2026 — Repository map intelligence, usage clarity, and snapshot-first dashboards

2 May 2026

At a glance

The dashboard repository vector map now reads like an architecture brief: it highlights central nodes, fan-out risk, isolated files, and test relationships, and a Signals panel calls out why parts of the topology matter—not only that they exist.
Act on this map connects the graph to the rest of Critique: open a node or neighborhood in Chat with a prefilled prompt, copy an agent handoff, or export JSON, Markdown, agent handoff text, or DOT/Graphviz for tools outside the app.
From the same view you can run GitHub-oriented actions—copy review-ready Markdown, draft an issue from the selection, or attach the selected context to a PR by number—without losing map context.
The usage area emphasizes managed execution instead of raw provider plumbing; hidden model visibility rules are respected, and charts and labels resolve to catalog display names where Critique has them.
PR reviews now support a clearer premium-first path: when a repository is set to Auto or Premium OpenCode Review, Critique attempts the full OpenCode-led audit first, falls back to the collector-backed review path if that premium run fails, and labels the resulting review authority so operators can tell whether the output came from native OpenCode, collector fallback, or backend-only synthesis.
Metering and credits now line up with how reviews actually run: multi-pass work (reasoning plus structured output) rolls into one honest usage picture, OpenRouter-style native and reasoning token fields parse consistently, sandbox and OpenCode completions surface customer-safe labels, and credits no longer snap tiny when a large uncataloged-token call would warrant more—legacy rows can correct upward when token evidence shows they were too low. Assistant and fix-prompt traffic now leaves durable usage records.
OpenRouter traffic defaults site attribution to Critique’s public home, and review calls send the current attribution headers expected by the provider.
Dashboard GitHub data defaults to persisted snapshots in Critique: full reconciliation runs after Connect or Re-Sync repositories, not on ordinary loads of usage analytics or review-run detail—the UI states that repository access stays cached until you re-sync.

April 2026 — PR review runtime unification and live-run clarity

30 April 2026

At a glance

PR review routing now separates OpenCode agent, Collect sandbox, and GitHub-backed review types so operators can choose the exact backend and understand whether an agent stream should exist.
OpenCode agent reviews now start from a single sandbox path: Critique prepares the PR context, launches the OpenCode audit, streams the run, and consumes the final review output without a separate collector sandbox blocking the agent.
Review usage and live-run diagnostics are more accountable: token-bearing calls no longer display zero credits from legacy rows, and sandbox setup or failure events remain visible even when the agent never reaches a rich tool stream.

April 2026 — OpenCode operator hardening for longer review sessions

29 April 2026

At a glance

The live review stream is easier to read at a glance: tool calls, thinking, and assistant messages now surface with clearer tags and cleaner output, and the dashboard shows whether the sandbox, OpenRouter, and QStash prerequisites are ready.
Long-running sandbox reviews now recover more gracefully when the agent goes quiet. Critique can send a follow-up into the same OpenCode session, and if the sandbox stalls entirely, the review can fall back to the backend publish path instead of dying silently.
Review prompts now push for deeper autonomous work by default: Critique expects subagents to be used early on non-trivial changes, avoids permission-seeking loops, and treats missing env-dependent validation as a cue to switch to targeted checks instead of stalling.

April 2026 — Workspace Build: branch-aware sandboxes and the full Builder model roster

29 April 2026

At a glance

Build mode in /workspace now offers searchable repository and model pickers; models come from the same remedy / Builder catalog as /builder, filtered by the selected repo’s plan (ultra-only checkpoints stay ultra-only).
You can choose a Git branch or tag (GitHub-backed list plus manual ref), optionally set a sandbox-only branch (git checkout -b) before OpenCode runs, and those choices persist on the Builder job record for later inspection.
The builder execution path checks out the requested ref in E2B, then creates the local work branch when provided—still no automatic push to GitHub; PR flow stays manual.

April 2026 — Transparent automation credits end to end

24 April 2026

At a glance

Usage analytics now reconcile OpenRouter-style payloads (including nested cache fields) across review, remedy, sandbox OpenCode completions, and builder jobs so token math and credits derive from identical inputs.
The automation ledger separates review-agent, remedy, and builder rows; each exposes prompt versus completion totals, billed credits, latency, and textual purpose fields so accountants can chase down the exact workload.
When sandbox PR reviews emit granular OpenRouter completions, aggregated lead rows skip double-accounting—the quota matches one OpenRouter session on the bookkeeping side while granular rows carry the explanatory trail.

Commit- and PR-level verification stays in your internal source tools; this page intentionally avoids linking to private repositories.

Search paths

Looking for the product pages behind these releases?

The changelog helps with branded discovery, but the pages below are the ones built for comparison, buyer research, and broader AI code review search intent.

Open source