The Review Committee

A diff that passes CI is a diff that doesn’t crash on the first build. It’s not a diff a senior engineer would approve. The gap between “compiles, tests pass” and “merge this” is where most agent output dies — and where most teams revert to manual review of every line because they can’t trust the agent to police itself.

The Review Committee is SprintLoop’s mechanism for closing that gap. Three reviewer agents — Architect, Security, QA — read every closed lane’s diff and emit a verdict before the human sign-off is requested. They don’t replace the human reviewer; they raise the floor.

What each reviewer does

Each reviewer is a specialized agent with its own system prompt, its own tool set, and its own scoring rubric. They run in parallel after the lane signals “ready for review.”

Reviewer	Looks for	Tools
Architect	Layering violations, circular deps, ownership boundary breaks, public API stability	Repo map, dependency graph, exported-symbol diff
Security	Auth bypasses, missing input validation, secret leaks, RLS gaps, dangerous functions	SAST output, dependency CVE check, RLS policy diff
QA	Coverage drop, missing edge-case tests, broken contract tests, flaky-test reintroduction	Coverage tool output, test framework AST

Each reviewer emits one of three verdicts: Approve, Has questions, or Block. A verdict carries a rationale — usually 2–4 sentences plus links to the specific lines or files that triggered it — and a confidence score the comparator uses when racing.

How verdicts gate merges

The default policy:

All three reviewers approve → sign-off slot opens. Owner clicks Approve, lane merges.
One or more reviewers say “Has questions” → sign-off slot opens with a yellow banner naming the questions. Owner can answer inline or override and merge.
Any reviewer says “Block” → sign-off slot is disabled. Owner must either fix the issue (re-dispatch with corrections) or explicitly override with a documented reason.

Override is logged as a signed entry on the lane’s chain. It carries the reason text and the overrider’s identity. Auditors look at override frequency and reasons; if a workspace overrides Block verdicts 30% of the time, the rule is broken and someone should retune the reviewer.

The policy is configurable per workspace. Common variations:

Strict mode. Block from any reviewer cannot be overridden. Used by teams in regulated industries.
Architect-only mode. Security and QA become advisory; only Architect verdicts gate merge. Used when SAST and coverage are enforced by CI separately.
Off. Reviewers run, verdicts are visible, but never gate merge. Used during evaluation periods or for very small teams.

What good reviewer output looks like

The Security reviewer on a recent lane that touched a webhook handler:

Block. The new handler at src/handlers/stripe.ts:34 reads req.body directly into the database without verifying Stripe’s signature header. This bypasses the signature check that’s enforced on the existing handlers in this file. Either call verifyStripeSignature() (line 12) before the insert, or document why this endpoint is allowed to skip verification. Confidence 0.91.

That’s the floor we want. Specific line, specific file, specific contradiction with existing code, specific remediation. Vague verdicts (“This might be insecure”) don’t make it through the reviewer’s own self-check; they get rewritten or downgraded to “Has questions.”

When the Architect reviewer renders a Block, it’s usually about a layering violation — code in a leaf module reaching back into a higher-level module, or a service taking a direct dependency on another service’s internal types. The rationale will name the violated boundary and the file where the boundary is defined.

QA Block verdicts are almost always either “you removed a test for behavior X without explaining why” or “the new code has a branch with no test.” Coverage thresholds alone don’t trigger a QA Block; missing tests for changed behavior do.

Adding a custom reviewer

Workspaces can add custom reviewers under Settings → Review Committee → Add reviewer. A custom reviewer needs:

A name (e.g., Accessibility, Database performance, Localization).
A scope — which paths trigger this reviewer. Default is everywhere.
A system prompt describing what the reviewer is looking for.
An optional tool spec — e.g., a custom edge function the reviewer can call to get domain data.
A merge policy — block, advise, or off.

The most common custom reviewer in healthcare workspaces is HIPAA, which checks for PHI in logs, missing encryption-at-rest annotations, and audit-log gaps. The most common in fintech is Compliance, which flags new data flows that need to be added to the SOC 2 system description.

Custom reviewers are rate-limited at the workspace level. Each diff currently runs against up to 8 reviewers concurrently; beyond that, additional reviewers queue and surface verdicts after the first batch finishes.

Tuning a reviewer

If a reviewer is wrong too often — too many false-positive Blocks, or too many missed issues that the human catches in sign-off — open the reviewer’s settings and look at its Calibration tab. The tab shows recent verdicts side-by-side with the human sign-off outcome. Five common patterns:

High false-positive Blocks. Tighten the prompt to specify what doesn’t count as a violation. Add 2–3 examples of approved diffs that the reviewer wrongly Blocked.
High false-negative misses. Loosen the scope: the reviewer is only looking at certain paths, and the issues are slipping in through other paths. Or the prompt is missing a class of issue entirely.
Verdict drift. Same code triggers different verdicts on different days. Usually a sign the prompt is over-anchored on style and missing the load-bearing rule.
Confidence inflation. Every verdict comes back at 0.95+ confidence. Lower the temperature, ask the reviewer to cite a specific line for every claim.
Slow. Verdict takes >60s. Trim the tool set or split into two reviewers.

How the human sign-off changes

The human’s job after the Review Committee runs is not “read the diff line by line.” It’s:

Read the verdicts. Usually 30 seconds.
Spot-check anything the reviewers flagged as “Has questions.”
If you disagree with a Block, override with a written reason.
Approve.

A human sign-off after a clean reviewer pass is typically 2–4 minutes for a 200-line diff. The same human reviewing the diff cold, without the reviewer pass, would take 15–25 minutes for the same change. The committee earns its keep by making the human’s review faster and more focused — not by replacing it.