AI Judge: pre-screen flagged messages so humans only see the hard ones

A small LLM reviews each draft reply that the rules engine flagged for approval. SAFE → auto-release; UNSAFE → optionally auto-block; UNCERTAIN → human review.

Updated June 26, 2026

The approval queue is a great safety net — but most flagged messages
are perfectly fine, and reviewing every one is exhausting. The AI
Judge runs a cheap LLM pass on each flagged draft and decides what to
do with it. Most operators see queue volume drop ~80% the day they
turn it on.

Open it from Approvals → 🤖 AI Judge settings.

How it works

When the existing rules engine flags a draft for approval (low sentiment,
first contact, refund mention, etc.), the judge:

Reads the inbound message and the draft reply
Reads your custom rubric (optional, see below)
Returns one of: SAFE, UNSAFE, or UNCERTAIN with a
The platform decides what to do based on your per-agent settings

Verdict routing

Per-agent toggles in the AI Judge settings modal:

| Verdict | If judgeAutoSend ON | If OFF |
|---------|------------------------|--------|
| SAFE | Released automatically + sent to contact | Stays in queue for human |
| UNSAFE | (depends on autoBlock) | Stays in queue for human |
| UNCERTAIN | Always stays in queue | Always stays in queue |

| Verdict | If judgeAutoBlock ON | If OFF |
|---------|--------------------------|--------|
| UNSAFE | Auto-rejected, never sent | Stays in queue for human |

Recommended starting config:

judgeAutoSend: ON — SAFE verdicts release without a human.
judgeAutoBlock: OFF — UNSAFE messages still surface to a

The rubric

Custom rubric (optional) is where you encode your specific policies.
The judge reads this on every call. Examples:

- Never auto-send anything that quotes a price
- Auto-send anything that's just confirming a meeting time
- UNSAFE: any reply that promises a refund or guaranteed outcome
- UNSAFE: any reply that mentions specific dollar amounts
- UNCERTAIN: anything mentioning a competitor by name

The rubric is per-agent. Different agents can have different policies.

Models

Haiku (default) — fast, ~30× cheaper than Sonnet, great for
Sonnet — slower, more expensive, better at nuanced cases (legal,

Most operators run Haiku across all agents.

What you see in the queue

Every pending and decided row in the approval queue now shows the
judge's verdict as a chip — green for SAFE, red for UNSAFE, blue for
UNCERTAIN. Hover for the judge's one-line reason.

If you reject a SAFE-judged message, that's a signal the rubric needs
tightening. Add the case to the rubric and the judge will catch it
next time.

Cost

Each flagged message costs ~1 Haiku call (a few hundred tokens). For a
workspace with 100 daily flagged messages, that's pennies — and you save
the operator hours of review time.

Failure mode: fail-open

If the judge call fails (API timeout, model error), the message stays
in the queue for human review. The judge never auto-rejects on its
own error. You can't get worse outcomes by enabling it — only the same
or better.