Skip to content

Threshold Configuration

Understanding thresholds

Thresholds determine when a score becomes a violation. Each scorer produces a score from 0.0 to 1.0:

  • 0.0 = no problem detected
  • 1.0 = maximum confidence of a problem

A threshold of 0.70 means: flag/block when the scorer is 70% or more confident that a problem exists.

Default thresholds

ScorerDefaultRationale
Security0.70High-confidence PII/injection only
Bias0.60Slightly more sensitive — subtle bias matters
Accuracy0.65Middle ground for hallucination detection
Drift0.25Small drift changes matter more
Cost(disabled)Cost not relevant in CI context

Threshold by use case

Customer-facing application

# High stakes — strict thresholds
gates:
assessment:
scorers:
security:
threshold: 0.60 # Catch more security issues
bias:
threshold: 0.50 # More sensitive to bias
accuracy:
threshold: 0.60

Internal tool

# Lower stakes — standard thresholds
gates:
assessment:
scorers:
security:
threshold: 0.75 # Allow more flexibility
bias:
threshold: 0.70
accuracy:
threshold: 0.70

Research / experimentation

# Monitoring only — log everything, fail nothing
gates:
assessment:
fail_on: never # Never fail CI, just log
scorers:
security:
threshold: 0.90 # Only log extreme violations

Per-tag threshold overrides

Apply different thresholds to different test tags:

gates:
assessment:
scorers:
security:
threshold: 0.70 # Default
overrides:
financial: 0.50 # Stricter for financial tests
internal: 0.85 # Looser for internal tool tests

Violation weighting

When multiple scorers exceed threshold, weight them differently:

gates:
assessment:
violation_threshold: 2 # Allow up to 2 violations
scorers:
security:
weight: 2.0 # Security violations count double
bias:
weight: 1.5
accuracy:
weight: 1.0

With violation_threshold: 2 and weight: 2.0 for security, a single security violation (weight 2.0) counts as 2, immediately hitting the threshold. Two accuracy violations (weight 1.0 each) also total 2.

Regression thresholds

In addition to absolute thresholds, set regression thresholds to catch relative degradation:

gates:
assessment:
baseline_branch: main
baseline_regression_threshold: 0.10 # Fail if any score worsens by >0.10
regression_dimension_overrides:
security: 0.05 # Stricter regression tolerance for security
accuracy: 0.15 # More lenient regression tolerance for accuracy

Threshold tuning workflow

  1. Start with fail_on: never and log mode
  2. Review 20+ builds to understand your score distribution
  3. Set thresholds at p95 of your passing tests + 0.05 buffer
  4. Switch to fail_on: flag
  5. Monitor false positive rate for 1 week
  6. Adjust thresholds up or down based on signal quality
  7. Switch to fail_on: block when false positive rate is acceptable

Rule of thumb: Your threshold should pass all your known-good test cases with a 0.10 buffer. If a test case that should pass is scoring 0.65, set your threshold at 0.75 minimum.