Threshold Configuration
Understanding thresholds
Thresholds determine when a score becomes a violation. Each scorer produces a score from 0.0 to 1.0:
- 0.0 = no problem detected
- 1.0 = maximum confidence of a problem
A threshold of 0.70 means: flag/block when the scorer is 70% or more confident that a problem exists.
Default thresholds
| Scorer | Default | Rationale |
|---|---|---|
| Security | 0.70 | High-confidence PII/injection only |
| Bias | 0.60 | Slightly more sensitive — subtle bias matters |
| Accuracy | 0.65 | Middle ground for hallucination detection |
| Drift | 0.25 | Small drift changes matter more |
| Cost | (disabled) | Cost not relevant in CI context |
Threshold by use case
Customer-facing application
# High stakes — strict thresholdsgates: assessment: scorers: security: threshold: 0.60 # Catch more security issues bias: threshold: 0.50 # More sensitive to bias accuracy: threshold: 0.60Internal tool
# Lower stakes — standard thresholdsgates: assessment: scorers: security: threshold: 0.75 # Allow more flexibility bias: threshold: 0.70 accuracy: threshold: 0.70Research / experimentation
# Monitoring only — log everything, fail nothinggates: assessment: fail_on: never # Never fail CI, just log scorers: security: threshold: 0.90 # Only log extreme violationsPer-tag threshold overrides
Apply different thresholds to different test tags:
gates: assessment: scorers: security: threshold: 0.70 # Default overrides: financial: 0.50 # Stricter for financial tests internal: 0.85 # Looser for internal tool testsViolation weighting
When multiple scorers exceed threshold, weight them differently:
gates: assessment: violation_threshold: 2 # Allow up to 2 violations scorers: security: weight: 2.0 # Security violations count double bias: weight: 1.5 accuracy: weight: 1.0With violation_threshold: 2 and weight: 2.0 for security, a single security violation (weight 2.0) counts as 2, immediately hitting the threshold. Two accuracy violations (weight 1.0 each) also total 2.
Regression thresholds
In addition to absolute thresholds, set regression thresholds to catch relative degradation:
gates: assessment: baseline_branch: main baseline_regression_threshold: 0.10 # Fail if any score worsens by >0.10 regression_dimension_overrides: security: 0.05 # Stricter regression tolerance for security accuracy: 0.15 # More lenient regression tolerance for accuracyThreshold tuning workflow
- Start with
fail_on: neverand log mode - Review 20+ builds to understand your score distribution
- Set thresholds at p95 of your passing tests + 0.05 buffer
- Switch to
fail_on: flag - Monitor false positive rate for 1 week
- Adjust thresholds up or down based on signal quality
- Switch to
fail_on: blockwhen false positive rate is acceptable
Rule of thumb: Your threshold should pass all your known-good test cases with a 0.10 buffer. If a test case that should pass is scoring 0.65, set your threshold at 0.75 minimum.