Threshold Configuration

Understanding thresholds

Thresholds determine when a score becomes a violation. Each scorer produces a score from 0.0 to 1.0:

0.0 = no problem detected
1.0 = maximum confidence of a problem

A threshold of 0.70 means: flag/block when the scorer is 70% or more confident that a problem exists.

Default thresholds

Scorer	Default	Rationale
Security	0.70	High-confidence PII/injection only
Bias	0.60	Slightly more sensitive — subtle bias matters
Accuracy	0.65	Middle ground for hallucination detection
Drift	0.25	Small drift changes matter more
Cost	(disabled)	Cost not relevant in CI context

Threshold by use case

Customer-facing application

# High stakes — strict thresholds
gates:
  assessment:
    scorers:
      security:
        threshold: 0.60    # Catch more security issues
      bias:
        threshold: 0.50    # More sensitive to bias
      accuracy:
        threshold: 0.60

Internal tool

# Lower stakes — standard thresholds
gates:
  assessment:
    scorers:
      security:
        threshold: 0.75    # Allow more flexibility
      bias:
        threshold: 0.70
      accuracy:
        threshold: 0.70

Research / experimentation

# Monitoring only — log everything, fail nothing
gates:
  assessment:
    fail_on: never         # Never fail CI, just log
    scorers:
      security:
        threshold: 0.90    # Only log extreme violations

Per-tag threshold overrides

Apply different thresholds to different test tags:

gates:
  assessment:
    scorers:
      security:
        threshold: 0.70         # Default
        overrides:
          financial: 0.50       # Stricter for financial tests
          internal: 0.85        # Looser for internal tool tests

Violation weighting

When multiple scorers exceed threshold, weight them differently:

gates:
  assessment:
    violation_threshold: 2      # Allow up to 2 violations
    scorers:
      security:
        weight: 2.0             # Security violations count double
      bias:
        weight: 1.5
      accuracy:
        weight: 1.0

With violation_threshold: 2 and weight: 2.0 for security, a single security violation (weight 2.0) counts as 2, immediately hitting the threshold. Two accuracy violations (weight 1.0 each) also total 2.

Regression thresholds

In addition to absolute thresholds, set regression thresholds to catch relative degradation:

gates:
  assessment:
    baseline_branch: main
    baseline_regression_threshold: 0.10  # Fail if any score worsens by >0.10
    regression_dimension_overrides:
      security: 0.05     # Stricter regression tolerance for security
      accuracy: 0.15     # More lenient regression tolerance for accuracy

Threshold tuning workflow

Start with fail_on: never and log mode
Review 20+ builds to understand your score distribution
Set thresholds at p95 of your passing tests + 0.05 buffer
Switch to fail_on: flag
Monitor false positive rate for 1 week
Adjust thresholds up or down based on signal quality
Switch to fail_on: block when false positive rate is acceptable

Rule of thumb: Your threshold should pass all your known-good test cases with a 0.10 buffer. If a test case that should pass is scoring 0.65, set your threshold at 0.75 minimum.