The red/green loop (measure vs threshold)
🟡 Partial — The statistical power caveat (UNDERPOWERED) is advisory in v1: it is flagged, it does not bring down the gate.
A control associates a metric with a threshold and a blocking condition. The state of the control — red or green — is the state of the treatment cycle for that risk.
What defines a control
Section titled “What defines a control”A control consists of three elements:
- Metric — the measured value (demographic parity difference, fairness metric, test coverage, etc.)
- Threshold — the limit value declared in
sei.yaml, derived from the risk appetite - Mode —
blocking(brings down the gate on failure) oradvisory(flags the failure, does not block)
The risk gate is the conjunction of all blocking controls. If any is red, sei run returns exit ≠ 0 even though the evidence is still anchored and signed.
Real example: the loan and demographic parity
Section titled “Real example: the loan and demographic parity”The loan scenario (consumer credit, EU AI Act Annex III §5) measures the demographic parity difference (demographic_parity_diff, difference in approval rates between gender groups) as the fairness control. The gate passes if < 0.03.
In the demonstrated run:
measured metric: demographic_parity_diff = 0.0151declared threshold: 0.03point estimate: PASSES (0.0151 < 0.03)However, the engine computes a 95% cluster-aware bootstrap confidence interval:
bootstrap CI [0.001, 0.072]The upper bound of the CI — 0.072 — crosses the threshold 0.03. The point estimate passes the gate, but the sample size is insufficient to distinguish the estimator from the threshold. The engine emits the UNDERPOWERED warning.
Statistical reliability: CI, not just point estimates
Section titled “Statistical reliability: CI, not just point estimates”Reporting only the point estimate is insufficient practice for high-risk systems: a wide CI can conceal that a system “passes” the threshold by statistical chance.
The sei engine reports confidence intervals per control:
- Cluster-aware bootstrap — when observations are not independent (e.g., multiple images per patient in the medical case), the bootstrap is stratified by cluster (patient) to produce a CI that respects intra-cluster dependence.
- Crossed threshold — if the CI crosses the threshold, the engine emits
UNDERPOWEREDwith the explicit CI, regardless of the point estimate.
This pattern applies to both the loan scenario (by sample) and the medical scenario (by patient).
The “refactor” is the treatment
Section titled “The “refactor” is the treatment”When a control is red, the next step is to choose the lowest-cost treatment that brings the metric to green. The available options — code change, parameter adjustment, dataset change — are described in Treatment modalities.
The treatment committed to git is the act that closes the cycle and shifts the control state from red to green on the next sei run.