The treatment modalities

🟡 Partial — Treatment by DIRECT code change is built and demonstrated across the five guion scenarios (iris, loan, retina, student, talent); the prompt modality is in design (santander, Art. 25) and HOTL is declared (spine).

The treatment modalities are ways to apply ISO 23894 §6.5 (risk treatment) in a versioned AI system. All three close the same red/green loop: the treatment is committed to git, froga run re-measures, and the evidence bundle is re-anchored.

Modality 1 — Versioned code change

The treatment step’s diff is the real change. It takes many forms depending on the risk; all are a code change committed to git that froga run re-measures:

iris (landing) — the treatment adds the petal features to training: V1 sepal-only measures poorly (RED) → V2 with petals (GREEN). Clean red→green arc.
loan (credit, Annex III §5(b)) — the estimator goes from plain LogisticRegression to a fairlearn reduction (DemographicParity), and fairness is measured OUT-OF-FOLD over all 1,000 rows (5-fold, not a small held-out). V1: the bootstrap CI crosses the 0.092 threshold → AMBER (the point already exceeds, but a small held-out can’t demonstrate it). V2, at the SAME OOF power, the CI falls entirely below → conclusive GREEN. An honest amber→green arc: the green isn’t fabricated, it’s demonstrated with proper measurement.
retina (clinical screening, SaMD MDR → high-risk) — V2 replaces train.py with class_weight='balanced' + decision threshold 0.30. Sensitivity (recall) goes from RED (< 0.80) to GREEN (≈ 0.88). red→green arc (GPU).
student (education, Annex III §3) — the treatment generalises the minors’ data (exact age → terciles): k-anonymity goes from k=1 (216 unique students, RED) to k=27 (GREEN), closing the data-minimisation risk for vulnerable subjects (GDPR Art. 5(1)(c)). Gender fairness, measured properly (cluster-aware OOF over mat+por, n=1044), is a certified-green control (audit, not the gate): at honest n the disparity is genuinely small. red→green arc on the minors control.
talent (employment, Annex III §4) — the treatment removes the leaky feature (a label proxy): the SAFETY defect (feature-leakage) goes from RED to GREEN. Data adequacy (n=215, capped) and the underpowered fairness stay as honest advisory findings — measured, documented, monitored (Art. 72) and consciously accepted at approval, NOT certified (Option B). red→green arc on the certifiable defect.

# 1. Apply the treatment: replace train.py (the diff IS the change)
git add train.py
git commit -m "treat: apply the risk treatment (§6.5)"

# 2. Re-run the pipeline and re-anchor the evidence
froga run

# 3. Check the control state
froga status          # the treated control turns GREEN (or stays honest advisory where the data can't certify)

The bundle’s git log (.froga/bundle.json) records the treatment: the commit that applied the change is visible and attributable.

Modality 2 — Prompt adjustment (LLM case, Art. 25)

Reference scenario: santander (LLM-based credit, EU AI Act Art. 25 — the prompt activates the provider condition).

In an LLM-based AI system the prompt is the artifact that controls model behaviour, analogous to training code in supervised systems. Treating a detected bias is two-stage: (1) a versioned prompt (instruction + rubric + few-shot examples, whose diff IS the treatment) and (2) on top, a debias-only LoRA adaptation (a rank-limited bias-direction ablation — the same low-rank mathematics as tensor-network compression; see the Art. 25 design) that removes bias without improving capability.

🚧 Planned — The prompt modality (santander, Art. 25) is in DESIGN — it needs a GPU de-risk of the debias LoRA. See the Art. 25 design spec.

Specific challenges: bias detection in text is less stable than in binary classifiers; at small scale (sub-1B) bias and capability may not separate cleanly → you measure both the bias delta AND the accuracy delta and show both, without pretending.

Modality 3 — Human oversight (HOTL)

When the treatment is not a model change but a human-oversight provision (HITL/HOTL, Art. 14): a role with competence/authority that reviews or can reverse the output. It’s declared in the froga.yaml (oversight_mode_declared) and travels attested in the dossier. The oversight prescriptor (elevated line) types which level applies given exposure and autonomy.

The loop is the same across all modalities

Step	Action
Control in red	`froga run` reports a `blocking` control outside the threshold
Choose modality	Code change / prompt adjustment / human oversight
Commit the treatment	`git commit` documents the change with authorship and date
Re-measure	`froga run` re-runs the pipeline and re-anchors the evidence
Control in green (or honest advisory)	If the blocking gate passes, `froga reconstruct` classifies the cycle as CLOSED; where the data can’t certify (e.g. talent’s fairness power at n=215) the control stays advisory —measured, documented, monitored and consciously accepted—, not a fabricated green

The detail of the red/green loop is in The red/green loop.