Skip to content

Calibration & LR output

Use when: your verifier outputs raw scores (distances, fractions, probabilities) that need to be turned into calibrated log-likelihood ratios before an evidential report. Don't use when: your scorer is already a well-calibrated LR — skip straight to evaluation. Expect: a scorer wrapper whose predict_proba / log_lr outputs are probability-calibrated against a labelled development set.

Raw scores from a verifier are rarely honest probabilities out-of-the-box. This page covers the two standard post-hoc calibration methods plus the chain-of-custody metadata that turns a calibrated score into a court-ready LR statement.

Verification systems produce raw scores. Forensic reporting expects calibrated posteriors converted to likelihood ratios — the evidential semantics courts understand. tamga.forensic provides both steps.

The workflow

flowchart LR
  S["raw score<br>(GI, Delta, SVM, ...)"] --> C["CalibratedScorer<br>fit on (cal_scores, cal_labels)"]
  C --> P["calibrated p(H1|E)"]
  P --> L["log<sub>10</sub>(LR)"]

  style C fill:#FBF3DE,stroke:#C9A34A
  style L fill:#FBF3DE,stroke:#C9A34A

The calibration fold must be separate from the test fold. Overfitting the calibrator on the test set gives optimistic C_llr and ECE.

CalibratedScorer

Use when: you want to wrap any scorer (GeneralImpostors, Unmasking, a custom Delta classifier) so it produces calibrated probabilities and log-LRs in one call. Don't use when: your upstream scorer already emits calibrated output. Expect: score(q, k) returns raw; predict_proba(q, k) returns calibrated probability; log_lr(q, k) returns the evidential quantity.

Wraps a 1-D monotone calibrator — either Platt (logistic) or isotonic.

from tamga.forensic import CalibratedScorer

scorer = CalibratedScorer(method="platt").fit(calibration_scores, calibration_labels)
probs   = scorer.predict_proba(test_scores)
log_lrs = scorer.predict_log_lr(test_scores, base=10.0)

Choosing the method

Method When
"platt" Small calibration sets (< 100 / class). Parametric; assumes sigmoidal mapping. Robust.
"isotonic" Larger calibration sets (≥ 100 / class). Non-parametric; flexible.

Both are monotone — rank order of inputs is preserved, so AUC is unchanged.

Platt calibration

Use when: your scorer's decision boundary is approximately linear in log-odds — logistic-regression-like shape. Fewer parameters than isotonic; needs fewer labelled trials. Don't use when: your score-to-probability relationship is non-monotonic or sharply bent — Platt's sigmoid will underfit. Expect: a scalar-parameter sigmoid fit; predict_proba outputs calibrated probabilities via 1 / (1 + exp(a*score + b)).

Isotonic calibration

Use when: your scorer's decision boundary is non-linear and you have enough labelled trials (≥500) to fit a non-parametric curve. Don't use when: your dev set is small — isotonic overfits with few points. Expect: a piecewise-constant calibration function; predict_proba outputs the monotone increasing step function.

Log-LR conversion

Under flat priors ($p(H_1) = p(H_0) = 0.5$), log-LR is just the logit of the calibrated posterior:

$$ \log_{10}(\text{LR}) = \log_{10}\left(\frac{p(H_1 \mid E)}{1 - p(H_1 \mid E)}\right) $$

from tamga.forensic import log_lr_from_probs, log_lr_from_probs_with_priors

log_lrs = log_lr_from_probs(probs)                                # flat priors
log_lrs = log_lr_from_probs_with_priors(probs, prior_target=0.3)  # non-flat

Use log_lr_from_probs_with_priors when the calibration set was NOT balanced — the function corrects the reported LR back to prior-free magnitudes.

Verbal scale

Report log-LR magnitudes alongside the six-band Nordgaard et al. (2012) / ENFSI (2015) scale:

log₁₀(LR) Verbal support
0 – 1 weak
1 – 2 moderate
2 – 3 moderately strong
3 – 4 strong
4 – 5 very strong
> 5 extremely strong

The build_forensic_report template renders this scale automatically beside each method's LR value. See Reporting.

Reference

CalibratedScorer

Fit a monotone calibrator mapping raw scores to calibrated posteriors.

Parameters:

Name Type Description Default
method ('platt', 'isotonic')
  • platt: one-dimensional LogisticRegression (Platt scaling). Parametric; assumes the score-to-probability mapping is sigmoidal. Robust on small calibration sets.
  • isotonic: IsotonicRegression. Non-parametric, monotone. More flexible but requires more calibration data (rule of thumb: >= 100 trials per class).
"platt"

Attributes:

Name Type Description
method str
fitted bool

predict_proba

predict_proba(scores: ndarray) -> np.ndarray

Return calibrated p(H1 | score) for each input score.

predict_log_lr

predict_log_lr(scores: ndarray, *, base: float = 10.0) -> np.ndarray

Calibrated posteriors → log-LR (flat-prior). Thin wrapper around log_lr_from_probs.

tamga.forensic.lr.log_lr_from_probs

log_lr_from_probs(probs: ndarray, *, eps: float = 1e-12, base: float = 10.0) -> np.ndarray

Convert calibrated posteriors p(H1 | E) to log-likelihood ratios, flat-prior case.

Under flat priors (p(H1) = p(H0) = 0.5), log-LR = log(p / (1 - p)) (the logit). When calibrated on a balanced set, this is the forensically-appropriate evidential output.

Parameters:

Name Type Description Default
probs ndarray

Calibrated probabilities of the target hypothesis, in [0, 1].

required
eps float

Clip bound to avoid log(0). Defaults to 1e-12.

1e-12
base float

Logarithm base. Defaults to 10 (the standard forensic convention).

10.0

Returns:

Type Description
ndarray

log_base(LR) for each trial.

tamga.forensic.lr.log_lr_from_probs_with_priors

log_lr_from_probs_with_priors(probs: ndarray, *, prior_target: float, eps: float = 1e-12, base: float = 10.0) -> np.ndarray

Convert p(H1 | E) to log-LR with a user-specified prior.

posterior-odds = LR * prior-odds, so LR = posterior-odds / prior-odds.

Parameters:

Name Type Description Default
probs ndarray

Calibrated probabilities of the target hypothesis, in [0, 1].

required
prior_target float

Prior probability of H1 used when training the calibrator, in (0, 1).

required