Choosing a method¶
Not sure which tamga method fits your question? This page answers "I want to do X — what should I reach for?" for the most common cases. The method names link to their primary entries in Methods and Forensic toolkit for the full detail.
Attribution, comparison, exploration¶
| I want to… | Required data | Method | Headline metric | Tutorial |
|---|---|---|---|---|
| Attribute 1 questioned doc to N candidate authors | N authors × ~2k+ words known each; 1 questioned doc | CosineDelta (robust default) or BurrowsDelta (classic) |
nearest-author rank | Federalist |
| Cluster an unknown corpus by stylistic similarity | 20+ docs, labels optional | PCAReducer + KMeansCluster or HDBSCANCluster |
silhouette, visual inspection | — |
| Compare two pre-defined author groups | 10+ docs per group | ZetaClassic or ZetaEder |
per-word distinctiveness score | — |
| Classify docs into groups with ML | 20+ docs per class | build_classifier + cross_validate_tamga |
CV accuracy / F1 | — |
| Reduce features for visualisation | any FeatureMatrix |
PCAReducer / UMAPReducer / TSNEReducer / MDSReducer |
visual inspection | — |
| Bayesian single-candidate attribution | N candidates × ≥1k words; 1 questioned doc | BayesianAuthorshipAttributor |
posterior probability per candidate | — |
| Bootstrap-consensus tree across MFW bands | 10+ docs, multiple MFW bands | BootstrapConsensus |
Newick tree with clade support | — |
Forensic — one-case verification¶
| I want to… | Required data | Method | Headline metric | Tutorial |
|---|---|---|---|---|
| Verify "same author?" between 1 questioned doc and 1 candidate | 1 candidate's known writings + an impostor pool (~100 docs) | GeneralImpostors |
calibrated log-LR + C_llr |
PAN-CLEF |
| Verify same-author with topic-robustness | Q + K long prose + impostor pool | Unmasking |
accuracy-drop curve | PAN-CLEF |
| Minimise topic bias in verification features | any corpus | CategorizedCharNgramExtractor with categories=("prefix","suffix","punct"), or distort_corpus(mode="dv_ma") |
same as upstream verifier | PAN-CLEF |
| Turn raw verifier scores into evidential LR | verifier outputs on labelled dev trials | CalibratedScorer + compute_pan_report |
log-LR, C_llr, ECE |
PAN-CLEF |
| Generate a court-ready LR-framed report | Result with chain-of-custody fields |
build_forensic_report |
ENFSI verbal scale | — |
How to read this page¶
- "Required data" is a minimum — more is always better.
- "Headline metric" is the output you should quote in write-ups, not the only output the method produces.
- When two methods are listed for the same task, the first one is the recommended default and the second is a published alternative worth considering.
Next¶
- Methods — full catalogue with gloss + detail per method.
- Features — extractor catalogue with gloss + detail per extractor.
- Forensic toolkit — calibration, evaluation, reporting.