Choosing a method¶

Not sure which tamga method fits your question? This page answers "I want to do X — what should I reach for?" for the most common cases. The method names link to their primary entries in Methods and Forensic toolkit for the full detail.

Attribution, comparison, exploration¶

I want to…	Required data	Method	Headline metric	Tutorial
Attribute 1 questioned doc to N candidate authors	N authors × ~2k+ words known each; 1 questioned doc	`CosineDelta` (robust default) or `BurrowsDelta` (classic)	nearest-author rank	Federalist
Cluster an unknown corpus by stylistic similarity	20+ docs, labels optional	`PCAReducer` + `KMeansCluster` or `HDBSCANCluster`	silhouette, visual inspection	—
Compare two pre-defined author groups	10+ docs per group	`ZetaClassic` or `ZetaEder`	per-word distinctiveness score	—
Classify docs into groups with ML	20+ docs per class	`build_classifier` + `cross_validate_tamga`	CV accuracy / F1	—
Reduce features for visualisation	any `FeatureMatrix`	`PCAReducer` / `UMAPReducer` / `TSNEReducer` / `MDSReducer`	visual inspection	—
Bayesian single-candidate attribution	N candidates × ≥1k words; 1 questioned doc	`BayesianAuthorshipAttributor`	posterior probability per candidate	—
Bootstrap-consensus tree across MFW bands	10+ docs, multiple MFW bands	`BootstrapConsensus`	Newick tree with clade support	—

Forensic — one-case verification¶

I want to…	Required data	Method	Headline metric	Tutorial
Verify "same author?" between 1 questioned doc and 1 candidate	1 candidate's known writings + an impostor pool (~100 docs)	`GeneralImpostors`	calibrated log-LR + `C_llr`	PAN-CLEF
Verify same-author with topic-robustness	Q + K long prose + impostor pool	`Unmasking`	accuracy-drop curve	PAN-CLEF
Minimise topic bias in verification features	any corpus	`CategorizedCharNgramExtractor` with `categories=("prefix","suffix","punct")`, or `distort_corpus(mode="dv_ma")`	same as upstream verifier	PAN-CLEF
Turn raw verifier scores into evidential LR	verifier outputs on labelled dev trials	`CalibratedScorer` + `compute_pan_report`	log-LR, `C_llr`, `ECE`	PAN-CLEF
Generate a court-ready LR-framed report	`Result` with chain-of-custody fields	`build_forensic_report`	ENFSI verbal scale	—

How to read this page¶

"Required data" is a minimum — more is always better.
"Headline metric" is the output you should quote in write-ups, not the only output the method produces.
When two methods are listed for the same task, the first one is the recommended default and the second is a published alternative worth considering.

Next¶

Methods — full catalogue with gloss + detail per method.
Features — extractor catalogue with gloss + detail per extractor.
Forensic toolkit — calibration, evaluation, reporting.