Skip to content

Choosing a method

Not sure which tamga method fits your question? This page answers "I want to do X — what should I reach for?" for the most common cases. The method names link to their primary entries in Methods and Forensic toolkit for the full detail.

Attribution, comparison, exploration

I want to… Required data Method Headline metric Tutorial
Attribute 1 questioned doc to N candidate authors N authors × ~2k+ words known each; 1 questioned doc CosineDelta (robust default) or BurrowsDelta (classic) nearest-author rank Federalist
Cluster an unknown corpus by stylistic similarity 20+ docs, labels optional PCAReducer + KMeansCluster or HDBSCANCluster silhouette, visual inspection
Compare two pre-defined author groups 10+ docs per group ZetaClassic or ZetaEder per-word distinctiveness score
Classify docs into groups with ML 20+ docs per class build_classifier + cross_validate_tamga CV accuracy / F1
Reduce features for visualisation any FeatureMatrix PCAReducer / UMAPReducer / TSNEReducer / MDSReducer visual inspection
Bayesian single-candidate attribution N candidates × ≥1k words; 1 questioned doc BayesianAuthorshipAttributor posterior probability per candidate
Bootstrap-consensus tree across MFW bands 10+ docs, multiple MFW bands BootstrapConsensus Newick tree with clade support

Forensic — one-case verification

I want to… Required data Method Headline metric Tutorial
Verify "same author?" between 1 questioned doc and 1 candidate 1 candidate's known writings + an impostor pool (~100 docs) GeneralImpostors calibrated log-LR + C_llr PAN-CLEF
Verify same-author with topic-robustness Q + K long prose + impostor pool Unmasking accuracy-drop curve PAN-CLEF
Minimise topic bias in verification features any corpus CategorizedCharNgramExtractor with categories=("prefix","suffix","punct"), or distort_corpus(mode="dv_ma") same as upstream verifier PAN-CLEF
Turn raw verifier scores into evidential LR verifier outputs on labelled dev trials CalibratedScorer + compute_pan_report log-LR, C_llr, ECE PAN-CLEF
Generate a court-ready LR-framed report Result with chain-of-custody fields build_forensic_report ENFSI verbal scale

How to read this page

  • "Required data" is a minimum — more is always better.
  • "Headline metric" is the output you should quote in write-ups, not the only output the method produces.
  • When two methods are listed for the same task, the first one is the recommended default and the second is a published alternative worth considering.

Next

  • Methods — full catalogue with gloss + detail per method.
  • Features — extractor catalogue with gloss + detail per extractor.
  • Forensic toolkit — calibration, evaluation, reporting.