Results & provenance¶
Every method returns a Result — the shared return type across tamga.
The Result¶
@dataclass
class Result:
method_name: str
params: dict[str, Any]
values: dict[str, Any] # JSON-safe (ndarray encoded as {"__ndarray__": ...})
tables: list[pd.DataFrame] # exported as parquet
figures: list[Any] # matplotlib figures or raw bytes
provenance: Provenance | None
Persistence¶
result.save(directory) writes:
result.json—method_name,params,values(with numpy encoded),provenancetable_0.parquet,table_1.parquet, … — one per DataFrame intables- Figures are deferred to the viz layer (
render_figures.pyper example)
Round-trip with Result.from_json("results/demo/pca/result.json").
Provenance¶
Every Result's .provenance carries the full reproducibility envelope:
@dataclass
class Provenance:
tamga_version: str
python_version: str
spacy_model: str
spacy_version: str
corpus_hash: str
feature_hash: str | None
seed: int
timestamp: datetime
resolved_config: dict[str, Any]
# Forensic (all optional):
questioned_description: str | None
known_description: str | None
hypothesis_pair: str | None
acquisition_notes: str | None
custody_notes: str | None
source_hashes: dict[str, str]
Provenance.current(...) builds a record from the runtime + your inputs.
Provenance.from_dict(...) round-trips from a saved result.json.
Reproducibility contract¶
Two runs of the same study.yaml against the same corpus with the same seed produce
byte-identical result.json. The runner threads cfg.seed through:
- numpy's default RNG in any sampling method
- scikit-learn's
random_stateon every stochastic estimator (k-means, LogReg cross-validation, RandomForest, …) - PyMC's
random_seedon every Bayesianpm.sample()call - The Stratified K-Fold shuffle
Non-determinism would be a bug — please report.
Loading multi-method runs¶
tamga run study.yaml produces a directory structure:
results/demo/
├── resolved_config.json
├── burrows/
│ └── result.json
├── pca/
│ ├── result.json
│ └── figure.png # if rendered
└── zeta/
├── result.json
├── table_0.parquet
└── table_1.parquet
build_report(results/demo, output="report.html") loads every result.json under the
directory and renders a single HTML report.
Next¶
- Forensic toolkit — where Provenance's forensic fields come into play.