Skip to content

geno_lewm.provenance.manifest

manifest

Manifest schema for GenoLeWM artifacts (RFC-0011).

A manifest is the trust anchor: every byte downstream of the model file is identified by content hash, and model_id = SHA-256( canonical_json(manifest)).

This module ships the strict, JSON-native dataclass mirror of the schema, write_manifest / load_manifest (round-trip stable), and the model_id derivation.

The schema is intentionally narrow — every field is required and the loader rejects unknown top-level keys. Adding a field is a MINOR schema bump (handled by raising :data:SCHEMA_VERSION to 1.1.0 and accepting both); removing or renaming is MAJOR.

ManifestArtifact dataclass

ManifestArtifact(file: str, hash: str, dtype: str | None = None, version: str | None = None)

A single artifact file referenced by the manifest.

ManifestEncoder dataclass

ManifestEncoder(id: str, revision: str, hash: str)

Carbon encoder identity.

ManifestTraining dataclass

ManifestTraining(config_file: str, hash: str, data_snapshot: dict[str, str] = dict())

Training config + data-snapshot identifiers.

Manifest dataclass

Manifest(schema_version: str, model_name: str, model_version: str, release_id: str, encoder: ManifestEncoder, predictor: ManifestArtifact, action_encoder: ManifestArtifact, calibration: ManifestArtifact, training: ManifestTraining, eval: ManifestArtifact)

Top-level manifest (RFC-0011 §3.7).

to_canonical_dict

to_canonical_dict() -> dict[str, Any]

Return the dict mirror used for canonical JSON.

Dataclass asdict walks nested frozen dataclasses and is deterministic, so the result is byte-stable when fed to the canonical JSON encoder (which also sorts keys).

Source code in geno_lewm/provenance/manifest.py
def to_canonical_dict(self) -> dict[str, Any]:
    """Return the dict mirror used for canonical JSON.

    Dataclass ``asdict`` walks nested frozen dataclasses and is
    deterministic, so the result is byte-stable when fed to the
    canonical JSON encoder (which also sorts keys).
    """
    return asdict(self)

model_id

model_id() -> str

Return model_id = SHA-256(canonical_json(manifest)).

Source code in geno_lewm/provenance/manifest.py
def model_id(self) -> str:
    """Return ``model_id = SHA-256(canonical_json(manifest))``."""
    return canonical_json_sha256(self.to_canonical_dict())

write_manifest

write_manifest(manifest: Manifest, path: str | Path) -> Path

Write a manifest to disk as canonical JSON.

The on-disk bytes are byte-stable across platforms.

Source code in geno_lewm/provenance/manifest.py
def write_manifest(manifest: Manifest, path: str | Path) -> Path:
    """Write a manifest to disk as canonical JSON.

    The on-disk bytes are byte-stable across platforms.
    """
    p = Path(path)
    p.parent.mkdir(parents=True, exist_ok=True)
    p.write_bytes(manifest.to_canonical_json())
    return p

load_manifest

load_manifest(path: str | Path) -> Manifest

Load and validate a manifest from disk.

Source code in geno_lewm/provenance/manifest.py
def load_manifest(path: str | Path) -> Manifest:
    """Load and validate a manifest from disk."""
    p = Path(path)
    raw = p.read_bytes()
    try:
        d = json.loads(raw.decode("utf-8"))
    except json.JSONDecodeError as exc:
        raise InputError(
            "manifest is not valid JSON",
            details={"path": str(p), "error": str(exc)},
        ) from exc
    return _from_dict(d)