geno_lewm.provenance.hashing¶
hashing
¶
Content-addressing primitives for GenoLeWM artifacts (RFC-0011 §3.1).
Two functions:
- :func:
canonical_json_sha256— SHA-256 of the canonical JSON serialization of a value. Canonical JSON per RFC-0011 §3.7: keys sorted lexicographically, no whitespace, UTF-8, NaN / Infinity rejected. Byte-stable across platforms and Python releases. - :func:
sha256_file/ :func:sha256_bytes— stream-friendly file / in-memory hashing used to compute the per-artifacthashfields in :class:Manifest.
All outputs are returned as "sha256:<hex>" strings to match the
on-disk manifest convention.
canonical_json_bytes
¶
Return the canonical-JSON byte string of value.
Canonical form (RFC-0011 §3.7, similar to RFC 8785):
- Keys are sorted lexicographically at every level.
- No whitespace (compact
separators=(",", ":")). - UTF-8 encoded.
- NaN / Infinity rejected.
bytesrejected (must be hashed separately and embedded by reference).
Source code in geno_lewm/provenance/hashing.py
canonical_json_sha256
¶
sha256_bytes
¶
sha256_file
¶
Return "sha256:<hex>" for the file at path.
Streams the file in 1 MiB chunks; safe for arbitrarily large artifacts (weights files can be multi-GB).
Source code in geno_lewm/provenance/hashing.py
looks_like_sha256
¶
Return True iff s matches the "sha256:<64hex>" shape.