geno_lewm.encoder.cache¶
cache
¶
Parquet window-embedding cache and SQLite index.
Implements the on-disk cache contract from RFC-0002 §3.6 and
docs/spec/03-data-model.md#on-disk-window-embedding-cache.
Parquet support is intentionally imported lazily so the base package
keeps its minimal dependency surface; install geno-lewm[train] or
the development extra to use this module.
WindowCacheKey
dataclass
¶
WindowCacheKey(window_hash: bytes, encoder_hash: bytes, state_layer: int, pool_type: str, pool_radius: int, dtype: str)
Content-addressed key for a cached embedding row.
WindowCacheRecord
dataclass
¶
WindowCacheRecord(chrom: str, start_bp: int, end_bp: int, window_hash: bytes, encoder_hash: bytes, state_layer: int, pool_type: str, pool_radius: int, dtype: str, embedding: tuple[float, ...], untargeted: bool, created_at: int = 0, schema_version: str = CACHE_SCHEMA_VERSION)
One row in the window-embedding cache schema.
with_created_at
¶
Fill created_at with current UTC nanoseconds when absent.
Source code in geno_lewm/encoder/cache.py
CacheReindexReport
dataclass
¶
Summary of a SQLite index rebuild.
CacheRepairReport
dataclass
¶
Summary of a repair pass over Parquet shards.
default_cache_dir
¶
shard_path_for
¶
shard_path_for(cache_dir: Path | str, *, encoder_id: str, state_layer: int, pool_type: str, pool_radius: int, contig: str, stride_block: int) -> Path
Return the canonical Parquet shard path for a cache block.
Source code in geno_lewm/encoder/cache.py
write_shard
¶
write_shard(cache_dir: Path | str, *, encoder_id: str, contig: str, stride_block: int, records: Sequence[WindowCacheRecord]) -> Path
Write one immutable Parquet shard and index its rows.
If the shard already exists with the same rows, this is a no-op. If it exists and new or conflicting rows are supplied, the function raises instead of rewriting in place (INV-DATA-3 / INV-DATA-10).
Source code in geno_lewm/encoder/cache.py
read_embedding
¶
Return an embedding by content key, or None on cache miss.
Source code in geno_lewm/encoder/cache.py
reindex_cache
¶
Rebuild index.sqlite from every readable Parquet shard.
Source code in geno_lewm/encoder/cache.py
repair_cache
¶
Quarantine unreadable Parquet shards and rebuild the SQLite index.