geno_lewm¶
geno_lewm
¶
GenoLeWM — action-conditioned JEPA world model for DNA.
GenoLeWM treats genetic edits as first-class actions and learns latent
transitions on top of a frozen DNA foundation model. The package today
ships the production infrastructure layer (typed errors, structured
observability with privacy redaction, content-addressed provenance
receipts, canonical edit specs, the verify CLI). The training, predictor, and
deployment surfaces land incrementally — see ROADMAP.md and the
rfcs/ corpus.
The public Python surface is enumerated below in __all__. Every
symbol's stability is governed by RFC-0014 (api stability) and the
tests/api/public_surface.json snapshot is the binding contract.
ActionEncoder
¶
ActionEncoder(*, d_action: int = 512, d_pos: int = 128, d_type: int = 64, d_seq: int = 256, max_window_bp: int = 12288, carbon_tokenizer: Any | None = None)
Bases: Module
Encode :class:RelEdit objects into learned action embeddings.
Source code in geno_lewm/action/encoder.py
EditSpec
dataclass
¶
A canonical, frozen genomic edit (RFC-0003 §3.1).
Construct with absolute VCF-style coordinates; the derived
:attr:edit_type is filled in by __post_init__.
pos is 1-based per VCF convention; both ref and alt are
explicit base strings (no <DEL> / <INS> symbolic alleles —
they're deferred to v2).
relative_to
¶
Return the window-relative form (RFC-0003 §3.3).
window_start_bp and window_end_bp are 0-based inclusive
coordinates on the same chromosome as :attr:chrom. The
predictor sees only the relative offset; absolute coordinates
never enter the model.
Source code in geno_lewm/action/spec.py
EditType
¶
Bases: IntEnum
The six v1 edit categories (RFC-0003 §3.2).
Members are deterministic functions of (len(ref), len(alt)) —
callers do not pass this value; it is computed during construction.
RelEdit
dataclass
¶
Window-relative form consumed by the action encoder.
BackendProbe
dataclass
¶
Capability probe result for one runtime backend.
GenoLeWMRuntime
¶
GenoLeWMRuntime(model_dir: str | Path, backend: str = BACKEND_AUTO, *, encoder: object | None = None, action_encoder: object | None = None, predictor: object | None = None, calibration: CalibrationTable | None = None)
Top-level runtime facade for on-device inference workflows.
Source code in geno_lewm/deploy/runtime.py
score_variant
¶
score_variant(variant: EditSpec, window: str | None = None, *, receipt_path: str | Path | None = None) -> Any
Score a single variant through local scorer components when available.
Source code in geno_lewm/deploy/runtime.py
score_vcf
¶
score_vcf(vcf_path: str | Path, fasta_path: str | Path, output_path: str | Path, batch_size: int = 64, progress: bool = True, *, receipt_path: str | Path | None = None) -> None
Score a VCF through local scorer components when available.
When receipt_path is provided, the runtime writes JSONL with
one canonical v1 receipt per scored alternate. The v1 schema
commits a single output, so this is intentionally not a batch
aggregate receipt.
Source code in geno_lewm/deploy/runtime.py
encode_window
¶
Encode a DNA window once the encoder backend is installed.
Source code in geno_lewm/deploy/runtime.py
predict
¶
Run the predictor once a predictor backend is installed.
Source code in geno_lewm/deploy/runtime.py
CarbonStateEncoder
¶
CarbonStateEncoder(model_id: str, revision: str, *, dtype: str = 'bf16', state_layer: int = -1, pool_type: str = POOL_CENTERED_MEAN, pool_radius: int = DEFAULT_POOL_RADIUS_TOKENS, normalize: bool = True, lora_config: object | None = None, model: object | None = None, tokenizer: object | None = None, encoder_hash: bytes | str | None = None, local_files_only: bool = True, trust_remote_code: bool = False, device: str | None = None)
Encode DNA windows with Carbon hidden states plus deterministic pooling.
Source code in geno_lewm/encoder/carbon.py
encode
¶
encode_batch
¶
encode_batch(windows: Sequence[str], edit_loci: Sequence[int | None]) -> tuple[tuple[float, ...], ...]
Encode and pool a batch of DNA windows.
Source code in geno_lewm/encoder/carbon.py
BackendUnsupportedError
¶
BackendUnsupportedError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: DeployError
Requested runtime backend is not available on the host.
Source code in geno_lewm/errors.py
CacheCorruptError
¶
CacheCorruptError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
CollapseDetectedError
¶
CollapseDetectedError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: TrainingError
A representation-collapse alert tripped (RFC-0005).
Source code in geno_lewm/errors.py
ConfigError
¶
ConfigError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
DataLoaderError
¶
DataLoaderError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: TrainingError
Data pipeline raised an exception the trainer cannot recover from.
Source code in geno_lewm/errors.py
DeployError
¶
DeployError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
DiskFullError
¶
DiskFullError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
ErrorCodeEntry
¶
EvalDatasetError
¶
EvalDatasetError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: EvalError
A benchmark file or dataset could not be loaded.
Source code in geno_lewm/errors.py
EvalError
¶
EvalError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
EvalRegressionError
¶
EvalRegressionError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
ExportFormatError
¶
ExportFormatError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
GenoLeWMError
¶
GenoLeWMError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: Exception
Root of the GenoLeWM exception hierarchy.
Every typed exception in this package inherits from GenoLeWMError.
Subclasses must set a code class attribute that matches an entry
in :data:ERROR_CODES.
Source code in geno_lewm/errors.py
to_dict
¶
Return the structured payload as a plain dict.
The output is JSON-serializable provided details values are
JSON-native. Keys mirror the structured-log error event so
the CLI dispatcher and observability layer can pass it through
without translation.
Source code in geno_lewm/errors.py
to_json
¶
Return the structured payload as a JSON string.
Adds a UTC ts field so log sinks receive a self-contained
record.
Source code in geno_lewm/errors.py
InputCommitmentMismatchError
¶
InputCommitmentMismatchError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: ProvenanceError
Recomputed input commitment does not match the receipt.
Source code in geno_lewm/errors.py
InputError
¶
InputError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: GenoLeWMError
Caller-supplied input violates a documented invariant.
Source code in geno_lewm/errors.py
InternalError
¶
InternalError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: GenoLeWMError
A bug we caught; should never surface to end users.
Source code in geno_lewm/errors.py
InvalidEditError
¶
InvalidEditError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: InputError
An EditSpec fails one of its constructor invariants.
Source code in geno_lewm/errors.py
InvariantViolation
¶
InvariantViolation(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: InternalError
A runtime invariant marked INV-* was breached.
Source code in geno_lewm/errors.py
ManifestHashMismatchError
¶
ManifestHashMismatchError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: ProvenanceError
Recomputed manifest hash does not match the stated model_id.
Source code in geno_lewm/errors.py
MissingConfigError
¶
MissingConfigError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
ModelNotFoundError
¶
ModelNotFoundError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: ResourceError
A model checkpoint is missing locally and cannot be downloaded.
Source code in geno_lewm/errors.py
NaNLossError
¶
NaNLossError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
NetworkCallProhibitedError
¶
NetworkCallProhibitedError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: ResourceError
A post-setup network call was attempted under fail-closed policy.
See RFC-0010 §3.7 ("on-device fail-closed").
Source code in geno_lewm/errors.py
OutOfMemoryError
¶
OutOfMemoryError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: ResourceError
Re-raise of CUDA / host OOM with attached GenoLeWM context.
Intentionally shadows the builtin OutOfMemoryError only inside
this module's namespace; downstream code that needs the builtin can
use builtins.OutOfMemoryError.
Source code in geno_lewm/errors.py
OutOfWindowError
¶
OutOfWindowError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: InputError
An edit's rel_pos falls outside the encoder window.
Source code in geno_lewm/errors.py
OutputCommitmentMismatchError
¶
OutputCommitmentMismatchError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: ProvenanceError
Recomputed output bytes do not match the receipt.
Source code in geno_lewm/errors.py
OverlappingEditsError
¶
OverlappingEditsError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: InputError
Two or more edits in a haplotype overlap in genomic coordinates.
Source code in geno_lewm/errors.py
ProvenanceError
¶
ProvenanceError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: GenoLeWMError
Receipt or artifact-provenance failure (RFC-0011).
Source code in geno_lewm/errors.py
ProvenanceKindUnsupportedError
¶
ProvenanceKindUnsupportedError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: ProvenanceError
Verifier does not understand the receipt provenance kind.
Source code in geno_lewm/errors.py
QuantizationError
¶
QuantizationError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
ReceiptSchemaError
¶
ReceiptSchemaError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
ResourceError
¶
ResourceError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
RuntimeSetupError
¶
RuntimeSetupError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: ResourceError
First-run network setup step (download / verification) failed.
Source code in geno_lewm/errors.py
SchemaCompatError
¶
SchemaCompatError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: ConfigError
An on-disk artifact's schema MAJOR version is incompatible.
Source code in geno_lewm/errors.py
TrainingError
¶
TrainingError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
UnknownTopLevelKeyError
¶
UnknownTopLevelKeyError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: ConfigError
A configuration payload contained a top-level key not in the schema.
Source code in geno_lewm/errors.py
UnreachableError
¶
UnreachableError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: InternalError
Control flow reached a branch marked unreachable.
Source code in geno_lewm/errors.py
UnsupportedEditError
¶
UnsupportedEditError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: InputError
An edit's type or length is outside the v1 scope (RFC-0003).
Source code in geno_lewm/errors.py
VcfParseError
¶
VcfParseError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
WindowMismatchError
¶
WindowMismatchError(message: str = '', *, details: Mapping[str, Any] | None = None, remediation: str | None = None)
Bases: InputError
Window reference bases do not match EditSpec.ref at rel_pos.
Source code in geno_lewm/errors.py
EventSpec
dataclass
¶
A single row in the :data:EVENTS registry.
allowed_keys lists the data keys that the per-event redaction
allowlist permits (RFC-0013 §3.5). Standardized fields (step,
epoch, phase, duration_ms, trace_id, span_id,
error_code) are promoted out of data before redaction and
are always allowed at the top level — they need not appear here.
GenoLeWMLogger
¶
GenoLeWMLogger(component: str, *, run_id: str, log_dir: Path, sink: _Sink, level: Severity = 'info', pretty: bool = False)
Component-scoped structured logger.
Loggers are cheap to construct (cached by (component, run_id))
and thread-safe: the underlying sink serializes writes.
Source code in geno_lewm/observability.py
LogRecord
dataclass
¶
LogRecord(ts: str, severity: Severity, event: str, run_id: str, component: str, data: dict[str, Any] = dict(), step: int | None = None, epoch: int | None = None, phase: str | None = None, duration_ms: int | None = None, trace_id: str | None = None, span_id: str | None = None, error_code: str | None = None)
One row written by the logger.
The record carries the spec-required fields directly and stashes
event-specific structured fields under :attr:data. to_dict
returns the exact wire shape — keys are stable across versions.
DtypeConfig
dataclass
¶
Numerical-precision commit shape.
Manifest
dataclass
¶
Manifest(schema_version: str, model_name: str, model_version: str, release_id: str, encoder: ManifestEncoder, predictor: ManifestArtifact, action_encoder: ManifestArtifact, calibration: ManifestArtifact, training: ManifestTraining, eval: ManifestArtifact)
Top-level manifest (RFC-0011 §3.7).
to_canonical_dict
¶
Return the dict mirror used for canonical JSON.
Dataclass asdict walks nested frozen dataclasses and is
deterministic, so the result is byte-stable when fed to the
canonical JSON encoder (which also sorts keys).
Source code in geno_lewm/provenance/manifest.py
ManifestArtifact
dataclass
¶
A single artifact file referenced by the manifest.
ManifestEncoder
dataclass
¶
Carbon encoder identity.
ManifestTraining
dataclass
¶
Training config + data-snapshot identifiers.
PoolingConfig
dataclass
¶
State-encoder pooling configuration commit shape (RFC-0002).
Receipt
dataclass
¶
Receipt(schema_version: str, model_id: str, input_commitment: str, output: ReceiptOutput, output_commitment: str, calibration_hash: str, runtime: ReceiptRuntime, timestamp: str, provenance: ReceiptProvenance)
Top-level receipt (RFC-0011 §3.3).
ReceiptOutput
dataclass
¶
ReceiptOutput(sigma_raw: float, sigma_calibrated: float, bucket_id: str, confidence: float, low_confidence: bool)
Score-call output committed by the receipt.
ReceiptProvenance
dataclass
¶
Checksum provenance block serialized as provenance in v1 JSON.
ReceiptRuntime
dataclass
¶
Runtime / environment block.
SurpriseResult
dataclass
¶
SurpriseResult(sigma_raw: float, sigma_calibrated: float, bucket_id: str, confidence: float, low_confidence: bool)
Calibrated surprise score for one edit.
to_dict
¶
Return a JSON-native payload for CLI and JSONL outputs.
Source code in geno_lewm/surprise/score.py
apply_edit
¶
Return window with edit applied.
window is the pre-edit base string (uppercase ACGTN). The
function does not validate window contents beyond what the edit
locus requires; that is the caller's responsibility.
The reference bases at the edit locus must match edit.ref_bases
case-insensitively — otherwise :class:WindowMismatchError is
raised with the locus context attached.
Pass preserve_length=True to truncate / pad the result back to
the original window length on the side opposite the edit. The
default leaves the indel length change intact (length-preserving
is the trainer's responsibility for s_{t+1} encoding).
Source code in geno_lewm/action/apply.py
apply_edits
¶
Apply a sequence of edits to window.
The edits are sorted by descending rel_pos and applied in that
order (INV-ARCH-4). Edits must not overlap in genomic coordinates;
overlap raises :class:OverlappingEditsError.
Equivalent inputs (same set of edits in any caller-supplied order) produce equivalent outputs — the function is order-invariant after the internal sort, which is the property the training pipeline relies on.
The preserve_length flag truncates / pads back to the input
window length using the position of the first (left-most)
edit as the reference locus, so the side opposite the edit cluster
is the one trimmed.
Source code in geno_lewm/action/apply.py
indel
¶
indel(window: str, n: int, *, rng: Random, length_dist: Mapping[int, float] | Sequence[float] | None = None, type_mix: tuple[float, float] = (0.5, 0.5), edge_margin: int = DEFAULT_EDGE_MARGIN) -> list[RelEdit]
Sample n indels (INS or DEL).
length_dist is the event length (number of bases inserted or
deleted, exclusive of the VCF anchor base). Default is a truncated
geometric over [1, V1_MAX_LEN-1].
type_mix is (p_ins, p_del). Default 50/50.
Source code in geno_lewm/action/synthetic.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 | |
mnv
¶
mnv(window: str, n: int, *, rng: Random, length_dist: Mapping[int, float] | Sequence[float] | None = None, edge_margin: int = DEFAULT_EDGE_MARGIN) -> list[RelEdit]
Sample n MNVs (length-preserving multi-base substitutions).
Length is drawn from length_dist (default uniform over [2, 8]
per RFC text). The alt is guaranteed different from ref at every
base (otherwise constructing a RelEdit with that ref/alt would be
rejected by EditSpec validation).
Source code in geno_lewm/action/synthetic.py
uniform_snv
¶
uniform_snv(window: str, n: int, *, rng: Random, edge_margin: int = DEFAULT_EDGE_MARGIN) -> list[RelEdit]
Sample n uniform SNVs anchored inside window.
Each SNV's alt is uniformly drawn from the three non-reference
bases at the chosen position, so the contract "alt is always
non-reference" is enforced by construction.
Returns edits in the order they were sampled. The list may contain duplicates by position — the caller (data pipeline) is responsible for deduplication if it needs disjoint edits.
Source code in geno_lewm/action/synthetic.py
deprecated
¶
Mark obj as deprecated.
Usage::
@deprecated("use new_thing() instead; removed in v0.3")
def old_thing(...): ...
Emits :class:DeprecationWarning once per process per call
site — that is, once per (filename, lineno) of the calling
code. Multiple call sites all warn; the same site warns only once.
reason is appended to the warning message; pass a short
actionable string ("use X instead").
Source code in geno_lewm/api.py
experimental
¶
Mark obj (a function or class) as experimental.
Usage::
@experimental
def f(...): ...
@experimental(reason="API shape under review")
class C: ...
Emits :class:FutureWarning once per process when the decorated
object is first invoked (function) or instantiated (class). Later
calls are silent.
Source code in geno_lewm/api.py
fail_closed_network_guard
¶
Block common network entry points inside an inference path.
Source code in geno_lewm/deploy/runtime.py
probe_backends
¶
Probe runtime backends in RFC-0010 auto-selection order.
Source code in geno_lewm/deploy/runtime.py
select_backend
¶
Select a backend from probe results, or raise if the requested one is unavailable.
Source code in geno_lewm/deploy/runtime.py
exit_code_for
¶
Return the CLI exit code for exc.
Used by geno_lewm.cli._dispatch. Non-GenoLeWM exceptions map to
exit code 1 ("uncategorized failure"); KeyboardInterrupt maps to
130.
Source code in geno_lewm/errors.py
current_trace_context
¶
get_logger
¶
get_logger(component: str, *, run_id: str | None = None, log_dir: str | PathLike[str] | None = None, level: Severity | None = None, pretty: bool | None = None) -> GenoLeWMLogger
Return a logger bound to component.
Loggers are cached by (component, run_id, log_dir); calling
get_logger twice with the same arguments returns the same
instance, so independent subsystems share one ordered stream per
run.
Defaults:
run_id:$GENO_LEWM_RUN_IDor a randomrun-<hex>.log_dir:$GENO_LEWM_LOG_DIRor~/.geno-lewm/logs.level:$GENO_LEWM_LOG_LEVEL(defaultinfo).pretty: TTY-detected, overridable by$GENO_LEWM_LOG_FORMAT.
Source code in geno_lewm/observability.py
logged_run
¶
logged_run(component: str = 'runtime', *, run_id: str | None = None, log_dir: str | PathLike[str] | None = None, start_event: str | None = None, end_event: str | None = None, start_data: Mapping[str, Any] | None = None) -> Iterator[GenoLeWMLogger]
Open a sink for the run; flush on exit; never swallow exceptions.
The wrapper guarantees that any records emitted up to a crash are
flushed to disk (INV-OBS-6: "a crash before logger init still
produces a sanitized minimal record"). Optional start_event /
end_event book-end the run. If the block raises and the
exception is a geno_lewm.errors.GenoLeWMError, an error
record is emitted before the exception propagates.
Source code in geno_lewm/observability.py
set_trace_context
¶
Push (trace_id, span_id) into the contextvar for the block.
Source code in geno_lewm/observability.py
canonical_json_sha256
¶
compute_input_commitment
¶
compute_input_commitment(reference_window: str, edit_spec: EditSpec, pooling_config: PoolingConfig, dtype_config: DtypeConfig) -> str
Return the "sha256:<hex>" input commitment for a scoring call.
The canonical payload is a dict with fixed keys; canonical-JSON encoding handles ordering and stability.
Source code in geno_lewm/provenance/commitment.py
compute_output_commitment
¶
Compute the output-commitment hash for an output block.
Separated from Receipt so callers can pre-compute the
commitment before assembling the receipt.
Source code in geno_lewm/provenance/receipt.py
load_manifest
¶
Load and validate a manifest from disk.
Source code in geno_lewm/provenance/manifest.py
read_receipt
¶
Load and validate a receipt from disk.
Source code in geno_lewm/provenance/receipt.py
sha256_bytes
¶
sha256_file
¶
Return "sha256:<hex>" for the file at path.
Streams the file in 1 MiB chunks; safe for arbitrarily large artifacts (weights files can be multi-GB).
Source code in geno_lewm/provenance/hashing.py
write_manifest
¶
Write a manifest to disk as canonical JSON.
The on-disk bytes are byte-stable across platforms.
Source code in geno_lewm/provenance/manifest.py
write_receipt
¶
Write a receipt as canonical JSON; round-trip byte-stable.
Source code in geno_lewm/provenance/receipt.py
score_variant
¶
score_variant(variant: EditSpec, encoder: object, action_encoder: object, predictor: object, calibration: CalibrationTable, *, reference_window: str, window_start_bp: int = 0, region: str | Sequence[str] | None = None, repeat: str | Sequence[str] | None = None, aggregation: str = 'mean', min_bucket_size: int = DEFAULT_MIN_BUCKET_SIZE) -> SurpriseResult
Score one edit against a caller-supplied reference window.
The scorer is intentionally model-object agnostic: callers can pass
the concrete training-time modules or small deterministic fakes.
FASTA-backed window extraction is available through :func:score_vcf;
checkpoint loading is owned by higher runtime layers.
Source code in geno_lewm/surprise/score.py
score_vcf
¶
score_vcf(vcf_path: str | Path, encoder: object, action_encoder: object, predictor: object, calibration: CalibrationTable, output_path: str | Path, *, reference_windows: Mapping[str, str] | None = None, reference_fasta: str | Path | None = None, window_bp: int = DEFAULT_WINDOW_BP, window_start_bp: int = 0, region: str | Sequence[str] | None = None, repeat: str | Sequence[str] | None = None, aggregation: str = 'mean', show_progress: bool = True, batch_size: int = 64, min_bucket_size: int = DEFAULT_MIN_BUCKET_SIZE) -> Path
Score VCF rows and write one JSON object per scored alternate.
Pass reference_fasta for local FASTA-backed window extraction.
reference_windows remains useful for tests and already-extracted
windows. Mapping keys are tried in this order:
chrom:pos:ref:alt, chrom:pos, then chrom.