geno_lewm.surprise.score¶
score
¶
Surprise scoring for action-conditioned DNA world-model predictions.
Aggregation
module-attribute
¶
Supported aggregation modes for multi-step predictor outputs.
SurpriseResult
dataclass
¶
SurpriseResult(sigma_raw: float, sigma_calibrated: float, bucket_id: str, confidence: float, low_confidence: bool)
Calibrated surprise score for one edit.
to_dict
¶
Return a JSON-native payload for CLI and JSONL outputs.
Source code in geno_lewm/surprise/score.py
score_variant
¶
score_variant(variant: EditSpec, encoder: object, action_encoder: object, predictor: object, calibration: CalibrationTable, *, reference_window: str, window_start_bp: int = 0, region: str | Sequence[str] | None = None, repeat: str | Sequence[str] | None = None, aggregation: str = 'mean', min_bucket_size: int = DEFAULT_MIN_BUCKET_SIZE) -> SurpriseResult
Score one edit against a caller-supplied reference window.
The scorer is intentionally model-object agnostic: callers can pass
the concrete training-time modules or small deterministic fakes.
FASTA-backed window extraction is available through :func:score_vcf;
checkpoint loading is owned by higher runtime layers.
Source code in geno_lewm/surprise/score.py
raw_surprise_example
¶
raw_surprise_example(variant: EditSpec, encoder: object, action_encoder: object, predictor: object, *, reference_window: str, window_start_bp: int = 0, region: str | Sequence[str] | None = None, repeat: str | Sequence[str] | None = None, aggregation: str = 'mean') -> CalibrationExample
Score one reference variant into a :class:CalibrationExample.
Runs the model (no calibration table needed) and returns the variant's
functional bucket_id paired with its raw surprise, ready to feed
:func:geno_lewm.surprise.calibration.build_calibration_table.
Source code in geno_lewm/surprise/score.py
score_vcf
¶
score_vcf(vcf_path: str | Path, encoder: object, action_encoder: object, predictor: object, calibration: CalibrationTable, output_path: str | Path, *, reference_windows: Mapping[str, str] | None = None, reference_fasta: str | Path | None = None, window_bp: int = DEFAULT_WINDOW_BP, window_start_bp: int = 0, region: str | Sequence[str] | None = None, repeat: str | Sequence[str] | None = None, aggregation: str = 'mean', show_progress: bool = True, batch_size: int = 64, min_bucket_size: int = DEFAULT_MIN_BUCKET_SIZE) -> Path
Score VCF rows and write one JSON object per scored alternate.
Pass reference_fasta for local FASTA-backed window extraction.
reference_windows remains useful for tests and already-extracted
windows. Mapping keys are tried in this order:
chrom:pos:ref:alt, chrom:pos, then chrom.
Source code in geno_lewm/surprise/score.py
build_calibration_examples_from_vcf
¶
build_calibration_examples_from_vcf(vcf_path: str | Path, encoder: object, action_encoder: object, predictor: object, *, reference_windows: Mapping[str, str] | None = None, reference_fasta: str | Path | None = None, window_bp: int = DEFAULT_WINDOW_BP, window_start_bp: int = 0, region: str | Sequence[str] | None = None, repeat: str | Sequence[str] | None = None, aggregation: str = 'mean') -> list[CalibrationExample]
Score a background VCF into :class:CalibrationExample rows.
Runs the model over every scoreable VCF alternate (FASTA-backed window
extraction) and records (bucket_id, sigma_raw) per variant. No
calibration table is required, so the result can seed
:func:geno_lewm.surprise.calibration.build_calibration_table.