Skip to content

geno_lewm.action.spec

spec

Canonical edit types — EditSpec, EditType, RelEdit.

Defined by RFC-0003 §3.1–§3.3 and docs/spec/03-data-model.md#editspec.

Every downstream subsystem (encoder, predictor, planner, scorer) consumes these types. They are intentionally tiny dataclasses with hard validation — no Optional fields, no string-typed kinds, no post-construction mutation.

Validation maps to the typed error hierarchy from RFC-0012:

  • malformed input (bad bases, ref == alt, pos < 1, etc.) → :class:geno_lewm.errors.InvalidEditError.
  • len(ref) > 16 or len(alt) > 16 (i.e. structural variant) → :class:geno_lewm.errors.UnsupportedEditError whose details payload includes edit_type=EditType.SV.
  • rel_pos outside the window during :meth:EditSpec.relative_to → :class:geno_lewm.errors.OutOfWindowError.

EditType

Bases: IntEnum

The six v1 edit categories (RFC-0003 §3.2).

Members are deterministic functions of (len(ref), len(alt)) — callers do not pass this value; it is computed during construction.

EditSpec dataclass

EditSpec(chrom: str, pos: int, ref: str, alt: str, edit_type: EditType = EditType.SNV)

A canonical, frozen genomic edit (RFC-0003 §3.1).

Construct with absolute VCF-style coordinates; the derived :attr:edit_type is filled in by __post_init__.

pos is 1-based per VCF convention; both ref and alt are explicit base strings (no <DEL> / <INS> symbolic alleles — they're deferred to v2).

relative_to

relative_to(window_start_bp: int, window_end_bp: int) -> RelEdit

Return the window-relative form (RFC-0003 §3.3).

window_start_bp and window_end_bp are 0-based inclusive coordinates on the same chromosome as :attr:chrom. The predictor sees only the relative offset; absolute coordinates never enter the model.

Source code in geno_lewm/action/spec.py
def relative_to(self, window_start_bp: int, window_end_bp: int) -> RelEdit:
    """Return the window-relative form (RFC-0003 §3.3).

    ``window_start_bp`` and ``window_end_bp`` are 0-based inclusive
    coordinates on the same chromosome as :attr:`chrom`. The
    predictor sees only the relative offset; absolute coordinates
    never enter the model.
    """
    if window_end_bp < window_start_bp:
        raise InvalidEditError(
            "window_end_bp must be >= window_start_bp",
            details={"start": window_start_bp, "end": window_end_bp},
        )
    rel_pos = self.pos - 1 - window_start_bp  # convert 1-based VCF → 0-based offset
    if rel_pos < 0 or rel_pos + len(self.ref) > (window_end_bp - window_start_bp + 1):
        raise OutOfWindowError(
            "edit falls outside the window",
            details={
                "pos": self.pos,
                "ref_len": len(self.ref),
                "window_start_bp": window_start_bp,
                "window_end_bp": window_end_bp,
                "rel_pos": rel_pos,
            },
            remediation="re-center the encoder window over the edit, or skip the edit",
        )
    return RelEdit(
        rel_pos=rel_pos,
        edit_type=self.edit_type,
        ref_bases=self.ref,
        alt_bases=self.alt,
    )

RelEdit dataclass

RelEdit(rel_pos: int, edit_type: EditType, ref_bases: str, alt_bases: str)

Window-relative form consumed by the action encoder.