GenoLeWM¶

Action-conditioned latent world models for genomic edits.

GenoLeWM treats a genomic edit as an action. A frozen DNA encoder maps a reference window to a latent state, and a trainable predictor estimates the post-edit latent state from that state plus the structured edit.

The package is alpha research software. It is installable, typed, tested, benchmarked, and published with public artifacts.

Checkpoint validity notice (2026-07-10): Every published v0.1 and v0.2.1 checkpoint uses legacy_raw_v1. encoder.normalize: true was not applied, so those runs trained and evaluated on raw pooled Carbon states. Training sources used global pooling while targets were edit-centered. Historical centered pools were shifted one hidden token left because the leading <dna> token was omitted from the token offset; rollout and candidate centers also differed, and cache v1 omitted the center identity. The pinned Carbon tokenizer.py also performed an unpinned, network-capable Qwen/Qwen3-4B-Base lookup, so the prior runtime was not self-contained. In the v0.2.1 Phase 2 run, KL was computed from frozen target states and supplied no gradient to the trainable parameters. Published metrics are historical legacy-implementation outputs. These defects invalidate the checkpoints and metrics as evidence for the intended normalized method. Corrected results require a new l2_normalized_v2 lineage. The local pure-DNA tokenizer and token-layout-aware centering repairs establish code contracts only, not model quality.

Start Here¶

Public Artifacts¶

Artifact	Link
PyPI package	https://pypi.org/project/geno-lewm/0.2.1/
Source/wheel release	https://github.com/AbdelStark/GenoLeWM/releases/tag/v0.2.1
Model package	https://huggingface.co/abdelstark/geno-lewm
Dataset package	https://huggingface.co/datasets/abdelstark/geno-lewm-data
Verified v0.3 variant-membership candidate	Exact Hub commit `96e97a7f…`
Verified v0.3 membership split evidence	Exact Hub commit `6d2ec7dd…`
Verified schema-1.1 v0.3 snapshot candidate	Exact Hub commit `712d612d…`
Historical v0.2.1 benchmark/planning/paper tree	https://huggingface.co/abdelstark/geno-lewm-runs/tree/main/geno-lewm-v021-strong-4f36eef-10k-r1
Historical v0.2.1 generated paper	https://huggingface.co/abdelstark/geno-lewm-runs/resolve/main/geno-lewm-v021-strong-4f36eef-10k-r1/paper/paper.serious-completion.md

Core Commands¶

python -m pip install geno-lewm
geno-lewm-verify examples/data/verify_receipt/receipt.json \
  --manifest examples/data/verify_receipt/manifest.json
geno-lewm-train --fixture-smoke --run-dir /tmp/geno-lewm-smoke --steps 50

Use geno-lewm[eval] for FASTA/VCF evaluation and geno-lewm[train] for Carbon-backed training paths.

Current Boundaries¶

No corrected normalized-method result is published. Existing checkpoints and metrics are legacy_raw_v1 historical outputs.
The v0.2.1 Phase 2 KL changed the reported scalar loss but had no gradient with respect to the optimized predictor and action encoder.
No clinical utility claim.
No broad model-quality claim.
No K=20 rollout-speed closure.
No useful-planning claim from the current planning demo.
A real checksum-closed v0.3 variant-membership candidate is published and independently verified at exact Hub commit 96e97a7ffe1e9ad8f9a98f690b220a32ac75ddc2. It is not a released v0.3 snapshot or phased-haplotype holdout. Its placed-window and held-role split evidence is separately published and independently verified at exact Hub commit 6d2ec7dd68af636ba8c594774c3c55a236c0995f. A schema-1.1.0 snapshot candidate assembled from those inputs is published and independently verified at exact Hub commit 712d612d85ea6341b8ce17bd3460ff5c2207b802. Its report explicitly keeps released_v03_snapshot=false and phased_haplotype_membership=false; it is not corrected model-quality, benchmark, representativeness, or clinical evidence.
No GenoLeWM-FX model or demo ships; the FX pivot is stopped at the feasibility gate.
The FX precomputed-Borzoi rescue is complete as a no-positive-claim result: the residual lift is small and non-significant, no model-quality claim is open, and exact fipip overlap is not claimed.
No runtime or privacy assurance beyond local execution contracts and checksum provenance.