01 — Architecture¶
- Status: Authoritative for v0.1
- Companion RFCs: RFC-0002, RFC-0003, RFC-0004, RFC-0008, RFC-0009, RFC-0010
- Narrative companion:
ARCHITECTURE.md
Module boundaries¶
The repository is organized so that every RFC has exactly one home module and every module has exactly one defining RFC. This map lists the implemented package layout; planned contracts that do not yet have code are tracked in the roadmap rather than shown as present modules.
geno_lewm/
├── encoder/ # RFC-0002 — Carbon state encoder + window cache
│ ├── carbon.py
│ ├── windowing.py
│ ├── pooling.py
│ └── cache.py
├── action/ # RFC-0003 — EditSpec + action encoder
│ ├── encoder.py
│ ├── spec.py
│ ├── apply.py
│ └── synthetic.py
├── predictor/ # RFC-0004 — cross-attention predictor + losses
│ ├── model.py
│ ├── ar.py
│ └── losses.py
├── data/ # RFC-0006 — corpus, loaders, holdouts, tuple builder
│ ├── corpus.py
│ ├── gnomad.py
│ ├── clinvar.py
│ ├── builder.py
│ └── _vcf.py
├── evaluation.py # RFC-0007 — artifact metrics and eval report payloads
├── carbon_zero_shot.py # RFC-0007 — Carbon baseline score artifacts
├── planning/ # RFC-0008 — cost functions and action sampling
│ ├── costs.py
│ └── sampling.py
├── surprise/ # RFC-0009 — raw + calibrated surprise
│ ├── context.py
│ ├── score.py
│ └── calibration.py
├── deploy/ # RFC-0010 + RFC-0019 — runtime and import helpers
│ ├── runtime.py
│ └── import_/
├── provenance/ # RFC-0011 — preferred manifests, receipts, hashes API
│ ├── commitment.py
│ ├── hashing.py
│ ├── manifest.py
│ └── receipt.py
├── cli/ # RFC-0018 — command-line surface
│ ├── train.py
│ ├── score.py
│ ├── rollout.py
│ ├── plan.py
│ ├── eval.py
│ ├── eval_all.py
│ ├── carbon_baseline.py
│ ├── export.py
│ ├── prepare_gnomad.py
│ ├── prepare_clinvar.py
│ ├── update.py
│ └── verify.py
├── config/ # RFC-0017 — Hydra config schema
│ ├── loader.py
│ ├── schema.py
│ └── defaults/
├── training/ # RFC-0005/RFC-0006 — training and preflight helpers
│ ├── fixture.py
│ ├── preflight.py
│ ├── real.py
│ ├── trainer.py
│ ├── collapse.py
│ └── sampling.py
├── errors.py # RFC-0012 — exception hierarchy (single file by design)
├── observability.py # RFC-0013 — logger factory, metric names, redaction
├── metrics.py # RFC-0013 — metrics registry and export
└── __init__.py
Tests mirror the layout under tests/, with tests/unit/ for unit tests,
tests/integration/ for cross-module tests, tests/property/ for
property-based tests, and tests/ml/ for ML-specific tests (collapse
checks, rollout fidelity smoke tests). See
07-testing-strategy.md.
Runtime data flow¶
Training¶
corpus.py + synthetic samplers ──► builder.py
│
│ (w_ref, a, w_alt)
▼
encoder/carbon.py (frozen)
│
│ s_t, s_{t+1}
▼
action/encoder.py ──► a_emb
│
▼
predictor/model.py ──► ŝ_{t+1}
│
▼
predictor/losses.py ──► L_pred
│
▼
optimizer step
(predictor params only in Phase 1)
Cache hits dominate from epoch 2 onward; see RFC-0006 §6.
Inference — single-variant scoring¶
variant ──► window lookup ──► [cache hit?]
│
┌──────────┴──────────┐
│ yes │ no
▼ ▼
s_t from disk Carbon-500M call
│ │
└─────────┬────────────┘
▼
action encoder + predictor ──► ŝ_{t+1}
│
▼
Carbon-500M on edited window ──► s_{t+1}
│
▼
surprise.score ──► σ_raw, σ_calibrated
│
▼
receipt.write
For haplotype rollout, the second Carbon call is omitted; the predictor's output is the predicted final latent.
For planning, neither Carbon call occurs inside the search loop. The search amortizes one Carbon call across thousands of predictor rollouts.
Invariants¶
These hold across all subsystems. Violations are bugs, not features. Release-dependent invariants describe required behavior for published inference paths; until the first real terminal demo ships, they remain implementation and release gates rather than public artifact evidence.
| ID | Invariant | Enforced by |
|---|---|---|
| INV-ARCH-1 | The encoder is frozen in Phase 1 (no gradients touch Carbon weights) | requires_grad_(False) at construction; checked at trainer start |
| INV-ARCH-2 | State and predictor outputs live in the same d_state-dim L2-normalized space |
encoder L2-norms outputs; predictor output projection normalizes |
| INV-ARCH-3 | Action embeddings are projected to the predictor's hidden dim by exactly one learned linear layer | predictor input-projection module is a single nn.Linear |
| INV-ARCH-4 | Multi-edit application is right-to-left by descending rel_pos |
action/apply.py::apply_edits |
| INV-ARCH-5 | Cache lookups are content-addressed by (window_hash, encoder_hash, state_layer, pool_type, pool_radius) |
encoder/cache.py |
| INV-ARCH-6 | Every published inference path can produce a checksum receipt | provenance/receipt.py::write_receipt |
| INV-ARCH-7 | Published inference paths perform no network call after first-run setup | runtime fail-closed guard in deploy/runtime.py |
| INV-ARCH-8 | Every public function with a side effect on disk takes an explicit path argument | linted (custom AST check in CI) |
| INV-ARCH-9 | bf16 is the default predictor dtype; up-cast to fp32 occurs only at the loss boundary | trainer config defaults |
| INV-ARCH-10 | Reference embeddings cached on disk are never overwritten in place; new configs produce new shards | encoder/cache.py::write_shard |
Concurrency model¶
- Training uses single-process single-GPU by default. Multi-GPU support
is via
accelerateand is opt-in. No multi-process data sharding inside the trainer; theDataLoaderuses N worker processes for IO and on-the-fly encoding. - Inference (server) uses async batching: requests queue into a predictor batch with a 5 ms gather window, then issue one forward pass.
- Inference (laptop) is single-threaded by default. Batched VCF scoring uses a producer-consumer pipeline with one Carbon worker and one predictor worker.
- Planning is single-process. Parallelizable to multi-GPU in Phase 2 if CEM sample counts exceed single-GPU memory.
There is no asynchronous training, no parameter server, no distributed optimizer state. Multi-node training is out of scope for v1.
Determinism¶
| Path | Determinism guarantee |
|---|---|
Training with --deterministic and pinned seeds |
bit-exact within same hardware + driver |
Training without --deterministic |
reproducible loss curves to within sampling noise |
| Inference (CUDA bf16) | bit-exact within same GPU + driver version |
| Inference (Core ML) | bit-exact within same OS minor version |
| Inference (ONNX Runtime CPU) | bit-exact across machines |
| Verifier re-run | bit-exact when backend matches receipt's runtime.backend |
The receipt format (RFC-0011 §3.3) explicitly records the runtime backend so that verifiers know which determinism class to apply.
Failure semantics¶
Subsystems fail fast and fail loud. The contract:
- Validation errors (malformed input) raise a typed exception from
geno_lewm.errors(see04-error-model.md) with a structured payload. - Capacity errors (out-of-memory, disk full) are reported with the resource that exhausted and surface a remediation hint.
- Cache misses are not errors; they trigger a fall-through to live
encoding with a
cache_missmetric increment. - Network failures during first-run setup raise
RuntimeSetupErrorwith a clear retry strategy. - Backend mismatches during receipt verification raise
ProvenanceErrorwith the expected and observed backends.
No subsystem swallows exceptions. No subsystem returns None on failure.
What is not in the architecture¶
To keep scope honest, the following are intentionally absent in v1:
- Discriminator networks. Surprise is the predictor residual; no separate
classifier is trained on
(s_t, ŝ_{t+1}, s_{t+1}). - Language modeling heads. Decoding to DNA is Carbon's job.
- Protein heads.
- EMA target encoders (LeWM's stability result removes the need).
- Mixed-precision training of Carbon — encoder dtype is fixed at load time.
- Online model updates (auto-update is disabled by design; see RFC-0010 §3.8).
- A web service. No HTTP API in v1.