Implementation Tracker¶
Last updated: 2026-06-06
This file is a human-maintained snapshot of the current execution plan. GitHub issues remain the source of truth for issue state.
Current Status¶
The repository has moved beyond the original spec-bootstrap phase. The implemented surface now includes:
- error taxonomy and error-code gates;
- observability, redaction, metrics, and WandB sink integration;
- action representation, edit application, and synthetic samplers;
- RFC-0006 tuple-builder contracts for source mix, ClinVar fallback, absolute variant providers, and holdout filtering;
- local gnomAD and ClinVar VCF-to-Parquet shard builders with schema-checked Parquet loaders;
- lazy Carbon state encoder wrapper for optional local Transformers runtimes;
- base cross-attention
Predictor,ARPredictorrollout wrapper, predictor losses, and training stability helpers; - deterministic fixture smoke trainer that writes config, metrics, log, checkpoint, dataset manifest, and training-run metadata;
- preflight-gated Carbon trainer launcher with compatible checkpoint resume validation for run id, dataset snapshot, seed split, and config identity;
- surprise score library, FASTA-backed VCF scoring, and the score CLI path for manifest-verified injected scorer components;
- optional native runtime component loading from manifest-backed local artifacts when the ML stack is installed;
- artifact manifests, checksum receipts, single-variant score receipt emission, per-row VCF receipt JSONL sidecars, and the verify notebook;
geno-lewm-evalartifact-level ClinVar-style metrics with deterministic bootstrap confidence intervals from score and label JSONL files, plus optional matched measured-baseline comparisons that require a recorded baseline score artifact;geno-lewm-carbon-baselinegeneration of Carbon zero-shot baseline score JSONL from a local Carbon LM, held-out VCF, FASTA, and optional sequence log-likelihood cache;geno-lewm-eval-allaggregation of validated metrics JSON into packaged sourceeval_metrics.jsonplus generatedeval_report.md, witheval_config.effective.yamlrecorded as a required report artifact;bench.inference --release-efficiencygeneration of measured latency, throughput, peak memory, command, hardware/runtime notes, and input identities as validatedefficiency_report.json;- dedicated fixture-backed
tests/mlsmoke coverage for finite fixture training loss, collapse-health signals, deterministic resume identity, and optional torch predictor initialization/learning when torch is installed; CI runs it as the separateml-smokejob; - hosted fixture-backed eval smoke regression checking with
python -m tools.ci.eval_smoke_gate, which generates public score/label JSONL fixtures, runsgeno-lewm-evalandgeno-lewm-eval-all, enforces AUROC/AP/balanced-accuracy/baseline-delta thresholds, and records why real checkpoint/dataset evaluation is not attempted by this CI gate; - source-distribution inventory checking with
python -m tools.release.check_sdist_assets dist/*.tar.gz, wired into CI and the PyPI release workflow after package metadata validation; tools.release.paper_packagevalidation of generated eval-report Summary/Artifacts markers, required artifact rows, model/dataset identity lines, baseline score artifact rows, exacteval_report.mdrendering from packagedeval_metrics.json, and validefficiency_report.json;- runtime/update/desktop scaffolds;
- release tooling, API-snapshot tooling, duplicate-free
__all__checks, and current public module-map docs.
The v0.1 paper/demo release is complete. Remaining high-priority work is post-release evidence: broader held-out benchmarks, GenoLeWM-vs-Carbon baseline deltas, RFC-0004 attention KV-cache speedups, rollout-fidelity state-row generation beyond the implemented metrics aggregator, planning-ready APIs/CLI, and the first PyPI package tag.
Active Milestones¶
| Milestone | Purpose | Exit Signal |
|---|---|---|
| Direction cleanup | remove stale claims and align issues/docs | completed for v0.1; ongoing for v0.2 |
| Real inference slice | terminal command runs one true score path | completed by the v0.1 public terminal transcript |
| Dataset snapshot | reproducible first experiment data | completed by the v0.1 public dataset package |
| First training run | SNV predictor checkpoint | completed by the published geno-lewm-coherent-cd2bfcc run |
| Evaluation report | first paper-grade results | completed for the narrow v0.1 chr21 ClinVar slice |
| Paper/demo release | public showcase | completed by public model, dataset, demo, paper, and final binder |
| v0.2 benchmark readiness | stronger held-out science evidence | measured benchmark report with exact variants and negative findings |
| v0.2 rollout/planning readiness | scalable rollout and planning substrate | AR rollout speed gate plus planning demo backed by measured evidence |
Release Evidence Ledger¶
Use this ledger as the implementation tracker's source of truth for what the local release contracts proved in v0.1 and what future releases must re-run. Do not use local fixture/tooling evidence alone for v0.2 claims.
| Issue | Local contract | v0.1 status and future boundary |
|---|---|---|
| #163 dataset snapshot | python -m tools.release.dataset_snapshot; python -m tools.release.dataset_package |
Completed for v0.1; v0.2 needs broader benchmark snapshots and refreshed split evidence |
| #164 first Carbon-backed run | geno-lewm-train --carbon-preflight; geno-lewm-train --carbon-train --package-release-run |
Completed for v0.1; v0.2 training should wait for stronger data/eval gates |
| #165 results report | geno-lewm-eval; geno-lewm-carbon-baseline; geno-lewm-eval-all --require-v02-vep-metrics --require-v02-rollout-metrics; geno-lewm-rollout; python -m tools.release.rollout_state_examples; python -m tools.release.rollout_state_rows; python -m tools.release.v02_benchmark_suite; python -m bench.inference --release-efficiency |
Completed for the narrow v0.1 release; broader benchmark and real held-out latent rollout specs/states remain open |
| #197 v0.2 benchmark readiness | python -m tools.release.v02_benchmark_readiness --metrics-json ... --rollout-speed-report ... --rollout-speed-scope-report ... --efficiency-report ... --suite-report ... --output ... --require-ok |
New coverage/provenance contract; --require-ok records measured values/deltas, measured efficiency latency/throughput/memory, efficiency command provenance, suite output identities, row-derived readiness/blocker entries with issue refs, and metric conclusions with split/track, confidence-interval, evaluated variant-key, missing-metric, baseline-gap, failed-target, and release-input context, and also requires CI-bearing VEP metrics, rollout generation reports, an executed passing suite report, and non-fixture, package-relative release inputs; readiness input identities and readiness, efficiency, suite, or nested rollout-speed command path arguments use public-safe paths plus SHA-256 and size where applicable, while the release_inputs row records checked metrics artifact paths, efficiency input identities, and suite output identities, requiring the consumed bench.rollout report plus suite outputs to preserve claim boundaries and requiring suite outputs to include the consumed metrics JSON artifacts; expected ok=false until broader benchmark rows pass and #42 rollout speed either passes from measured artifacts or is explicitly re-scoped with tools.release.rollout_speed_scope |
| #20 release packaging | python -m build; twine check; python -m tools.release.check_sdist_assets dist/*.tar.gz over the full first-publication toolchain |
Tagged package release built by the protected workflow from the checked tree |
| #166 terminal showcase | python tools/demo/terminal_inference.py; python -m tools.release.clean_machine_demo |
Completed for v0.1; future demos should demonstrate stronger benchmark/planning behavior without clinical claims |
| #167/#101 paper and publication | python -m tools.release.paper_draft; python -m tools.release.paper_package; python -m tools.release.hub_release; python -m tools.release.hub_publish; python -m tools.release.publication_report |
Completed for v0.1 with public model, dataset, demo, paper, and final binder links |
Remaining High-Priority Gaps¶
| Gap | Existing Issue(s) | Notes |
|---|---|---|
| Carbon encoder and cache scale | #32, #36 | v0.1 proved the released artifact path; v0.2 still needs broader Carbon validation and cache-build throughput evidence |
| Trainer reproducibility and regression gates | #44, #47 | v0.1 run evidence exists; v0.2 needs deterministic repeatability and benchmark gates beyond the first run |
| Dataset builders and split enforcement | #49, #50, #51, #52 | v0.1 dataset publication exists; audit remaining issue deltas against the actual pipeline, then narrow v0.2 work around larger shards, holdouts, and warm-cache throughput |
| ClinVar, baseline, and rollout evaluation | #53, #55, #56, #57 | v0.1 measured release evidence exists; v0.2 needs broader coding/non-coding benchmarks, Carbon baseline deltas, real rollout state-row artifacts, and exact evaluated variant identities |
| Score CLI and terminal demo | #62, #65 | v0.1 clean-machine scoring transcript exists; close or re-scope remaining work to reusable examples, quickstart polish, and future benchmark/planning demos |
| Model checkpoint Hub release | #101 | v0.1 model release is complete; future work is PyPI/source-package publication and v0.2 model package evidence |
| Hosted ML smoke gate | #89 | Dedicated tests/ml fixture smoke coverage and CI ml-smoke job exist; this remains separate from #54's hosted eval smoke-regression gate |
| Hosted eval smoke gate | #54 | Dedicated tools.ci.eval_smoke_gate, tests/eval, and CI eval-smoke job exist; this remains separate from real ClinVar/rollout benchmark execution |
| Paper-grade docs and tutorials | #94, #95, #96, #97, #98 | Should wait for real artifacts where possible |
| Public provenance API naming | #162 | geno_lewm.provenance is now the active namespace; the legacy import package and receipt JSON field have been removed |
De-Scoped Work¶
The active roadmap no longer includes runtime assurance mechanisms beyond checksum provenance. The package accepts only checksum receipts today. Closed historical issues that referenced the previous direction should stay closed and should not block paper/demo work.
Paper-Ready Definition¶
The first paper/demo release is not ready until:
- dataset snapshot and preprocessing scripts are public;
- dataset package artifacts are generated from a checked snapshot spec
validated by
python -m tools.release.dataset_snapshot --spec-json configs/first_experiment/dataset-snapshot-snv.json --check-spec, staged-input identities checked with the same spec and--check-inputs, then built from staged local upstream files with the same spec and--dataset-dir ... --overwrite, including normalizeddataset_package.json,dataset_manifest.json,data_card.md,split_integrity.json,dataset_input_check_report.json,dataset_snapshot_report.json, andSHA256SUMS; the snapshot report records the checked spec hash and upstream source file identities without private absolute input paths and binds input-check evidence plus generated metadata, manifest, data-card, split-integrity, and nested package-file artifacts by path/hash/size; the release verifier requires that report inSHA256SUMS, rejects stale report file identities, rejects duplicate snapshot file entries, rejects stale generated package identities, and rejects stale card or manifest output that no longer matchesdataset_package.json; generated dataset package metadata must carrygenerated_by=tools.release.dataset_package; checksum inventories must reject invalid digests and duplicate paths;split_integrity.jsonalso records observed label/class balance plus thetools.release.dataset_integritysource header and fails when no train/eval comparable-key comparison can be made;data_card.mdrenders the same class-balance summary; - train/eval configs are committed under
configs/first_experiment/, and the Carbon preflight records the effective training config hash plus resolved closed-schema config payload and CUDA/VRAM accelerator readiness; - real training inputs are preflighted with
geno-lewm-train --carbon-preflight; preflight requires the generated dataset package evidence set, includingdataset_package.json,dataset_input_check_report.json,dataset_snapshot_report.json, andSHA256SUMS, requiresruntime.device: cudafor the first-experiment config, checks the default 40 GiB CUDA memory threshold, and rejects stale input-check evidence before launch; - Carbon-encoded minibatches can be trained through
geno_lewm.training.TorchTrainerwith AdamW parameter groups, WSD LR scheduling, gradient clipping, and distinct data/predictor/LoRA seed records; the real launcher places the Carbon encoder, predictor, action encoder, and encoded minibatches on the configured device; - completed training run evidence is generated with
python -m tools.release.training_runorgeno-lewm-train --carbon-train --package-release-run, including checksum-coveredtraining_preflight_report.jsonfor release Carbon-backed runs andgenerated_by=tools.release.training_run; release-mode verification requires the preflight report's dataset core-file evidence fordataset_package.json,dataset_input_check_report.json,dataset_snapshot_report.json, andSHA256SUMS; the final package verifier rejects staletraining_run_card.mdcontent that no longer matchestraining_run_manifest.json; - checkpoint and model card are published;
- checkpoint package artifacts are generated with
python -m tools.release.model_package, including normalizedmodel_package.json, renderedmodel_card.md, packagedeval_metrics.json,efficiency_report.json, and model-local eval artifact references from the metrics payload in the checksum set; generatedmodel_package.jsonmust carrygenerated_by=tools.release.model_package, model metadata must include the training preflight report, training run manifest/card, and training-run checksums asextra_files, and the package verifier rejects stale model cards that do not re-render frommodel_package.jsonplusmanifest.json, rejects invalid or duplicate checksum paths, rejects training-run dataset/config/commit evidence that does not match the manifest plus eval/efficiency evidence, and rejects mixed eval/efficiency release id, dataset snapshot, commit, or model-result identity across artifacts; - evaluation metrics JSON and confidence intervals are generated with
geno-lewm-eval, including matched baseline score artifacts when a measured baseline is reported, and accepted metrics payloads carrygenerated_by=geno-lewm-evalorgenerated_by=geno-lewm-eval-all;geno-lewm-evalrecords its report artifact table as package-relative paths under--artifact-root, defaulting to the metrics output directory, writeseval_config.effective.yamlbesideeval_metrics.json, and rejects absolute paths outside that root; - Carbon zero-shot baseline scores are generated with
geno-lewm-carbon-baseline --artifact-root ... --vcf ... --fasta ... --carbon-model-dir ... --output-scores ...and consumed bygeno-lewm-evalwith--baseline-score-field carbon_zero_shot_score; optional log-likelihood cache rows are scoped to the Carbon model and revision before reuse, with unique sequence SHA-256 keys within that scope; generated baseline summary metadata records package-relative model, input, output, and cache paths under--artifact-root;geno-lewm-evalrequires primary score rows fromgeno-lewm-scoreand Carbon baseline rows fromgeno-lewm-carbon-baseline; - evaluation report is generated from packaged measured metrics JSON
with
geno-lewm-eval-all --require-v02-vep-metrics --require-v02-rollout-metrics, which refreshes and recordseval_config.effective.yamlplus metrics inputs as package-relative artifact paths under the aggregate metrics directory and fails incomplete v0.2 VEP or rollout-fidelity coverage;geno-lewm-rolloutcan add rollout-fidelity metric rows from measured latent-state JSONL with per-K stratification; the eval-report parser rejects metrics payloads missing the requiredeval_configartifact; baseline comparisons must supplybaseline,baseline_value, anddelta_vs_baselinetogether; conclusions must explicitly reference every measured metric name, split, measured value, and baseline delta when present fromeval_metrics.json; python -m tools.release.v02_benchmark_suite --manifest ... --output-report ...can plan or execute the v0.2 benchmark commands from a JSON manifest.configs/first_experiment/v0.2_benchmark_suite.template.jsonis a checked planning template for the required coding ClinVar, non-coding ClinVar, BRCA2 saturation, TraitGym Mendelian, phased-haplotype rollout, and synthetic edit-chain rollout rows. ClinVar VEP rows use binary ClinVar metrics; BRCA2 and TraitGym rows usegeno-lewm-eval --metric-mode spearmanover continuous labels. Plan-only reports keepok=false, while execute-mode clears each step's declared output files, then requires the command to exit successfully and write those outputs again, recording passed-step output identities by package-local path plus SHA-256 and size, with measured claims still deferred to generated artifact validators; the suite report records the manifest by package-local path plus SHA-256 and size identity; final release-input readiness must run after the executed suite report exists and pass it with--suite-report; a second-pass suite manifest can express that command withreadiness.suite_report;python -m tools.release.rollout_state_examples --spec-jsonl ... --cache-dir ...resolves explicit cache keys for measured source, target, and candidate latent states into examples JSONL;python -m tools.release.rollout_state_rows --examples-jsonl ... --model-dir ...generates rollout-state JSONL from those measured latent examples and the manifest-backed action encoder/predictor, rejecting example rows without the supportedschema_version=1.0.0and generator marker. These helpers bridge precomputed source/target/candidate states togeno-lewm-rollout; they are not Carbon encoder runs or held-out haplotype generators. Release readiness requires rollout metrics to carry both generation reports as package-relative artifacts;python -m tools.release.rollout_speed_scope --rollout-speed-report ... --output ...records an accepted #42/#197 re-scope for failed RFC-0004 AR rollout speed targets. Readiness consumes it only when the report binds the exact failingbench.rolloutpath/SHA-256/size identity, failed K targets, valid GitHub issue refs including #42 and #197, UTC generated and accepted timestamps, HTTP(S) decision URL, and public-safe scope plus nested rollout command path identities. Scope negative findings and claim boundaries must preserve that the failed target is not passing rollout-speed evidence, and scope generation requires the sourcebench.rolloutreport's own claim boundary first. The readiness row remainsrescopedrather than passing speed evidence, andscope_decisionspreserves the accepted report identity, accepter, rationale, replacement target, timestamps, decision URL, and issue refs. Metric conclusions must also include failed-target details and accepted decision context for re-scoped rows;- eval-report
negative_findingsmust be non-empty and render as## Negative Findings; baseline delta rows must carry matching evaluated variant-key hashes;tools.release.paper_packageresolves eval artifact paths inside the package and validates score JSONLgenerated_bymarkers; - efficiency evidence is generated with
python -m bench.inference --release-efficiencyand records measured single-variant latency, batched throughput, peak memory, command, hardware/runtime notes, package-relative or inline input identities, samples, warm-up, limitations, and thetools.release.efficiency_reportsource header; - terminal demo runs real model inference from released artifacts;
- demo transcript is generated by
tools/demo/terminal_inference.pyfrom the actualgeno-lewm-scorecommand, including generated time, exit code, model release/version/id, artifact-input paths, and an explicit claim-boundary sentence; - demo command, model/input identities, VCF input summary, transcript
hash, score/receipt hashes, JSONL field names, generated report hashes, and compact
score/receipt batch metadata are summarized by generated
terminal_demo_manifest.json; - demo runtime readiness is summarized by generated
runtime_preflight_report.json, which must require native runtime dependencies and must record fixture/test manifest allowance as false; - demo score and receipt JSONL streams are summarized by generated
batch_receipt_report.json, including checked score fields, model id, calibration hash, runtime identity, receipt stream, and record count; the demo runner clears owned score, receipt, batch-report, and demo-manifest outputs before invoking the score command so stale JSONL rows cannot satisfy a later run; the demo runner re-opensruntime_preflight_report.jsonbefore writingterminal_demo_manifest.jsonand rejects stale or mutated model, input, command, backend, runtime-requirement, or model-artifact evidence from a different run; the package verifier rejects stale transcript claim-boundary or artifact-input markers and stale terminal-demo manifestruntime_preflightsummaries; the package verifier rejects stale manifest JSONL field lists, stalescore_receipt_batchsummaries, and score/receipt batches whose model id or calibration hash do not match the packaged model manifest; - first experiment paper draft is generated with
python -m tools.release.paper_draft, rejecting staleeval_report.mdoutput that no longer matcheseval_metrics.json, rejecting stale terminal-demo VCF summaries, requiring a UTCGenerated: ...Ztimestamp, rendering the scored-input summary in Demo Evidence, including generated Citation Metadata, including Negative Findings copied from the generated eval report, and namingmodel_package.json,dataset_package.json,dataset_input_check_report.json,dataset_snapshot_report.json,eval_metrics.json,eval_config.effective.yaml,eval_report.md,efficiency_report.json, and demo evidence paths using package-local artifact names rather than build-machine root paths; the package verifier rejects paper drafts or drafts missing Citation Metadata or Negative Findings that no longer match the current artifact set; python -m tools.release.paper_packagepasses for the model, dataset, demo, and paper artifacts;python -m tools.release.hub_releaseemits a versioned dry-run Hub upload plan for the verified release candidate, requiring a public paper URL when a paper artifact is included, and records model upload inventories from bothSHA256SUMSandtraining_run_SHA256SUMS, dataset upload inventories includingSHA256SUMS, and portable terminal-demo upload inventories with unique GitHub release asset names. When a paper URL is present, it also binds the verified paper file path/hash/size before emitting publication commands for recognized Hub/GitHub targets; Hugging Face commands upload each verified model/dataset file to its planned destination instead of syncing whole package directories;.github/workflows/release-hub-dry-run.ymlruns the package verifier, Hub dry-run planner, and release-candidate report without publishing weights or requiring Hub credentials;python -m tools.release.hub_publishand.github/workflows/release-hub-publish.ymlpublish the verified model, dataset, terminal-demo, and matching paper artifacts after a clean dry-run, requiringHF_TOKEN, GitHub release credentials, supported Hugging Face/GitHub target URLs, direct GitHub release download paper URLs whose final asset name matches the verified paper file, and protectedreleaseenvironment approval, with the workflow syncing the lockeddev,train,eval, anddeployextras for the native clean-machine replay, then uploading only files named by the verified Hub plan before regenerating the final release-candidate report from the public links and fetched public artifact bytes; the protected workflow runs the clean-machine terminal replay from that report with native runtime checks enabled before runningpython -m tools.release.publication_assetsto bind the GitHub release target, upload command, and publication-evidence asset identities, then uploading those assets to the demo release tag, using release credentials only for scoped artifact fetches;python -m tools.release.release_candidateemitsrelease_candidate_report.jsonwithready=truefor the same model, dataset, demo, paper, public URL reachability checks, commit SHA, Hub repo id, model package metadata, dataset package metadata, dataset snapshot report, source metrics JSON, effective eval config, generated eval report, efficiency report, manifest-backed predictor/action/calibration and training-config artifacts,training_preflight_report.json,training_run_SHA256SUMS, and Hub model/dataset/demo upload inventories, provider-backed public artifact exact file-set, hash, and size checks, direct paper byte hash/size checks, plus areadinesschecklist that records which publication requirements are satisfied or blocked; public checks can be skipped only for explicit fixture rehearsals that allow fixture manifests, otherwise the report remainsready=false;python -m tools.release.clean_machine_democonsumes the generated ready release-candidate report, rejects hand-authored reports and stale embedded Hub plans by source header and model/repo/URL identity, rejects missing or failed readiness rows, non-empty candidate blockers, skipped or failed public link checks, skipped, missing, incomplete, or failed public artifact checks, unsafe embedded Hub-plan destinations, or malformed expected hashes, downloads the published model files, dataset snapshot files, and GitHub release demo assets, verifies their SHA-256 values against the Hub upload plan, re-runs the release-package verifier on the downloaded model/dataset/demo package, reruns the terminal demo from those public bytes, and rejects replayed terminal demo manifests with invalid source headers, non-passing status, model id mismatch, downloadedmodel/manifest.jsonhash/size mismatch, stale VCF/FASTA input identities, staleruntime_preflightsummaries, stalescore_receipt_batchsummaries, or replay artifact hash/size drift before writing the clean-machine report. The final publication binder also checks the replay manifest's VCF/FASTA input identities against the downloaded demo artifacts and checks the replay manifest's artifact table against the clean-machine replay report for the transcript, scores, receipts, runtime preflight, and batch report. Before scoring, the replay helper checks the downloaded demo VCF/FASTA hashes and sizes against the downloaded demo manifest; after scoring, it rejects replay manifests whose VCF/FASTA identities do not match those downloaded inputs. The replay helper writesclean_machine_demo_report.jsonwith the release-candidate report filename plus hash/size identity, output-directory-relative downloaded artifact identities, package-verification result, replay transcript and manifest identities, and replay score, receipt, runtime-preflight, and batch-report artifact hashes, without serializing fetch tokens or private absolute workstation paths;python -m tools.release.publication_reportruns after credentialed Hub publication and clean-machine replay, writespublication_evidence_report.json, and binds the Hub release plan, release-candidate report, publish report, and clean-machine replay report by public-safe filename plus hash/size identity, including the clean-machine replay's recorded release-candidate report filename/path, hash, and size identity plus the verified paper file source name, URL, hash, and size identity plus the full paper-critical release-candidate artifact table for model, dataset, eval, demo, and paper identities, public-safe release-candidate readiness rows plus public link and public artifact check summaries, with every uploaded release-candidate artifact identity in that table checked against the Hub plan and downloaded public artifact, plus the replayed terminal-demo manifest's model id, downloadedmanifest.jsonidentity, VCF/FASTA input identities,runtime_preflightsummary, and replayed runtime-preflight model/input identities without private absolute paths, while failing on a candidate embedded-plan mismatch, a missing generated readiness checklist, non-empty candidate blockers, stale readinessissue_refs, missing or failed candidatepublic_linksorpublic_artifactschecks, exact download-set, public source URL, hash, or replay-artifact mismatches; the protected workflow uploads the Hub plan, release-candidate report, publish report, clean-machine replay report, final publication evidence report, publication evidence asset manifest, replay transcript, replay manifest, score/receipt JSONL streams, runtime preflight report, and batch receipt report as public GitHub release assets;- receipt semantics cover the published demo mode without implying unsupported trust guarantees;
- README and docs show measured values only where they are measured;
- privacy and safety docs match the demo behavior.
GitHub Issue State¶
-
15 now tracks artifact provenance and checksum receipts, not external¶
runtime assurance mechanisms beyond checksum provenance. -
62 and #65 have local implementation coverage plus v0.1¶
clean-machine artifact-backed scoring evidence. Their stalestatus:blockedlabels should be removed; close them if no non-v0.1 acceptance remains, or re-scope any quickstart/tutorial follow-up under #197. -
101 tracked the v0.1 model Hub release with
model_card.md,¶eval_metrics.json,eval_config.effective.yaml,eval_report.md,efficiency_report.json,manifest.json, training config, checksum files, and links to the dataset snapshot and terminal demo transcript. That release is now public and closed through PR #196; future model packages should preserve the samepython -m tools.release.paper_packagecontract. -
162 now has a local
legacy import package has been removed from the active public surface, and receipt JSON now serializes the field asgeno_lewm.provenancepublic namespace. The¶provenance. -
49 and #50 now have local release-file prep commands for gnomAD and¶
ClinVar, and the v0.1 dataset publication used public dataset artifacts. Audit the remaining issue bodies against that pipeline and keep only narrower v0.2 shard/data-quality deltas. -
51 now has a local tuple-builder contract plus¶
geno_lewm.data.GenoLeWMDataset, which deterministically streams source windows and training tuples without importing torch in core environments.geno_lewm.training.encode_training_batchnow looks up untargeted sources_tstates in the documented cache index when present and falls back to live encoding on misses. Remaining work should focus on v0.2 prepared-shard scale, holdout membership deltas, and measured warm-cache throughput validation. -
44 and #47 now have deterministic fixture smoke coverage through¶
geno-lewm-train --fixture-smokeplus a torch trainer core for Carbon-encoded minibatches, AdamW groups, WSD scheduling, gradient clipping, distinct seed records, and a preflight-gatedgeno-lewm-train --carbon-trainlauncher. v0.1 supplied a real Carbon-backed run; remaining work is deterministic repeatability on supported backends and benchmark gates beyond the first run. -
163 through #167 tracked the v0.1 paper/demo chain: dataset snapshot,¶
first training run, generated evaluation report, terminal showcase, and paper package. They are closed with public evidence through PR #196. Remaining data/eval/demo work should be routed through #197 or narrower v0.2 issues, not reopened as missing v0.1 publication evidence.
Validation Expectations¶
For project-direction changes:
rgfinds no active docs promising unsupported runtime assurance beyond checksum provenance;- receipt tests confirm unsupported runtime assurance modes are rejected;
- score CLI/runtime tests cover single-variant receipt emission and per-row VCF receipt JSONL sidecars;
- public module-map docs match the current package layout and do not
list removed or absent paths such as
eval/,holdouts.py,deploy/provenance.py, or export modules that have not landed; - public API docs/RFC-0014 point to
tests/api/public_surface.jsonas the exhaustive enforced symbol list, and upcoming planning solver types are not described as stable top-level exports; - CLI scaffold factory helpers remain private to the shared stub factory and do not leak into command-module public surfaces;
- README and roadmap state current implementation gaps honestly;
- docs build and focused tests pass.