Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog 1.1.0, and this project adheres to Semantic Versioning 2.0.0.
For pre-1.0 versioning policy (no breaking changes in MINOR until 1.0),
see docs/spec/09-release-and-versioning.md.
[Unreleased]¶
Changed¶
- Narrowed RFC-0011 and the public roadmap to artifact provenance and
checksum receipts. The receipt schema now accepts only
provenance.kind="checksum_only"; unsupported future runtime assurance modes are rejected instead of treated as forward-compatible placeholders. - The scope-language guard now checks notebook source and text outputs so public examples cannot reintroduce unsupported trust claims.
- Made
geno_lewm.provenancethe active public namespace for manifest, checksum, commitment, and receipt helpers. The old legacy import package has been removed from the active public API. - Pruned CLI scaffold factory helpers from command-module public
surfaces. Stub command modules now expose
appandcli_main; the reusablebuild_stub_app/make_cli_mainhelpers remain internal togeno_lewm.cli._stub_main. - Removed the healthcare-industry package classifier so future package metadata matches the documented research-only, non-clinical safety boundary.
- Rewrote the README, roadmap, implementation tracker, contributor
guide, and
AGENTS.mdagent context around the current alpha status, first experiment gaps, dataset/model release plan, and terminal demo target.
Added¶
- Terminal demo transcript generator.
tools/demo/terminal_inference.pyrunsgeno-lewm-scorewith explicit score and receipt outputs, records stdout/stderr and manifest identity into a Markdown transcript, and rejects fixture/test manifests by default so release demos do not publish scaffold output as model behavior.- Demo transcript generation now fails unless
scores.jsonlandreceipts.jsonlexist, are non-empty JSONL, and have their hashes, row counts, and JSONL field names recorded in the transcript. terminal_demo_manifest.jsonrecords JSONL field names and a compactscore_receipt_batchsummary with record count, checked score fields, receipt stream, model id, calibration hash, and runtime identity.- The paper/demo package verifier rejects stale terminal-demo JSONL
field lists and stale
score_receipt_batchsummaries that no longer match the generated score, receipt, and batch-report artifacts. tools/release/batch_receipt_report.pygeneratesbatch_receipt_report.jsonfrom the terminal-demo score and receipt JSONL streams, checking row counts, model id, calibration hash, runtime identity, row ordering, output commitments, and score-output equality.-
tools/release/paper_draft.pygenerates the first-experiment paper draft from model, dataset, eval-report, terminal transcript, and batch receipt report artifacts; it records dataset package metadata, model package metadata, source metrics, eval report, efficiency report, and demo evidence in Artifact Availability, and rejects placeholder wording by default. -
Paper/demo release package verifier.
configs/first_experiment/train-carbon-500m-snv.yamlandconfigs/first_experiment/eval-clinvar-snv.yamlprovide checked first paper-run config files that load through the closed GenoLeWM config schema.- Carbon training preflight now validates
--training-configthrough the same closed schema and records the resolved config payload intraining_preflight_report.json. tools/release/paper_package.pyvalidates the model directory, dataset snapshot, generated terminal transcript, and optional artifact-grounded paper draft before a first paper/demo release is linked publicly.tools/release/dataset_package.pybuildsdata_card.md, normalizeddataset_package.json,dataset_manifest.json,split_integrity.json, andSHA256SUMSfrom release metadata and already-produced dataset shard files.tools/release/dataset_snapshot.pynow writesdataset_snapshot_report.jsonwith the checked snapshot-spec hash, upstream source file hashes, staged file identities, and public-safe source references, and includes the report inSHA256SUMS.- The paper/demo verifier rejects stale dataset cards or manifests
that no longer match the packaged
dataset_package.jsonmetadata. - The paper/demo verifier now requires
dataset_snapshot_report.jsonand rejects missing reports, private absolute source paths, and stale staged-file identities. - The paper/demo verifier re-renders generated paper drafts from the
current release artifacts and rejects stale paper Markdown, including
drafts that no longer name
model_package.json. tools/release/dataset_integrity.pyrecomputes dataset file identities, split record counts, and train/eval comparable-key leakage checks fromdataset_manifest.json.- Split-integrity checks now inspect Parquet row counts and variant
keys when shard files expose
chrom,pos,ref, andalt. tools/release/training_run.pybuildstraining_run_manifest.json,training_run_card.md, andtraining_run_SHA256SUMSfrom run metadata, metrics, logs, config, dataset manifest, checkpoint files, seeds, hardware/runtime notes, monitoring flags, and optionaltraining_preflight_report.jsonevidence.- Release training-run verification now requires
training_preflight_report.json, checks its schema,ok=truestatus, dataset snapshot, resolved config identity, checksums, and public-relative path references before a paper/demo package can pass. geno-lewm-train --carbon-trainnow preflights the exacttraining_config.effective.yamlused by the launch, mirrorstraining_preflight_report.jsoninto the run directory, and supports--package-release-runto writetraining_run_manifest.json,training_run_card.md, andtraining_run_SHA256SUMSimmediately after a successful Carbon-backed run.tools/release/model_package.pybuilds normalizedmodel_package.json,model_card.md, andSHA256SUMSfrommanifest.json, manifest-backed checkpoint artifacts, model-release metadata, packagedeval_metrics.json, andefficiency_report.json.- Model and paper package verification now reject eval/efficiency evidence whose model release id, dataset snapshot, commit, or model-result identity is mixed across otherwise valid artifacts.
tools/release/hub_release.pydry-runs the Hugging Face Hub upload plan after package verification and records model files, checksum-covered training-run artifacts, model and datasetSHA256SUMS, terminal-demo files with portable package-local paths, hashes, release links, commit SHA, and upload commands for the model, dataset, and GitHub release artifact set when the URLs identify supported targets.- Hub dry-runs now require a public
paper_urlwhenever a paper artifact is part of the candidate, and.github/workflows/release-hub-dry-run.ymlruns the package, Hub-plan, and release-candidate gates without publishing weights or requiring Hub credentials. tools/release/hub_publish.pyand.github/workflows/release-hub-publish.ymladd the credentialed publication path for verified model, dataset, and terminal-demo artifacts. The workflow requiresHF_TOKEN, GitHub release credentials, protectedreleaseenvironment approval, supported Hugging Face/GitHub target URLs, and regenerates the finalrelease_candidate_report.jsonafter upload from public links and fetched public artifact bytes, then runs the clean-machine terminal replay from that report before uploading publication evidence. Replay fetches pass the Hugging Face token only to Hub artifact downloads and the GitHub token only to the release asset listing.tools/release/clean_machine_demo.pydownloads the published model files, dataset snapshot files, and GitHub release demo assets from a ready release-candidate report, verifies their SHA-256 values against the Hub upload plan, re-runs the release-package verifier on the downloaded model/dataset/demo package, reruns the terminal demo from those public bytes, and writesclean_machine_demo_report.jsonwith the release-candidate report identity, downloaded artifact identities, package-verification result, replay transcript and manifest identities, and replay score, receipt, runtime-preflight, and batch-report artifact hashes, without serializing fetch tokens.tools/release/publication_report.pywritespublication_evidence_report.jsonafter credentialed Hub publication and clean-machine replay, binding the Hub release plan, release-candidate report, publish report, and replay report by file identity and failing on candidate, readiness, download, or replay-artifact mismatches.- The verifier checks manifest artifact hashes,
SHA256SUMS, model card/data card sections, stale model-card metadata, generated eval-report sections, dataset manifest file hashes, dataset split-integrity evidence, training-run evidence, fixture-manifest rejection, transcript markers from the real score command, and demo score/receipt JSONL hashes, row counts, and batch receipt report. - Demo batch receipt reports are rejected unless their model id and calibration hash match the packaged model manifest.
tools/release/release_candidate.pynow records manifest-backed predictor, action-encoder, calibration, training-config,model_package.json,dataset_snapshot_report.json, andtraining_preflight_report.jsonplustraining_run_SHA256SUMSartifact identities, Hub upload inventories, provider-backed public model/dataset/demo artifact hash checks, and a machine-readablereadinesschecklist in the final publication decision report.--skip-public-link-checkis now accepted only for explicit fixture rehearsals that also pass--allow-fixture-manifest; a non-fixture release candidate remainsready=falsewhen public link or artifact hash checks are skipped.tools/release/eval_report.pyrenderseval_report.mdfrom measured metrics JSON and rejects empty metrics or placeholder wording before the package verifier accepts the artifact.geno-lewm-eval --scores-jsonl ... --labels-jsonl ... --output-metrics ...now computes artifact-level ClinVar P/LP versus B/LB metrics and writes the measured metrics JSON consumed by the eval-report generator.- Eval metrics include deterministic stratified bootstrap confidence
intervals by default (
--bootstrap-resamples,--bootstrap-seed,--ci-level) and record an explicit omission note when resampling is disabled. geno-lewm-evalcan attach a matched measured baseline score artifact with--baseline-scores-jsonl ... --baseline-name ...; report metrics then include the baseline value and delta while preserving the baseline score artifact path.geno-lewm-carbon-baseline --vcf ... --fasta ... --carbon-model-dir ... --output-scores ...now writes Carbon zero-shot baseline score JSONL withcarbon_zero_shot_score = -(logLik_alt - logLik_ref)and optional sequence log-likelihood cache rows for reuse across real eval runs.-
geno-lewm-eval-all --metrics-json ... --output-metrics ... --output-report ...now validates and aggregates measured metrics JSON artifacts into aggregate metrics JSON pluseval_report.md, writeseval_config.effective.yaml, records it as aneval_configartifact, and rejects mixed model, dataset, commit, hardware, or core artifact identities. -
RFC-0006 training tuple-builder contract.
geno_lewm.data.builderaddsWindowContext,TrainingTuple,EditSourceCount,HoldoutInterval, andHoldoutPolicyso the future trainer receives a checked(window_id, action, target_window)stream instead of ad hoc tuples.build_training_tuples()enforces the RFC-0006 3/3/1/1 source allocation, supports explicit ClinVar-to-synthetic-SNV fallback, filters chromosome/interval/edit-key/record holdouts, and validates provider output before target windows are materialized.-
Synthetic SNV/indel providers and an absolute
EditSpecprovider are available for fixture tests and for wiring prepared gnomAD and ClinVar shards later. -
Local gnomAD and ClinVar shard preparation.
geno-lewm-prepare-gnomad --input-vcf ... --output ...converts a local gnomAD VCF/VCF.gz into the documented Parquet shard, keeping PASS variants above the global AF threshold and splitting multi-allelic rows per alternate.-
geno-lewm-prepare-clinvar --input-vcf ... --release ... --output ...converts a local ClinVar VCF/VCF.gz into the documented Parquet shard, retaining VUS/OTHER rows whilelabel_set()excludes them from labelled eval sets. -
Deterministic fixture training smoke path.
geno-lewm-train --fixture-smoke --run-dir ... --steps 50now writes a resolved config, metrics JSON, log, fixture checkpoint, fixture dataset manifest, andtraining_run.jsonmetadata.-
Fixture smoke runs can resume from the fixture checkpoint and are covered by bit-equal continuation tests. The output is release plumbing evidence only, not a Carbon-backed model result.
-
PyPI release workflow hardening (issue #100).
.github/workflows/release-pypi.ymlis now the trusted-publisher workflow path for tagged releases.-
Release artifacts build from the committed
uv.lock, publish to PyPI via OIDC trusted publishing, and emit GitHub/Sigstore build provenance withSHA256SUMSattached to the GitHub release. -
Receipt-verification tutorial notebook (issue #99).
-
examples/07_verify_receipt.ipynbverifies a committed checksum-only fixture receipt against its manifest, recomputes the input commitment, and validates the output commitment through the publicgeno-lewm-verifypath. -
CLI discovery flags wired to the config layer (issue #29; RFC-0017 §3.8 + RFC-0018 §3.2).
--print-configrenders the resolved config of the invoked command as canonical YAML on stdout.--print-config-treeprepends a# resolved from: <path>provenance comment (single source today; Hydra-style multi-source composition lands with future work).--explain encoder.dtype(or any dotted-key) renders the schema docstring + type + default — implemented viageno_lewm.config.describe_field.- Each Typer stub now passes
default_config_namethrough to the dispatcher (train/score/eval/plan) so the right YAML template loads when the discovery flags fire. -
tests/unit/test_cli_dispatcher.pyextended with parametrised coverage of each flag against every stub (34 new cases) plus a targeted--explain unknown.key→ exit code 3 test. -
Configuration schema and YAML defaults (issue #28; RFC-0017).
geno_lewm/config/schema.py— frozen dataclasses for every subsystem (EncoderConfig,PredictorConfig,ActionEncoderConfig,OptimizerConfig,DataConfig,EvalConfig,ObservabilityConfig,RuntimeConfig) plus the top-levelGenoLeWMConfig. RFC-0017 §3.2 left the choice of Pydantic vs dataclasses open; dataclasses keep the base runtime dep footprint minimal (onlypyyamlwas added).geno_lewm/config/loader.py— YAML loader + typed validator. Coerces YAML payloads through the schema, rejects unknown top- level keys with the newUnknownTopLevelKeyError(CONFIG.UNKNOWN_TOP_LEVEL_KEY), and rejects unknown sub-fields withConfigError. Type coercion handlesLiteralenums,tuple[str, ...](from YAML lists),bool/int/float/str, andX | Noneunions.geno_lewm/config/defaults/{train,score,eval,plan}.yaml— canonical Phase 1 defaults mirroring every field in the schema.write_resolved_config()emits the resolved config as canonical YAML so${run_id}/config.resolved.yamlis byte-stable (RFC-0017 §3.5).describe_field()returns{name, type, default, doc}for any dotted-key in the schema; consumed by the--explainCLI flag (PR #29 wiring still to come).-
pyyaml>=6andtypes-PyYAML>=6added to base / dev deps. -
CLI dispatcher and stub command surface (issues #30, #31; RFC-0018).
geno_lewm/cli/_dispatch.py— shared Typer dispatch helpers:SharedOptionsdataclass,shared_option_decls,finalize_shared,print_banner,run_app,not_yet_implemented. CatchesGenoLeWMErrorat exactly one place and maps each subclass to the exit code documented indocs/spec/04-error-model.md(2 / 3 / 4 / 5 / 6 / 7 / 8 / 9; 130 forKeyboardInterrupt).- Eleven new Typer stub commands:
train,score,rollout,plan,eval,eval-all,export,cache-windows,prepare-gnomad,prepare-clinvar,update. Each accepts the full shared flag set from RFC-0018 §3.2, prints the non-dismissible safety banner (RFC-0018 §3.7, suppressible only with both--quietand--no-banner), and exits with code 9 advertising the GitHub tracking issue for the eventual implementation. [project.scripts]registers all 12 console scripts (the 11 new stubs + the existinggeno-lewm-verify).- Shell completion (issue #31) via Typer's built-in
--install-completion/--show-completion; install steps documented in CONTRIBUTING.md. typer>=0.12added to the base runtime dependency set; previously the package shipped with zero runtime dependencies.-
tests/unit/test_cli_dispatcher.py— 84 tests covering the banner contract, the shared-flag validator, exit-code mapping for every error family,--versionfor every stub, thepyproject.toml↔ module-layout invariant, and the--helpsmoke test for every console script. -
Test pyramid scaffold and shared fixtures (issue #85; RFC-0015 §3.6).
tests/conftest.pyresolvesPYTEST_RANDOM_SEED(env override, orHEADSHA mod 2**32 by default) and surfaces it in the pytest header for reproducible failures. Exposes the resolved value via therandom_seedfixture;seeded_randomreturns a per-testrandom.Randomso randomness never leaks across tests.- New synthetic fixtures usable by every test layer:
synthetic_window(4 kBACGT),synthetic_edit_spec,synthetic_pooling_config,synthetic_dtype_config,synthetic_receipt_output,fixtures_dir,utc_now,stable_isoformat. - New test directories created (per RFC-0015 §3.1):
tests/ml/,tests/integration/,tests/eval/,tests/typecheck/,tests/fixtures/. Each ships an__init__.pythat documents the layer's purpose.tests/typecheck/already holds runtime checks for thepy.typedmarker and the sortedness / completeness ofgeno_lewm.__all__. -
tests/fixtures/sample_window.faandtests/fixtures/sample_receipt.json— small canned data files;tests/integration/test_fixtures_load.pysmokes them through the publicread_receiptloader. -
Performance harness, microbench suite, and regression detector (issues #90, #91, #92; RFC-0016).
bench/_harness.py— stdlib-only timing library.time_callablereturns aBenchResultwith samples / median / IQR (P25, P75) and a metadata block (commit, machine, Python, platform, dtype).write_resultpersists JSON atbench/results/<machine>/<benchmark>.json. Machine slug honoursGENO_LEWM_BENCH_MACHINEso CI runners write to distinct trees.bench/inference.py,bench/training.py,bench/planning.py,bench/profile.py— per-target benchmark scripts and profiler invocations. Planning emits placeholder JSON until the CEM solver lands (#59 / #60 / #61).tests/benchmark/test_microbench.py—pytest-benchmarksuite over the hot paths (canonical-JSON hashing, sha256 file/bytes, receipt commitments,EditSpecvalidation,apply_edit/apply_editsbatches). Markedbenchand deselected from the defaultpytestrun; the nightly job opts in withpytest -m bench --benchmark-only --benchmark-json=....tools/ci/perf_regression.py— diffs current results against the committed baseline atbench/results/baseline/. Handles both the bench-harness JSON shape and pytest-benchmark JSON; fails when any benchmark's median exceeds the baseline by more than the configured threshold (default 5 %, RFC-0016 §3.7). Treats missing baselines as warm-up and never gates on new benchmarks..github/workflows/perf-nightly.yml— daily cron that runs the harness, the pytest microbench suite, and the regression detector, uploading the result tree as a workflow artifact.-
pytest-benchmark>=4added to the[dev]optional extras. -
Changed-files coverage gate (issue #88;
tools/ci/coverage_gate.py). - Cobertura XML +
git diff origin/<base>...HEAD→ per-file coverage on the lines a PR adds or modifies; fails if any touched Python file undergeno_lewm/falls below the configured threshold (default 90 %). Avoids the project-wide ratchet pathology called out in RFC-0015 §4.2. - Wired into
.github/workflows/ci.ymlas a step on the canonical matrix combo (Ubuntu × Python 3.12), gated topull_requestevents. Theactions/checkout@v6step now usesfetch-depth: 0so the gate can resolve the base ref locally. -
Inputs are explicit (
--coverage-xml,--base,--threshold,--prefix,--diff-file) so the gate is unit-testable without a real git repo. -
Release tooling (issue #102;
tools/release/). tools/release/bump.pyrewrites the canonical__version__assignment ingeno_lewm/__init__.pyafter validating the new string against the project's PEP 440 subset (release,aN/bN/rcN,.postN,.devN) and enforcing strict-monotone ordering.--dry-runemits the unified diff without touching the tree;--showprints the current version.tools/release/changelog.pysynthesises a Keep-a-Changelog 1.1.0 section fromgit log <since>..<until>, mapping conventional / area-prefixed commits toAdded/Changed/Deprecated/Removed/Fixed/Securitybuckets and flagging breaking (feat!:/fix!:) commits. Default--dry-runmode prints the section to stdout;--writelifts the existing[Unreleased]block inCHANGELOG.mdinto a dated[X.Y.Z]heading and re-opens an empty placeholder.-
Both helpers are pure stdlib and run as
python -m tools.release.{bump,changelog}so the release runner does not need optional dependencies installed. -
Distribution & packaging.
- PEP 440-compliant version (
0.1.0.dev0) sourced dynamically by Hatch fromgeno_lewm/__init__.pyso package metadata and the runtime__version__cannot drift. py.typedmarker so downstream type checkers honour the package's mypy-strict signatures.- Curated top-level
geno_lewm.__all__re-exporting the implemented surface (errors, observability, provenance, action specs, decorators). tools/__init__.pysopython -m tools.*runs as documented.-
Optional dependency groups split into
train/eval/deploy/dev/docs/all. -
Modern quality tooling.
- Ruff lint+format with the full B, C4, UP, N, RUF, SIM, PIE, PTH, PL, PERF, FURB, LOG, ASYNC rule set; zero remaining findings.
- Mypy
--strictclean acrossgeno_lewm/andtools/(25 source files, 0 errors). [tool.pytest.ini_options]with strict markers / strict config /filterwarnings = ["error"].- Branch coverage at a 95 % gate.
- Pre-commit configuration mirroring every CI gate
(
.pre-commit-config.yaml). -
.editorconfigand.gitattributesfor cross-editor / cross-OS consistency. -
CI/CD pipeline.
.github/workflows/ci.yml— matrix tests on Python 3.10 / 3.11 / 3.12 / 3.13 across Linux / macOS / Windows, ruff lint+format, mypy --strict, the five contract gates (errors / events / surface / no-print / network),mkdocs --strictbuild, sdist+wheel build with import-sanity smoke test, codecov upload, single required-check fan-in..github/workflows/release.yml— tag-driven PyPI publish via OIDC Trusted Publishing, TestPyPI dry-run on manual dispatch, GitHub release with extracted changelog notes..github/workflows/codeql.yml— weekly + per-PR static analysis (security-extended queries).-
.github/workflows/docs.yml— GitHub Pages deploy. -
Documentation site (mkdocs-material).
https://abdelstark.github.io/GenoLeWM/with material theme, mkdocstrings, dark/light palette, search, and code annotations.- Auto-generated API reference, error-code table, log-event table.
- RFC corpus rendered into the docs tree at build time with rewritten cross-links.
-
docs/quickstart.mdwalking through every shipped module. -
Open-source hygiene.
.github/CODEOWNERSmapping spec / RFC / privacy / security paths to project lead review..github/dependabot.ymlfor weekly minor/patch updates + security advisories on pip and GitHub Actions..github/FUNDING.yml.- README badges (CI, CodeQL, docs, PyPI, Python, license, mypy strict, ruff, pre-commit).
Changed¶
tools/api/snapshot.pynow emits a Python-version-stable signature for enums (enum[IntEnum](SNV=0, INS=1, …)instead of the synthesized__init__signature that drifted between 3.10, 3.11, 3.12, 3.13). The committed snapshot attests/api/public_surface.jsonwas regenerated.geno_lewm.observability.logged_rundrops two unused locals and usescontextlib.suppressfor the on-crash flush path.geno_lewm.metrics.Histogram.snapshotreturns a typedHistogramSnapshotTypedDictrather thandict[str, object].geno_lewm.cli.verify.verifyacceptsstream: IO[str] | None(was untypedobject | None).
Removed¶
- Legacy
.github/workflows/lint-errors.yml(subsumed by the new multi-jobci.yml). docs/rfcsfilesystem symlink (replaced by a docs-build-time generator that emits a docs-tree mirror with rewritten links).
Security¶
- PyPI Trusted Publishing (OIDC) on the release workflow — no long-lived API tokens are stored in repository secrets.
- CodeQL Python analysis on every PR + weekly schedule.
[0.1.0-draft] — 2026-05-20¶
Added¶
- Initial repository scaffold.
- 19 design RFCs (0001–0019) covering scope, encoder, action, predictor, training, data, eval, planning, surprise, deployment, artifact provenance, error taxonomy, observability, API stability, testing, performance budget, configuration, CLI, and the desktop app.
SPECIFICATION.mdsynthesized canonical view.SPEC.mdtop-level index of the specification corpus.ARCHITECTURE.mdnarrative walk-through.ROADMAP.mdphase plan.- Eleven-section spec corpus at
docs/spec/covering overview, architecture, public API, data model, error model, observability, security, testing strategy, performance budget, release and versioning, and glossary. - Open-source process documents:
SECURITY.md,PRIVACY.md,CONTRIBUTING.md,CODE_OF_CONDUCT.md. - Implementation tracker at
docs/roadmap/IMPLEMENTATION.md. - Glossary, FAQ, design-decision log under
docs/. - Apache-2.0 license.
pyproject.tomlpackage stub.- Phase 1 infrastructure modules implemented and tested
(
errors,observability,_redaction,metrics,action,provenance,cli.verify,api).
Security¶
- Network fail-closed contract documented in
docs/spec/06-security.mdand enforced by thecheck_network_confinedAST linter. - Redaction-by-default observability filter; the
GENO_LEWM_REDACTION_STRICT=1strict mode is the documented default.