Engineering Quality Standards¶
WorldForge treats engineering quality as part of the public API. Provider capabilities, package metadata, docs, tests, and optional robotics runtimes must stay aligned so users can reason about what is installed, what is deterministic, and what remains host-owned.
Reference Baseline¶
WorldForge's quality rules are grounded in the same upstream sources a production Python ML framework should track:
- Python Packaging User Guide:
pyproject.tomlmetadata for static project metadata, build-system configuration, SPDX license metadata, entry points, and dependency declarations. - Python Packaging User Guide: packaging projects
for
srclayout packaging and distribution structure. - uv package guide for building and publishing packages through a modern build frontend while preserving the backend contract.
- pytest good integration practices
for
srclayout test isolation and--import-mode=importlib. - Ruff linter configuration and Ruff formatter guidance for a single fast lint and format toolchain.
- PEP 561 and the
typing distribution spec for
shipping inline type information through
py.typed. - PyTorch reproducibility notes for honest ML determinism boundaries: seeds reduce nondeterminism within a platform and release, but exact results are not guaranteed across releases, devices, or platforms.
- scikit-learn developer guide for stable public imports, estimator-style validation discipline, and explicit randomness contracts.
- Scientific Python SPEC 0 for the expectation that scientific Python projects document support windows and dependency policy rather than letting compatibility drift silently.
Project Rules¶
Packaging¶
- The package source lives under
src/worldforge, and tests run against installed/importable package semantics rather than accidental repository-root imports. pyproject.tomlis the single source of truth for package metadata, Python support, scripts, optional extras, uv package mode, and tool configuration.- Wheels contain runtime package files only. Source distributions contain tests, docs, examples, scripts, and release metadata so downstream users can inspect and rebuild the project.
src/worldforge/py.typedis part of the wheel contract. Removing it is a typing regression.- Optional ML and robotics runtimes stay outside the base dependency set.
torch, LeRobot, LeWorldModel, GR00T, CUDA, robot controllers, checkpoints, and datasets are supplied by the host environment for the specific smoke or showcase that needs them.
Testing¶
pytestruns with--import-mode=importlibso tests do not depend on implicitsys.pathmutation.- Test fixtures must be deterministic unless a test explicitly validates nondeterministic runtime handling.
- Every provider capability must have both a positive contract test and a failure-mode test for the boundary it documents.
- Reusable provider contract helpers must use explicit exceptions instead of Python
assert, so adapter validation does not disappear when tests run under optimized Python. - Public exception assertions should match literal messages precisely enough to catch regressions without depending on unrelated text.
xfailis strict. A test that starts passing should be investigated and either promoted or removed.
Linting And Style¶
- Ruff owns formatting-compatible linting and import ordering for
src,tests,examples, andscripts. - The enforced Ruff surface includes bugbear-adjacent quality families for comprehensions, returns, simplification, pytest style, performance footguns, and Ruff-native correctness checks.
- Public
__all__exports stay sorted so public API diffs are reviewable. - Mutable class metadata such as Textual
BINDINGSandSCREENSmust be annotated asClassVarto separate framework declarations from instance state. - Tests use direct
pytestimports and split compound assertions when doing so improves failure localization.
ML And Robotics Boundaries¶
- Deterministic in-repo suites are contract harnesses, not evidence of physical fidelity.
- Real robotics showcase paths must state which runtime owns preprocessing, checkpoints, observations, action translation, safety checks, and hardware execution.
- WorldForge validates tensor and action boundaries; it must not pad, project, or reinterpret mismatched action spaces.
- Score providers expose
score. Policy providers exposepolicy. Predictive world models exposepredict. Branding must not override executable capability truth. - Provider events are log-facing records. Targets, messages, and metadata must remain sanitized before reaching JSON logs, in-memory sinks, or metrics aggregation.
- Downloaded PyTorch weight files load through
torch.load(..., weights_only=True)by default. Falling back to pickle deserialization must be explicit and limited to trusted artifacts.
Local Gate¶
Run the full gate from the repository root before publishing behavior, docs, or distribution changes:
uv lock --check
uv run ruff check src tests examples scripts
uv run ruff format --check src tests examples scripts
uv run python scripts/generate_provider_docs.py --check
uv run mkdocs build --strict
uv run pytest
uv run --extra harness pytest --cov=src/worldforge --cov-report=term-missing --cov-fail-under=90
bash scripts/test_package.sh
uv build --out-dir dist --clear --no-build-logs
The local gate runs the lock check, Ruff, generated-provider-doc drift check, strict MkDocs build, full pytest, harness coverage gate, wheel/sdist package contract, and distribution build. Before a release tag, also run:
tmp_req="$(mktemp requirements-audit.XXXXXX)"
uv export --frozen --all-groups --no-emit-project --no-hashes -o "$tmp_req" >/dev/null
uvx --from pip-audit pip-audit -r "$tmp_req" --no-deps --disable-pip --progress-spinner off
rm -f "$tmp_req"
For release hardening, use the dependency audit in Operations.