Skip to content

Engineering Quality Standards

WorldForge treats engineering quality as part of the public API. Provider capabilities, package metadata, docs, tests, and optional robotics runtimes must stay aligned so users can reason about what is installed, what is deterministic, and what remains host-owned.

Reference Baseline

WorldForge's quality rules are grounded in the same upstream sources a production Python ML framework should track:

Project Rules

Packaging

  • The package source lives under src/worldforge, and tests run against installed/importable package semantics rather than accidental repository-root imports.
  • pyproject.toml is the single source of truth for package metadata, Python support, scripts, optional extras, uv package mode, and tool configuration.
  • Wheels contain runtime package files only. Source distributions contain tests, docs, examples, scripts, and release metadata so downstream users can inspect and rebuild the project.
  • src/worldforge/py.typed is part of the wheel contract. Removing it is a typing regression.
  • Optional ML and robotics runtimes stay outside the base dependency set. torch, LeRobot, LeWorldModel, GR00T, CUDA, robot controllers, checkpoints, and datasets are supplied by the host environment for the specific smoke or showcase that needs them.

Testing

  • pytest runs with --import-mode=importlib so tests do not depend on implicit sys.path mutation.
  • Test fixtures must be deterministic unless a test explicitly validates nondeterministic runtime handling.
  • Every provider capability must have both a positive contract test and a failure-mode test for the boundary it documents.
  • Reusable provider contract helpers must use explicit exceptions instead of Python assert, so adapter validation does not disappear when tests run under optimized Python.
  • Public exception assertions should match literal messages precisely enough to catch regressions without depending on unrelated text.
  • xfail is strict. A test that starts passing should be investigated and either promoted or removed.

Linting And Style

  • Ruff owns formatting-compatible linting and import ordering for src, tests, examples, and scripts.
  • The enforced Ruff surface includes bugbear-adjacent quality families for comprehensions, returns, simplification, pytest style, performance footguns, and Ruff-native correctness checks.
  • Public __all__ exports stay sorted so public API diffs are reviewable.
  • Mutable class metadata such as Textual BINDINGS and SCREENS must be annotated as ClassVar to separate framework declarations from instance state.
  • Tests use direct pytest imports and split compound assertions when doing so improves failure localization.

ML And Robotics Boundaries

  • Deterministic in-repo suites are contract harnesses, not evidence of physical fidelity.
  • Real robotics showcase paths must state which runtime owns preprocessing, checkpoints, observations, action translation, safety checks, and hardware execution.
  • WorldForge validates tensor and action boundaries; it must not pad, project, or reinterpret mismatched action spaces.
  • Score providers expose score. Policy providers expose policy. Predictive world models expose predict. Branding must not override executable capability truth.
  • Provider events are log-facing records. Targets, messages, and metadata must remain sanitized before reaching JSON logs, in-memory sinks, or metrics aggregation.
  • Downloaded PyTorch weight files load through torch.load(..., weights_only=True) by default. Falling back to pickle deserialization must be explicit and limited to trusted artifacts.

Local Gate

Run the full gate from the repository root before publishing behavior, docs, or distribution changes:

uv lock --check
uv run ruff check src tests examples scripts
uv run ruff format --check src tests examples scripts
uv run python scripts/generate_provider_docs.py --check
uv run mkdocs build --strict
uv run pytest
uv run --extra harness pytest --cov=src/worldforge --cov-report=term-missing --cov-fail-under=90
bash scripts/test_package.sh
uv build --out-dir dist --clear --no-build-logs

The local gate runs the lock check, Ruff, generated-provider-doc drift check, strict MkDocs build, full pytest, harness coverage gate, wheel/sdist package contract, and distribution build. Before a release tag, also run:

tmp_req="$(mktemp requirements-audit.XXXXXX)"
uv export --frozen --all-groups --no-emit-project --no-hashes -o "$tmp_req" >/dev/null
uvx --from pip-audit pip-audit -r "$tmp_req" --no-deps --disable-pip --progress-spinner off
rm -f "$tmp_req"

For release hardening, use the dependency audit in Operations.