Capability Fixture Corpus¶

WorldForge ships a small, packaged corpus of canonical input fixtures for every provider capability — predict, reason, embed, generate, transfer, score, and policy. Conformance tests, evaluation suites, and provider authors can reuse the corpus instead of hand-rolling payloads in each test file.

The corpus lives at src/worldforge/testing/fixtures/ and is exposed through the public testing API at worldforge.testing (so it is part of the wheel; no checkout-time path discovery is needed).

Cardinality¶

For each capability the corpus ships:

exactly one valid_baseline.json representing a minimal-but-realistic public input,
at least two invalid_<reason>.json fixtures that name distinct boundary failures.

This guarantees seven valid baselines and at least fourteen invalid boundary fixtures, so every conformance helper can be exercised without re-deriving payloads.

Envelope¶

Each fixture file is a JSON object with schema_version: 1 and the following keys:

Field	Description
`id`	`<capability>.<name>` matching the file path (e.g. `predict.valid_baseline`).
`capability`	One of `predict`, `reason`, `embed`, `generate`, `transfer`, `score`, `policy`.
`data_class`	`synthetic` (hand-authored), `captured` (recorded from a real provider/run), or `host-supplied` (provided by an integrator at runtime; the file ships an example).
`expected`	`valid` or `invalid`.
`expected_error_pattern`	Regex hint for the error message a `WorldForgeError` raises. Required for invalid fixtures, must be `null` for valid fixtures.
`description`	One-sentence note on what the fixture exercises.
`payload`	Capability-specific dict whose keys map directly onto the matching `assert_*_conformance()` keyword arguments.

The complete contributor charter — including ownership, when to add a new fixture, and what not to put in the corpus — lives in src/worldforge/testing/fixtures/README.md.

Loading fixtures¶

from worldforge.testing import (
    iter_capability_fixtures,
    load_capability_fixture,
)

baseline = load_capability_fixture("score", "valid_baseline")
score_info = baseline.payload["info"]
candidates = baseline.payload["action_candidates"]

for fixture in iter_capability_fixtures("policy"):
    print(fixture.id, fixture.expected, fixture.description)

Available helpers (re-exported from worldforge.testing):

Symbol	Purpose
`CAPABILITY_FIXTURE_NAMES`	Tuple of capability names covered by the corpus.
`CapabilityFixture`	Frozen dataclass returned by the loader.
`FIXTURE_SCHEMA_VERSION`	Currently `1`; bump only with a loader migration.
`list_fixture_names(capability)`	Sorted file stems for a capability.
`load_capability_fixture(capability, name)`	Validate and return one fixture.
`iter_capability_fixtures(capability)`	Iterate validated fixtures for one capability.
`iter_all_fixtures()`	Iterate every fixture across the corpus.

Using fixtures in conformance tests¶

The payload keys mirror the keyword arguments of assert_*_conformance(), so a fixture can flow straight into the existing helpers:

from worldforge import Action, MockProvider
from worldforge.testing import (
    assert_embed_conformance,
    assert_predict_conformance,
    load_capability_fixture,
)

provider = MockProvider()

predict_fx = load_capability_fixture("predict", "valid_baseline")
assert_predict_conformance(
    provider,
    world_state=predict_fx.payload["world_state"],
    action=Action.from_dict(predict_fx.payload["action"]),
    steps=predict_fx.payload["steps"],
)

embed_fx = load_capability_fixture("embed", "valid_baseline")
assert_embed_conformance(provider, text=embed_fx.payload["text"])

For invalid fixtures, callers can compile expected_error_pattern into a regex and assert that the corresponding API surface raises WorldForgeError with a matching message. Some invalid fixtures document a contract claim that the framework does not yet enforce; those fixtures still ship so future validators can lock against them, and the description spells the situation out.

Data ownership¶

Fixtures are owned by the evaluation/quality stream alongside the deterministic evaluation suites. They are deliberately:

synthetic by default — hand-authored JSON shapes that exercise public-input boundaries.
tiny — JSON only; binary clip frames are inlined as frames_base64 strings rather than shipped as media files.
provider-agnostic where possible — provider-specific shapes (e.g. policy_info variants for LeRobot vs GR00T) belong here only if more than one consumer would benefit; otherwise keep them in tests/fixtures/providers/ next to the provider fixtures used for parser and retry tests.

Captured payloads (data_class: "captured") are accepted when they reproduce a real bug or regression and are sanitized of secrets, signed URLs, host paths, and proprietary content. Host-supplied payloads (data_class: "host-supplied") ship a synthetic placeholder; the description field documents what the integrator should substitute.

Validation¶

uv run pytest tests/test_capability_fixtures.py tests/test_provider_contracts.py
bash scripts/test_package.sh
uv run mkdocs build --strict