CLI Reference¶
WorldForge ships a local-first CLI for provider diagnostics, persisted mock worlds, evaluation, benchmarking, packaged examples, and optional visual harnesses. Commands run against the same typed Python surfaces as the library.
Use uv run worldforge --help and each subcommand's --help output as the exact parser contract.
This page is the stable operator map.
Discovery¶
Contributor setup and static release-support checks:
uv run python scripts/contributor_doctor.py --format markdown
uv run python scripts/contributor_doctor.py --format json
uv run python scripts/check_docs_commands.py
uv run python scripts/check_wrapper_portability.py
uv run python scripts/check_core_performance.py
contributor_doctor.py reports missing required tools as setup failures, optional runtime
dependencies as skips, and missing GitHub CLI auth as a publishing warning rather than a local
validation failure.
Operator failure drills:
uv run worldforge drills list
uv run worldforge drills run missing-credentials --workspace-dir .worldforge/drills
uv run worldforge drills run unsafe-event-metadata --workspace-dir .worldforge/drills --bundle
The drill command surface is checkout-safe by default. Each run preserves a manifest, records the expected failure and recovery command, and confines generated state to the requested temporary or documented workspace.
Provider Diagnostics¶
uv run worldforge doctor --registered-only
uv run worldforge provider list
uv run worldforge provider info mock
uv run worldforge provider contract mock --format json
uv run worldforge provider docs
uv run worldforge provider health mock
uv run worldforge negotiate --list
uv run worldforge negotiate --workflow policy-plus-score
Use doctor first when a provider is missing. Optional providers such as LeWorldModel, LeRobot,
GR00T, Cosmos-Policy, and LeWorldModel only register when their host-owned environment variables and runtimes are
available. worldforge negotiate answers the higher-level question "can my providers satisfy this
workflow before I run it?" — see Capability Negotiation.
Prediction¶
worldforge predict seeds a plain world-state dict, rolls the requested move through the
action-conditioned predict provider, and prints the provider name, physics score, confidence,
and the resulting world-state JSON. The positional argument is a scenario label only — there is no
symbolic World runtime or local JSON persistence. Compose predict with score/policy
providers through LatentMPCController (see Python API) for receding-horizon
planning.
Evaluation¶
uv run worldforge eval --suite physics --provider mock
uv run worldforge eval --suite planning --provider mock --format json
Built-in suites are deterministic contract checks. They are useful for adapter regression testing, not claims of physical fidelity, media quality, or real-world safety.
Benchmarking¶
uv run worldforge benchmark --provider mock --iterations 5 --format json
uv run worldforge benchmark --provider mock --operation embed --input-file examples/benchmark-inputs.json
uv run worldforge benchmark --provider mock --operation predict --budget-file examples/benchmark-budget.json
Budget files can make latency, throughput, success-rate, retry-count, and error-count limits fail with a non-zero exit code. Preserve benchmark artifacts before using numbers in a release note, paper, or public claim.
Robotics Showcase TUI¶
scripts/robotics-showcase
scripts/robotics-showcase --no-tui
uv run worldforge runs list --status failed --artifact-type json
The robotics showcase report is optional and Textual-backed. It keeps Textual out of the base package while visualizing the prepared-host LeRobot plus LeWorldModel policy+score run. The run history commands are checkout-safe and work without Textual; they report preserved-run filters, sanitized rerun commands, and first recovery actions without printing secret values.
Expected success signal: the selected flow reaches a completed run workspace and the inspector
shows its saved artifact paths. For cosmos-policy, the replay should report
raw_action_shape: [50, 14], translated_actions: 50, and
saved_replay_artifact: artifacts/cosmos-policy-replay.json. For gr00t-replay, expect
translated_actions: 40 and saved_replay_artifact: artifacts/gr00t-replay.json. For
robotics-compare, expect total_translated_actions: 92 and
comparison_artifact: artifacts/robotics-policy-comparison.json. First triage step: open the saved
run workspace and inspect logs/provider-events.jsonl plus the flow-specific artifact, either
artifacts/cosmos-policy-replay.json, artifacts/gr00t-replay.json, or
artifacts/robotics-policy-comparison.json; for live-provider readiness checks, run
uv run worldforge provider workbench cosmos-policy --format json --live.
Packaged Demos¶
Checkout-safe demos use injected deterministic runtimes:
uv run worldforge-demo-leworldmodel
uv run worldforge-demo-lerobot
uv run --extra rerun worldforge-demo-rerun
uv run python scripts/demo_showcases.py list
uv run python scripts/demo_showcases.py run all --workspace-dir .worldforge/demo-showcases
They validate WorldForge's provider adapters, planning, execution, persistence, reload, and event
paths without installing optional model runtimes or downloading checkpoints. The Rerun demo
requires the rerun extra and records events, worlds, plans, 3D object boxes, and benchmark
metrics to a .rrd artifact. The demo showcase runner preserves ten issue-backed workflows with
run_manifest.json, JSON summaries, Markdown summaries, safe issue bundles, and first triage
steps. See Demo Showcase Workflows and
Use Case Cookbook.
Optional Runtime Smokes¶
Real LeWorldModel checkpoint scoring:
LeRobot policy plus LeWorldModel checkpoint scoring replay:
scripts/robotics-showcase --health-only
scripts/robotics-showcase
uvx --from "rerun-sdk>=0.24,<0.32" rerun /tmp/worldforge-robotics-showcase/real-run.rrd
Open the run's TensorBoard logs without launching the Textual report - useful for non-interactive verification or for opening TensorBoard from a shell:
uv run worldforge-open-tensorboard --logdir .worldforge/tensorboard/<run>
uv run worldforge-open-tensorboard --logdir .worldforge/tensorboard/<run> --probe --no-browser
See TensorBoard Integration for the full flag reference.
Live GR00T and LeRobot policy smoke helpers:
uv run worldforge-smoke-cosmos-policy --help
uv run worldforge-smoke-jepa-wms --help
uv run worldforge-smoke-lerobot-leworldmodel --help
uv run python scripts/smoke_gr00t_policy.py --help
uv run python scripts/smoke_lerobot_policy.py --help
Optional-runtime commands require host-owned runtimes, checkpoints, credentials, observations, and task-specific action translators. WorldForge does not add those dependencies to the base package and does not treat injected demos as real upstream inference.
More detail: