Skip to content

Examples And CLI Commands

Use the CLI index for the current runnable examples and optional smoke paths:

uv run worldforge examples
uv run worldforge examples --format json

For the full command surface, see the CLI Reference.

Demo Showcase Runner

Workflow Surface Command
demo-showcases Checkout-safe issue-backed workflows for local worlds, diagnostics, replay, provider failures, evidence review, release drills, and cookbook evidence uv run python scripts/demo_showcases.py run all --workspace-dir .worldforge/demo-showcases
uv run python scripts/demo_showcases.py list
uv run python scripts/demo_showcases.py run first-run --workspace-dir .worldforge/demo-showcases
uv run python scripts/demo_showcases.py run all --workspace-dir .worldforge/demo-showcases --format json --overwrite

Each selected workflow writes a preserved run_manifest.json, results/summary.json, and reports/summary.md under .worldforge/demo-showcases/<workflow>/runs/<run-id>/. The runner does not install optional model runtimes, call paid providers, open a GUI, or control robots. See Demo Showcase Workflows for the artifact matrix and Use Case Cookbook for task-oriented recipes.

Robotics Showcase TUI

Example Surface Command
robotics-showcase Real PushT policy+score run with Textual report scripts/robotics-showcase

The robotics showcase report is optional and Textual-backed. The same run can be executed without the TUI for JSON, Rerun, and terminal evidence.

scripts/robotics-showcase
scripts/robotics-showcase --no-tui

Textual stays out of the base dependency set.

Available flows:

Flow Purpose
leworldmodel Visual score-planning path through the LeWorldModel provider surface.
lerobot Visual policy-plus-score path through the LeRobot provider surface.
cosmos-policy Visual saved ALOHA /act replay through the Cosmos-Policy provider surface.
gr00t-replay Visual saved GR00T N1.7 PolicyClient replay through the GR00T provider surface.
robotics-compare Visual LeRobot, Cosmos-Policy, and GR00T policy comparison through one replay surface.
diagnostics Visual provider diagnostics and benchmark comparison path.
workbench Visual provider-authoring evidence path for adapter promotion work.

Rerun Recording

Example Surface Command
rerun-observability-showcase Provider events, world snapshots, 3D object boxes, plan artifacts, benchmark metrics uv run --extra rerun worldforge-demo-rerun
rerun-robotics-showcase Real PushT policy+score run with candidate targets, selected trajectory, score bars, latency bars, provider events, and replay snapshots scripts/robotics-showcase

The Rerun showcase writes .worldforge/rerun/worldforge-rerun-showcase.rrd by default. Open it with:

uv run --extra rerun rerun .worldforge/rerun/worldforge-rerun-showcase.rrd

See Rerun Integration for live viewer modes and Python API usage.

Prediction And Evaluation

Example Command Purpose
basic-prediction uv run python examples/basic_prediction.py Create a mock world, predict, plan, and print a physics evaluation report.

Provider Comparison

Example Command Purpose
cross-provider-compare uv run python examples/cross_provider_compare.py Register a second deterministic provider and compare prediction outputs.

Score Planning

Example Command Runtime boundary
leworldmodel-score-planning uv run worldforge-demo-leworldmodel Uses LeWorldModelProvider with an injected deterministic cost runtime.

Policy Plus Score Planning

Example Command Runtime boundary
lerobot-policy-score-planning uv run worldforge-demo-lerobot Uses LeRobotPolicyProvider with an injected deterministic policy runtime.

Both packaged demos validate the WorldForge adapter, planning, execution, persistence, reload, and event path in a clean checkout. They do not install optional ML runtimes or run upstream neural checkpoint inference.

Robot Decision Traces

Example Command Runtime boundary
so101-replay-trace uv run worldforge-demo-so101-replay-trace Scores deterministic SO-101 pick-and-place candidates and emits a reusable decision trace without LeRobot, torch, DimOS, or hardware.
go2-controlbench-decisiontrace uv run python examples/go2-controlbench-decisiontrace/run.py Replays a public Go2 Air ControlBench trace, reranks candidate commands through score, and reports measured native-odom regret without DimOS, Unitree SDKs, Hugging Face downloads, or hardware.

The SO-101 replay trace records observation -> goal -> candidate_actions -> candidate_scores -> selected_action -> outcome -> counterfactuals. It is a checkout-safe replay fixture shaped after the public lerobot/svla_so101_pickplace metadata, not a hardware-success or policy-quality claim.

The Go2 ControlBench DecisionTrace records the same evidence shape for navigation: target motion, candidate Unitree sport-mode commands, predicted-error scores, the score-selected command, counterfactual measured outcomes, and regret against the best measured command. It uses a bundled copy of a public espejelomar/go2-air-controlbench-v1 trace; source outcomes are native Go2 odometry means, not mocap-grade ground truth or live-robot execution by the demo.

Service Host Reference

Example Command Runtime boundary
service-host uv run python examples/hosts/service/app.py --provider mock --port 8080 Stdlib HTTP reference host; the embedding service owns deployment, credentials, telemetry export, alerting, and upstream SLA handling.

The service host exposes:

Endpoint Purpose
GET /healthz Process liveness only.
GET /readyz Framework alive, configured provider, provider health, traffic decision, and doctor() summary.
GET /providers Registered-provider diagnostics for the current host process.
POST /workflows/mock-predict Safe deterministic mock prediction smoke.
POST /workflows/predict Configurable provider prediction workflow using a JSON body with provider, world_state, and action.

/readyz reports ready, provider_unconfigured, or provider_unhealthy. Only ready means the host should accept provider-backed workflow traffic; the other states tell the host load balancer or job runner to drain this process while operators inspect checks.provider_health and the embedded doctor summary.

Every response includes or echoes a request id. Provider events are sent through JsonLoggerSink with that request id so host logs can correlate HTTP requests with provider calls. Public errors use typed JSON payloads and redact obvious secret-shaped values, but production services still own credential storage, request authentication, dashboards, alert routing, and provider SLA policy.

Batch Evaluation Host

Example Command Runtime boundary
batch-eval-host uv run python examples/hosts/batch-eval/app.py benchmark --provider mock Stdlib job reference host; the embedding batch system owns scheduling, durable storage, credentials, and provider-specific runtime setup.

Run deterministic mock evaluation and benchmark jobs in a clean checkout:

uv run python examples/hosts/batch-eval/app.py \
  --workspace .worldforge/batch-eval \
  eval --suite planning --provider mock

uv run python examples/hosts/batch-eval/app.py \
  --workspace .worldforge/batch-eval \
  benchmark --provider mock --operation predict --iterations 1 \
  --input-file examples/benchmark-inputs.json \
  --budget-file examples/benchmark-budget.json

Each job writes a shared run workspace under .worldforge/batch-eval/runs/<run-id>/ with run_manifest.json, JSON/Markdown/CSV reports, copied input and budget files for benchmark jobs, and a JSON stdout summary that points to the manifest. Benchmark budget violations return exit code 1 after preserving the failed run, which lets CI or a scheduler fail the job while still keeping issue-safe artifacts.

To swap in a real provider, run the same command on a prepared host that has the provider registered, credentials configured, optional runtime dependencies installed, and benchmark inputs that match that provider's advertised capability. Keep scheduling, retry policy above the process, long-term artifact storage, and credential rotation outside the base package.

Robotics Operator Host

Example Command Runtime boundary
robotics-operator-host uv run python examples/hosts/robotics-operator/app.py review --sample-translator Stdlib offline operator-review host; the lab application owns action translators, checklist policy, approval, controller integration, interlocks, and safety certification.

The default mode does not call robot controllers. It runs a deterministic LeRobot policy surface and score provider through an explicit sample PushT translator, then writes a preserved run workspace under .worldforge/robotics-operator/runs/<run-id>/ with:

  • results/action_chunks.json for all candidate action chunks and the selected chunk.
  • results/score_rationale.json for score values, best index, and score metadata.
  • logs/provider-events.jsonl for the provider event stream.
  • results/approval.json for host-owned checklist and dry-run approval state.
  • results/replay.json for an offline replay artifact.

Controller execution remains disabled unless the embedding host supplies an explicit controller hook in code, all checklist items are true, and dry-run approval is recorded. WorldForge only produces typed policy, score, event, replay, and run-manifest artifacts; it does not certify robot hardware, task safety, emergency stops, workspace readiness, or controller behavior.

Reference Host Deployment Recipes

These recipes are for embedding WorldForge in a host process, not for deploying WorldForge as a managed service. WorldForge owns provider contracts, typed validation, local run manifests, report rendering, and sanitized evidence helpers. The host owns deployment, authentication, queueing, durable storage, controller integration, alerting, rollback, uptime, and provider or robot safety policy.

Path taxonomy:

Path When to use it Success signal Host-owned boundary
checkout-safe mock provider, deterministic examples, no credentials command exits 0, /readyz is ready, or a run manifest is written no physical-fidelity, uptime, or durability claim
prepared-host optional provider/runtime already installed by the host provider health is healthy and the smoke writes evidence dependency install, checkpoints, cache directories, and runtime patches
credentialed a remote provider with secrets loaded outside the repo provider info shows redacted config and health passes secret storage, credential rotation, upstream SLA, and network egress
GPU-bound LeWorldModel, LeRobot, GR00T, or another device-bound runtime host preflight names the device and the smoke records a manifest CUDA/Metal drivers, device scheduling, model assets, and memory pressure
robotics-lab operator review around task-specific policy/action translation dry-run review records approval and no controller calls by default workspace safety, interlocks, controller hooks, operator approval, and safety certification

No new provider environment variables are introduced by these recipes, so .env.example stays unchanged unless a future provider adds real configuration.

Stdlib Service Host Recipe

Owned boundary: WorldForge maps provider diagnostics to /readyz, emits typed public errors, and can run deterministic mock prediction workflows. The embedding service owns HTTP deployment, request authentication, routing, queueing, durable state, log shipping, alerting, upstream SLA policy, and rollback.

Env template:

export WORLDFORGE_SERVICE_PROVIDER=mock
export WORLDFORGE_SERVICE_STATE_DIR=.worldforge/service-worlds
export WORLDFORGE_SERVICE_PORT=8080
# Prepared-host path only: export LEWORLDMODEL_POLICY=... or LEROBOT_POLICY_PATH=...

Process command:

mkdir -p .worldforge/service/logs
PYTHONUNBUFFERED=1 uv run python examples/hosts/service/app.py \
  --provider "${WORLDFORGE_SERVICE_PROVIDER:-mock}" \
  --state-dir "${WORLDFORGE_SERVICE_STATE_DIR:-.worldforge/service-worlds}" \
  --port "${WORLDFORGE_SERVICE_PORT:-8080}" \
  2>&1 | tee .worldforge/service/logs/service.log

Readiness command:

curl -fsS http://127.0.0.1:8080/readyz | python -m json.tool

Expected success signal: status is ready, traffic is accept, and checks.provider_healthy is true. First failure triage step: if the status is provider_unconfigured or provider_unhealthy, run uv run worldforge doctor --registered-only and uv run worldforge provider health <provider>, then inspect the redacted checks.provider_health.details field.

Smoke command:

curl -fsS -X POST http://127.0.0.1:8080/workflows/mock-predict \
  -H 'content-type: application/json' \
  -H 'x-request-id: service-smoke-001' \
  -d '{}' | python -m json.tool

Logging command:

tail -f .worldforge/service/logs/service.log

Evidence export command:

mkdir -p .worldforge/service/evidence
curl -fsS http://127.0.0.1:8080/readyz > .worldforge/service/evidence/readyz.json
curl -fsS http://127.0.0.1:8080/providers > .worldforge/service/evidence/providers.json

First rollback step: drain this host in the embedding service, restart it with the last known good provider configuration or WORLDFORGE_SERVICE_PROVIDER=mock, and keep upstream traffic disabled until /readyz returns ready.

Batch Eval Host Recipe

Owned boundary: WorldForge runs deterministic evaluation and benchmark contracts, writes run_manifest.json, report artifacts, and budget verdicts. The batch platform owns scheduling, queueing, retries above the process, credentials, long-term artifact storage, budget-change review, alerting, and rollback.

Env template:

export WORLDFORGE_BATCH_WORKSPACE=.worldforge/batch-eval
export WORLDFORGE_BATCH_STATE_DIR=.worldforge/batch-eval/worlds
# Prepared-host or credentialed path only: load provider-specific env from .env.example.

Readiness command:

uv run worldforge doctor --registered-only
uv run worldforge doctor --capability predict
uv run worldforge provider health mock

Process command:

uv run python examples/hosts/batch-eval/app.py \
  --workspace "${WORLDFORGE_BATCH_WORKSPACE:-.worldforge/batch-eval}" \
  --state-dir "${WORLDFORGE_BATCH_STATE_DIR:-.worldforge/batch-eval/worlds}" \
  benchmark --provider mock --operation predict --iterations 1 \
  --input-file examples/benchmark-inputs.json \
  --budget-file examples/benchmark-budget.json

Smoke command: use the same checkout-safe benchmark command with --provider mock; it exercises report rendering, budget evaluation, manifest preservation, and the batch host exit code without live credentials.

Expected success signal: JSON stdout contains status: "passed", exit_code: 0, run_manifest, and a budget.passed value of true when a budget file is supplied. Budget violations exit 1 only after the failed run workspace is preserved.

Logging command:

uv run worldforge runs list \
  --workspace-dir "${WORLDFORGE_BATCH_WORKSPACE:-.worldforge/batch-eval}" \
  --format markdown

Evidence export command:

uv run worldforge runs bundle <run-id> \
  --workspace-dir "${WORLDFORGE_BATCH_WORKSPACE:-.worldforge/batch-eval}"

First failure triage step: open <workspace>/runs/<run-id>/run_manifest.json, then export the bundle before deleting or retrying the run. First rollback step: stop the scheduler queue, restore the last known good input or budget file, rerun the checkout-safe mock benchmark, and only then reenable prepared-host or credentialed provider jobs.

Robotics Operator Host Recipe

Owned boundary: WorldForge runs an offline policy-plus-score review, preserves selected action chunks, score rationale, provider events, dry-run approval, replay artifacts, and a manifest. The lab host owns task observations, action translators, workspace readiness, emergency stops, operator approval, controller hooks, robot hardware, and safety certification.

Env template:

export WORLDFORGE_ROBOTICS_WORKSPACE=.worldforge/robotics-operator
export WORLDFORGE_ROBOTICS_STATE_DIR=.worldforge/robotics-operator/worlds
# Prepared-host, GPU-bound, or robotics-lab path only:
# export LEROBOT_POLICY_PATH=...
# export LEROBOT_DEVICE=cuda
# export LEWORLDMODEL_DEVICE=cuda

Readiness command:

uv run worldforge provider health mock
scripts/robotics-showcase --health-only

Use the first command for the checkout-safe operator recipe. Use the --health-only showcase preflight on prepared-host, GPU-bound, or robotics-lab paths before any real LeRobot or LeWorldModel run.

Process command:

uv run python examples/hosts/robotics-operator/app.py \
  --workspace "${WORLDFORGE_ROBOTICS_WORKSPACE:-.worldforge/robotics-operator}" \
  --state-dir "${WORLDFORGE_ROBOTICS_STATE_DIR:-.worldforge/robotics-operator/worlds}" \
  review --sample-translator --approve-dry-run \
  --check workspace_clear \
  --check emergency_stop_available \
  --check operator_present \
  --check controller_isolated

Smoke command: use the same checkout-safe review command with the sample translator. It records a dry-run review and keeps controller execution disabled.

Expected success signal: JSON stdout contains status: "passed", exit_code: 0, run_manifest, controller_executed: false, and paths for approval, action_chunks, score_rationale, provider_events, and replay.

Logging command:

tail -n +1 \
  "${WORLDFORGE_ROBOTICS_WORKSPACE:-.worldforge/robotics-operator}"/runs/<run-id>/logs/provider-events.jsonl

Evidence export command:

uv run worldforge runs bundle <run-id> \
  --workspace-dir "${WORLDFORGE_ROBOTICS_WORKSPACE:-.worldforge/robotics-operator}"

First failure triage step: inspect results/approval.json and logs/provider-events.jsonl before changing translator or controller code. First rollback step: keep controller execution disabled, discard the produced action chunks, return to the last approved translator/checklist bundle, and rerun the offline review before any lab controller hook is reenabled.

Optional Runtime Smoke

Example Command Runtime boundary
leworldmodel-real-checkpoint-smoke scripts/lewm-real --checkpoint ~/.stable-wm/pusht/lewm_object.ckpt --device cpu Requires host-owned stable_worldmodel, torch, datasets, OpenCV, imageio, and LeWM checkpoint assets; loads the official LeWorldModel object checkpoint through stable_worldmodel.policy.AutoCostModel and prints visual pipeline, tensor, latency, event, and candidate-cost output.
lerobot-leworldmodel-health scripts/robotics-showcase --health-only Non-mutating preflight for LeRobot, LeWorldModel, and checkpoint presence before running the full showcase.
lerobot-leworldmodel-real-robotics scripts/robotics-showcase Requires host-owned LeRobot, stable_worldmodel, torch, datasets, a real policy checkpoint, LeWM checkpoint assets, and PushT simulation dependencies; uses LeRobot's compatible rerun-sdk resolution for the default Rerun artifact path, opens a staged Textual report with an o shortcut for Rerun, and writes /tmp/worldforge-robotics-showcase/real-run.rrd by default. See the robotics replay showcase walkthrough.

Operational Commands

uv run worldforge doctor --registered-only
uv run worldforge provider list
uv run worldforge provider docs
uv run worldforge provider info mock
uv run worldforge predict kitchen --provider mock --x 0.3 --y 0.8 --z 0.0 --steps 2
uv run worldforge eval --suite planning --provider mock --format json
uv run worldforge benchmark --provider mock --iterations 5 --format json

Object add/update/remove commands write typed mutation entries into world history; predictions append their provider action entries after the provider returns the next state.