Examples And CLI Commands¶

Use the CLI index for the current runnable examples and optional smoke paths:

uv run worldforge examples
uv run worldforge examples --format json

For the full command surface, see the CLI Reference.

Visual Harness¶

Example	Surface	Command
`theworldharness`	E2E flows, provider diagnostics, benchmark comparison	`uv run --extra harness worldforge-harness`

TheWorldHarness is an optional Textual TUI for running the packaged E2E demos as visible provider workflows.

uv run --extra harness worldforge-harness
uv run --extra harness worldforge-harness --flow lerobot
uv run --extra harness worldforge-harness --flow diagnostics
uv run worldforge harness --list

The harness keeps Textual out of the base dependency set. Install or run with the harness extra when you want the visual interface.

Available flows:

Flow	Purpose
`leworldmodel`	Visual score-planning path through the LeWorldModel provider surface.
`lerobot`	Visual policy-plus-score path through the LeRobot provider surface.
`diagnostics`	Visual provider diagnostics and benchmark comparison path.

Rerun Recording¶

Example	Surface	Command
`rerun-observability-showcase`	Provider events, world snapshots, 3D object boxes, plan artifacts, benchmark metrics	`uv run --extra rerun worldforge-demo-rerun`
`rerun-robotics-showcase`	Real PushT policy+score run with candidate targets, selected trajectory, score bars, latency bars, provider events, and replay snapshots	`scripts/robotics-showcase`

The Rerun showcase writes .worldforge/rerun/worldforge-rerun-showcase.rrd by default. Open it with:

uv run --extra rerun rerun .worldforge/rerun/worldforge-rerun-showcase.rrd

See Rerun Integration for live viewer modes and Python API usage.

Prediction And Evaluation¶

Example	Command	Purpose
`basic-prediction`	`uv run python examples/basic_prediction.py`	Create a mock world, predict, plan, and print a physics evaluation report.

Provider Comparison¶

Example	Command	Purpose
`cross-provider-compare`	`uv run python examples/cross_provider_compare.py`	Register a second deterministic provider and compare prediction outputs.

Score Planning¶

Example	Command	Runtime boundary
`leworldmodel-score-planning`	`uv run worldforge-demo-leworldmodel`	Uses `LeWorldModelProvider` with an injected deterministic cost runtime.

Policy Plus Score Planning¶

Example	Command	Runtime boundary
`lerobot-policy-score-planning`	`uv run worldforge-demo-lerobot`	Uses `LeRobotPolicyProvider` with an injected deterministic policy runtime.

Both packaged demos validate the WorldForge adapter, planning, execution, persistence, reload, and event path in a clean checkout. They do not install optional ML runtimes or run upstream neural checkpoint inference.

Service Host Reference¶

Example	Command	Runtime boundary
`service-host`	`uv run python examples/hosts/service/app.py --provider mock --port 8080`	Stdlib HTTP reference host; the embedding service owns deployment, credentials, telemetry export, alerting, and upstream SLA handling.

The service host exposes:

Endpoint	Purpose
`GET /healthz`	Process liveness only.
`GET /readyz`	Framework alive, configured provider, provider health, traffic decision, and `doctor()` summary.
`GET /providers`	Registered-provider diagnostics for the current host process.
`POST /workflows/mock-predict`	Safe deterministic mock prediction smoke.
`POST /workflows/generate`	Configurable provider generate workflow using a JSON body with `provider`, `prompt`, and `duration_seconds`.

/readyz reports ready, provider_unconfigured, or provider_unhealthy. Only ready means the host should accept provider-backed workflow traffic; the other states tell the host load balancer or job runner to drain this process while operators inspect checks.provider_health and the embedded doctor summary.

Every response includes or echoes a request id. Provider events are sent through JsonLoggerSink with that request id so host logs can correlate HTTP requests with provider calls. Public errors use typed JSON payloads and redact obvious secret-shaped values, but production services still own credential storage, request authentication, dashboards, alert routing, and provider SLA policy.

Batch Evaluation Host¶

Example	Command	Runtime boundary
`batch-eval-host`	`uv run python examples/hosts/batch-eval/app.py benchmark --provider mock`	Stdlib job reference host; the embedding batch system owns scheduling, durable storage, credentials, and provider-specific runtime setup.

Run deterministic mock evaluation and benchmark jobs in a clean checkout:

uv run python examples/hosts/batch-eval/app.py \
  --workspace .worldforge/batch-eval \
  eval --suite planning --provider mock

uv run python examples/hosts/batch-eval/app.py \
  --workspace .worldforge/batch-eval \
  benchmark --provider mock --operation generate --iterations 1 \
  --input-file examples/benchmark-inputs.json \
  --budget-file examples/benchmark-budget.json

Each job writes a shared run workspace under .worldforge/batch-eval/runs/<run-id>/ with run_manifest.json, JSON/Markdown/CSV reports, copied input and budget files for benchmark jobs, and a JSON stdout summary that points to the manifest. Benchmark budget violations return exit code 1 after preserving the failed run, which lets CI or a scheduler fail the job while still keeping issue-safe artifacts.

To swap in a real provider, run the same command on a prepared host that has the provider registered, credentials configured, optional runtime dependencies installed, and benchmark inputs that match that provider's advertised capability. Keep scheduling, retry policy above the process, long-term artifact storage, and credential rotation outside the base package.

Robotics Operator Host¶

Example	Command	Runtime boundary
`robotics-operator-host`	`uv run python examples/hosts/robotics-operator/app.py review --sample-translator`	Stdlib offline operator-review host; the lab application owns action translators, checklist policy, approval, controller integration, interlocks, and safety certification.

The default mode does not call robot controllers. It runs a deterministic LeRobot policy surface and score provider through an explicit sample PushT translator, then writes a preserved run workspace under .worldforge/robotics-operator/runs/<run-id>/ with:

results/action_chunks.json for all candidate action chunks and the selected chunk.
results/score_rationale.json for score values, best index, and score metadata.
logs/provider-events.jsonl for the provider event stream.
results/approval.json for host-owned checklist and dry-run approval state.
results/replay.json for an offline replay artifact.

Controller execution remains disabled unless the embedding host supplies an explicit controller hook in code, all checklist items are true, and dry-run approval is recorded. WorldForge only produces typed policy, score, event, replay, and run-manifest artifacts; it does not certify robot hardware, task safety, emergency stops, workspace readiness, or controller behavior.

Optional Runtime Smoke¶

Example	Command	Runtime boundary
`leworldmodel-real-checkpoint-smoke`	`scripts/lewm-real --checkpoint ~/.stable-wm/pusht/lewm_object.ckpt --device cpu`	Requires host-owned `stable_worldmodel`, torch, datasets, OpenCV, imageio, and LeWM checkpoint assets; loads the official LeWorldModel object checkpoint through `stable_worldmodel.policy.AutoCostModel` and prints visual pipeline, tensor, latency, event, and candidate-cost output.
`lerobot-leworldmodel-health`	`scripts/robotics-showcase --health-only`	Non-mutating preflight for LeRobot, LeWorldModel, and checkpoint presence before running the full showcase.
`lerobot-leworldmodel-real-robotics`	`scripts/robotics-showcase`	Requires host-owned LeRobot, `stable_worldmodel`, torch, datasets, a real policy checkpoint, LeWM checkpoint assets, and PushT simulation dependencies; uses LeRobot's compatible `rerun-sdk` resolution for the default Rerun artifact path, opens a staged Textual report with an `o` shortcut for Rerun, and writes `/tmp/worldforge-robotics-showcase/real-run.rrd` by default. See the robotics replay showcase walkthrough.

Operational Commands¶

uv run worldforge doctor --registered-only
uv run worldforge world create lab --provider mock
uv run worldforge world add-object <world-id> cube --x 0 --y 0.5 --z 0 --object-id cube-1
uv run worldforge world predict <world-id> --object-id cube-1 --x 0.4 --y 0.5 --z 0
uv run worldforge world list
uv run worldforge world objects <world-id>
uv run worldforge world history <world-id>
uv run worldforge world export <world-id> --output world.json
uv run worldforge world delete <world-id>
uv run worldforge provider list
uv run worldforge provider docs
uv run worldforge provider info mock
uv run worldforge predict kitchen --provider mock --x 0.3 --y 0.8 --z 0.0 --steps 2
uv run worldforge eval --suite planning --provider mock --format json
uv run worldforge benchmark --provider mock --iterations 5 --format json

Object add/update/remove commands write typed mutation entries into world history; predictions append their provider action entries after the provider returns the next state.