Skip to content

Demo Showcase Workflows

scripts/demo_showcases.py is the checkout-safe demo evidence runner for the current showcase roadmap stream. It composes existing WorldForge examples, diagnostics, preserved-run workspaces, issue bundles, replay fixtures, and host examples without installing optional model runtimes, calling paid providers, opening GUIs, or controlling robots.

uv run python scripts/demo_showcases.py list
uv run python scripts/demo_showcases.py run all --workspace-dir .worldforge/demo-showcases
uv run python scripts/demo_showcases.py run first-run --format json --overwrite

Expected success signal: the command exits 0, reports status: passed, and writes one run_manifest.json plus results/summary.json and reports/summary.md for every selected workflow. First triage step: open the failed workflow's workflow-result.json, then inspect the referenced preserved run manifest.

Artifact Layout

The default workspace is .worldforge/demo-showcases/.

Path Purpose Safe attachment note
<workflow>/workflow-result.json short machine-readable workflow result safe when safe_to_attach is true
<workflow>/runs/<run-id>/run_manifest.json preserved command, provider, operation, status, artifacts safe by construction; no raw secrets or signed URLs
<workflow>/runs/<run-id>/results/summary.json full workflow summary safe unless the workflow marks otherwise
<workflow>/runs/<run-id>/reports/summary.md human summary with claim boundary and first triage step safe unless the workflow marks otherwise
<workflow>/issue-bundle/ issue-ready evidence bundle for the diagnostics workflow attach only when evidence_manifest.json says safe_to_attach: true

Workflow Matrix

Workflow Issue Command Expected output Primary artifact First triage step
first-run #189 uv run python scripts/demo_showcases.py run first-run object seeded into a world-state dict, three mock prediction steps recorded, final world-state JSON exported first-run/exported-world-state.json run uv run worldforge doctor --registered-only and inspect the exported world-state JSON
diagnostics-issue-bundle #190 uv run python scripts/demo_showcases.py run diagnostics-issue-bundle skipped provider diagnostic preserved and bundled diagnostics-issue-bundle/issue-bundle/issue.md inspect evidence_manifest.json before attaching
robotics-replay #191 uv run python scripts/demo_showcases.py run robotics-replay deterministic policy-plus-score replay summary robotics-replay/robotics-replay-manifest.json run uv run worldforge-demo-lerobot before prepared-host commands
provider-event-redaction-dry-run #192 uv run python scripts/demo_showcases.py run provider-event-redaction-dry-run sanitized provider event fixtures provider-event-redaction-dry-run/provider-event-redaction-events.json inspect redacted provider event targets before any live smoke
adapter-author #193 uv run python scripts/demo_showcases.py run adapter-author provider scaffold generated under demo output and promotion blockers reported adapter-author/generated-provider/ replace placeholder fixtures, then run the generated provider test
batch-eval #194 uv run python scripts/demo_showcases.py run batch-eval eval success and controlled benchmark budget failure preserved batch-eval/batch-host/runs/<run-id>/run_manifest.json inspect the failed benchmark manifest before changing budgets
service-host #195 uv run python scripts/demo_showcases.py run service-host stdlib service host readiness and one mock request summary service-host/runs/<run-id>/results/summary.json run uv run python examples/hosts/service/app.py --help and inspect /readyz
rerun-gallery #196 uv run python scripts/demo_showcases.py run rerun-gallery manifest-only Rerun gallery with missing-extra status rerun-gallery/rerun-gallery-manifest.json install the rerun extra before opening .rrd files
failure-lab #197 uv run python scripts/demo_showcases.py run failure-lab isolated failure drills, preflight, and recovery commands failure-lab/failure-lab-report.json read recovery_commands before touching real .worldforge state
use-case-cookbook #198 uv run python scripts/demo_showcases.py run use-case-cookbook cookbook recipe count and docs artifact reference docs/src/use-case-cookbook.md open the recipe matching the failed command and artifact
external-provider-package #237 uv run python scripts/demo_showcases.py run external-provider-package temp external provider package generated and entry-point discovery report preserved external-provider-package/external-provider-discovery.json inspect the discovery report, then run the generated package tests before publishing
custom-evaluation-suite #238 uv run python scripts/demo_showcases.py run custom-evaluation-suite custom suite runs with provenance, one controlled failure, and report artifacts custom-evaluation-suite/custom-eval-artifacts/ open markdown, then inspect failure_gallery.md for the controlled failed case
policy-score-candidate-lab #239 uv run python scripts/demo_showcases.py run policy-score-candidate-lab deterministic action candidates ranked by a score provider with raw policy actions preserved policy-score-candidate-lab/policy-score-candidate-lab.json verify the selected row matches score_result.best_index
fixture-drift-review #240 uv run python scripts/demo_showcases.py run fixture-drift-review temp fixture manifest review with missing, changed, schema-change, unsafe, and intended-update cases fixture-drift-review/fixture-drift-review.md inspect every fixture diff before approving an intended manifest update
capability-negotiation-preflight #241 uv run python scripts/demo_showcases.py run capability-negotiation-preflight negotiation reports for ready, missing-config, missing-dependency, unsupported, and not-registered preflight cases capability-negotiation-preflight/capability-negotiation/preflight-report.md follow the first recommended action for the blocked capability slot
embodied-policy-replay-comparison #242 uv run python scripts/demo_showcases.py run embodied-policy-replay-comparison side-by-side LeRobot, GR00T, and Cosmos-Policy replay contracts with provider-specific raw action fields embodied-policy-replay-comparison/embodied-policy-replay-comparison.md inspect raw fields and missing-translator blockers before running a prepared-host smoke
non-developer-evidence-review #245 uv run python scripts/demo_showcases.py run non-developer-evidence-review static HTML/JSON/Markdown review package for evaluation, benchmark, world diff, and issue-bundle evidence non-developer-evidence-review/non-developer-evidence-review/review-package.html attach only the review package and keep local-only rows out of issue uploads
provider-failure-gallery #246 uv run python scripts/demo_showcases.py run provider-failure-gallery fixture-backed provider failure gallery with expected events, errors, safe artifacts, owners, and first triage commands provider-failure-gallery/provider-failure-gallery/provider-failure-gallery.md match the failed provider signal to a row before attaching evidence

Runtime Boundaries

These workflows prove the WorldForge integration layer and artifact contracts, not upstream model quality or physical execution. Optional runtimes remain host-owned:

  • LeWorldModel, LeRobot, GR00T, torch, checkpoints, simulators, and robot controllers are not installed by this runner.
  • Provider-event redaction paths use fixture-backed events and do not make paid API calls.
  • Rerun is represented by a manifest in the checkout path; .rrd generation still requires the rerun extra or a prepared-host robotics run.
  • Provider scaffolds generated by the adapter-author workflow are intentionally incomplete and must not be registered or promoted until real fixtures, runtime manifests, docs, and tests pass.
  • External provider packages generated by the demo live under the selected workspace and prove package shape plus discovery behavior only; they are not published, installed globally, or treated as real adapter evidence.
  • Custom evaluation suite output is deterministic adapter-contract evidence. Its controlled failed case demonstrates report and failure-gallery handling, not provider quality or physical fidelity.
  • The policy+score candidate lab uses local deterministic providers to show candidate generation, scoring, raw action preservation, translator boundaries, and safe artifact shape. It is not a robot controller, simulator, checkpoint run, or physical-performance claim.
  • The fixture drift review workflow mutates only the selected demo workspace. It demonstrates review states and the approved update path; it does not refresh remote fixtures or rewrite the committed snapshot manifest.
  • The capability negotiation preflight workflow reports blockers only. It does not install optional dependencies, configure credentials, or execute fallback workflows.
  • The embodied policy replay comparison preserves LeRobot, GR00T, and Cosmos-Policy action shapes separately. It does not perform cross-provider action conversion, contact a live GPU server, or claim controller safety.
  • The non-developer evidence review package escapes display text and links only relative safe artifacts. Host-local paths, signed URLs, and raw provider payloads are marked local-only or excluded; the package does not host a dashboard or execute JavaScript.
  • The provider failure gallery is fixture-backed. It does not call paid APIs, install optional runtimes, preserve signed URLs, or turn scaffold provider rows into integration claims.
  • Benchmark failures in the batch workflow are controlled budget failures so the issue and release evidence path can be tested without changing production thresholds.

For task-oriented commands, see the Use Case Cookbook.