Architecture¶
WorldForge is the Python integration layer around testable physical-AI world-model workflows. Its job is to expose each provider through an honest typed capability surface, validate the boundary, and let host applications compose planning, prediction, generation, evaluation, persistence, and observability without pretending every provider means the same thing by "world model."
The architecture centers on capability-specific contracts. LeWorldModel scores action candidates. GR00T and LeRobot select embodied action chunks. Cosmos and Runway return media artifacts. The framework keeps those surfaces distinct, then composes them through typed planning, evaluation, diagnostics, and observability.
System Map¶
Repository layout:
worldforge/
|-- src/worldforge/
| |-- framework.py # WorldForge facade, World runtime, planning, persistence
| |-- models.py # public data contracts and validation
| |-- capabilities/ # narrow runtime-checkable capability protocols
| |-- providers/
| | |-- base.py # provider interface, ProviderError, PredictionPayload
| | |-- catalog.py # provider factories and auto-registration policy
| | |-- mock.py # deterministic reference provider
| | |-- observable.py # event/health wrapper for protocol implementations
| | |-- leworldmodel.py# local JEPA cost-model adapter
| | |-- cosmos.py # HTTP video generation adapter
| | |-- gr00t.py # host-owned embodied policy client adapter
| | |-- lerobot.py # host-owned LeRobot policy adapter
| | |-- runway.py # HTTP video generation/transfer adapter
| | `-- remote.py # scaffold adapters for JEPA and Genie
| |-- observability.py # ProviderEvent sinks
| |-- rerun.py # optional Rerun event and artifact bridge
| |-- benchmark.py # provider benchmark harness
| |-- evaluation/ # built-in suites and report rendering
| `-- testing/ # reusable provider contract assertions
|-- docs/
|-- examples/
|-- scripts/
`-- tests/
Runtime layers:
Host application
|
| Python API / CLI
v
+------------------------+
| WorldForge facade |
| provider + capability |
| registries, diagnostics|
+-----------+------------+
|
| owns
v
+-------------------+
| World runtime |
| state/history |
| planning/execution|
+---------+---------+
|
| dispatches by capability
v
+-------------------------+ +---------------------------+
| Provider adapter or | ----> | upstream model/API/runtime|
| capability implementation| <---- | checkpoint/task/artifact |
| validation/events | | |
+-------------------------+ +---------------------------+
|
v
+-------------------+
| typed result |
| Prediction |
| VideoClip |
| ActionScoreResult |
| ActionPolicyResult|
| ReasoningResult |
+-------------------+
Mermaid equivalent:
flowchart TD
Host[Host app or CLI]
Forge[WorldForge facade\nprovider + capability registries,\ndiagnostics, persistence]
World[World\nstate, history, planning]
Provider[Provider adapter or capability impl\ncapability contract]
Upstream[Upstream runtime or API\nLeWM, GR00T, LeRobot, Cosmos, Runway, mock]
Models[Typed public models\nPrediction, VideoClip, ActionScoreResult, ActionPolicyResult]
Obs[ProviderEvent sinks\nlogs, recorder, metrics]
Store[Local JSON state]
Host --> Forge
Forge --> World
World --> Provider
Provider --> Upstream
Upstream --> Provider
Provider --> Models
Provider --> Obs
World --> Store
Forge --> Store
Operational Ownership Map¶
WorldForge owns the framework boundary. The host owns production operation around that boundary.
| Layer | WorldForge responsibility | Host responsibility |
|---|---|---|
| provider catalog | factory metadata, auto-registration rules, provider profiles | deciding which optional providers are configured in each environment |
| provider call | typed inputs, explicit capabilities, result validation, provider events | credentials, endpoints, model packages, checkpoints, robot stacks, and upstream SLAs |
| planning | composition across predict, score, and policy surfaces |
task preprocessing, action-space mapping, execution policy, and safety checks |
| local state | validated single-writer JSON import/export | backups, retention, locking, migrations, object storage, and multi-writer durability |
| evaluation and benchmarks | deterministic contract suites and capability-aware reports | preserving run artifacts and making empirical claims only from appropriate data |
| observability | ProviderEvent hook, JSON logger, recorder, and in-memory metrics |
trace IDs, dashboards, alerting, distributed tracing, and on-call runbooks |
See User And Operator Playbooks for concrete commands that exercise these boundaries.
Module Responsibilities¶
framework.py
WorldForge: top-level object for provider registration, diagnostics, persistence helpers, and provider-wide operations such asgenerate(...),transfer(...),reason(...),embed(...),score_actions(...), andselect_actions(...).World: mutable world state with scene objects, history, prediction, comparison, planning, plan execution, and evaluation entry points.Prediction,Plan,PlanExecution, andComparison: workflow-level result objects.
models.py
- Public data contracts such as
Action,SceneObject,StructuredGoal,VideoClip,ProviderProfile,ProviderHealth,ProviderEvent,ActionScoreResult, andActionPolicyResult. - Validation helpers and public framework errors:
WorldForgeErrorandWorldStateError.
providers/base.py
BaseProvider, which defines the common capability surface.ProviderError, the public provider/runtime failure type.PredictionPayload, the validated payload returned bypredict(...)implementations.
capabilities/__init__.py
- Runtime-checkable protocol contracts for narrow integrations:
Cost,Policy,Generator,Predictor,Reasoner,Embedder,Transferer, and reservedPlanner. RunnableModel, an optional bundle for implementations that genuinely expose multiple capability protocols under one logical model.
providers/observable.py
_ObservableCapability, the internal wrapper that adds provider events, timing, health, profile, and info surfaces around pure capability protocol implementations.- Capability method mapping used by the
WorldForgefacade when dispatching protocol-registered implementations.
providers/catalog.py
PROVIDER_CATALOG, the single in-repo list of known provider factories.- Registration policy for providers that are always available versus providers enabled by host environment configuration.
providers/leworldmodel.py
- Optional local adapter for
stable_worldmodel.policy.AutoCostModel. - Exposes only
score=True. - Validates
pixels,goal,action, four-dimensional action candidates, finite cost outputs, andbest_index.
providers/cosmos.py and providers/runway.py
- Real HTTP adapters with typed request policy, parser boundaries, retry events, and artifact validation.
providers/gr00t.py and providers/lerobot.py
- Host-owned policy adapters for NVIDIA Isaac GR00T PolicyClient and Hugging Face LeRobot
PreTrainedPolicyinference. - Exposes only
policy=True. - Requires an explicit action translator because robot actions are embodiment-specific.
observability.py and rerun.py
- Host-side provider event composition through
JsonLoggerSink,InMemoryRecorderSink,ProviderMetricsSink, andcompose_event_handlers(...). - Optional Rerun SDK bridge through
RerunEventSinkandRerunArtifactLoggerfor events, world snapshots, object boxes, plans, benchmark artifacts, and robotics-showcase visual layers.
evaluation/ and benchmark.py
- Deterministic evaluation suites and capability-aware benchmark reports for adapter comparison.
harness/
- Optional Textual front face for the same APIs: worlds CRUD, provider capability inspection, live provider events, evaluation, benchmark, and preserved report inspection.
tui.pyis the only Textual import surface;flows.py,models.py, and helper modules remain importable without theharnessextra.
End-to-End Pipeline¶
The shortest accurate pipeline is:
configure providers
-> create/load a World
-> choose a workflow
-> dispatch through a capability-specific provider method
-> validate provider result
-> return a typed result
-> optionally persist, observe, evaluate, or benchmark
Expanded:
1. Provider discovery
- mock registers unconditionally
- optional providers register only when their env vars are present
- hosts may call register_provider(...) for full custom adapters
- hosts may call register_cost(...), register_policy(...), or register(...) for narrow
capability protocol implementations
2. World setup
- create_world(...) starts from empty typed state
- create_world_from_prompt(...) creates a deterministic local seed scene
- load_world(...) restores local JSON state after validation
- delete_world(...) validates the world id before removing local JSON state
3. Workflow call
- world.predict(...) requires a provider with predict=True
- forge.generate(...) requires generate=True
- forge.transfer(...) requires transfer=True
- forge.select_actions(...) requires policy=True
- world.plan(...) can use predictive, score-based, policy, or policy+score planning
- world.evaluate(...) and benchmark harnesses select operations by capability
4. Provider boundary
- provider receives a JSON world snapshot, media request, query, embedding request, score
payload, or policy observation
- adapter validates local inputs before network/model calls when possible
- provider emits ProviderEvent records for success, failure, and retries where supported
5. Result boundary
- PredictionPayload updates world state only after validation
- VideoClip validates media metadata and bytes/source paths
- ActionScoreResult validates finite scores and an in-range best_index
- ActionPolicyResult validates executable actions and JSON-compatible raw actions
- ProviderError surfaces provider/runtime failures with context
- unexpected protocol exceptions are wrapped as ProviderError after failure events are emitted
6. Host-owned operation
- JSON persistence is single-writer local storage
- production logging, metrics export, trace IDs, dashboards, and locks remain host-owned
Provider Injection¶
There are three injection points.
Construction-time auto-registration
WorldForge(auto_register_remote=True)
|
|-- mock always registered
|-- cosmos if COSMOS_BASE_URL is set
|-- runway if RUNWAYML_API_SECRET or RUNWAY_API_SECRET is set
|-- leworldmodel if LEWORLDMODEL_POLICY or LEWM_POLICY is set
|-- gr00t if GROOT_POLICY_HOST is set
|-- jepa if JEPA_MODEL_NAME is set
`-- genie if GENIE_API_KEY is set
Manual full-provider registration
forge = WorldForge(auto_register_remote=False)
forge.register_provider(MyProvider(...))
world = forge.create_world("lab", provider="my-provider")
Manual capability-protocol registration
forge = WorldForge(auto_register_remote=False)
forge.register_cost(MyCostModel(name="my-cost"))
forge.register_policy(MyPolicy(name="my-policy"))
result = forge.score_actions(cost="my-cost", info=info, action_candidates=candidates)
plan = world.plan(
goal="pick best policy candidate",
policy_provider="my-policy",
score_provider="my-cost",
policy_info=policy_info,
score_info=score_info,
)
Call-site override
world = forge.create_world("lab", provider="mock")
world.predict(action) # uses world.provider
world.predict(action, provider="other") # overrides for this call
world.plan(..., provider="leworldmodel") # planner/scorer provider
world.execute_plan(plan, provider="mock") # execution provider
forge.generate("prompt", generator=impl) # direct one-off capability instance
Provider lookup is name-based for registered full providers and registered protocol implementations. Provider dispatch is capability-based. A full provider should never advertise a capability unless the corresponding method is implemented end to end; a protocol implementation is indexed only into the registry for the method it structurally implements.
flowchart LR
Env[Environment variables] --> Auto[WorldForge auto-registration]
Custom[Custom BaseProvider instance] --> Manual[register_provider]
Protocol[Capability protocol impl] --> ManualProtocol[register_cost / register_policy / register]
Auto --> Registry[Provider registry]
Manual --> Registry
ManualProtocol --> CapabilityRegistry[Capability registries]
WorldProvider[world.provider default] --> Resolve[provider resolution]
CallOverride[method provider override] --> Resolve
Direct[direct capability instance] --> Resolve
PlanExec[Plan metadata execution_provider] --> Resolve
Registry --> Resolve
CapabilityRegistry --> Resolve
Resolve --> Capability{capability supported?}
Capability -- yes --> Invoke[call provider method]
Capability -- no --> Error[WorldForgeError or ProviderError]
Predictive Planning Pipeline¶
Predictive planning is the path for providers that can take a world state plus an action and return a future world state.
World.plan(goal_spec=..., provider="mock")
|
|-- parse goal_spec or goal_json into StructuredGoal
|-- resolve scene object selectors
|-- create one or more WorldForge Action objects
|-- for each action:
| provider.predict(simulated_state, action, steps=1)
| validate PredictionPayload
| append predicted state and physics score
|
`-- return Plan(
planning_mode="predict",
actions=[...],
predicted_states=[...],
success_probability=heuristic average of physics scores
)
Mermaid sequence:
sequenceDiagram
participant App as Host app
participant World
participant Provider
participant Plan
App->>World: plan(goal_spec, provider)
World->>World: normalize StructuredGoal
loop per action
World->>Provider: predict(snapshot, action, steps=1)
Provider-->>World: PredictionPayload
World->>World: validate and collect predicted state
end
World-->>Plan: Plan(planning_mode="predict")
Plan-->>App: actions + predicted_states
Score-Based Planning Pipeline¶
Score-based planning is the LeWorldModel-shaped path. It keeps WorldForge-native actions separate from optional model-native scorer payloads.
Host owns task preprocessing
|
|-- score_info
| |-- pixels task-shaped observation history
| |-- goal task-shaped goal observation
| `-- action task-shaped action history
|
|-- score_action_candidates (optional)
| |-- omitted: WorldForge serializes candidate_actions with Action.to_dict()
| `-- provided: tensor or nested numeric array shaped for the score provider
|
`-- candidate_actions
`-- WorldForge Action sequences, one sequence per scored candidate
World.plan(..., provider="leworldmodel", execution_provider="mock")
|
|-- require provider.capabilities.score
|-- call provider.score_actions(info, action_candidates)
|-- receive ActionScoreResult(scores, best_index, lower_is_better=True)
|-- choose candidate_actions[best_index]
|-- return Plan(planning_mode="score", predicted_states=[])
World.execute_plan(plan)
|
|-- choose execution_provider from explicit arg or plan metadata
|-- require execution provider supports predict
`-- apply selected WorldForge actions through predict(...)
The separation is intentional:
- LeWorldModel ranks action tensors in its own task space.
- WorldForge stores and executes
Actionobjects in its own public API. - The host supplies the mapping between those two spaces because task preprocessing is model- and checkpoint-specific.
- A score provider is allowed to be a planner without being a predictor, generator, or reasoner.
sequenceDiagram
participant Host
participant World
participant LeWM as LeWorldModelProvider
participant Exec as Execution Provider
Host->>World: plan(provider="leworldmodel", candidate_actions, score_info, score_action_candidates)
World->>World: validate score planning inputs
World->>LeWM: score_actions(info, action_candidates)
LeWM->>LeWM: load AutoCostModel, tensorize inputs, get_cost(...)
LeWM-->>World: ActionScoreResult(best_index)
World-->>Host: Plan(actions=candidate_actions[best_index], planning_mode="score")
Host->>World: execute_plan(plan)
World->>Exec: predict(snapshot, selected action, steps=1)
Exec-->>World: PredictionPayload
World-->>Host: PlanExecution(final_world)
Concrete score-planning shape:
from worldforge import Action
plan = world.plan(
goal="select the lowest-cost LeWorldModel candidate",
provider="leworldmodel",
planner="leworldmodel-mpc",
candidate_actions=[
[Action.move_to(0.1, 0.5, 0.0)],
[Action.move_to(0.4, 0.5, 0.0)],
],
score_info={
"pixels": pixels,
"goal": goal_pixels,
"action": action_history,
},
score_action_candidates=action_candidate_tensor,
execution_provider="mock",
)
print(plan.metadata["score_result"]["best_index"])
execution = world.execute_plan(plan)
Policy Planning Pipeline¶
Policy planning treats an embodied policy as an actor that proposes executable action chunks from observations and instructions.
policy_info
|
|-- observation
| |-- video camera streams or image history
| |-- state proprioception or environment state
| `-- language task instruction
|
|-- embodiment_tag
`-- action_horizon
World.plan(..., provider="gr00t", policy_info=...)
|
|-- require provider.capabilities.policy
|-- call provider.select_actions(info)
|-- receive ActionPolicyResult(actions, raw_actions, action_candidates)
|-- return Plan(planning_mode="policy", predicted_states=[])
Policy plus score planning composes an actor with a world-model scorer:
GR00T policy provider
-> proposes one or more candidate action chunks
LeWorldModel / JEPA-WMS score provider
-> scores serialized policy candidates or a host-supplied model-native candidate tensor
WorldForge
-> selects policy_candidates[score_result.best_index]
Concrete shape:
plan = world.plan(
goal="choose the lowest-cost policy candidate",
policy_provider="gr00t",
score_provider="leworldmodel",
policy_info=policy_info,
score_info=lewm_info,
execution_provider="mock",
)
WorldForge passes serialized policy candidates to the score provider by default. Pass
score_action_candidates=... when the scorer requires native tensors or latents. The host still
owns the mapping between GR00T raw actions, WorldForge Action objects, and score-provider native
payloads. WorldForge validates that each provider returns a typed result and that the selected
candidate index is in range. Score-based planning also requires the score result length to match
the executable candidate count, so provider-native tensors cannot silently drift from the actions
that WorldForge can execute or report.
Provider Capability Surface¶
WorldForge does not ask "is this a world model?" at runtime. It asks which operations a provider can honestly perform.
BaseProvider subclass
|-- declares ProviderCapabilities(...)
|-- implements every advertised method end to end
`-- inherits ProviderError defaults for unsupported methods
Capability protocol implementation
|-- declares name and optional ProviderProfileSpec
|-- implements exactly the protocol method it exposes
|-- is wrapped for ProviderEvent, health, profile, and info surfaces
`-- can be registered with register_cost/register_policy/... or register(...)
In-repo provider mapping:
| Provider | Surface | Primary capability | Runtime kind |
|---|---|---|---|
mock |
implemented | predict, generate, reason, embed, transfer | deterministic local surrogate |
leworldmodel |
optional runtime adapter | score | local JEPA cost model |
gr00t |
optional runtime adapter | policy | host-owned Isaac GR00T policy client |
lerobot |
optional runtime adapter | policy | host-owned LeRobot policy checkpoint |
cosmos |
HTTP adapter | generate | remote physical-AI video foundation model API |
runway |
HTTP adapter | generate, transfer | remote video generation API |
jepa |
optional runtime adapter | score | host-owned facebookresearch/jepa-wms torch-hub runtime |
genie |
scaffold | capability-fail-closed | reservation for future interactive simulator work |
src/worldforge/providers/catalog.py owns the in-repo provider factory list and auto-registration
policy. Provider profiles expose the same information through Python and CLI diagnostics.
doctor() also includes known but unregistered optional providers by default, so missing local
dependencies and missing credentials show up before a workflow fails.
Data Contracts¶
Important public result contracts:
Action
type: non-empty string
parameters: JSON object
SceneObject
id: non-empty string
metadata: JSON object
PredictionPayload
state: JSON object
confidence: probability
physics_score: probability
frames: list[bytes]
metadata: JSON object
latency_ms: finite non-negative number
ActionScoreResult
provider: non-empty string
scores: non-empty list of finite numbers
best_index: index into scores
best_score: scores[best_index]
lower_is_better: bool
metadata: JSON object
ActionPolicyResult
provider: non-empty string
actions: non-empty list of Action objects
raw_actions: JSON object
action_horizon: optional positive integer
embodiment_tag: optional non-empty string
metadata: JSON object
action_candidates: non-empty list of action plans
Plan
goal: string
planner: string
provider: planner/scorer/predictor provider name
actions: list[Action]
predicted_states: list[JSON objects]
success_probability: probability
metadata: JSON object
State invariants:
- persisted worlds contain
id,name, andprovider - persisted worlds declare the current schema_version before nested state is accepted
- world
stepis always a non-negative integer scene.objectsis a JSON object keyed by object ID- embedded scene object IDs must match their map keys
- action parameters and metadata fields are JSON-native objects with string keys and finite numbers
- history entries have non-negative steps, validated snapshot states, non-empty summaries, and valid serialized action payloads when actions are present
- history entry steps cannot exceed the current world step
- provider capability names are a closed set; unknown capability filters fail explicitly instead of silently excluding every provider
- invalid public inputs fail explicitly instead of being silently coerced
- score providers return finite scores and an in-range
best_index - policy providers return executable actions and preserve raw provider actions
- remote media artifacts reject unsupported content types before returning a
VideoClip - provider events sanitize log-facing targets, messages, and metadata before event sinks record them; signed URL query strings and obvious credential fields are redacted
Failure Boundaries¶
WorldForge uses three public failure families:
WorldForgeError
invalid caller input, unsupported local operation, invalid public model values
WorldStateError
malformed persisted state or provider-supplied state that cannot be restored
ProviderError
provider credentials, optional dependency failures, transport failures, malformed upstream
responses, provider-specific input limits, expired artifacts, invalid downloaded media,
unsupported provider operations, malformed model outputs
Boundary rule:
caller input error -> fail before provider call when possible
provider/runtime error -> ProviderError with provider-specific context
state mutation -> only after provider output validates
remote artifact -> content type and body validated before VideoClip is returned
score output -> finite scores + valid best_index before Plan is returned
Observability¶
Provider events are deliberately small. They are not a complete production telemetry stack; they are the framework-level hook that host applications can fan out to their own logging and metrics.
Provider operation
|
|-- ProviderEvent(provider, operation, phase, duration_ms, attempt, status_code, sanitized target, message, metadata)
|
`-- host event_handler
|-- JsonLoggerSink
|-- RunJsonLogSink
|-- InMemoryRecorderSink
|-- ProviderMetricsSink
|-- ProviderMetricsExporterSink
`-- RerunEventSink
Targets in provider events are intentionally route-level. They keep enough context to identify the provider endpoint or artifact path, but query strings, fragments, URL userinfo, bearer tokens, and secret-like metadata fields are removed before JSON logging, run log export, or in-memory recording.
Example:
import logging
from pathlib import Path
from worldforge import WorldForge
from worldforge.observability import (
JsonLoggerSink,
OpenTelemetryProviderEventSink,
ProviderMetricsExporterSink,
ProviderMetricsSink,
RunJsonLogSink,
compose_event_handlers,
)
from worldforge.rerun import RerunEventSink, RerunRecordingConfig, RerunSession
run_id = "demo-run"
metrics = ProviderMetricsSink()
host_metrics_exporter = ... # supplied by your service
rerun_session = RerunSession(RerunRecordingConfig(save_path=".worldforge/rerun/events.rrd"))
forge = WorldForge(
event_handler=compose_event_handlers(
JsonLoggerSink(logger=logging.getLogger("demo.worldforge"), extra_fields={"run_id": run_id}),
RunJsonLogSink(Path(".worldforge") / "runs" / run_id / "provider-events.jsonl", run_id),
RerunEventSink(session=rerun_session),
ProviderMetricsExporterSink(host_metrics_exporter),
metrics,
)
)
forge.generate("orbiting cube", "mock", duration_seconds=1.0)
print(metrics.get("mock", "generate").to_dict())
rerun_session.close()
Persistence Ownership¶
Persistence is intentionally host-owned beyond local JSON state.
WorldForge today
local JSON files
single-writer assumption
import/export/delete validation
Host application owns
locks
transactions
database adapters
object storage
retention policy
multi-writer coordination
production backup/restore
This keeps the library small and makes persistence ownership explicit. A future persistence adapter must preserve the same state validation invariants before it becomes a supported runtime surface.
ADR 0001, Persistence Adapter Boundary, names the
future WorldPersistenceAdapter interface and rejects replacing local JSON with an implicit
database backend.
Design Implications¶
- Capability reporting is part of correctness. A provider that only scores must not be presented as a generator or predictor.
- LeWorldModel defines the canonical score-planning path: model-native candidate tensors in,
ActionScoreResult.best_indexout, WorldForgeActionsequence selected. - GR00T defines the first embodied-policy path: multimodal observation in, action chunk out, with host-owned translation from embodiment-specific raw actions to WorldForge actions.
- Generative video providers are useful, but video output is not the core abstraction of WorldForge.
- Host applications own preprocessing between sensors, robot task tensors, and WorldForge actions.
- The framework should stay strict at boundaries and flexible in provider registration.
- New provider integrations should add tests for every advertised capability and every documented failure mode.