Control And Planning¶
This is WorldForge's backbone loop: plan and score action candidates with an action-conditioned predictive world model, in latent space. It separates three roles:
- Policy providers propose actions from observations and instructions.
- Score providers rank candidate actions or action horizons against an observation and goal, acting as a cost oracle.
- Controllers own the optimizer: they decide how to sample, score, refine, and return
executable WorldForge
Actionchunks. The world model stays a pure oracle; the controller stays a pure optimizer.
LatentMPCController is the built-in controller. It implements a checkout-safe Cross-Entropy Method
(CEM) loop over the score_actions(...) capability:
sample action horizons
-> encode candidates for the score provider
-> score candidates
-> refit the Gaussian proposal from elites
-> return the best execute_k actions
The controller runs entirely in Python action space. It does not import torch, execute a robot,
step an environment, or train a world model. Tensor conversion, image preprocessing, model
inference, simulation, hardware execution, and safety interlocks remain host-owned or
provider-owned.
Standalone Controller (primary entry point)¶
Drive LatentMPCController directly against anything that exposes the score_actions(...) surface
— a registered WorldForge provider or your own cost oracle. This is the recommended way to run the
backbone loop, and it needs no local world state.
from worldforge import LatentMPCController, PlannerConfig, WorldForge
controller = LatentMPCController(
forge=WorldForge(),
score_provider="my-score-provider",
config=PlannerConfig(horizon=1, num_samples=64, num_iterations=5),
)
step = controller.plan_step(
observation_info={"observation_id": "frame-001"},
goal_info={"target": [0.6, 0.5, 0.2]},
)
print(step.actions[0], step.best_score, step.metadata["control_mode"])
examples/latent_mpc_planning.py is a runnable, checkout-safe version that brings its own score
oracle, so the whole loop runs with no credentials, GPU, or robot. The host owns receding-horizon
closure: execute up to execute_k actions, collect a new observation, then call plan_step(...)
again.
World.plan Entry Point (local-state convenience)¶
World.plan(planner="latent-mpc", ...) runs the same solve from within the local-state World
runtime. It is a convenience for hosts already using local JSON state; the standalone controller
above is the primary entry point.
from worldforge import PlannerConfig
plan = world.plan(
goal="move toward the goal latent",
planner="latent-mpc",
score_provider="my-score-provider",
score_info={"observation_id": "frame-001"},
goal_info={"target_x": 0.5},
planner_config=PlannerConfig(
horizon=2,
num_samples=96,
num_iterations=5,
num_elites=12,
execute_k=1,
seed=19,
action_kind="velocity",
action_parameter_bounds={"x": (-2.0, 2.0)},
),
)
Latent MPC requires an explicit score_provider. The world default provider is not used as an
implicit cost oracle because control loops should make the scoring model visible in plan metadata
and workflow traces.
The returned plan uses:
plan.planner == "latent-mpc"plan.metadata["planning_mode"] == "latent-mpc"plan.metadata["control_mode"] == "mpc"plan.metadata["optimizer"] == "cem"plan.metadata["iteration_best_scores"]for every CEM iterationplan.metadata["iteration_costs"]when the score provider declareslower_is_better=True
The host owns receding-horizon closure: execute up to execute_k actions, collect a new
observation, then call World.plan(planner="latent-mpc", ...) again.
Encoders¶
ScoreCandidateEncoder maps sampled WorldForge action horizons into the payload expected by a
score provider. The default ActionPlanCandidateEncoder serializes candidate actions with
Action.to_dict() and sends observation and goal dictionaries:
info = {"observation": observation_info, "goal": goal_info}
action_candidates = [[action.to_dict(), ...], ...]
Prepared-host robotics or simulation integrations can supply a custom encoder that converts the same sampled action horizons into provider-native tensors or nested numeric arrays. Keep that conversion behind the provider or host boundary; do not add optional runtime dependencies to the base package.
Current Scope¶
The first implementation supports Gaussian CEM sampling. Policy warm-start is intentionally
rejected with a typed WorldForgeError until the public contract is implemented and tested.
Use existing policy+score planning when a policy provider should propose concrete candidate
actions directly. Use latent MPC when WorldForge should sample and refine action horizons through
a score provider.