Python API¶

Entry points¶

from worldforge import (
    Action,
    ActionPolicyResult,
    ActionScoreResult,
    BenchmarkBudget,
    Cost,
    Policy,
    load_benchmark_inputs,
    RunnableModel,
    WorldForge,
)

`WorldForge`¶

Top-level framework object responsible for:

provider registration
world creation and persistence
generation, transfer, reasoning, embedding, action-scoring, and action-policy helpers
provider profiles and environment diagnostics

Common inspection helpers:

from worldforge import WorldForge

forge = WorldForge()

profiles = forge.builtin_provider_profiles()
doctor = forge.doctor()

print(profiles[0].supported_tasks)
print(doctor.issues)

Provider capability filters are strict. Valid capability names are predict, generate, reason, embed, plan, transfer, score, and policy; unknown names raise WorldForgeError instead of producing an empty result by typo.

Capability protocols¶

Provider adapters can still subclass BaseProvider, but small integrations can also register a single capability object. The object declares a non-empty name, optional ProviderProfileSpec, and the method for exactly the capability it implements.

from worldforge import ActionScoreResult, WorldForge
from worldforge.providers import ProviderProfileSpec


class LocalCost:
    name = "local-cost"
    profile = ProviderProfileSpec(description="Local score model")

    def score_actions(self, *, info, action_candidates):
        return ActionScoreResult(provider=self.name, scores=[0.2, 0.7], best_index=0)


forge = WorldForge()
forge.register_cost(LocalCost())

result = forge.score_actions(cost="local-cost", info={}, action_candidates=[{}, {}])
print(result.best_index)

The same pattern is available through register_policy, register_generator, register_predictor, register_reasoner, register_embedder, and register_transferer. forge.register(...) dispatches a pure object by protocol membership, and RunnableModel(...) can group several capability implementations. Registered protocol implementations appear in providers(), provider_profile(...), doctor(...), planning, and the benchmark harness. Existing calls such as forge.generate("prompt", "mock") keep resolving through the legacy provider registry.

Persistence¶

from worldforge import WorldForge

forge = WorldForge(state_dir=".worldforge/worlds")
world = forge.create_world("lab", provider="mock")
world_id = forge.save_world(world)

payload = forge.export_world(world_id)
copy = forge.import_world(payload, new_id=True, name="lab-copy")
copy_id = forge.save_world(copy)

forge.delete_world(world_id)
print(forge.list_worlds(), copy_id)

save_world(...), load_world(...), import_world(...), fork_world(...), and delete_world(...) all validate world identifiers before touching the filesystem. Delete removes only the local JSON file for a file-safe world id; missing worlds raise WorldStateError instead of being treated as successful no-ops.

Observability¶

import logging
from pathlib import Path

from worldforge import WorldForge
from worldforge.observability import (
    JsonLoggerSink,
    ProviderMetricsExporterSink,
    OpenTelemetryProviderEventSink,
    ProviderMetricsSink,
    RunJsonLogSink,
    compose_event_handlers,
)
from worldforge.rerun import RerunArtifactLogger, RerunEventSink, RerunRecordingConfig, RerunSession

run_id = "demo-run"
metrics = ProviderMetricsSink()
host_metrics_exporter = ...  # supplied by your service
forge = WorldForge(
    event_handler=compose_event_handlers(
        JsonLoggerSink(logger=logging.getLogger("demo.worldforge"), extra_fields={"run_id": run_id}),
        RunJsonLogSink(Path(".worldforge") / "runs" / run_id / "provider-events.jsonl", run_id),
        ProviderMetricsExporterSink(host_metrics_exporter),
        metrics,
    )
)

forge.generate("orbiting cube", "mock", duration_seconds=1.0)
print(metrics.get("mock", "generate").to_dict())

Provider events are log-safe by default. The target field keeps endpoint or artifact path context but strips URL userinfo, query strings, and fragments; message, metadata, and sink extra fields redact obvious bearer tokens, API keys, signatures, passwords, and signed URLs. RunJsonLogSink appends one JSON object per line and stamps every record with run_id for manifest correlation. OpenTelemetryProviderEventSink is optional and accepts an injected host tracer, so the base package does not import OpenTelemetry or configure collectors. ProviderMetricsExporterSink is also optional and accepts a host-owned metrics exporter with bounded labels for provider, operation, phase, status class, and capability.

Rerun is available as an optional observability and artifact layer:

session = RerunSession(RerunRecordingConfig(save_path=".worldforge/rerun/run.rrd"))
rerun_events = RerunEventSink(session=session)
artifacts = RerunArtifactLogger(session=session)

forge = WorldForge(event_handler=rerun_events)
world = forge.create_world("lab", provider="mock")
plan = world.plan("move the first object right")

artifacts.log_world(world)
artifacts.log_plan(plan)
session.close()

Install with worldforge-ai[rerun]. Rerun is not a provider and does not advertise WorldForge capabilities.

Action Scoring¶

Providers that expose the score capability can rank candidate action sequences without claiming prediction, generation, or reasoning support. LeWorldModel uses this path because its upstream runtime is a JEPA cost model.

from worldforge import WorldForge

forge = WorldForge()
result = forge.score_actions(
    "leworldmodel",
    info={
        "pixels": [[[0.0, 0.1, 0.2]]],
        "goal": [[[0.8, 0.9, 1.0]]],
        "action": [[[0.0, 0.0, 0.0]]],
    },
    action_candidates=[
        [
            [[0.0], [0.1], [0.2]],
            [[0.3], [0.2], [0.1]],
        ]
    ],
)

print(result.best_index, result.best_score)

ActionScoreResult validates finite scores, exposes best_index and best_score, and includes lower_is_better so callers do not have to infer score direction from provider-specific docs. Metadata must be JSON-native: dict keys are strings, numbers are finite, and object instances or tuples are rejected instead of being coerced silently.

The planner can consume the same score surface when callers provide WorldForge actions that correspond to each scored candidate. By default, those actions are serialized and passed to the score provider. Pass score_action_candidates when the scorer expects a provider-native tensor or latent candidate payload:

from worldforge import Action

plan = world.plan(
    goal="choose the lowest-cost LeWorldModel action",
    provider="leworldmodel",
    planner="leworldmodel-mpc",
    candidate_actions=[
        [Action.move_to(0.1, 0.5, 0.0)],
        [Action.move_to(0.4, 0.5, 0.0)],
    ],
    score_info=info,
    score_action_candidates=action_candidates,
    execution_provider="mock",
)

print(plan.actions, plan.metadata["score_result"]["best_index"])

Score-based plans do not ask the score provider to predict state. Plan.predicted_states stays empty, score details are stored in Plan.metadata, and execute_plan(...) uses execution_provider when the scoring provider does not implement predict(). The planner requires one score per candidate action plan; mismatched score counts fail before a plan is returned.

Action Policy¶

Providers that expose the policy capability select executable action chunks from observations. NVIDIA Isaac GR00T uses this surface because it is an embodied VLA policy, not a predictive world model.

result = forge.select_actions(
    "gr00t",
    info={
        "observation": {
            "video": {"front": video_array},
            "state": {"eef": state_array},
            "language": {"task": [["pick up the cube"]]},
        },
        "embodiment_tag": "LIBERO_PANDA",
        "action_horizon": 16,
    },
)

print(result.actions, result.raw_actions)

ActionPolicyResult validates that the provider returned at least one WorldForge Action, preserves provider-native raw actions for debugging, and can carry multiple candidate action chunks for downstream scoring. Preserved raw actions and metadata must be JSON-native so run artifacts can be serialized without hidden encoder behavior.

Policy-only planning:

plan = world.plan(
    goal="pick up the cube",
    provider="gr00t",
    policy_info=policy_info,
    execution_provider="mock",
)

Policy plus score planning:

plan = world.plan(
    goal="choose the lowest-cost policy candidate",
    policy_provider="gr00t",
    score_provider="leworldmodel",
    policy_info=policy_info,
    score_info=lewm_info,
    execution_provider="mock",
)

By default, WorldForge serializes the policy candidates as lists of Action.to_dict() payloads before calling the score provider. Pass score_action_candidates when the scorer needs a provider-native tensor or latent candidate format. The host still owns embodiment-specific action translation and any model-native mapping.

`World`¶

Stateful runtime object responsible for:

scene object management
prediction
comparison
planning with heuristic strings or typed StructuredGoal
evaluation

Example:

from worldforge import Position, StructuredGoal

plan = world.plan(
    goal_spec=StructuredGoal.object_at(
        object_name="red_mug",
        position=Position(0.3, 0.8, 0.0),
    )
)

Typed structured goals cover:

StructuredGoal.object_at(...)
StructuredGoal.object_near(...)
StructuredGoal.spawn_object(...)
StructuredGoal.swap_objects(...)

Evaluation¶

from worldforge.evaluation import EvaluationSuite

print(EvaluationSuite.builtin_names())

suite = EvaluationSuite.from_builtin("reasoning")
report = suite.run_report(["mock"], forge=forge)
print(report.results[0].passed)
print(report.to_markdown())

Benchmarking¶

from worldforge import BenchmarkBudget, ProviderBenchmarkHarness, load_benchmark_inputs

harness = ProviderBenchmarkHarness(forge=forge)
inputs = load_benchmark_inputs(
    {
        "embedding_text": "benchmark cube state",
        "generation_prompt": "benchmark orbiting cube",
    }
)
report = harness.run(
    ["mock"],
    operations=["predict", "generate", "embed"],
    iterations=5,
    inputs=inputs,
)
print(report.to_json())

budget = BenchmarkBudget.from_dict({"max_p95_latency_ms": 25.0})
print(report.evaluate_budgets([budget]).passed)

Provider contract testing¶

from worldforge.providers import MockProvider
from worldforge.testing import assert_provider_contract

report = assert_provider_contract(MockProvider())
print(report.to_dict())

For score providers, pass provider-specific score payloads so the helper can exercise score_actions(...):

report = assert_provider_contract(
    provider,
    score_info={"observation": [[0.0]], "goal": [[1.0]]},
    score_action_candidates=[[[[0.0]]]],
)

For policy providers, pass provider-specific policy observations:

report = assert_provider_contract(provider, policy_info=policy_info)

Public failure modes¶

WorldForge uses three public exception families for runtime workflows:

WorldForgeError: invalid caller input, invalid model values, unsupported formats, and invalid local configuration values, including non-file-safe world IDs used for persistence lookup.
WorldStateError: malformed persisted state or provider-supplied world state that cannot be safely restored or applied, including invalid scene-object maps and invalid history entries.
ProviderError: provider credentials, transport failures, unsupported provider operations, malformed upstream responses, provider-specific input limits, expired artifacts, invalid downloaded media, optional dependency failures, and malformed model score outputs.

Provider-facing workflows touched by remote adapters fail before returning partial results:

from worldforge import GenerationOptions, WorldForge
from worldforge.providers import ProviderError

forge = WorldForge()

try:
    clip = forge.generate(
        "a rainy alley at night",
        "runway",
        duration_seconds=4.0,
        options=GenerationOptions(ratio="1280:720"),
    )
except ProviderError as exc:
    # Inspect emitted ProviderEvent records for transport status and attempts.
    raise

Important boundary checks:

Position, Rotation, VideoClip, request policies, provider events, embeddings, reasoning confidence, and prediction payload metrics reject non-finite numbers.
Action.parameters, SceneObject.metadata, provider-event metadata, score metadata, policy raw actions, policy metadata, and prediction payload state/metadata reject non-JSON-native values rather than accepting object instances that only fail at persistence time.
Evaluation and benchmark result objects validate finite metrics, score ranges, coherent counts, and JSON-native metrics before JSON, Markdown, or CSV artifacts are rendered.
World.add_object(...) rejects duplicate scene object IDs.
Imported or provider-supplied world state rejects scene-object keys that disagree with embedded object IDs.
Cosmos generation responses must include a non-empty base64 b64_video field and typed optional metadata.
Runway task creation, polling, and artifact download responses are validated before constructing a returned VideoClip.
LeWorldModel scoring requires pixels, goal, and action info fields, action candidates shaped as (batch=1, samples, horizon, action_dim), optional stable_worldmodel and torch runtime dependencies, one returned score per candidate sample, and finite model scores.