Provider Routing And Fallback¶
WorldForge ships a typed routing helper for trying a preferred provider first and falling back to a prioritized list of alternates when calls fail. The helper validates capability compatibility before each attempt, preserves the underlying provider events emitted by the observable capability wrapper, and returns the full attempt history to the caller. Failures are never silently masked.
When fallback is appropriate¶
- Optional providers: a host has the mock provider registered for offline development and a prepared-host provider configured when its runtime is present. Routing prefers the runtime-backed provider and falls back to mock when that adapter is missing on a checkout.
- Transient remote failures: a remote provider returns a retryable status code its own retry policy could not absorb. Routing logs the failure and tries the next provider so a one-off network blip does not break a batch evaluation.
- Cross-host portability: the same workflow runs on machines with different provider mixes. A deterministic preferred + fallback chain keeps the call path predictable across hosts.
When fallback is not appropriate¶
- Correctness-sensitive contracts: downstream pipelines that depend on the chosen provider's specific semantics (a particular world-model, a particular policy network). A silent fallback to a different provider changes the result without changing the caller. Pick one provider and let the call surface its error.
- Billing-sensitive operations: re-trying with another paid provider multiplies cost. Configure a single provider and surface its failure.
- Robotics control loops: deterministic provider selection matters more than fault tolerance. A fallback in a control loop hides which model produced an action; fail loudly so an operator can decide whether to re-engage.
- Malformed provider output: the routing helper does not mask
contract-validation errors. Adapters that pass schema-invalid data to the
observable capability wrapper raise
ProviderError; the chain records the failure as a normal failed attempt and tries the next provider, but the malformed output never reaches the caller.
Public surface¶
worldforge.provider_routing (re-exported on the top-level package):
| Symbol | Purpose |
|---|---|
ProviderRoutingPolicy |
Frozen policy: capability + preferred + fallbacks + capability-check toggle + operation tag |
RoutingAttempt |
One step in the chain: provider, capability, status, optional reason / error type / sanitized message |
RoutingResult |
Outcome: chosen provider, success flag, full attempt history, returned value |
ROUTING_ATTEMPT_STATUSES |
Enumerated statuses: succeeded, failed, skipped-not-registered, skipped-incompatible |
route_capability(policy, forge, *, invoke) |
Try the chain, return the first success |
route_capability is deterministic. Providers are tried in the order
(preferred, *fallbacks) and the chain stops at the first success. Each
unregistered or capability-incompatible provider is recorded as a skipped
attempt without invoking the call; only registered, compatible providers
are passed to invoke. Any exception raised by invoke is captured as a
failed attempt with the exception class name and a redacted str(exc) and
the chain continues.
Example¶
from worldforge import Action, ProviderRoutingPolicy, WorldForge, route_capability
forge = WorldForge()
# Forge auto-registers `mock` everywhere. Prepared-host providers register only
# when their required environment is present.
policy = ProviderRoutingPolicy(
capability="predict",
preferred="local-predictor",
fallbacks=("mock",),
operation="prediction-fallback",
)
result = route_capability(
policy,
forge,
invoke=lambda name: forge.predict(
world_state={"objects": {}},
action=Action("noop", {"target": "world"}),
provider=name,
),
)
if not result.succeeded:
raise RuntimeError(
"no provider in chain satisfied predict(): "
+ ", ".join(
f"{a.provider}={a.status}" for a in result.attempts
)
)
print(f"chosen={result.chosen} via_chain={[a.provider for a in result.attempts]}")
prediction = result.value
When the preferred provider is not configured on the host, the attempt history records
skipped-not-registered for it and the chain proceeds to mock -
without invoking any remote call or charging the credentialed account.
Events and provenance¶
route_capability does not emit its own ProviderEvent objects. The
events produced by the observable capability wrapper inside
forge.predict(...), forge.embed(...), score, or policy calls are preserved unchanged and
flow through whatever event_handler was attached to the forge. The
RoutingResult.attempts tuple is the chain-level companion to those
per-call events.
Failed attempts in attempts carry:
status="failed"error_type: the exception class name (e.g."ProviderError","ProviderBudgetExceededError")error_message: redactedstr(exc). Adapters are still responsible for raising sanitized provider errors, but routing attempts are artifact-facing records and defensively strip common bearer token, API key, secret assignment, and signed-URL query shapes before serialization.
Validation gates¶
Routing inputs are validated at construction:
capabilitymust be one ofCAPABILITY_NAMESpreferredmust be a non-empty provider namefallbacksmust be a sequence of non-empty provider names- the chain (
preferred + fallbacks) cannot contain duplicates require_capabilitymust be abooloperationmust be a non-empty stringRoutingResultattempts must all match the result capability- succeeded results must have one final succeeded attempt, a matching chosen provider, and a returned value
- failed results must not carry a chosen provider, stale value, or succeeded attempt
Each violation raises WorldForgeError so misconfiguration surfaces at
policy construction, not in a dispatch path.