commit and audit, not a proving circuit

Verify a quantized world model ran as specified.

ProvableWorldModel lets anyone verify, on a CPU with no floating point, that a committed quantized JEPA world model ran exactly as specified: rolling out candidate action plans, scoring them against a goal, and picking the best. It Freivalds-checks the large matmuls and exactly re-executes everything else, all bound by Merkle commitments and a Fiat-Shamir transcript. The full 192-dim, 6-block, 16-head le-wm predictor proves and verifies with the real pretrained checkpoint weights.

2⁻⁴⁴ soundness error at 10⁵ checks no_std float-free verifier zero proving circuits

What it does not claim. It proves an exact arithmetic relation. It does not claim floating-point or PyTorch equivalence, the physical truth of the predictions, or zero knowledge. Public claims must be a subset of the proven statement.

If a model planned your robot's next move, can you check it told the truth?

A world model takes a history of states and a candidate plan and predicts what happens next, then a planner scores plans and picks one. Re-running the model yourself defeats the purpose, and a model on someone else's GPU is a black box.

ProvableWorldModel turns that black box into a checkable claim. The party that runs the model emits a compact audit artifact. A float-free verifier audits the trajectory, the costs, and the selected action without repeating the expensive fixed-weight matmuls. Change one number anywhere and the proof fails.

From a checkpoint to an accepted proof, or a rejection.

The end to end flow at a glance below, then press play to walk it step by step. Flip the tamper switch to forge a matmul and watch the verifier catch it.

  1. 1

    Checkpoint

    Start from the trained le-wm JEPA checkpoint (quentinll/lewm-pusht): real weights and config.

  2. 2

    Quantize and export

    Read the checkpoint, quantize the full 192-dim V0 subgraph to an exact integer graph, fold BatchNorm, emit a canonical manifest, weights, and committed lookup tables.

  3. 3

    Commit inputs

    Encode a real PushT expert episode through the checkpoint's own encoders into a latent history and action embedding, canonicalize them, and check the manifest hash against the model commitment before any work.

  4. 4

    Run inference

    Run the exact integer reference forward pass, recording every accumulator, requant, attention dot product, cost, and the argmin into a trace.

  5. 5

    Assemble the proof

    Merkle-commit the trace, run Fiat-Shamir to squeeze the Freivalds vectors, and emit a self-contained AuditArtifact.

  6. 6

    Verify

    The verifier replays the transcript, checks every commitment, Freivalds-checks each matmul, and exactly recomputes everything else.

  7. 7

    Accept

    All checks pass. The trajectory, costs, and selected action are exactly the committed quantized inference.

The default docker compose up runs this exact pipeline with synthetic weights (no checkpoint download); --profile real proves the pretrained quentinll/lewm-pusht weights shown here, on a real PushT expert observation and action.

Freivalds: check a matmul without redoing it.

For a linear layer z = W·x the verifier draws a random r, precomputes v = rᵀW once, then checks v·x == r·z. It holds because rᵀ(Wx) = (rᵀW)x. Draw a challenge, then tamper z and watch the equality break.

W
·
x
=
z
r
secret challenge, drawn after z is committed
v·x ?
=?=
r·z ?
Draw a challenge to start.

A wrong z slips past one random r with probability at most 1/p. The real system uses p = 2⁶¹ − 1, so the error per check is about 2⁻⁶¹, and a union bound over many checks stays around 2⁻⁴⁴.

Six pieces, no magic.

Freivalds checks

For each fixed weight matrix, precompute v = rᵀW once and check v·x == r·z per use. The same weights run across every candidate, rollout step, and block, so one precompute is reused many times.

Exact integer replay

The cheap, deterministic ops are recomputed exactly from committed inputs: requant, residual, attention dot products, softmax, GELU, SiLU, LayerNorm, MSE, and argmin. No tolerance, no probabilistic gap, no circuit.

Merkle commitments

Every load-bearing value is bound by a Blake2s commitment: weights, scales, rounding, tables, op order, planner config, inputs, outputs, and every trace cell. Anything not committed is mutable, and therefore unsound.

Fiat-Shamir transcript

The prover commits the trace first, then derives the challenge r from a transcript of the commitments. Because r comes after z is fixed, the prover cannot fake z to pass.

Two fields, real soundness

An M31 field (2³¹−1) encodes values; a larger Fp61 field (2⁶¹−1) carries the Freivalds dot products for soundness headroom. Per-check error is at most 1/p.

No attention hole

CommitLLM cannot verify bf16 FlashAttention because it is not reproducible. Here attention is small and integer-exact, so the verifier recomputes it directly. Integer-relation verification, no residual attention hole, no floating point.

Statement tiers, from one step to a full plan.

implemented and tested, full model

One predictor step

Prove z_next = PredProj(ARPredictor(z_hist, ActEnc(actions))) over the committed quantized graph: a single forward pass through the predictor. The full 192-dim, 6-block, 16-head predictor (2,437 ops: AdaLN-zero, multi-head attention, GELU feed-forward, gated residuals) proves and verifies in the Rust prover with the real quentinll/lewm-pusht checkpoint weights and a real PushT expert observation and action.

implemented and tested

Autoregressive rollout

Apply the predictor over a horizon and prove each step feeds the next. The recurrence and windowing wiring are checked, with a history of three.

implemented and tested, V0

Fixed-candidate planning

Roll out every candidate action sequence, score each by goal MSE, and prove the selected candidate is the argmin with a smallest-index tie-break. All candidate costs must be proven, or the prover could cherry-pick.

deferred

Full CEM planner

The sampling loop: seeded candidates, clipping, top-k elites, and distribution updates across iterations.

deferred

Pixel to plan

End to end including the ViT encoder, from pixels through prediction to the chosen plan.

What changed from CommitLLM.

This project reuses CommitLLM's Freivalds plus exact-replay audit and adapts it from language models to a quantized world model.

AspectCommitLLMProvableWorldModel
Matmul checkFreivalds over F_pReused, p = 2⁶¹−1 over bounded ints
ChallengeVerifier-secret rFiat-Shamir r by default; verifier-secret mode available
Non-matmul opsCanonical f64 replay of bridge opsExact integer replay, no f64, no tolerance
AttentionUnverifiable hole, bf16 is not reproducibleExactly recomputed, quantized integer, small sequence
Verifierstd plus rayonno_std, float-free
DomainDecode, sampling, LM-head logitsLatent prediction, rollout, MSE, argmin

Five small crates, one trust anchor.

pwm-corefields, fixed point, tensors, Merkle commitments, Fiat-Shamir, the Freivalds check, the trace model. no_std.
pwm-exportcheckpoint to quantized integer graph, manifest, golden vectors, the Rust integer reference.
pwm-proverrun the reference inference, commit the trace, derive challenges, emit the artifact.
pwm-verifierCPU, no_std, float-free. Freivalds-check linears, recompute the rest, check rollout, cost, argmin.
pwm-testkitgolden vectors, accept and reject suites, the mutation harness, the demo CLI.
pwm-prover --> pwm-export --> pwm-core <-- pwm-verifier
                              ^
                       pwm-testkit

The verifier depends on neither the exporter nor any Python or float runtime. It verifies arithmetic only, sharing one Freivalds and trace implementation with the prover through pwm-core.

Run the full pipeline in one command.

The demo

# the full le-wm predictor: prove, verify, reject a forged matmul
git clone https://github.com/AbdelStark/ProvableWorldModel
cd ProvableWorldModel
docker compose up --build
# or prove the real pretrained checkpoint weights
docker compose --profile real up --build export predictor-real

Without Docker

# the full 192-dim predictor, synthetic weights
cargo run -p pwm-testkit --bin pwm --release -- prove-predictor
# the real pretrained checkpoint bundle
cargo run -p pwm-testkit --bin pwm --release -- prove-predictor bundle.json