commit and audit, not a proving circuit

Verify a quantized world model ran as specified.

ProvableWorldModel lets anyone verify, on a CPU with no floating point, that a committed quantized JEPA world model ran exactly as specified: rolling out candidate action plans, scoring them against a goal, and picking the best. It Freivalds-checks the large matmuls and exactly re-executes everything else, all bound by Merkle commitments and a Fiat-Shamir transcript. The full 192-dim, 6-block, 16-head le-wm predictor proves and verifies with the real pretrained checkpoint weights.

See how it works Run the demo Source

2⁻⁴⁴ soundness error at 10⁵ checks no_std float-free verifier zero proving circuits

What it does not claim. It proves an exact arithmetic relation. It does not claim floating-point or PyTorch equivalence, the physical truth of the predictions, or zero knowledge. Public claims must be a subset of the proven statement.

If a model planned your robot's next move, can you check it told the truth?

A world model takes a history of states and a candidate plan and predicts what happens next, then a planner scores plans and picks one. Re-running the model yourself defeats the purpose, and a model on someone else's GPU is a black box.

ProvableWorldModel turns that black box into a checkable claim. The party that runs the model emits a compact audit artifact. A float-free verifier audits the trajectory, the costs, and the selected action without repeating the expensive fixed-weight matmuls. Change one number anywhere and the proof fails.

From a checkpoint to an accepted proof, or a rejection.

The end to end flow at a glance below, then press play to walk it step by step. Flip the tamper switch to forge a matmul and watch the verifier catch it.

End to end pipeline. The prover side is trusted and offline. The verifier side is cheap, no_std, and float free, and audits arithmetic only.

prover trusted, offline

1
Checkpoint

le-wm JEPA checkpoint: real weights and config (quentinll/lewm-pusht).
2
Export

Quantize to an exact integer graph, fold BatchNorm, emit a canonical manifest and committed lookup tables.
3
Encode

Real PushT episode (lerobot/pusht): frames to ViT encoder and projector to latent history; expert action to action embedding.
4
Commit

Canonicalize the latent history and action embedding, check the manifest hash against the model commitment before any work.
5
Prove

Exact integer inference of the full 6-block, 16-head, 192-dim predictor (2,437 ops): AdaLN-zero, multi-head attention, GELU FFN, gated residuals. Merkle-commit the trace, Fiat-Shamir squeeze the Freivalds challenge r.

verifier no_std, float free

7
Verify

Replay the transcript. Freivalds check every fixed weight matmul, then exactly recompute attention, softmax, GELU, LayerNorm, residuals, rollout, cost, argmin. It never re-runs the model.
8
Verdict

ACCEPT Ok when every check passes. The rollout, cost, and argmin are exactly the committed quantized inference.

REJECT a typed VerifyError when any load bearing value is mutated. Forge one matmul output, get FreivaldsCheckFailed.

The model runs in exact integer fixed point, so attention is reproducible: the one hole CommitLLM leaves open is closed here. The verifier audits arithmetic only.

Tamper the proof

1
Checkpoint
Start from the trained le-wm JEPA checkpoint (quentinll/lewm-pusht): real weights and config.
2
Quantize and export
Read the checkpoint, quantize the full 192-dim V0 subgraph to an exact integer graph, fold BatchNorm, emit a canonical manifest, weights, and committed lookup tables.
3
Commit inputs
Encode a real PushT expert episode through the checkpoint's own encoders into a latent history and action embedding, canonicalize them, and check the manifest hash against the model commitment before any work.
4
Run inference
Run the exact integer reference forward pass, recording every accumulator, requant, attention dot product, cost, and the argmin into a trace.
5
Assemble the proof
Merkle-commit the trace, run Fiat-Shamir to squeeze the Freivalds vectors, and emit a self-contained AuditArtifact.
6
Verify
The verifier replays the transcript, checks every commitment, Freivalds-checks each matmul, and exactly recomputes everything else.
7
Accept
All checks pass. The trajectory, costs, and selected action are exactly the committed quantized inference.

docker compose --profile real up

ProvableWorldModel  commit-and-audit over the le-wm world model
  pipeline   checkpoint -> quantize -> commit -> encode -> run -> prove -> verify

[stage 1/4] EXPORT  offline, trusted
  ├ model    le-wm V0 predictor (6 blocks, 16 heads), real checkpoint weights
  ├ source   quentinll/lewm-pusht (Hugging Face, MIT)
  ├ config   dim=192, history=3, heads=16, dim_head=64, mlp=2048, depth=6
  ├ weights  30 tensors, 10,764,288 int8 params
  ├ inputs   z_history [3x192], action embedding [3x192]
  └ source   real PushT expert episode (lerobot/pusht): 3 frames → ViT encoder; real 2D action + state

[stage 2/4] PROVE  exact integer inference + commitment
  ├ graph    2,437 ops over the named-buffer block DAG
  │          per block: AdaLN-zero, 16-head attention, GELU FFN, gated residuals
  ├ infer    exact integer forward pass in 130 ms
  └ z_next   [11, 55, 32, -73, -57, 13]  (predicted next-latent head)

[stage 3/4] VERIFY  no_std, float-free
  ├ challenge replayed the Fiat-Shamir transcript, derived the Freivalds r
  ├ checks   Freivalds v·x == r·z on every projection
  │          exact recompute of attention, softmax, GELU, LayerNorm, residuals
  └ verdict  ACCEPT  in 38 ms

[stage 4/4] TAMPER  forge one matmul output
  └ forged matmul op 2 -> REJECT FreivaldsCheckFailed { op_id: 2 }  (caught)

The default docker compose up runs this exact pipeline with synthetic weights (no checkpoint download); --profile real proves the pretrained quentinll/lewm-pusht weights shown here, on a real PushT expert observation and action.

Freivalds: check a matmul without redoing it.

For a linear layer z = W·x the verifier draws a random r, precomputes v = rᵀW once, then checks v·x == r·z. It holds because rᵀ(Wx) = (rᵀW)x. Draw a challenge, then tamper z and watch the equality break.

secret challenge, drawn after z is committed

v·x ?

=?=

r·z ?

Draw a challenge to start.

A wrong z slips past one random r with probability at most 1/p. The real system uses p = 2⁶¹ − 1, so the error per check is about 2⁻⁶¹, and a union bound over many checks stays around 2⁻⁴⁴.

Six pieces, no magic.

Freivalds checks

For each fixed weight matrix, precompute v = rᵀW once and check v·x == r·z per use. The same weights run across every candidate, rollout step, and block, so one precompute is reused many times.

Exact integer replay

The cheap, deterministic ops are recomputed exactly from committed inputs: requant, residual, attention dot products, softmax, GELU, SiLU, LayerNorm, MSE, and argmin. No tolerance, no probabilistic gap, no circuit.

Merkle commitments

Every load-bearing value is bound by a Blake2s commitment: weights, scales, rounding, tables, op order, planner config, inputs, outputs, and every trace cell. Anything not committed is mutable, and therefore unsound.

Fiat-Shamir transcript

The prover commits the trace first, then derives the challenge r from a transcript of the commitments. Because r comes after z is fixed, the prover cannot fake z to pass.

Two fields, real soundness

An M31 field (2³¹−1) encodes values; a larger Fp61 field (2⁶¹−1) carries the Freivalds dot products for soundness headroom. Per-check error is at most 1/p.

No attention hole

CommitLLM cannot verify bf16 FlashAttention because it is not reproducible. Here attention is small and integer-exact, so the verifier recomputes it directly. Integer-relation verification, no residual attention hole, no floating point.

Statement tiers, from one step to a full plan.

implemented and tested, full model

One predictor step

Prove z_next = PredProj(ARPredictor(z_hist, ActEnc(actions))) over the committed quantized graph: a single forward pass through the predictor. The full 192-dim, 6-block, 16-head predictor (2,437 ops: AdaLN-zero, multi-head attention, GELU feed-forward, gated residuals) proves and verifies in the Rust prover with the real quentinll/lewm-pusht checkpoint weights and a real PushT expert observation and action.

implemented and tested

Autoregressive rollout

Apply the predictor over a horizon and prove each step feeds the next. The recurrence and windowing wiring are checked, with a history of three.

implemented and tested, V0

Fixed-candidate planning

Roll out every candidate action sequence, score each by goal MSE, and prove the selected candidate is the argmin with a smallest-index tie-break. All candidate costs must be proven, or the prover could cherry-pick.

deferred

Full CEM planner

The sampling loop: seeded candidates, clipping, top-k elites, and distribution updates across iterations.

deferred

Pixel to plan

End to end including the ViT encoder, from pixels through prediction to the chosen plan.

What changed from CommitLLM.

This project reuses CommitLLM's Freivalds plus exact-replay audit and adapts it from language models to a quantized world model.

Aspect	CommitLLM	ProvableWorldModel
Matmul check	Freivalds over `F_p`	Reused, `p = 2⁶¹−1` over bounded ints
Challenge	Verifier-secret r	Fiat-Shamir r by default; verifier-secret mode available
Non-matmul ops	Canonical f64 replay of bridge ops	Exact integer replay, no f64, no tolerance
Attention	Unverifiable hole, bf16 is not reproducible	Exactly recomputed, quantized integer, small sequence
Verifier	std plus rayon	no_std, float-free
Domain	Decode, sampling, LM-head logits	Latent prediction, rollout, MSE, argmin

Five small crates, one trust anchor.

pwm-corefields, fixed point, tensors, Merkle commitments, Fiat-Shamir, the Freivalds check, the trace model. no_std.

pwm-exportcheckpoint to quantized integer graph, manifest, golden vectors, the Rust integer reference.

pwm-proverrun the reference inference, commit the trace, derive challenges, emit the artifact.

pwm-verifierCPU, no_std, float-free. Freivalds-check linears, recompute the rest, check rollout, cost, argmin.

pwm-testkitgolden vectors, accept and reject suites, the mutation harness, the demo CLI.

pwm-prover --> pwm-export --> pwm-core <-- pwm-verifier
                              ^
                       pwm-testkit

The verifier depends on neither the exporter nor any Python or float runtime. It verifies arithmetic only, sharing one Freivalds and trace implementation with the prover through pwm-core.

Run the full pipeline in one command.

The demo

# the full le-wm predictor: prove, verify, reject a forged matmul
git clone https://github.com/AbdelStark/ProvableWorldModel
cd ProvableWorldModel
docker compose up --build
# or prove the real pretrained checkpoint weights
docker compose --profile real up --build export predictor-real

Without Docker

# the full 192-dim predictor, synthetic weights
cargo run -p pwm-testkit --bin pwm --release -- prove-predictor
# the real pretrained checkpoint bundle
cargo run -p pwm-testkit --bin pwm --release -- prove-predictor bundle.json

Specification Roadmap GitHub

Verify a quantized world model ran as specified.

If a model planned your robot's next move, can you check it told the truth?

From a checkpoint to an accepted proof, or a rejection.

Checkpoint

Export

Encode

Commit

Prove

Verify

Verdict