Freivalds checks
For each fixed weight matrix, precompute v = rᵀW once and check v·x == r·z per use. The same weights run across every candidate, rollout step, and block, so one precompute is reused many times.
commit and audit, not a proving circuit
ProvableWorldModel lets anyone verify, on a CPU with no floating point, that a committed quantized JEPA world model ran exactly as specified: rolling out candidate action plans, scoring them against a goal, and picking the best. It Freivalds-checks the large matmuls and exactly re-executes everything else, all bound by Merkle commitments and a Fiat-Shamir transcript. The full 192-dim, 6-block, 16-head le-wm predictor proves and verifies with the real pretrained checkpoint weights.
2⁻⁴⁴ soundness error at 10⁵ checks
no_std float-free verifier
zero proving circuits
What it does not claim. It proves an exact arithmetic relation. It does not claim floating-point or PyTorch equivalence, the physical truth of the predictions, or zero knowledge. Public claims must be a subset of the proven statement.
A world model takes a history of states and a candidate plan and predicts what happens next, then a planner scores plans and picks one. Re-running the model yourself defeats the purpose, and a model on someone else's GPU is a black box.
ProvableWorldModel turns that black box into a checkable claim. The party that runs the model emits a compact audit artifact. A float-free verifier audits the trajectory, the costs, and the selected action without repeating the expensive fixed-weight matmuls. Change one number anywhere and the proof fails.
The end to end flow at a glance below, then press play to walk it step by step. Flip the tamper switch to forge a matmul and watch the verifier catch it.
le-wm JEPA checkpoint: real weights and config (quentinll/lewm-pusht).
Quantize to an exact integer graph, fold BatchNorm, emit a canonical manifest and committed lookup tables.
Real PushT episode (lerobot/pusht): frames to ViT encoder and projector to latent history; expert action to action embedding.
Canonicalize the latent history and action embedding, check the manifest hash against the model commitment before any work.
Exact integer inference of the full 6-block, 16-head, 192-dim predictor (2,437 ops): AdaLN-zero, multi-head attention, GELU FFN, gated residuals. Merkle-commit the trace, Fiat-Shamir squeeze the Freivalds challenge r.
Replay the transcript. Freivalds check every fixed weight matmul, then exactly recompute attention, softmax, GELU, LayerNorm, residuals, rollout, cost, argmin. It never re-runs the model.
Ok when every check passes. The rollout, cost, and argmin are exactly the committed quantized inference.
VerifyError when any load bearing value is mutated. Forge one matmul output, get FreivaldsCheckFailed.
The model runs in exact integer fixed point, so attention is reproducible: the one hole CommitLLM leaves open is closed here. The verifier audits arithmetic only.
Start from the trained le-wm JEPA checkpoint (quentinll/lewm-pusht): real weights and config.
Read the checkpoint, quantize the full 192-dim V0 subgraph to an exact integer graph, fold BatchNorm, emit a canonical manifest, weights, and committed lookup tables.
Encode a real PushT expert episode through the checkpoint's own encoders into a latent history and action embedding, canonicalize them, and check the manifest hash against the model commitment before any work.
Run the exact integer reference forward pass, recording every accumulator, requant, attention dot product, cost, and the argmin into a trace.
Merkle-commit the trace, run Fiat-Shamir to squeeze the Freivalds vectors, and emit a self-contained AuditArtifact.
The verifier replays the transcript, checks every commitment, Freivalds-checks each matmul, and exactly recomputes everything else.
All checks pass. The trajectory, costs, and selected action are exactly the committed quantized inference.
ProvableWorldModel commit-and-audit over the le-wm world model pipeline checkpoint -> quantize -> commit -> encode -> run -> prove -> verify [stage 1/4] EXPORT offline, trusted ├ model le-wm V0 predictor (6 blocks, 16 heads), real checkpoint weights ├ source quentinll/lewm-pusht (Hugging Face, MIT) ├ config dim=192, history=3, heads=16, dim_head=64, mlp=2048, depth=6 ├ weights 30 tensors, 10,764,288 int8 params ├ inputs z_history [3x192], action embedding [3x192] └ source real PushT expert episode (lerobot/pusht): 3 frames → ViT encoder; real 2D action + state [stage 2/4] PROVE exact integer inference + commitment ├ graph 2,437 ops over the named-buffer block DAG │ per block: AdaLN-zero, 16-head attention, GELU FFN, gated residuals ├ infer exact integer forward pass in 130 ms └ z_next [11, 55, 32, -73, -57, 13] (predicted next-latent head) [stage 3/4] VERIFY no_std, float-free ├ challenge replayed the Fiat-Shamir transcript, derived the Freivalds r ├ checks Freivalds v·x == r·z on every projection │ exact recompute of attention, softmax, GELU, LayerNorm, residuals └ verdict ACCEPT in 38 ms [stage 4/4] TAMPER forge one matmul output └ forged matmul op 2 -> REJECT FreivaldsCheckFailed { op_id: 2 } (caught)
The default docker compose up runs this exact pipeline with synthetic weights (no checkpoint download); --profile real proves the pretrained quentinll/lewm-pusht weights shown here, on a real PushT expert observation and action.
For a linear layer z = W·x the verifier draws a random
r, precomputes v = rᵀW once, then checks
v·x == r·z. It holds because
rᵀ(Wx) = (rᵀW)x. Draw a challenge, then tamper
z and watch the equality break.
v·x ?r·z ?
A wrong z slips past one random r with probability at
most 1/p. The real system uses p = 2⁶¹ − 1,
so the error per check is about 2⁻⁶¹, and a union
bound over many checks stays around 2⁻⁴⁴.
For each fixed weight matrix, precompute v = rᵀW once and check v·x == r·z per use. The same weights run across every candidate, rollout step, and block, so one precompute is reused many times.
The cheap, deterministic ops are recomputed exactly from committed inputs: requant, residual, attention dot products, softmax, GELU, SiLU, LayerNorm, MSE, and argmin. No tolerance, no probabilistic gap, no circuit.
Every load-bearing value is bound by a Blake2s commitment: weights, scales, rounding, tables, op order, planner config, inputs, outputs, and every trace cell. Anything not committed is mutable, and therefore unsound.
The prover commits the trace first, then derives the challenge r from a transcript of the commitments. Because r comes after z is fixed, the prover cannot fake z to pass.
An M31 field (2³¹−1) encodes values; a larger Fp61 field (2⁶¹−1) carries the Freivalds dot products for soundness headroom. Per-check error is at most 1/p.
CommitLLM cannot verify bf16 FlashAttention because it is not reproducible. Here attention is small and integer-exact, so the verifier recomputes it directly. Integer-relation verification, no residual attention hole, no floating point.
Prove z_next = PredProj(ARPredictor(z_hist, ActEnc(actions))) over the committed quantized graph: a single forward pass through the predictor. The full 192-dim, 6-block, 16-head predictor (2,437 ops: AdaLN-zero, multi-head attention, GELU feed-forward, gated residuals) proves and verifies in the Rust prover with the real quentinll/lewm-pusht checkpoint weights and a real PushT expert observation and action.
Apply the predictor over a horizon and prove each step feeds the next. The recurrence and windowing wiring are checked, with a history of three.
Roll out every candidate action sequence, score each by goal MSE, and prove the selected candidate is the argmin with a smallest-index tie-break. All candidate costs must be proven, or the prover could cherry-pick.
The sampling loop: seeded candidates, clipping, top-k elites, and distribution updates across iterations.
End to end including the ViT encoder, from pixels through prediction to the chosen plan.
This project reuses CommitLLM's Freivalds plus exact-replay audit and adapts it from language models to a quantized world model.
| Aspect | CommitLLM | ProvableWorldModel |
|---|---|---|
| Matmul check | Freivalds over F_p | Reused, p = 2⁶¹−1 over bounded ints |
| Challenge | Verifier-secret r | Fiat-Shamir r by default; verifier-secret mode available |
| Non-matmul ops | Canonical f64 replay of bridge ops | Exact integer replay, no f64, no tolerance |
| Attention | Unverifiable hole, bf16 is not reproducible | Exactly recomputed, quantized integer, small sequence |
| Verifier | std plus rayon | no_std, float-free |
| Domain | Decode, sampling, LM-head logits | Latent prediction, rollout, MSE, argmin |
pwm-corefields, fixed point, tensors, Merkle commitments, Fiat-Shamir, the Freivalds check, the trace model. no_std.pwm-exportcheckpoint to quantized integer graph, manifest, golden vectors, the Rust integer reference.pwm-proverrun the reference inference, commit the trace, derive challenges, emit the artifact.pwm-verifierCPU, no_std, float-free. Freivalds-check linears, recompute the rest, check rollout, cost, argmin.pwm-testkitgolden vectors, accept and reject suites, the mutation harness, the demo CLI.pwm-prover --> pwm-export --> pwm-core <-- pwm-verifier
^
pwm-testkit
The verifier depends on neither the exporter nor any Python or float runtime. It verifies arithmetic only, sharing one Freivalds and trace implementation with the prover through pwm-core.
# the full le-wm predictor: prove, verify, reject a forged matmul git clone https://github.com/AbdelStark/ProvableWorldModel cd ProvableWorldModel docker compose up --build # or prove the real pretrained checkpoint weights docker compose --profile real up --build export predictor-real
# the full 192-dim predictor, synthetic weights cargo run -p pwm-testkit --bin pwm --release -- prove-predictor # the real pretrained checkpoint bundle cargo run -p pwm-testkit --bin pwm --release -- prove-predictor bundle.json