geno_lewm.cli.rollout¶
rollout
¶
geno-lewm-rollout — aggregate measured rollout-fidelity state rows.
RolloutStateRow
dataclass
¶
RolloutStateRow(row_id: str, split: str, horizon: int, source_state: tuple[float, ...], predicted_state: tuple[float, ...], target_state: tuple[float, ...], target_rank: int, baseline_target_rank: int)
One measured rollout-fidelity row.
RolloutSummary
dataclass
¶
RolloutSummary(split: str, horizon: int | None, n: int, row_identity: str, cosine_similarity_mean: float, naive_baseline_cosine_mean: float, l2_distance_mean: float, naive_baseline_l2_mean: float, recall_at_k: float, naive_baseline_recall_at_k: float)
Aggregate metrics for one split or split/horizon group.
load_rollout_state_rows
¶
Load and validate measured rollout-state JSONL.
Source code in geno_lewm/cli/rollout.py
build_rollout_metrics_payload
¶
build_rollout_metrics_payload(rows: tuple[RolloutStateRow, ...], *, recall_k: int, model_id: str, model_release: str, dataset_snapshot: str, commit: str, hardware: str, artifacts: dict[str, str]) -> dict[str, object]
Build eval-compatible rollout-fidelity metrics from measured state rows.