05 — Observability¶
- Status: Authoritative for v0.1
- Companion RFC: RFC-0013
GenoLeWM's observability surface has three layers — logging, metrics, tracing — plus a strict redaction filter that runs over all of them. Logs and metrics are structured; nothing is free-form. The redaction filter is the single defense against personal-data exfiltration, and its tests are non-negotiable.
Principles¶
- Structured by default. Every event is a key-value record. Free-text
printis forbidden ingeno_lewm/. - Redaction by default. The logger receives no user variants, no personal genome data, no FASTA bytes. The redaction filter rejects any field that fails its allowlist.
- Stable keys. Event names and metric names are part of the public contract. Renaming an event name is a MAJOR change.
- Local-first. Default sink is the local filesystem; remote sinks (wandb, OpenTelemetry collectors) are opt-in.
- Cost-bounded. The observability layer must not dominate training compute. Sampling and rate-limiting are explicit.
Logging¶
Format¶
JSON Lines (.jsonl), one record per line, UTF-8.
{
"ts": "2026-05-20T15:42:01.123Z",
"severity": "info",
"event": "training.step",
"run_id": "run-2026-05-20-h100-0",
"step": 12345,
"epoch": 2,
"phase": "phase1",
"duration_ms": 412,
"data": { ... event-specific fields ... }
}
Required fields on every record¶
| Field | Type | Notes |
|---|---|---|
ts |
string | ISO-8601 UTC, millisecond resolution |
severity |
enum | debug | info | warn | error |
event |
string | dotted-lowercase event name |
run_id |
string | random or human-supplied per process |
data |
object | event-specific structured fields |
Optional but standardized:
| Field | Type | Notes |
|---|---|---|
step, epoch, phase |
ints / string | training loop |
duration_ms |
int | for spans |
trace_id, span_id |
string | OpenTelemetry-compatible 16/8-byte hex |
error_code |
string | only on severity=error records |
Event registry¶
Event names are registered in geno_lewm/observability.py::EVENTS.
The registry is the source of truth and is regenerated into
docs/api/log-events.md on each release.
Canonical events for v0.1:
| Event | Severity | When emitted |
|---|---|---|
training.run.start |
info | trainer initialized |
training.run.end |
info | trainer exited |
training.step |
debug | every step |
training.epoch.end |
info | every epoch |
training.checkpoint.write |
info | checkpoint saved |
training.collapse.alert |
warn | one of the alert criteria tripped |
training.metric |
info | scalar metric logged |
eval.run.start |
info | benchmark started |
eval.run.end |
info | benchmark finished |
eval.regression |
error | smoke eval regression |
data.cache.hit |
debug | cache hit |
data.cache.miss |
debug | cache miss |
data.shard.write |
info | new shard written |
inference.score.start |
debug | scoring call entered |
inference.score.end |
debug | scoring call returned |
inference.batch.end |
info | batch finished |
inference.network.blocked |
error | fail-closed guard tripped |
provenance.receipt.write |
info | receipt written |
provenance.verify.start |
info | verifier started |
provenance.verify.end |
info | verifier returned |
provenance.verify.mismatch |
error | a hash check failed |
error |
error | any GenoLeWMError raised inside a logged span |
Severities¶
debug: high-volume; off by default in release builds.info: durable record of normal operation.warn: degraded behavior; investigation suggested.error: typed failure; always accompanied byerror_code.
Sinks¶
Default: local filesystem at ${GENO_LEWM_LOG_DIR}/{run_id}.jsonl.
Optional sinks:
- wandb: training metrics + summary events. Opt-in via
--wandb-project. - stderr: human-readable rendering for interactive use. Filtered by severity.
- OpenTelemetry: OTLP/gRPC export if
OTEL_EXPORTER_OTLP_ENDPOINTis set. Disabled by default.
There is no default cloud sink. Telemetry is never enabled by default.
Redaction¶
The redaction filter operates on the data field of every record. Rules:
- Allowlist of keys: keys not in the per-event allowlist are dropped
with a
redacted_keyscounter increment. - Type allowlist: nested values may be scalars, lists of scalars, or shallow dicts of scalars. Bytes objects and tensors are always dropped.
- Pattern filters: any string field whose value matches a DNA-string
pattern (
^[ACGTN]{20,}$) is dropped. - Personal-data fields: explicit deny list for
vcf_content,genotype,sample_id,user_email, etc.
The redaction filter has unit tests for every event in the registry and property tests over random payloads.
Metrics¶
Format¶
Counter / gauge / histogram, with stable names. Metrics names are
dotted-lowercase, follow Prometheus naming conventions, and are
registered in geno_lewm/metrics.py::METRICS.
Canonical metrics¶
| Name | Kind | Unit | Notes |
|---|---|---|---|
geno_lewm.training.step.duration |
histogram | ms | per training step |
geno_lewm.training.loss.pred |
gauge | unitless | last logged pred loss |
geno_lewm.training.loss.reg |
gauge | unitless | last logged reg loss (Phase 2 active, Phase 1 monitoring) |
geno_lewm.training.collapse.alert |
counter | events | total alerts during run |
geno_lewm.training.collapse.pred_cos_mean |
gauge | unitless | last collapse-monitor mean prediction-target cosine similarity |
geno_lewm.training.collapse.pred_l2_mean |
gauge | unitless | last collapse-monitor mean prediction-target L2 distance |
geno_lewm.training.collapse.target_var_per_dim |
gauge | unitless | last collapse-monitor mean target variance per latent dimension |
geno_lewm.training.collapse.pred_var_per_dim |
gauge | unitless | last collapse-monitor mean prediction variance per latent dimension |
geno_lewm.training.collapse.pred_target_corr |
gauge | unitless | last collapse-monitor flattened prediction-target correlation |
geno_lewm.training.collapse.pairwise_pred_dist_mean |
gauge | unitless | last collapse-monitor mean pairwise prediction distance |
geno_lewm.training.collapse.kl_reg |
gauge | unitless | last collapse-monitor LeJEPA KL regularizer value |
geno_lewm.data.cache.hit |
counter | ops | per-call increment |
geno_lewm.data.cache.miss |
counter | ops | per-call increment |
geno_lewm.data.encode.duration |
histogram | ms | per-window encoder call |
geno_lewm.inference.score.duration |
histogram | ms | per variant |
geno_lewm.inference.batch.throughput |
gauge | variants/s | last batched scoring |
geno_lewm.inference.memory.peak_bytes |
gauge | bytes | RSS peak per process |
geno_lewm.planning.cem.iter.duration |
histogram | ms | per CEM iteration |
geno_lewm.planning.cem.calls |
counter | predictor calls | per planning run |
geno_lewm.provenance.verify.duration |
histogram | ms | per verification |
geno_lewm.observability.redacted_keys |
counter | keys | total dropped by filter |
geno_lewm.errors.raised |
counter | events | tagged with code |
Exporters¶
- Prometheus textfile: written to
${GENO_LEWM_LOG_DIR}/metrics.promeverymetrics_interval_s(default 30 s). - OpenTelemetry OTLP/gRPC: opt-in via env var.
- wandb: training metrics auto-forwarded when
--wandb-projectset.
Tracing¶
Optional in v1; OpenTelemetry-compatible.
- Spans wrap top-level operations (
training.step,inference.score,planning.cem.iter). - Spans are emitted only when
OTEL_EXPORTER_OTLP_ENDPOINTis set. - Trace IDs propagate into logs via
trace_id/span_idfields.
Sampling and rate limits¶
- Per-step training records are sampled at
1 / log_everysteps (defaultlog_every = 100). Other event types are unsampled. - Cache hit/miss debug records are sampled at
1 / 1000by default. - Metric flushes are not sampled.
Configuration¶
The observability layer is configured via:
| Env var | Default | Meaning |
|---|---|---|
GENO_LEWM_LOG_DIR |
~/.geno-lewm/logs |
local log root |
GENO_LEWM_LOG_LEVEL |
info |
global log severity threshold |
GENO_LEWM_LOG_FORMAT |
jsonl |
jsonl | pretty |
OTEL_EXPORTER_OTLP_ENDPOINT |
unset | enables tracing if set |
GENO_LEWM_METRICS_INTERVAL_S |
30 |
flush interval |
GENO_LEWM_REDACTION_STRICT |
1 |
force-fail on disallowed payload (recommended) |
CLI flags override env vars; env vars override defaults.
Privacy contract¶
- The default log root excludes any personal-data fields. The runtime
fails closed on
GENO_LEWM_REDACTION_STRICT=1if a redaction rule is bypassed. - Crash logs sanitize stack traces of locals before writing; only model identifiers and stack frames are preserved.
- Per-variant scoring never logs the variant bases; it logs only the variant id (hash of chrom+pos+ref+alt) and the categorical outputs.
Failures of the privacy contract are bugs and surface as
InternalError.InvariantViolation.
Invariants¶
| ID | Invariant | Enforced by |
|---|---|---|
| INV-OBS-1 | Every logged event has a name registered in EVENTS |
linter rule |
| INV-OBS-2 | Every metric name is registered in METRICS |
linter rule |
| INV-OBS-3 | The redaction filter is the only path from logger.<level>(...) to a sink |
call-graph check |
| INV-OBS-4 | DNA strings ≥ 20 bp never appear in any log record | property test in CI |
| INV-OBS-5 | Telemetry is opt-in; no observability sink phones home without an env var | integration test |
| INV-OBS-6 | A crash before logger init still produces a sanitized minimal record | wrapper in __main__ |
Open questions¶
| ID | Question | Owner | Target |
|---|---|---|---|
| OQ-OBS-1 | Whether to add an aggregation.window_s knob for batched metric exports |
core | v0.2 |
| OQ-OBS-2 | Whether the redaction filter should also strip clinvar_id (it can be re-personalized via lookup) |
core | before v0.2 |
| OQ-OBS-3 | Whether tracing should be on by default for CLI commands (off currently) | core | v0.3 |