05 — Observability¶

Status: Authoritative for v0.1
Companion RFC: RFC-0013

GenoLeWM's observability surface has three layers — logging, metrics, tracing — plus a strict redaction filter that runs over all of them. Logs and metrics are structured; nothing is free-form. The redaction filter is the single defense against personal-data exfiltration, and its tests are non-negotiable.

Principles¶

Structured by default. Every event is a key-value record. Free-text print is forbidden in geno_lewm/.
Redaction by default. The logger receives no user variants, no personal genome data, no FASTA bytes. The redaction filter rejects any field that fails its allowlist.
Stable keys. Event names and metric names are part of the public contract. Renaming an event name is a MAJOR change.
Local-first. Default sink is the local filesystem; remote sinks (wandb, OpenTelemetry collectors) are opt-in.
Cost-bounded. The observability layer must not dominate training compute. Sampling and rate-limiting are explicit.

Logging¶

Format¶

JSON Lines (.jsonl), one record per line, UTF-8.

{
  "ts": "2026-05-20T15:42:01.123Z",
  "severity": "info",
  "event": "training.step",
  "run_id": "run-2026-05-20-h100-0",
  "step": 12345,
  "epoch": 2,
  "phase": "phase1",
  "duration_ms": 412,
  "data": { ... event-specific fields ... }
}

Required fields on every record¶

Field	Type	Notes
`ts`	string	ISO-8601 UTC, millisecond resolution
`severity`	enum	`debug` \| `info` \| `warn` \| `error`
`event`	string	dotted-lowercase event name
`run_id`	string	random or human-supplied per process
`data`	object	event-specific structured fields

Optional but standardized:

Field	Type	Notes
`step`, `epoch`, `phase`	ints / string	training loop
`duration_ms`	int	for spans
`trace_id`, `span_id`	string	OpenTelemetry-compatible 16/8-byte hex
`error_code`	string	only on `severity=error` records

Event registry¶

Event names are registered in geno_lewm/observability.py::EVENTS. The registry is the source of truth and is regenerated into docs/api/log-events.md on each release.

Canonical events for v0.1:

Event	Severity	When emitted
`training.run.start`	info	trainer initialized
`training.run.end`	info	trainer exited
`training.step`	debug	every step
`training.epoch.end`	info	every epoch
`training.checkpoint.write`	info	checkpoint saved
`training.collapse.alert`	warn	one of the alert criteria tripped
`training.metric`	info	scalar metric logged
`eval.run.start`	info	benchmark started
`eval.run.end`	info	benchmark finished
`eval.regression`	error	smoke eval regression
`data.cache.hit`	debug	cache hit
`data.cache.miss`	debug	cache miss
`data.shard.write`	info	new shard written
`inference.score.start`	debug	scoring call entered
`inference.score.end`	debug	scoring call returned
`inference.batch.end`	info	batch finished
`inference.network.blocked`	error	fail-closed guard tripped
`provenance.receipt.write`	info	receipt written
`provenance.verify.start`	info	verifier started
`provenance.verify.end`	info	verifier returned
`provenance.verify.mismatch`	error	a hash check failed
`error`	error	any `GenoLeWMError` raised inside a logged span

Severities¶

debug: high-volume; off by default in release builds.
info: durable record of normal operation.
warn: degraded behavior; investigation suggested.
error: typed failure; always accompanied by error_code.

Sinks¶

Default: local filesystem at ${GENO_LEWM_LOG_DIR}/{run_id}.jsonl.

Optional sinks:

wandb: training metrics + summary events. Opt-in via --wandb-project.
stderr: human-readable rendering for interactive use. Filtered by severity.
OpenTelemetry: OTLP/gRPC export if OTEL_EXPORTER_OTLP_ENDPOINT is set. Disabled by default.

There is no default cloud sink. Telemetry is never enabled by default.

Redaction¶

The redaction filter operates on the data field of every record. Rules:

Allowlist of keys: keys not in the per-event allowlist are dropped with a redacted_keys counter increment.
Type allowlist: nested values may be scalars, lists of scalars, or shallow dicts of scalars. Bytes objects and tensors are always dropped.
Pattern filters: any string field whose value matches a DNA-string pattern (^[ACGTN]{20,}$) is dropped.
Personal-data fields: explicit deny list for vcf_content, genotype, sample_id, user_email, etc.

The redaction filter has unit tests for every event in the registry and property tests over random payloads.

Metrics¶

Format¶

Counter / gauge / histogram, with stable names. Metrics names are dotted-lowercase, follow Prometheus naming conventions, and are registered in geno_lewm/metrics.py::METRICS.

Canonical metrics¶

Name	Kind	Unit	Notes
`geno_lewm.training.step.duration`	histogram	ms	per training step
`geno_lewm.training.loss.pred`	gauge	unitless	last logged pred loss
`geno_lewm.training.loss.reg`	gauge	unitless	last logged reg loss (Phase 2 active, Phase 1 monitoring)
`geno_lewm.training.collapse.alert`	counter	events	total alerts during run
`geno_lewm.training.collapse.pred_cos_mean`	gauge	unitless	last collapse-monitor mean prediction-target cosine similarity
`geno_lewm.training.collapse.pred_l2_mean`	gauge	unitless	last collapse-monitor mean prediction-target L2 distance
`geno_lewm.training.collapse.target_var_per_dim`	gauge	unitless	last collapse-monitor mean target variance per latent dimension
`geno_lewm.training.collapse.pred_var_per_dim`	gauge	unitless	last collapse-monitor mean prediction variance per latent dimension
`geno_lewm.training.collapse.pred_target_corr`	gauge	unitless	last collapse-monitor flattened prediction-target correlation
`geno_lewm.training.collapse.pairwise_pred_dist_mean`	gauge	unitless	last collapse-monitor mean pairwise prediction distance
`geno_lewm.training.collapse.kl_reg`	gauge	unitless	last collapse-monitor LeJEPA KL regularizer value
`geno_lewm.data.cache.hit`	counter	ops	per-call increment
`geno_lewm.data.cache.miss`	counter	ops	per-call increment
`geno_lewm.data.encode.duration`	histogram	ms	per-window encoder call
`geno_lewm.inference.score.duration`	histogram	ms	per variant
`geno_lewm.inference.batch.throughput`	gauge	variants/s	last batched scoring
`geno_lewm.inference.memory.peak_bytes`	gauge	bytes	RSS peak per process
`geno_lewm.planning.cem.iter.duration`	histogram	ms	per CEM iteration
`geno_lewm.planning.cem.calls`	counter	predictor calls	per planning run
`geno_lewm.provenance.verify.duration`	histogram	ms	per verification
`geno_lewm.observability.redacted_keys`	counter	keys	total dropped by filter
`geno_lewm.errors.raised`	counter	events	tagged with `code`

Exporters¶

Prometheus textfile: written to ${GENO_LEWM_LOG_DIR}/metrics.prom every metrics_interval_s (default 30 s).
OpenTelemetry OTLP/gRPC: opt-in via env var.
wandb: training metrics auto-forwarded when --wandb-project set.

Tracing¶

Optional in v1; OpenTelemetry-compatible.

Spans wrap top-level operations (training.step, inference.score, planning.cem.iter).
Spans are emitted only when OTEL_EXPORTER_OTLP_ENDPOINT is set.
Trace IDs propagate into logs via trace_id / span_id fields.

Sampling and rate limits¶

Per-step training records are sampled at 1 / log_every steps (default log_every = 100). Other event types are unsampled.
Cache hit/miss debug records are sampled at 1 / 1000 by default.
Metric flushes are not sampled.

Configuration¶

The observability layer is configured via:

Env var	Default	Meaning
`GENO_LEWM_LOG_DIR`	`~/.geno-lewm/logs`	local log root
`GENO_LEWM_LOG_LEVEL`	`info`	global log severity threshold
`GENO_LEWM_LOG_FORMAT`	`jsonl`	`jsonl` \| `pretty`
`OTEL_EXPORTER_OTLP_ENDPOINT`	unset	enables tracing if set
`GENO_LEWM_METRICS_INTERVAL_S`	`30`	flush interval
`GENO_LEWM_REDACTION_STRICT`	`1`	force-fail on disallowed payload (recommended)

CLI flags override env vars; env vars override defaults.

Privacy contract¶

The default log root excludes any personal-data fields. The runtime fails closed on GENO_LEWM_REDACTION_STRICT=1 if a redaction rule is bypassed.
Crash logs sanitize stack traces of locals before writing; only model identifiers and stack frames are preserved.
Per-variant scoring never logs the variant bases; it logs only the variant id (hash of chrom+pos+ref+alt) and the categorical outputs.

Failures of the privacy contract are bugs and surface as InternalError.InvariantViolation.

Invariants¶

ID	Invariant	Enforced by
INV-OBS-1	Every logged event has a name registered in `EVENTS`	linter rule
INV-OBS-2	Every metric name is registered in `METRICS`	linter rule
INV-OBS-3	The redaction filter is the only path from `logger.<level>(...)` to a sink	call-graph check
INV-OBS-4	DNA strings ≥ 20 bp never appear in any log record	property test in CI
INV-OBS-5	Telemetry is opt-in; no observability sink phones home without an env var	integration test
INV-OBS-6	A crash before logger init still produces a sanitized minimal record	wrapper in `__main__`

Open questions¶

ID	Question	Owner	Target
OQ-OBS-1	Whether to add an `aggregation.window_s` knob for batched metric exports	core	v0.2
OQ-OBS-2	Whether the redaction filter should also strip `clinvar_id` (it can be re-personalized via lookup)	core	before v0.2
OQ-OBS-3	Whether tracing should be on by default for CLI commands (off currently)	core	v0.3