geno_lewm.encoder.pooling¶
pooling
¶
Pooling strategies for Carbon hidden states.
Defined by RFC-0002 §3.4. This module is intentionally independent of torch so the pooling contract, cache metadata, and downstream schema behavior can be validated before the Carbon runtime wrapper lands.
PoolingResult
dataclass
¶
PoolingResult(vector: tuple[float, ...], pool_type: Literal['centered_mean', 'global_mean'], pool_radius: int, untargeted: bool, center_token: int | None, token_count: int)
Pooled state vector plus cache-key metadata.
as_cache_fields
¶
Return fields shared with the window-cache schema.
global_mean
¶
Mean-pool every token vector in hidden_states.
centered_mean
¶
centered_mean(hidden_states: Sequence[Sequence[float]], *, center_token: int, pool_radius: int = DEFAULT_POOL_RADIUS_TOKENS) -> tuple[float, ...]
Mean-pool the inclusive token span center_token ± pool_radius.
Source code in geno_lewm/encoder/pooling.py
pool_hidden_states
¶
pool_hidden_states(hidden_states: Sequence[Sequence[float]], *, edit_locus: int | None = None, pool_type: Literal['centered_mean', 'global_mean'] = POOL_CENTERED_MEAN, pool_radius: int = DEFAULT_POOL_RADIUS_TOKENS, token_bp: int = CARBON_TOKEN_BP) -> PoolingResult
Pool token-level hidden states into a state vector.
edit_locus is a 0-based base-pair offset within the encoder
window. When it is absent, RFC-0002 requires a global-mean fallback
tagged as untargeted=True so cache consumers do not mix arbitrary
reference-window embeddings with edit-local embeddings.