Length-Normalized Sequence Probability#

normalized_probability

Length-Normalized Token Probability (LNTP) computes a length-normalized analog of joint token probability, making it invariant to response length.

Definition#

Length-normalized token (sequence) probability computes a length-normalized analog of joint token probability:

\[LNTP(y_i) = \prod_{t \in y_i} p_t^{\frac{1}{L_i}}\]

where \(L_i\) is the number of tokens in response \(y_i\).

Key Properties:

Equivalent to the geometric mean of token probabilities for response \(y_i\)
Length-invariant, making it suitable for comparing responses of different lengths
Score range: \([0, 1]\)

How It Works#

Generate a response with logprobs enabled
Extract the probability for each token in the response
Compute the geometric mean of all token probabilities

This normalization addresses the issue that sequence probability decreases with response length, allowing fair comparison across responses of varying lengths.

Parameters#

When using WhiteBoxUQ, specify "normalized_probability" in the scorers list.

Note

This scorer will be deprecated in favor of sequence_probability with length_normalize=True in a future version.

Example#

from uqlm import WhiteBoxUQ

# Initialize with normalized_probability scorer
wbuq = WhiteBoxUQ(
    llm=llm,
    scorers=["normalized_probability"]
)

# Generate responses and compute scores
results = await wbuq.generate_and_score(prompts=prompts)

# Access the normalized_probability scores
print(results.to_df()["normalized_probability"])

References#

Malinin, A. & Gales, M. (2021). Uncertainty Estimation in Autoregressive Structured Prediction. arXiv.

Length-Normalized Sequence Probability#

Definition#

How It Works#

Parameters#

Example#

References#

See Also#

This Page