Length-Normalized Sequence Probability#

normalized_probability

Length-Normalized Token Probability (LNTP) computes a length-normalized analog of joint token probability, making it invariant to response length.

Definition#

Length-normalized token (sequence) probability computes a length-normalized analog of joint token probability:

\[LNTP(y_i) = \prod_{t \in y_i} p_t^{\frac{1}{L_i}}\]

where \(L_i\) is the number of tokens in response \(y_i\).

Key Properties:

  • Equivalent to the geometric mean of token probabilities for response \(y_i\)

  • Length-invariant, making it suitable for comparing responses of different lengths

  • Score range: \([0, 1]\)

How It Works#

  1. Generate a response with logprobs enabled

  2. Extract the probability for each token in the response

  3. Compute the geometric mean of all token probabilities

This normalization addresses the issue that sequence probability decreases with response length, allowing fair comparison across responses of varying lengths.

Parameters#

When using WhiteBoxUQ, specify "normalized_probability" in the scorers list.

Note

This scorer will be deprecated in favor of sequence_probability with length_normalize=True in a future version.

Example#

from uqlm import WhiteBoxUQ

# Initialize with normalized_probability scorer
wbuq = WhiteBoxUQ(
    llm=llm,
    scorers=["normalized_probability"]
)

# Generate responses and compute scores
results = await wbuq.generate_and_score(prompts=prompts)

# Access the normalized_probability scores
print(results.to_df()["normalized_probability"])

References#

See Also#