Length-Normalized Sequence Probability#
normalized_probability
Length-Normalized Token Probability (LNTP) computes a length-normalized analog of joint token probability, making it invariant to response length.
Definition#
Length-normalized token (sequence) probability computes a length-normalized analog of joint token probability:
where \(L_i\) is the number of tokens in response \(y_i\).
Key Properties:
Equivalent to the geometric mean of token probabilities for response \(y_i\)
Length-invariant, making it suitable for comparing responses of different lengths
Score range: \([0, 1]\)
How It Works#
Generate a response with logprobs enabled
Extract the probability for each token in the response
Compute the geometric mean of all token probabilities
This normalization addresses the issue that sequence probability decreases with response length, allowing fair comparison across responses of varying lengths.
Parameters#
When using WhiteBoxUQ, specify "normalized_probability" in the scorers list.
Note
This scorer will be deprecated in favor of sequence_probability with length_normalize=True
in a future version.
Example#
from uqlm import WhiteBoxUQ
# Initialize with normalized_probability scorer
wbuq = WhiteBoxUQ(
llm=llm,
scorers=["normalized_probability"]
)
# Generate responses and compute scores
results = await wbuq.generate_and_score(prompts=prompts)
# Access the normalized_probability scores
print(results.to_df()["normalized_probability"])
References#
Malinin, A. & Gales, M. (2021). Uncertainty Estimation in Autoregressive Structured Prediction. arXiv.
See Also#
WhiteBoxUQ- Main class for white-box uncertainty quantificationSequence Probability - Non-normalized sequence probability
Minimum Token Probability - Minimum token probability across the response