Monte Carlo Sequence Probability#
monte_carlo_probability
Monte Carlo Sequence Probability (MCSP) computes the average length-normalized sequence probability across multiple sampled responses.
Definition#
Let \(y_1, y_2, ..., y_m\) denote \(m\) sampled responses generated from the same prompt. Monte Carlo Sequence Probability is defined as:
where \(L_i\) is the number of tokens in response \(y_i\) and \(p_t\) is the token probability.
Key Properties:
Combines multiple response samples for more robust probability estimation
Length-normalized to allow fair comparison across responses
Score range: \([0, 1]\)
How It Works#
Generate multiple responses from the same prompt with logprobs enabled
For each response, compute the length-normalized sequence probability (geometric mean of token probabilities)
Average across all sampled responses
This scorer combines the sampling approach of black-box methods with token probability information, providing a more robust estimate than single-response probability.
Parameters#
When using WhiteBoxUQ, specify "monte_carlo_probability" in the scorers list.
Example#
from uqlm import WhiteBoxUQ
# Initialize with monte_carlo_probability scorer
wbuq = WhiteBoxUQ(
llm=llm,
scorers=["monte_carlo_probability"],
sampling_temperature=1.0
)
# Generate responses and compute scores (requires multiple samples)
results = await wbuq.generate_and_score(prompts=prompts, num_responses=5)
# Access the monte_carlo_probability scores
print(results.to_df()["monte_carlo_probability"])
References#
Kuhn, L., et al. (2023). Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation. arXiv.
See Also#
WhiteBoxUQ- Main class for white-box uncertainty quantificationConsistency and Confidence (CoCoA) - Alternative multi-generation scorer
Length-Normalized Sequence Probability - Single-generation length-normalized probability