White-Box Scorers#

White-box Uncertainty Quantification (UQ) methods leverage token probabilities to estimate uncertainty. These scorers offer single-generation scoring, which is significantly faster and cheaper than black-box methods, but require access to the LLM’s internal probabilities.

Key Characteristics:

  • Minimal Latency: Token probabilities are already returned by the LLM

  • No Added Cost: Doesn’t require additional LLM calls (for single-generation scorers)

  • High Performance: Access to internal model states provides rich uncertainty signals

Trade-offs:

  • Limited Compatibility: Requires access to token probabilities, not available for all LLMs/APIs

Notation:

Let the tokenization of LLM response \(y_i\) be denoted as \(\{t_1,...,t_{L_i}\}\), where \(L_i\) denotes the number of tokens in the response. Let \(p_t\) denote the token probability for token \(t\).

Single-Generation Scorers#

These scorers require only one LLM generation and use the token probabilities from that single response.

Multi-Generation Scorers#

These scorers generate multiple responses from the same prompt, combining the sampling approach of black-box UQ with token-probability-based signals.