Black-Box Scorers#

Black-box Uncertainty Quantification (UQ) methods treat the LLM as a black box and evaluate consistency of multiple responses generated from the same prompt to estimate response-level confidence. These scorers are compatible with any LLM and don’t require access to internal model states or token probabilities.

Key Characteristics:

Universal Compatibility: Works with any LLM
Intuitive: Easy to understand and implement
No Internal Access Required: Doesn’t need token probabilities or model internals

Trade-offs:

Higher Cost: Requires multiple generations per prompt
Slower: Multiple generations and comparison calculations increase latency

Notation:

For a given prompt \(x_i\), these approaches involve generating \(m\) responses \(\tilde{\mathbf{y}}_i = \{ \tilde{y}_{i1},...,\tilde{y}_{im}\}\), using a non-zero temperature, from the same prompt and comparing these responses to the original response \(y_{i}\).

Available Scorers

Normalized Semantic Negentropy
Semantic Sets Confidence
Non-Contradiction Probability
Entailment Probability
Exact Match Rate
BERTScore
Normalized Cosine Similarity

Black-Box Scorers#

This Page