Black-Box Scorers ================= Black-box Uncertainty Quantification (UQ) methods treat the LLM as a black box and evaluate consistency of multiple responses generated from the same prompt to estimate response-level confidence. These scorers are compatible with any LLM and don't require access to internal model states or token probabilities. **Key Characteristics:** - **Universal Compatibility:** Works with any LLM - **Intuitive:** Easy to understand and implement - **No Internal Access Required:** Doesn't need token probabilities or model internals **Trade-offs:** - **Higher Cost:** Requires multiple generations per prompt - **Slower:** Multiple generations and comparison calculations increase latency **Notation:** For a given prompt :math:`x_i`, these approaches involve generating :math:`m` responses :math:`\tilde{\mathbf{y}}_i = \{ \tilde{y}_{i1},...,\tilde{y}_{im}\}`, using a non-zero temperature, from the same prompt and comparing these responses to the original response :math:`y_{i}`. .. toctree:: :maxdepth: 1 :caption: Available Scorers semantic_negentropy semantic_sets_confidence noncontradiction entailment exact_match bert_score cosine_sim