Long-Text Scorers ================= Long-form uncertainty quantification implements a three-stage pipeline after response generation: 1. Response Decomposition: The response :math:`y` is decomposed into units (claims or sentences), where a unit as denoted as $s$. 2. Unit-Level Confidence Scoring: Confidence scores are computed using a unit-level scoring function with values in :math:`[0, 1]`. Higher scores indicate greater likelihood of factual correctness. Units with scores below threshold $\tau$ are flagged as potential hallucinations. 3. Response-Level Aggregation: Unit scores are combined to provide an overall response confidence. **Key Characteristics:** - **Universal Compatibility:** Works with any LLM without requiring token probability access - **Fine-Grained Scoring:** Score at sentence or claim-level to localize likely hallucinations - **Uncertainty-aware decoding:** Improve factual precision by dropping high-uncertainty claims **Trade-offs:** - **Higher Cost:** Requires multiple generations per prompt - **Limited Compatibility:** Multiple generations and comparison calculations increase latency Long-Text Scoring Methods ------------------------- There are three main categories of long-text scoring methods offered by UQLM: .. toctree:: :maxdepth: 1 luq graph qa