Long-Text Uncertainty Quantification (LUQ)#
Definition#
The Long-text UQ (LUQ) approach demonstrated here is adapted from Zhang et al. (2024). Similar to standard black-box UQ, this approach requires generating a original response and sampled candidate responses to the same prompt. The original response is then decomposed into units (claims or sentences). Unit-level confidence scores are then obtained by averaging entailment probabilities across candidate responses:
where \(\mathbf{y}^{(s)}_{\text{cand}} = {y_1^{(s)}, ..., y_m^{(s)}}\) are \(m\) candidate responses, and \(P(\text{entail}|y_j, s)\) denotes the NLI-estimated probability that $s$ is entailed in \(y_j\).
Key Properties:
Claim or sententence-level scoring
Less complex (cost and latency) than other long-form scoring methods
Score range: \([0, 1]\)
How It Works#
Generate an original response and sampled responses
Decompose original response into units (claims or sentences)
Obtain entailment probabilities of units in original response with respect to sampled responses
For each unit, average entailment probabilities across sampled responses
Parameters#
When using LongTextUQ, specify "entailment" (or alternative scoring function) in the scorers list.
Example#
from uqlm import LongTextUQ
# Initialize
luq = LongTextUQ(
llm=original_llm,
claim_decomposition_llm=claim_decomposition_llm,
scorers=["entailment"],
sampling_temperature=1.0
)
# Generate responses and compute scores
results = await luq.generate_and_score(prompts=prompts, num_responses=5)
# Access the claim-level scores
print(results.to_df()["claims_data"])
References#
Zhang, C., et al. (2024). LUQ: Long-text Uncertainty Quantification for LLMs. arXiv.
See Also#
LongTextUQ- Class for LUQ-style scoring for long-form generations