QA-Based Uncertainty Quantification (LUQ)#
Definition#
The Claim-QA approach demonstrated here is adapted from Farquhar et al. (2024). It uses an LLM to convert each unit (sentence or claim) into a question for which that unit would be the answer. The method measures consistency across multiple responses to these questions, effectively applying standard black-box uncertainty quantification to those sampled responses to the unit questions. Formally, a claim-QA scorer \(c_g(s;\cdot)\) is defined as follows:
where \(y_0^{(s)}\) is the original unit response, \(\mathbf{y}^{(s)}_{\text{cand}} = {y_1^{(s)}, ..., y_m^{(s)}}\) are \(m\) candidate responses to the unit’s question, and \(\eta\) is a consistency function such as contradiction probability, cosine similarity, or BERTScore F1. Semantic entropy, which follows a slightly different functional form, can also be used to measure consistency.
Key Properties:
Claim or sententence-level scoring
Less complex (cost and latency) than other long-form scoring methods
Score range: \([0, 1]\)
How It Works#
Generate an original response and sampled responses
Decompose original response into units (claims or sentences)
For each claim/sentence, generate one or more questions that have that claim/sentence as the answer
Generate multiple responses for each question generated in step 3
Measure consistency in the LLM responses to the claim/sentence questions to estimate claim/sentence-level confidence
Parameters#
When using LongTextQA, specify "semantic_negentropy" (or alternative scoring function) in the scorers list.
Example#
from uqlm import LongTextQA
# Initialize
ltqa = LongTextQA(
llm=original_llm,
claim_decomposition_llm=claim_decomposition_llm,
scorers=["semantic_negentropy"],
sampling_temperature=1.0
)
# Generate responses and compute scores
results = await ltqa.generate_and_score(prompts=prompts, num_claim_qa_responses=5)
# Access the claim-level scores
print(results.to_df()["claims_data"])
References#
Farquhar, S., et al. (2024). Detecting hallucinations in large language models using semantic entropy. Nature.
See Also#
LongTextQA- Class for Graph-Based UQ for long-form generations