Semantic Sets Confidence#
semantic_sets_confidence
Semantic Sets Confidence (SSC) counts the number of unique response sets (clusters) obtained during the computation of semantic entropy and normalizes this count to obtain a confidence score.
Definition#
Let \(N_C\) denote the number of unique semantic clusters and \(m\) denote the number of sampled responses. We normalize this count to obtain a confidence score in \([0,1]\) as follows:
Interpretation:
When \(N_C = 1\): All sampled responses are semantically equivalent, so the confidence score is 1
When \(N_C = m\): All responses are semantically distinct, so the confidence score is 0
How It Works#
Generate multiple responses \(\tilde{\mathbf{y}}_i\) from the same prompt
Use an NLI model to cluster semantically equivalent responses based on mutual entailment
Count the number of unique semantic clusters \(N_C\)
Normalize using the formula above to get a score in \([0,1]\)
Fewer semantic clusters indicate higher consistency among responses, which typically correlates with higher confidence in the response accuracy.
Parameters#
When using BlackBoxUQ, specify "semantic_sets_confidence" in the scorers list.
Example#
from uqlm import BlackBoxUQ
# Initialize with semantic_sets_confidence scorer
bbuq = BlackBoxUQ(
llm=llm,
scorers=["semantic_sets_confidence"],
nli_model_name="microsoft/deberta-large-mnli"
)
# Generate responses and compute scores
results = await bbuq.generate_and_score(prompts=prompts, num_responses=5)
# Access the semantic_sets_confidence scores
print(results.to_df()["semantic_sets_confidence"])
References#
Lin, Z., et al. (2024). Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models. arXiv.
Vashurin, R., et al. (2025). Benchmarking LLM Uncertainty Quantification Methods for Agentic AI. arXiv.
Kuhn, L., et al. (2023). Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation. arXiv.
See Also#
BlackBoxUQ- Main class for black-box uncertainty quantificationNormalized Semantic Negentropy - Related scorer based on semantic entropy