Semantic Sets Confidence#

semantic_sets_confidence

Semantic Sets Confidence (SSC) counts the number of unique response sets (clusters) obtained during the computation of semantic entropy and normalizes this count to obtain a confidence score.

Definition#

Let \(N_C\) denote the number of unique semantic clusters and \(m\) denote the number of sampled responses. We normalize this count to obtain a confidence score in \([0,1]\) as follows:

\[SSC(y_i; \tilde{\mathbf{y}}_i) = \frac{m - N_C}{m - 1}\]

Interpretation:

When \(N_C = 1\): All sampled responses are semantically equivalent, so the confidence score is 1
When \(N_C = m\): All responses are semantically distinct, so the confidence score is 0

How It Works#

Generate multiple responses \(\tilde{\mathbf{y}}_i\) from the same prompt
Use an NLI model to cluster semantically equivalent responses based on mutual entailment
Count the number of unique semantic clusters \(N_C\)
Normalize using the formula above to get a score in \([0,1]\)

Fewer semantic clusters indicate higher consistency among responses, which typically correlates with higher confidence in the response accuracy.

Parameters#

When using BlackBoxUQ, specify "semantic_sets_confidence" in the scorers list.

Example#

from uqlm import BlackBoxUQ

# Initialize with semantic_sets_confidence scorer
bbuq = BlackBoxUQ(
    llm=llm,
    scorers=["semantic_sets_confidence"],
    nli_model_name="microsoft/deberta-large-mnli"
)

# Generate responses and compute scores
results = await bbuq.generate_and_score(prompts=prompts, num_responses=5)

# Access the semantic_sets_confidence scores
print(results.to_df()["semantic_sets_confidence"])

References#

Lin, Z., et al. (2024). Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models. arXiv.
Vashurin, R., et al. (2025). Benchmarking LLM Uncertainty Quantification Methods for Agentic AI. arXiv.
Kuhn, L., et al. (2023). Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation. arXiv.