Normalized Semantic Negentropy#

semantic_negentropy

Normalized Semantic Negentropy (NSN) normalizes the standard computation of discrete semantic entropy to be increasing with higher confidence and have \([0,1]\) support.

Definition#

In contrast to exact match and non-contradiction scorers, semantic entropy does not distinguish between an original response and candidate responses. Instead, this approach computes a single metric value on a list of responses generated from the same prompt.

Under this approach, responses are clustered using an NLI model based on mutual entailment. The discrete version of Semantic Entropy (SE) is defined as:

\[SE(y_i; \tilde{\mathbf{y}}_i) = - \sum_{C \in \mathcal{C}} P(C|y_i, \tilde{\mathbf{y}}_i)\log P(C|y_i, \tilde{\mathbf{y}}_i)\]

where \(P(C|y_i, \tilde{\mathbf{y}}_i)\) is calculated as the probability a randomly selected response \(y \in \{y_i\} \cup \tilde{\mathbf{y}}_i\) belongs to cluster \(C\), and \(\mathcal{C}\) denotes the full set of clusters of \(\{y_i\} \cup \tilde{\mathbf{y}}_i\).

To ensure that we have a normalized confidence score with \([0,1]\) support and with higher values corresponding to higher confidence, we implement the following normalization to arrive at Normalized Semantic Negentropy (NSN):

\[NSN(y_i; \tilde{\mathbf{y}}_i) = 1 - \frac{SE(y_i; \tilde{\mathbf{y}}_i)}{\log m}\]

where \(\log m\) is included to normalize the support.

How It Works#

Generate multiple responses \(\tilde{\mathbf{y}}_i\) from the same prompt
Use an NLI model to cluster semantically equivalent responses based on mutual entailment
Compute the entropy of the cluster distribution
Normalize the entropy to obtain a confidence score in \([0,1]\)

Higher NSN values indicate that responses are more semantically consistent (fewer clusters), suggesting higher confidence in the response.

Parameters#

When using BlackBoxUQ, specify "semantic_negentropy" in the scorers list.

Example#

from uqlm import BlackBoxUQ

# Initialize with semantic_negentropy scorer
bbuq = BlackBoxUQ(
    llm=llm,
    scorers=["semantic_negentropy"],
    nli_model_name="microsoft/deberta-large-mnli"
)

# Generate responses and compute scores
results = await bbuq.generate_and_score(prompts=prompts, num_responses=5)

# Access the semantic_negentropy scores
print(results.to_df()["semantic_negentropy"])

References#

Farquhar, S., et al. (2024). Detecting hallucinations in large language models using semantic entropy. Nature.
Kuhn, L., et al. (2023). Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation. arXiv.
Bouchard, D. & Chauhan, M. S. (2025). Generalized Ensembles for Robust Uncertainty Quantification of LLMs. arXiv.