Consistency and Confidence (CoCoA)#
consistency_and_confidence
Consistency and Confidence Approach (CoCoA) leverages two distinct signals: (1) similarity between an original response and sampled responses, and (2) token probabilities from the original response.
Definition#
Let \(y_0\) be the original response and \(y_1, ..., y_m\) be \(m\) sampled responses.
Step 1: Compute Length-Normalized Token Probability
Step 2: Compute Normalized Cosine Similarity
Average cosine similarity across pairings of the original response with all sampled responses, normalized to \([0,1]\):
Step 3: Compute CoCoA Score
CoCoA is the product of these two terms:
Key Properties:
Combines token-level confidence with response-level consistency
Multiplicative combination ensures both signals must be high for high confidence
Score range: \([0, 1]\)
How It Works#
Generate an original response with logprobs enabled
Generate multiple sampled responses from the same prompt
Compute the length-normalized probability of the original response
Encode all responses using a sentence transformer and compute cosine similarities
Multiply the probability and average similarity score
This approach is particularly effective because it requires both:
The model to be confident in its token predictions (high probability)
The responses to be semantically consistent (high similarity)
Parameters#
When using WhiteBoxUQ, specify "consistency_and_confidence" in the scorers list.
Example#
from uqlm import WhiteBoxUQ
# Initialize with consistency_and_confidence scorer
wbuq = WhiteBoxUQ(
llm=llm,
scorers=["consistency_and_confidence"],
sampling_temperature=1.0
)
# Generate responses and compute scores
results = await wbuq.generate_and_score(prompts=prompts, num_responses=5)
# Access the consistency_and_confidence scores
print(results.to_df()["consistency_and_confidence"])
References#
Vashurin, R., et al. (2025). CoCoA: Towards Efficient Multi-Criteria Conformal Calibration of Large Language Models. arXiv.
See Also#
WhiteBoxUQ- Main class for white-box uncertainty quantificationMonte Carlo Sequence Probability - Alternative multi-generation scorer
Normalized Cosine Similarity - Black-box cosine similarity scorer