Normalized Cosine Similarity#

cosine_sim

Normalized Cosine Similarity (NCS) leverages a sentence transformer to map LLM outputs to an embedding space and measure similarity using those sentence embeddings.

Definition#

Let \(V: \mathcal{Y} \rightarrow \mathbb{R}^d\) denote the sentence transformer, where \(d\) is the dimension of the embedding space.

The average cosine similarity across pairings of the original response with all candidate responses is given as follows:

\[CS(y_i; \tilde{\mathbf{y}}_i) = \frac{1}{m} \sum_{j=1}^m \frac{\mathbf{V}(y_i) \cdot \mathbf{V}(\tilde{y}_{ij})}{\|\mathbf{V}(y_i)\| \|\mathbf{V}(\tilde{y}_{ij})\|}\]

To ensure a standardized support of \([0, 1]\), we normalize cosine similarity to obtain confidence scores as follows:

\[NCS(y_i; \tilde{\mathbf{y}}_i) = \frac{CS(y_i; \tilde{\mathbf{y}}_i) + 1}{2}\]

Key Properties:

  • Uses sentence-level embeddings rather than token-level

  • Efficient computation compared to token-level methods like BERTScore

  • Normalized to \([0, 1]\) range where 1 indicates perfect semantic similarity

How It Works#

  1. Generate multiple candidate responses \(\tilde{\mathbf{y}}_i\) from the same prompt

  2. Encode the original response and all candidates using a sentence transformer

  3. Compute cosine similarity between the original response embedding and each candidate embedding

  4. Average the similarities and normalize to \([0, 1]\)

The default sentence transformer is all-MiniLM-L6-v2, which provides a good balance between speed and quality for semantic similarity tasks.

Parameters#

When using BlackBoxUQ, specify "cosine_sim" in the scorers list.

You can also specify a custom sentence transformer using the sentence_transformer parameter.

Example#

from uqlm import BlackBoxUQ

# Initialize with cosine_sim scorer
bbuq = BlackBoxUQ(
    llm=llm,
    scorers=["cosine_sim"],
    sentence_transformer="all-MiniLM-L6-v2"  # Default sentence transformer
)

# Generate responses and compute scores
results = await bbuq.generate_and_score(prompts=prompts, num_responses=5)

# Access the cosine_sim scores
print(results.to_df()["cosine_sim"])

References#

See Also#

  • BlackBoxUQ - Main class for black-box uncertainty quantification

  • BERTScore - Alternative similarity measure using BERT token embeddings