Normalized Cosine Similarity ============================ .. currentmodule:: uqlm.scorers ``cosine_sim`` Normalized Cosine Similarity (NCS) leverages a sentence transformer to map LLM outputs to an embedding space and measure similarity using those sentence embeddings. Definition ---------- Let :math:`V: \mathcal{Y} \rightarrow \mathbb{R}^d` denote the sentence transformer, where :math:`d` is the dimension of the embedding space. The average cosine similarity across pairings of the original response with all candidate responses is given as follows: .. math:: CS(y_i; \tilde{\mathbf{y}}_i) = \frac{1}{m} \sum_{j=1}^m \frac{\mathbf{V}(y_i) \cdot \mathbf{V}(\tilde{y}_{ij})}{\|\mathbf{V}(y_i)\| \|\mathbf{V}(\tilde{y}_{ij})\|} To ensure a standardized support of :math:`[0, 1]`, we normalize cosine similarity to obtain confidence scores as follows: .. math:: NCS(y_i; \tilde{\mathbf{y}}_i) = \frac{CS(y_i; \tilde{\mathbf{y}}_i) + 1}{2} **Key Properties:** - Uses sentence-level embeddings rather than token-level - Efficient computation compared to token-level methods like BERTScore - Normalized to :math:`[0, 1]` range where 1 indicates perfect semantic similarity How It Works ------------ 1. Generate multiple candidate responses :math:`\tilde{\mathbf{y}}_i` from the same prompt 2. Encode the original response and all candidates using a sentence transformer 3. Compute cosine similarity between the original response embedding and each candidate embedding 4. Average the similarities and normalize to :math:`[0, 1]` The default sentence transformer is ``all-MiniLM-L6-v2``, which provides a good balance between speed and quality for semantic similarity tasks. Parameters ---------- When using :class:`BlackBoxUQ`, specify ``"cosine_sim"`` in the ``scorers`` list. You can also specify a custom sentence transformer using the ``sentence_transformer`` parameter. Example ------- .. code-block:: python from uqlm import BlackBoxUQ # Initialize with cosine_sim scorer bbuq = BlackBoxUQ( llm=llm, scorers=["cosine_sim"], sentence_transformer="all-MiniLM-L6-v2" # Default sentence transformer ) # Generate responses and compute scores results = await bbuq.generate_and_score(prompts=prompts, num_responses=5) # Access the cosine_sim scores print(results.to_df()["cosine_sim"]) References ---------- - Shorinwa, O., et al. (2024). `A Survey of Confidence Estimation and Calibration in Large Language Models `_. *arXiv*. - `Sentence Transformers - all-MiniLM-L6-v2 `_. *HuggingFace*. See Also -------- - :class:`BlackBoxUQ` - Main class for black-box uncertainty quantification - :doc:`bert_score` - Alternative similarity measure using BERT token embeddings