Code Similarity Scorers
=======================

.. currentmodule:: uqlm.scorers

Definition
----------

Code similarity scorers generate sampled code responses from the same prompt and compare each sampled response with the original response. Higher average similarity indicates higher confidence.

``cosine_sim`` embeds the original and sampled code responses with a code embedding model, then computes normalized average cosine similarity:

.. math::

    NCS(y; \tilde{\mathbf{y}}) = \frac{1}{2} + \frac{1}{2m} \sum_{j=1}^{m} \frac{V(y) \cdot V(\tilde{y}_j)}{\|V(y)\| \cdot \|V(\tilde{y}_j)\|}

``code_bleu`` computes average CodeBLEU similarity between the original code response and sampled responses:

.. math::

    CBC(y; \tilde{\mathbf{y}}) = \frac{1}{m} \sum_{j=1}^{m} \text{CodeBLEU}(y, \tilde{y}_j)

**Key Properties:**

- Code-adapted black-box consistency scoring
- Uses structural or embedding-based similarity rather than natural-language entailment
- Score range: :math:`[0, 1]`

Parameters
----------

When using :class:`CodeGenUQ`, specify ``"cosine_sim"`` or ``"code_bleu"`` in the ``scorers`` list. You can also set ``sentence_transformer`` for ``cosine_sim`` and ``language`` for ``code_bleu``.

Example
-------

.. code-block:: python

    from uqlm import CodeGenUQ

    code_uq = CodeGenUQ(
        llm=llm,
        scorers=["cosine_sim", "code_bleu"],
        language="python",
    )

    results = await code_uq.generate_and_score(prompts=prompts, num_responses=5)

See Also
--------

- :class:`CodeGenUQ` - Class for code-generation uncertainty quantification