Code Similarity Scorers#

Definition#

Code similarity scorers generate sampled code responses from the same prompt and compare each sampled response with the original response. Higher average similarity indicates higher confidence.

cosine_sim embeds the original and sampled code responses with a code embedding model, then computes normalized average cosine similarity:

\[NCS(y; \tilde{\mathbf{y}}) = \frac{1}{2} + \frac{1}{2m} \sum_{j=1}^{m} \frac{V(y) \cdot V(\tilde{y}_j)}{\|V(y)\| \cdot \|V(\tilde{y}_j)\|}\]

code_bleu computes average CodeBLEU similarity between the original code response and sampled responses:

\[CBC(y; \tilde{\mathbf{y}}) = \frac{1}{m} \sum_{j=1}^{m} \text{CodeBLEU}(y, \tilde{y}_j)\]

Key Properties:

  • Code-adapted black-box consistency scoring

  • Uses structural or embedding-based similarity rather than natural-language entailment

  • Score range: \([0, 1]\)

Parameters#

When using CodeGenUQ, specify "cosine_sim" or "code_bleu" in the scorers list. You can also set sentence_transformer for cosine_sim and language for code_bleu.

Example#

from uqlm import CodeGenUQ

code_uq = CodeGenUQ(
    llm=llm,
    scorers=["cosine_sim", "code_bleu"],
    language="python",
)

results = await code_uq.generate_and_score(prompts=prompts, num_responses=5)

See Also#

  • CodeGenUQ - Class for code-generation uncertainty quantification