Code Similarity Scorers#
Definition#
Code similarity scorers generate sampled code responses from the same prompt and compare each sampled response with the original response. Higher average similarity indicates higher confidence.
cosine_sim embeds the original and sampled code responses with a code embedding model, then computes normalized average cosine similarity:
code_bleu computes average CodeBLEU similarity between the original code response and sampled responses:
Key Properties:
Code-adapted black-box consistency scoring
Uses structural or embedding-based similarity rather than natural-language entailment
Score range: \([0, 1]\)
Parameters#
When using CodeGenUQ, specify "cosine_sim" or "code_bleu" in the scorers list. You can also set sentence_transformer for cosine_sim and language for code_bleu.
Example#
from uqlm import CodeGenUQ
code_uq = CodeGenUQ(
llm=llm,
scorers=["cosine_sim", "code_bleu"],
language="python",
)
results = await code_uq.generate_and_score(prompts=prompts, num_responses=5)
See Also#
CodeGenUQ- Class for code-generation uncertainty quantification