Code-Generation Scorers#
Code-generation uncertainty quantification uses CodeGenUQ to score generated code. These scorers either reuse existing short-form UQ methods or adapt black-box consistency scoring to code by comparing structural similarity or functional equivalence across sampled generations.
Key Characteristics:
White-box compatibility: Token-probability scorers are identical to the corresponding white-box scorers.
Code-aware consistency: Black-box scorers compare sampled code generations using code embeddings, CodeBLEU, or LLM-judged functional equivalence.
Score range: \([0, 1]\), where higher values indicate higher confidence.
Trade-offs:
Dependency requirements: Some code-aware scorers require code-specific models or language tooling.
Higher cost: Functional equivalence scorers require additional LLM calls.
Code-Generation Scoring Methods#
There are three main categories of code-generation scoring methods offered by UQLM: