Code-Generation Scorers#

Code-generation uncertainty quantification uses CodeGenUQ to score generated code. These scorers either reuse existing short-form UQ methods or adapt black-box consistency scoring to code by comparing structural similarity or functional equivalence across sampled generations.

Key Characteristics:

  • White-box compatibility: Token-probability scorers are identical to the corresponding white-box scorers.

  • Code-aware consistency: Black-box scorers compare sampled code generations using code embeddings, CodeBLEU, or LLM-judged functional equivalence.

  • Score range: \([0, 1]\), where higher values indicate higher confidence.

Trade-offs:

  • Dependency requirements: Some code-aware scorers require code-specific models or language tooling.

  • Higher cost: Functional equivalence scorers require additional LLM calls.

Code-Generation Scoring Methods#

There are three main categories of code-generation scoring methods offered by UQLM: