uqlm.scorers.codegen.CodeGenUQ#

class uqlm.scorers.codegen.CodeGenUQ(llm=None, scorers=None, equivalence_llm=None, system_prompt=None, max_calls_per_min=None, sampling_temperature=1.0, top_k_logprobs=15, length_normalize=True, max_length=2000, sentence_transformer='jinaai/jina-embeddings-v2-base-code', language='python', retries=5)#

Bases: ShortFormUQ

__init__(llm=None, scorers=None, equivalence_llm=None, system_prompt=None, max_calls_per_min=None, sampling_temperature=1.0, top_k_logprobs=15, length_normalize=True, max_length=2000, sentence_transformer='jinaai/jina-embeddings-v2-base-code', language='python', retries=5)#

Class for computing confidence scores for code generation use cases.

Parameters:
  • llm (BaseChatModel) – A langchain llm object to get passed to chain constructor. User is responsible for specifying temperature and other relevant parameters to the constructor of their llm object.

  • scorers (List[str], default=None) – Specifies which scorers to include. Must be subset of [“sequence_probability”, “min_probability”, “mean_token_negentropy”, “min_token_negentropy”, “probability_margin”, “p_true”, “consistency_and_confidence”, “monte_carlo_probability”, “code_bleu”, “functional_equivalence_rate”, “verbalized_confidence”, “functional_negentropy”, “functional_sets_confidence”, “cosine_sim”]. If None, defaults to [“functional_equivalence_rate”, “cosine_sim”].

  • equivalence_llm (BaseChatModel, default=None) – A langchain llm object to get passed to chain constructor. This is used for CodeEquivalence and FunctionalEntropy scorers. User is responsible for specifying temperature and other relevant parameters to the constructor of their equivalence_llm object.

  • system_prompt (str, default=None) – Optional argument for user to provide custom system prompt. If prompts are list of strings and system_prompt is None, defaults to “You are a helpful assistant.”

  • max_calls_per_min (int, default=None) – Specifies how many api calls to make per minute to avoid a rate limit error. By default, no limit is specified.

  • sampling_temperature (float, default=1.0) – The ‘temperature’ parameter for llm model to generate sampled LLM responses. Must be greater than 0.

  • top_k_logprobs (int, default=15) – Specifies the number of logprobs to return for each response.

  • length_normalize (bool, default=True) – Specifies whether to length normalize the logprobs.

  • max_length (int, default=2000) – Specifies the maximum allowed string length. Responses longer than this value will be truncated to avoid OutOfMemoryError

  • sentence_transformer (str, default="jinaai/jina-embeddings-v2-base-code") – Specifies which huggingface sentence transformer to use when computing cosine similarity for consistency_and_confidence. See https://huggingface.co/jinaai?sort_models=likes#models for more information. The recommended sentence transformer is ‘jinaai/jina-embeddings-v2-base-code’.

  • language (str, default="python") – Specifies the language of the code, this is used while computing CodeBleu and CodeEquivalence scores (if “codebleu” or “functional_equivalence_rate” is in scorers). This might require user to install additional dependencies. Must be one of [“python”, “java”, “sql”].

  • retries (int, default=5) – Specifies the number of retries to make if the equivalence score is not found.

Methods

__init__([llm, scorers, equivalence_llm, ...])

Class for computing confidence scores for code generation use cases.

generate_and_score(prompts[, num_responses, ...])

generate_candidate_responses(prompts[, ...])

This method generates multiple responses for uncertainty estimation.

generate_original_responses(prompts[, ...])

This method generates original responses for uncertainty estimation.

score(prompts, responses, sampled_responses, ...)

async generate_candidate_responses(prompts, num_responses=5, progress_bar=None)#

This method generates multiple responses for uncertainty estimation. If specified in the child class, all responses are postprocessed using the callable function defined by the user.

Return type:

List[List[str]]

Parameters:
  • prompts (List[Union[str, List[BaseMessage]]]) – List of prompts from which LLM responses will be generated. Prompts in list may be strings or lists of BaseMessage. If providing input type List[List[BaseMessage]], refer to https://python.langchain.com/docs/concepts/messages/#langchain-messages for support.

  • num_responses (int, default=5) – The number of sampled responses used to compute consistency.

  • progress_bar (rich.progress.Progress, default=None) – A progress bar object to display progress.

Returns:

A list of sampled responses for each prompt.

Return type:

list of list of str

async generate_original_responses(prompts, top_k_logprobs=None, progress_bar=None)#

This method generates original responses for uncertainty estimation. If specified in the child class, all responses are postprocessed using the callable function defined by the user.

Return type:

List[str]

Parameters:
  • prompts (List[Union[str, List[BaseMessage]]]) – List of prompts from which LLM responses will be generated. Prompts in list may be strings or lists of BaseMessage. If providing input type List[List[BaseMessage]], refer to https://python.langchain.com/docs/concepts/messages/#langchain-messages for support.

  • progress_bar (rich.progress.Progress, default=None) – A progress bar object to display progress.

Returns:

A list of original responses for each prompt.

Return type:

list of str

References