uqlm.scorers.white_box.WhiteBoxUQ#

class uqlm.scorers.white_box.WhiteBoxUQ(llm=None, system_prompt='You are a helpful assistant.', max_calls_per_min=None, scorers=None)#

Bases: UncertaintyQuantifier

__init__(llm=None, system_prompt='You are a helpful assistant.', max_calls_per_min=None, scorers=None)#

Class for computing white-box UQ confidence scores. This class offers two confidence scores, normalized probability [1] and minimum probability [2].

Parameters:
  • llm (BaseChatModel) – A langchain llm object to get passed to chain constructor. User is responsible for specifying temperature and other relevant parameters to the constructor of their llm object.

  • max_calls_per_min (int, default=None) – Used to control rate limiting.

  • system_prompt (str or None, default="You are a helpful assistant.") – Optional argument for user to provide custom system prompt

  • scorers (subset of {) – “imperplexity”, “geometric_mean_probability”, “min_probability”, “max_probability”,

  • } – Specifies which black box (consistency) scorers to include. If None, defaults to all.

  • default=None – Specifies which black box (consistency) scorers to include. If None, defaults to all.

Methods

__init__([llm, system_prompt, ...])

Class for computing white-box UQ confidence scores.

avg_logprob(logprobs)

Compute average logprob

generate_and_score(prompts)

Generate responses and compute white-box confidence scores based on extracted token probabilities.

generate_candidate_responses(prompts)

This method generates multiple responses for uncertainty estimation.

generate_original_responses(prompts)

This method generates original responses for uncertainty estimation.

get_logprobs(logprobs)

Extract log token probabilities

score(logprobs_results[, prompts, responses])

Compute white-box confidence scores from provided logprobs.

avg_logprob(logprobs)#

Compute average logprob

Return type:

float

async generate_and_score(prompts)#

Generate responses and compute white-box confidence scores based on extracted token probabilities.

Return type:

UQResult

Parameters:

prompts (list of str) – A list of input prompts for the model.

Returns:

UQResult containing prompts, responses, logprobs, and white-box UQ scores

Return type:

UQResult

async generate_candidate_responses(prompts)#

This method generates multiple responses for uncertainty estimation. If specified in the child class, all responses are postprocessed using the callable function defined by the user.

Return type:

List[List[str]]

async generate_original_responses(prompts)#

This method generates original responses for uncertainty estimation. If specified in the child class, all responses are postprocessed using the callable function defined by the user.

Return type:

List[str]

static get_logprobs(logprobs)#

Extract log token probabilities

score(logprobs_results, prompts=None, responses=None)#

Compute white-box confidence scores from provided logprobs.

Return type:

UQResult

Parameters:
  • logprobs_results (list of logprobs_result) – List of dictionaries, each returned by BaseChatModel.agenerate

  • prompts (list of str, default=None) – A list of input prompts for the model.

  • responses (list of str, default=None) – A list of model responses for the prompts.

Returns:

UQResult containing prompts, responses, logprobs, and white-box UQ scores

Return type:

UQResult

References