uqlm.scorers.white_box.WhiteBoxUQ#
- class uqlm.scorers.white_box.WhiteBoxUQ(llm=None, system_prompt=None, max_calls_per_min=None, scorers=None, sampling_temperature=1.0, top_k_logprobs=15, use_n_param=False, length_normalize=True, prompts_in_nli=True, device=None)#
Bases:
UncertaintyQuantifier- __init__(llm=None, system_prompt=None, max_calls_per_min=None, scorers=None, sampling_temperature=1.0, top_k_logprobs=15, use_n_param=False, length_normalize=True, prompts_in_nli=True, device=None)#
Class for computing white-box UQ confidence scores. This class offers two confidence scores, normalized probability [1] and minimum probability [2].
- Parameters:
llm (BaseChatModel) – A langchain llm object to get passed to chain constructor. User is responsible for specifying temperature and other relevant parameters to the constructor of their llm object.
max_calls_per_min (int, default=None) – Used to control rate limiting.
system_prompt (str, default=None) – Optional argument for user to provide custom system prompt. If prompts are list of strings and system_prompt is None, defaults to “You are a helpful assistant.”
scorers (List[str], default=None) – Specifies which white-box UQ scorers to include. Must be subset of [“normalized_probability”, “min_probability”, “sequence_probability”, “max_token_negentropy”, “mean_token_negentropy”, “probability_margin”, “monte_carlo_probability”, “consistency_and_confidence”, “semantic_negentropy”, “semantic_density”, “p_true”]. If None, defaults to [“normalized_probability”, “min_probability”].
sampling_temperature (float, default=1.0) – The ‘temperature’ parameter for llm model to generate sampled LLM responses. Must be greater than 0.
use_n_param (bool, default=False) – Specifies whether to use n parameter for BaseChatModel. Not compatible with all BaseChatModel classes. If used, it speeds up the generation process substantially when num_responses > 1.
prompts_in_nli (bool, default=True) – Specifies whether to use the prompts in the NLI inputs for semantic entropy and semantic density scorers.
length_normalize (bool, default=True) – Specifies whether to length normalize the logprobs. This attribute affect the response probability computation for three scorers (semantic_negentropy, semantic_density, monte_carlo_probability, and consistency_and_confidence).
device (str or torch.device input or torch.device object, default="cpu") – Specifies the device that NLI model use for prediction. Only applies to ‘semantic_negentropy’, ‘semantic_density’ scorers. Pass a torch.device to leverage GPU.
Methods
__init__([llm, system_prompt, ...])Class for computing white-box UQ confidence scores.
generate_and_score(prompts[, num_responses, ...])Generate responses and compute white-box confidence scores based on extracted token probabilities.
generate_candidate_responses(prompts[, ...])This method generates multiple responses for uncertainty estimation.
generate_original_responses(prompts[, ...])This method generates original responses for uncertainty estimation.
score(logprobs_results[, prompts, ...])Compute white-box confidence scores from provided logprobs.
- async generate_and_score(prompts, num_responses=5, show_progress_bars=True)#
Generate responses and compute white-box confidence scores based on extracted token probabilities.
- Return type:
- Parameters:
prompts (List[Union[str, List[BaseMessage]]]) – List of prompts from which LLM responses will be generated. Prompts in list may be strings or lists of BaseMessage. If providing input type List[List[BaseMessage]], refer to https://python.langchain.com/docs/concepts/messages/#langchain-messages for support.
num_responses (int, default=5) – The number of sampled responses used to multi-generation white-box scorers. Only applies to “monte_carlo_probability”, “consistency_and_confidence”, “semantic_negentropy”, “semantic_density” scorers.
show_progress_bars (bool, default=True) – If True, displays a progress bar while generating and scoring responses
- Returns:
UQResult containing prompts, responses, logprobs, and white-box UQ scores
- Return type:
- async generate_candidate_responses(prompts, num_responses=5, progress_bar=None)#
This method generates multiple responses for uncertainty estimation. If specified in the child class, all responses are postprocessed using the callable function defined by the user.
- Return type:
List[List[str]]- Parameters:
prompts (List[Union[str, List[BaseMessage]]]) – List of prompts from which LLM responses will be generated. Prompts in list may be strings or lists of BaseMessage. If providing input type List[List[BaseMessage]], refer to https://python.langchain.com/docs/concepts/messages/#langchain-messages for support.
num_responses (int, default=5) – The number of sampled responses used to compute consistency.
progress_bar (rich.progress.Progress, default=None) – A progress bar object to display progress.
- Returns:
A list of sampled responses for each prompt.
- Return type:
list of list of str
- async generate_original_responses(prompts, top_k_logprobs=None, progress_bar=None)#
This method generates original responses for uncertainty estimation. If specified in the child class, all responses are postprocessed using the callable function defined by the user.
- Return type:
List[str]- Parameters:
prompts (List[Union[str, List[BaseMessage]]]) – List of prompts from which LLM responses will be generated. Prompts in list may be strings or lists of BaseMessage. If providing input type List[List[BaseMessage]], refer to https://python.langchain.com/docs/concepts/messages/#langchain-messages for support.
progress_bar (rich.progress.Progress, default=None) – A progress bar object to display progress.
- Returns:
A list of original responses for each prompt.
- Return type:
list of str
- async score(logprobs_results, prompts=None, responses=None, sampled_responses=None, sampled_logprobs_results=None, show_progress_bars=True, _display_header=True)#
Compute white-box confidence scores from provided logprobs.
- Return type:
- Parameters:
logprobs_results (list of logprobs_result) – List of dictionaries, each returned by BaseChatModel.agenerate
prompts (list of str, default=None) – A list of input prompts for the model. Required only for “p_true” scorer.
responses (list of str, default=None) – A list of model responses for the prompts. Required for “monte_carlo_probability”, “consistency_and_confidence”, “semantic_negentropy”, “semantic_density”, “p_true” scorers.
sampled_responses (list of list of str, default=None) – A list of lists of sampled LLM responses for each prompt. These will be used to compute consistency scores by comparing to the corresponding response from responses. Required for “monte_carlo_probability”, “consistency_and_confidence”, “semantic_negentropy”, “semantic_density” scorers.
sampled_logprobs_results (list of lists of logprobs_result) – List of list of dictionaries, each returned by BaseChatModel.agenerate corresponding to sampled_responses. Required only for “monte_carlo_probability”, “semantic_negentropy”, “semantic_density” scorers.
show_progress_bars (bool, default=True) – If True, displays a progress bar while scoring responses
- Returns:
UQResult containing prompts, responses, logprobs, and white-box UQ scores
- Return type:
References