uqlm.scorers.black_box.BlackBoxUQ#
- class uqlm.scorers.black_box.BlackBoxUQ(llm=None, scorers=None, device=None, use_best=True, nli_model_name='microsoft/deberta-large-mnli', postprocessor=None, system_prompt='You are a helpful assistant.', max_calls_per_min=None, sampling_temperature=1.0, use_n_param=False, max_length=2000, verbose=False)#
Bases:
UncertaintyQuantifier
- __init__(llm=None, scorers=None, device=None, use_best=True, nli_model_name='microsoft/deberta-large-mnli', postprocessor=None, system_prompt='You are a helpful assistant.', max_calls_per_min=None, sampling_temperature=1.0, use_n_param=False, max_length=2000, verbose=False)#
Class for black box uncertainty quantification. Leverages multiple responses to the same prompt to evaluate consistency as an indicator of hallucination likelihood.
- Parameters:
llm (langchain BaseChatModel, default=None) – A langchain llm BaseChatModel. User is responsible for specifying temperature and other relevant parameters to the constructor of their llm object.
scorers (subset of {) – ‘semantic_negentropy’, ‘noncontradiction’, ‘exact_match’, ‘bert_score’, ‘bleurt’, ‘cosine_sim’
} – Specifies which black box (consistency) scorers to include. If None, defaults to [“semantic_negentropy”, “noncontradiction”, “exact_match”, “cosine_sim”].
default=None – Specifies which black box (consistency) scorers to include. If None, defaults to [“semantic_negentropy”, “noncontradiction”, “exact_match”, “cosine_sim”].
device (str or torch.device input or torch.device object, default="cpu") – Specifies the device that NLI model use for prediction. Only applies to ‘semantic_negentropy’, ‘noncontradiction’ scorers. Pass a torch.device to leverage GPU.
use_best (bool, default=True) – Specifies whether to swap the original response for the uncertainty-minimized response based on semantic entropy clusters. Only used if scorers includes ‘semantic_negentropy’ or ‘noncontradiction’.
nli_model_name (str, default="microsoft/deberta-large-mnli") – Specifies which NLI model to use. Must be acceptable input to AutoTokenizer.from_pretrained() and AutoModelForSequenceClassification.from_pretrained()
postprocessor (callable, default=None) – A user-defined function that takes a string input and returns a string. Used for postprocessing outputs.
system_prompt (str or None, default="You are a helpful assistant.") – Optional argument for user to provide custom system prompt
max_calls_per_min (int, default=None) – Specifies how many api calls to make per minute to avoid a rate limit error. By default, no limit is specified.
sampling_temperature (float, default=1.0) – The ‘temperature’ parameter for llm model to generate sampled LLM responses. Must be greater than 0.
use_n_param (bool, default=False) – Specifies whether to use n parameter for BaseChatModel. Not compatible with all BaseChatModel classes. If used, it speeds up the generation process substantially when num_responses > 1.
max_length (int, default=2000) – Specifies the maximum allowed string length. Responses longer than this value will be truncated to avoid OutOfMemoryError
verbose (bool, default=False) – Specifies whether to print the index of response currently being scored.
Methods
__init__
([llm, scorers, device, use_best, ...])Class for black box uncertainty quantification.
generate_and_score
(prompts[, num_responses])Generate LLM responses, sampled LLM (candidate) responses, and compute confidence scores with specified scorers for the provided prompts.
generate_candidate_responses
(prompts)This method generates multiple responses for uncertainty estimation.
generate_original_responses
(prompts)This method generates original responses for uncertainty estimation.
score
(responses, sampled_responses)Compute confidence scores with specified scorers on provided LLM responses.
- async generate_and_score(prompts, num_responses=5)#
Generate LLM responses, sampled LLM (candidate) responses, and compute confidence scores with specified scorers for the provided prompts.
- async generate_candidate_responses(prompts)#
This method generates multiple responses for uncertainty estimation. If specified in the child class, all responses are postprocessed using the callable function defined by the user.
- Return type:
List
[List
[str
]]
- async generate_original_responses(prompts)#
This method generates original responses for uncertainty estimation. If specified in the child class, all responses are postprocessed using the callable function defined by the user.
- Return type:
List
[str
]
- score(responses, sampled_responses)#
Compute confidence scores with specified scorers on provided LLM responses. Should only be used if responses and sampled responses are already generated. Otherwise, use generate_and_score.
- Return type:
- Parameters:
responses (list of str, default=None) – A list of model responses for the prompts.
sampled_responses (list of list of str, default=None) – A list of lists of sampled LLM responses for each prompt. These will be used to compute consistency scores by comparing to the corresponding response from responses.
- Returns:
UQResult containing data (prompts, responses, and scores) and metadata
- Return type:
References