uqlm.scorers.density.SemanticDensity#

class uqlm.scorers.density.SemanticDensity(llm=None, postprocessor=None, device=None, system_prompt='You are a helpful assistant.', max_calls_per_min=None, use_n_param=False, sampling_temperature=1.0, verbose=False, nli_model_name='microsoft/deberta-large-mnli', max_length=2000, return_responses='all', length_normalize=True)#

Bases: UncertaintyQuantifier

__init__(llm=None, postprocessor=None, device=None, system_prompt='You are a helpful assistant.', max_calls_per_min=None, use_n_param=False, sampling_temperature=1.0, verbose=False, nli_model_name='microsoft/deberta-large-mnli', max_length=2000, return_responses='all', length_normalize=True)#

Class for computing semantic density and associated confidence scores. For more on semantic density, refer to Qiu et al.(2024) [1].

Parameters:
  • llm (langchain BaseChatModel, default=None) – A langchain llm BaseChatModel. User is responsible for specifying temperature and other relevant parameters to the constructor of their llm object.

  • postprocessor (callable, default=None) – A user-defined function that takes a string input and returns a string. Used for postprocessing outputs before black-box comparisons.

  • device (str or torch.device input or torch.device object, default="cpu") – Specifies the device that NLI model use for prediction. Only applies to ‘semantic_negentropy’, ‘noncontradiction’ scorers. Pass a torch.device to leverage GPU.

  • system_prompt (str or None, default="You are a helpful assistant.") – Optional argument for user to provide custom system prompt

  • max_calls_per_min (int, default=None) – Specifies how many api calls to make per minute to avoid a rate limit error. By default, no limit is specified.

  • sampling_temperature (float, default=1.0) – The ‘temperature’ parameter for llm model to generate sampled LLM responses. Must be greater than 0.

  • use_n_param (bool, default=False) – Specifies whether to use n parameter for BaseChatModel. Not compatible with all BaseChatModel classes. If used, it speeds up the generation process substantially when num_responses > 1.

  • verbose (bool, default=False) – Specifies whether to print the index of response currently being scored.

  • return_responses (str, default="all") – If a postprocessor is used, specifies whether to return only postprocessed responses, only raw responses, or both. Specified with ‘postprocessed’, ‘raw’, or ‘all’, respectively.

  • nli_model_name (str, default="microsoft/deberta-large-mnli") – Specifies which NLI model to use. Must be acceptable input to AutoTokenizer.from_pretrained() and AutoModelForSequenceClassification.from_pretrained()

  • max_length (int, default=2000) – Specifies the maximum allowed string length. Responses longer than this value will be truncated to avoid OutOfMemoryError

  • length_normalize (bool, default=True) – Determines whether response probabilities are length-normalized. Recommended to set as True when longer responses are expected.

Methods

__init__([llm, postprocessor, device, ...])

Class for computing semantic density and associated confidence scores.

generate_and_score(prompts[, num_responses, ...])

Evaluate semantic density score on LLM responses for the provided prompts.

generate_candidate_responses(prompts[, ...])

This method generates multiple responses for uncertainty estimation.

generate_original_responses(prompts[, ...])

This method generates original responses for uncertainty estimation.

score([prompts, responses, ...])

Evaluate semantic density score on LLM responses for the provided prompts.

async generate_and_score(prompts, num_responses=5, show_progress_bars=True)#

Evaluate semantic density score on LLM responses for the provided prompts.

Return type:

UQResult

Parameters:
  • prompts (List[Union[str, List[BaseMessage]]]) – List of prompts from which LLM responses will be generated. Prompts in list may be strings or lists of BaseMessage. If providing input type List[List[BaseMessage]], refer to https://python.langchain.com/docs/concepts/messages/#langchain-messages for support.

  • num_responses (int, default=5) – The number of sampled responses used to compute consistency.

  • show_progress_bars (bool, default=True) – If True, displays a progress bar while generating and scoring responses

Returns:

UQResult, containing data (prompts, responses, and semantic density score) and metadata

Return type:

UQResult

async generate_candidate_responses(prompts, num_responses=5, progress_bar=None)#

This method generates multiple responses for uncertainty estimation. If specified in the child class, all responses are postprocessed using the callable function defined by the user.

Return type:

List[List[str]]

Parameters:
  • prompts (List[Union[str, List[BaseMessage]]]) – List of prompts from which LLM responses will be generated. Prompts in list may be strings or lists of BaseMessage. If providing input type List[List[BaseMessage]], refer to https://python.langchain.com/docs/concepts/messages/#langchain-messages for support.

  • num_responses (int, default=5) – The number of sampled responses used to compute consistency.

  • progress_bar (rich.progress.Progress, default=None) – A progress bar object to display progress.

Returns:

A list of sampled responses for each prompt.

Return type:

list of list of str

async generate_original_responses(prompts, top_k_logprobs=None, progress_bar=None)#

This method generates original responses for uncertainty estimation. If specified in the child class, all responses are postprocessed using the callable function defined by the user.

Return type:

List[str]

Parameters:
  • prompts (List[Union[str, List[BaseMessage]]]) – List of prompts from which LLM responses will be generated. Prompts in list may be strings or lists of BaseMessage. If providing input type List[List[BaseMessage]], refer to https://python.langchain.com/docs/concepts/messages/#langchain-messages for support.

  • progress_bar (rich.progress.Progress, default=None) – A progress bar object to display progress.

Returns:

A list of original responses for each prompt.

Return type:

list of str

score(prompts=None, responses=None, sampled_responses=None, logprobs_results=None, sampled_logprobs_results=None, show_progress_bars=True, _display_header=True)#

Evaluate semantic density score on LLM responses for the provided prompts.

Return type:

UQResult

Parameters:
  • prompts (List[Union[str, List[BaseMessage]]]) – List of prompts from which LLM responses will be generated. Prompts in list may be strings or lists of BaseMessage. If providing input type List[List[BaseMessage]], refer to https://python.langchain.com/docs/concepts/messages/#langchain-messages for support.

  • responses (list of str, default=None) – A list of model responses for the prompts. If not provided, responses will be generated with the provided LLM.

  • sampled_responses (list of list of str, default=None) – A list of lists of sampled model responses for each prompt. These will be used to compute consistency scores by comparing to the corresponding response from responses. If not provided, sampled_responses will be generated with the provided LLM.

  • logprobs_results (list of list of dict, default=None) – A list of lists of logprobs results for each prompt. If not provided, logprobs will be generated with the provided LLM.

  • sampled_logprobs_results (list of list of list of dict, default=None) – A list of lists of lists of logprobs results for each prompt. If not provided, sampled_logprobs will be generated with the provided LLM.

  • show_progress_bars (bool, default=True) – If True, displays a progress bar while scoring responses

Returns:

UQResult, containing data (responses, sampled responses, and semantic density score) and metadata

Return type:

UQResult

References