uqlm.white_box.sampled_logprobs.SampledLogprobsScorer#

class uqlm.white_box.sampled_logprobs.SampledLogprobsScorer(scorers=['semantic_negentropy', 'semantic_density', 'monte_carlo_probability', 'consistency_and_confidence'], llm=None, nli_model_name='microsoft/deberta-large-mnli', max_length=2000, prompts_in_nli=True, length_normalize=True, device=None)#

Bases: LogprobsScorer

__init__(scorers=['semantic_negentropy', 'semantic_density', 'monte_carlo_probability', 'consistency_and_confidence'], llm=None, nli_model_name='microsoft/deberta-large-mnli', max_length=2000, prompts_in_nli=True, length_normalize=True, device=None)#

Initialize the SampledLogprobsScorer.

Parameters:
  • scorers (List[str], default=SAMPLED_LOGPROBS_SCORER_NAMES) – Specifies which scorers to compute. Must be a subset of [“semantic_negentropy”, “semantic_density”, “monte_carlo_probability”, “consistency_and_confidence”].

  • llm (BaseChatModel, default=None) – Specifies the LLM to use. Must be a BaseChatModel.

  • nli_model_name (str, default="microsoft/deberta-large-mnli") – Specifies which NLI model to use. Must be acceptable input to AutoTokenizer.from_pretrained() and AutoModelForSequenceClassification.from_pretrained()

  • max_length (int, default=2000) – Specifies the maximum allowed string length. Responses longer than this value will be truncated to avoid OutOfMemoryError

  • prompts_in_nli (bool, default=True) – Specifies whether to use the prompts in the NLI inputs for semantic entropy and semantic density scorers.

  • length_normalize (bool, default=True) – Specifies whether to length normalize the logprobs. This attribute affect the response probability computation for three scorers (semantic_negentropy, semantic_density, and monte_carlo_probability).

  • device (str or torch.device input or torch.device object, default="cpu") – Specifies the device that NLI model use for prediction. Only applies to ‘semantic_negentropy’, ‘semantic_density’ scorers. Pass a torch.device to leverage GPU.

Methods

__init__([scorers, llm, nli_model_name, ...])

Initialize the SampledLogprobsScorer.

compute_consistency_confidence(responses, ...)

compute_semantic_density(responses, ...[, ...])

compute_semantic_negentropy(responses, ...)

evaluate(responses, sampled_responses, ...)

extract_logprobs(single_response_logprobs)

Extract log probabilities from token data

extract_probs(single_response_logprobs)

Extract probabilities from token data

extract_top_logprobs(single_response_logprobs)

Extract top log probabilities for each token

monte_carlo_probability(responses, ...)

static extract_logprobs(single_response_logprobs)#

Extract log probabilities from token data

Return type:

ndarray

extract_probs(single_response_logprobs)#

Extract probabilities from token data

Return type:

ndarray

static extract_top_logprobs(single_response_logprobs)#

Extract top log probabilities for each token

Return type:

List[ndarray]

References