uqlm.black_box.consistency.ConsistencyScorer#

class uqlm.black_box.consistency.ConsistencyScorer(nli_model_name='microsoft/deberta-large-mnli', max_length=2000, use_best=False, scorers=['noncontradiction', 'entailment'])#

Bases: SimilarityScorer

__init__(nli_model_name='microsoft/deberta-large-mnli', max_length=2000, use_best=False, scorers=['noncontradiction', 'entailment'])#

Initialize the NonContradictionScorer.

Parameters:

use_best (bool, default=False) – Specifies whether to swap the original response for the uncertainty-minimized response based on semantic entropy clusters.

Methods

__init__([nli_model_name, max_length, ...])

Initialize the NonContradictionScorer.

evaluate(responses, sampled_responses[, ...])

Evaluate confidence scores on LLM responses.

evaluate(responses, sampled_responses, available_nli_scores={}, progress_bar=None)#

Evaluate confidence scores on LLM responses.

Return type:

Dict[str, Any]

Parameters:
  • responses (list of strings) – Original LLM response

  • sampled_responses (list of list of strings) – Sampled candidate responses to be compared to the original response

  • progress_bar (rich.progress.Progress, default=None) – If provided, displays a progress bar while scoring responses

Returns:

Dictionary containing mean NLI and (optionally) semantic entropy scores. The dictionary will also contain original and multiple responses, updated if use_best is True

Return type:

Dict

References