uqlm.longform.luq.matched_unit.MatchedUnitScorer#

class uqlm.longform.luq.matched_unit.MatchedUnitScorer(consistency_functions=['nli', 'bert_score', 'cosine_sim'], device=None, transformer='all-MiniLM-L6-v2', nli_model_name='microsoft/deberta-large-mnli', max_length=2000)#

Bases: ClaimScorer

__init__(consistency_functions=['nli', 'bert_score', 'cosine_sim'], device=None, transformer='all-MiniLM-L6-v2', nli_model_name='microsoft/deberta-large-mnli', max_length=2000)#

LUQScorer calculates variations of the LUQ, LUQ-Atomic, or LUQ-Pair scores.

Parameters:
  • consistency_functions (List[str], default=["nli", "bert_score", "cosine_sim"]) – Specifies which semantic consistency functions to use for scoring. Must be subset of [“nli”, “bert_score”, “cosine_sim”]

  • device (torch.device input or torch.device object, default=None) – Specifies the device that classifiers use for prediction. Set to “cuda” for classifiers to be able to leverage the GPU.

  • nli_model_name (str, default="microsoft/deberta-large-mnli") – Specifies which NLI model to use. Must be acceptable input to AutoTokenizer.from_pretrained() and AutoModelForSequenceClassification.from_pretrained()

  • max_length (int, default=2000) – Specifies the maximum allowed string length. Responses longer than this value will be truncated to avoid OutOfMemoryError

Methods

__init__([consistency_functions, device, ...])

LUQScorer calculates variations of the LUQ, LUQ-Atomic, or LUQ-Pair scores.

evaluate(claim_sets[, sampled_claim_sets, ...])

Evaluate the LUQ score and claim scores for a list of claims from each original response and sampled responses.

evaluate(claim_sets, sampled_claim_sets=None, progress_bar=None)#

Evaluate the LUQ score and claim scores for a list of claims from each original response and sampled responses.

Return type:

ClaimScores

Parameters:
  • claim_sets (list of list of strings) – List of original responses decomposed into lists of either claims or sentences

  • sampled_claim_sets (list of list of list of strings) – Decomposed responses to be compared to the decomposed original responses

  • progress_bar (rich.progress.Progress, default=None) – If provided, displays a progress bar while scoring responses

Returns:

Contains claim-level entailment, non-contradiction, and contrasted entailment scores averaged across candidate responses.

Return type:

Instance of ClaimScores

References