uqlm.longform.luq.unit_response.UnitResponseScorer#
- class uqlm.longform.luq.unit_response.UnitResponseScorer(nli_model_name='microsoft/deberta-large-mnli', device=None, max_length=2000, nli_llm=None)#
Bases:
ClaimScorer- __init__(nli_model_name='microsoft/deberta-large-mnli', device=None, max_length=2000, nli_llm=None)#
Calculates variations of the LUQ and LUQ-Atomic scorers: https://arxiv.org/abs/2403.20279
- Parameters:
nli_model_name (str, default="microsoft/deberta-large-mnli") – Specifies which NLI model to use. Must be acceptable input to AutoTokenizer.from_pretrained() and AutoModelForSequenceClassification.from_pretrained()
device (torch.device input or torch.device object, default=None) – Specifies the device that classifiers use for prediction. Set to “cuda” for classifiers to be able to leverage the GPU.
max_length (int, default=2000) – Specifies the maximum allowed string length. Responses longer than this value will be truncated to avoid OutOfMemoryError
nli_llm (BaseChatModel, default=None) – A LangChain chat model for LLM-based NLI inference. If provided, takes precedence over nli_model_name.
Methods
__init__([nli_model_name, device, ...])Calculates variations of the LUQ and LUQ-Atomic scorers: https://arxiv.org/abs/2403.20279
evaluate(claim_sets, sampled_responses[, ...])Evaluate the LUQ score and claim scores for a list of claims from each original response and sampled responses.
evaluate_with_llm(claim_sets, sampled_responses)Evaluate the LUQ score and claim scores for a list of claims from each original response and sampled responses.
- evaluate(claim_sets, sampled_responses, progress_bar=None)#
Evaluate the LUQ score and claim scores for a list of claims from each original response and sampled responses.
- Return type:
- Parameters:
claim_sets (list of list of strings) – List of original responses decomposed into lists of either claims or sentences
sampled_responses (list of list of strings) – Candidate responses to be compared to the decomposed original responses
progress_bar (rich.progress.Progress, default=None) – If provided, displays a progress bar while scoring responses
- Returns:
Contains claim-level entailment, non-contradiction, and contrasted entailment scores averaged across candidate responses.
- Return type:
Instance of ClaimScores
- async evaluate_with_llm(claim_sets, sampled_responses, progress_bar=None)#
Evaluate the LUQ score and claim scores for a list of claims from each original response and sampled responses.
- Return type:
- Parameters:
claim_sets (list of list of strings) – List of original responses decomposed into lists of either claims or sentences
sampled_responses (list of list of strings) – Candidate responses to be compared to the decomposed original responses
progress_bar (rich.progress.Progress, default=None) – If provided, displays a progress bar while scoring responses
- Returns:
Contains claim-level entailment, non-contradiction, and contrasted entailment scores averaged across candidate responses.
- Return type:
Instance of ClaimScores
References