uqlm.longform.graph.graph_scorer.GraphScorer#

class uqlm.longform.graph.graph_scorer.GraphScorer(nli_model_name='microsoft/deberta-large-mnli', device=None, max_length=2000, nli_llm=None)#

Bases: ClaimScorer

__init__(nli_model_name='microsoft/deberta-large-mnli', device=None, max_length=2000, nli_llm=None)#

Calculates variations of the graph-based uncertainty metrics by Jiang et al., 2024: https://arxiv.org/abs/2410.20783

Parameters:

nli_model_name (str, default="microsoft/deberta-large-mnli") – Specifies which NLI model to use. Must be acceptable input to AutoTokenizer.from_pretrained() and AutoModelForSequenceClassification.from_pretrained()
device (torch.device input or torch.device object, default=None) – Specifies the device that classifiers use for prediction. Set to “cuda” for classifiers to be able to leverage the GPU.
max_length (int, default=2000) – Specifies the maximum allowed string length. Responses longer than this value will be truncated to avoid OutOfMemoryError
nli_llm (BaseChatModel, default=None) – A LangChain chat model for LLM-based NLI inference. If provided, takes precedence over nli_model_name.

Methods

`__init__`([nli_model_name, device, ...])	Calculates variations of the graph-based uncertainty metrics by Jiang et al., 2024: https://arxiv.org/abs/2410.20783
`evaluate`(original_claim_sets, ...[, ...])	Evaluate the graph-based scores over response sets and corresponding claim sets

async evaluate(original_claim_sets, master_claim_sets, response_sets, binary_edge_threshold=0.5, progress_bar=None)#

Evaluate the graph-based scores over response sets and corresponding claim sets

Return type:

List[List[ClaimScore]]

Parameters:

original_claim_sets (list of list of strings) – List of original claim sets
master_claim_sets (list of list of strings) – List of master claim sets
sampled_responses (list of list of strings) – Candidate responses to be compared to the decomposed original responses
progress_bar (rich.progress.Progress, default=None) – If provided, displays a progress bar while scoring responses

References

uqlm.longform.graph.graph_scorer.GraphScorer#

This Page