uqlm.longform.benchmark.factscore_grader.FactScoreGrader#
- class uqlm.longform.benchmark.factscore_grader.FactScoreGrader(llm, max_calls_per_min=None)#
Bases:
object- __init__(llm, max_calls_per_min=None)#
Class for grading LLM responses to questions from the FactScore dataset: https://arxiv.org/abs/2305.14251
- Parameters:
llm (langchain BaseChatModel, default=None) – A langchain llm BaseChatModel. User is responsible for specifying temperature and other relevant parameters to the constructor of their llm object. This is used to grade claims against the FactScore answer key.
max_calls_per_min (int, default=None) – Specifies how many api calls to make per minute to avoid a rate limit error. By default, no limit is specified.
Methods
__init__(llm[, max_calls_per_min])Class for grading LLM responses to questions from the FactScore dataset: https://arxiv.org/abs/2305.14251
construct_entailment_prompt(claim, answer)Construct entailment prompt from claim and answer
construct_subjective_prompt(claim)Construct prompt to evaluate whether claim is objective or subjectiver
evaluate_claim_objectivity(claim_sets[, ...])Evaluate whether claims are objective or subjective
grade_claims(claim_sets, answers[, progress_bar])Grade claims against FactScore answers
- construct_entailment_prompt(claim, answer)#
Construct entailment prompt from claim and answer
- Return type:
str
- construct_subjective_prompt(claim)#
Construct prompt to evaluate whether claim is objective or subjectiver
- Return type:
str
- async evaluate_claim_objectivity(claim_sets, progress_bar=None)#
Evaluate whether claims are objective or subjective
- Return type:
List[List[bool]]- Parameters:
claim_sets (List[List[str]]) – List of lists of claims to be evaluated as objective or subjective
progress_bar (rich.progress.Progress, default=None) – If provided, displays a progress bar while scoring responses
- async grade_claims(claim_sets, answers, progress_bar=None)#
Grade claims against FactScore answers
- Return type:
List[List[bool]]- Parameters:
claim_sets (List[List[str]]) – List of lists of claims (one list per FactScore question) to be graded
answers (List[str]) – FactScore answers to grade against (typically Wikipedia texts)
progress_bar (rich.progress.Progress, default=None) – If provided, displays a progress bar while scoring responses
References