uqlm.longform.benchmark.factscore_grader.FactScoreGrader#

class uqlm.longform.benchmark.factscore_grader.FactScoreGrader(llm, max_calls_per_min=None)#

Bases: object

__init__(llm, max_calls_per_min=None)#

Class for grading LLM responses to questions from the FactScore dataset: https://arxiv.org/abs/2305.14251

Parameters:
  • llm (langchain BaseChatModel, default=None) – A langchain llm BaseChatModel. User is responsible for specifying temperature and other relevant parameters to the constructor of their llm object. This is used to grade claims against the FactScore answer key.

  • max_calls_per_min (int, default=None) – Specifies how many api calls to make per minute to avoid a rate limit error. By default, no limit is specified.

Methods

__init__(llm[, max_calls_per_min])

Class for grading LLM responses to questions from the FactScore dataset: https://arxiv.org/abs/2305.14251

construct_entailment_prompt(claim, answer)

Construct entailment prompt from claim and answer

construct_subjective_prompt(claim)

Construct prompt to evaluate whether claim is objective or subjectiver

evaluate_claim_objectivity(claim_sets[, ...])

Evaluate whether claims are objective or subjective

grade_claims(claim_sets, answers[, progress_bar])

Grade claims against FactScore answers

construct_entailment_prompt(claim, answer)#

Construct entailment prompt from claim and answer

Return type:

str

construct_subjective_prompt(claim)#

Construct prompt to evaluate whether claim is objective or subjectiver

Return type:

str

async evaluate_claim_objectivity(claim_sets, progress_bar=None)#

Evaluate whether claims are objective or subjective

Return type:

List[List[bool]]

Parameters:
  • claim_sets (List[List[str]]) – List of lists of claims to be evaluated as objective or subjective

  • progress_bar (rich.progress.Progress, default=None) – If provided, displays a progress bar while scoring responses

async grade_claims(claim_sets, answers, progress_bar=None)#

Grade claims against FactScore answers

Return type:

List[List[bool]]

Parameters:
  • claim_sets (List[List[str]]) – List of lists of claims (one list per FactScore question) to be graded

  • answers (List[str]) – FactScore answers to grade against (typically Wikipedia texts)

  • progress_bar (rich.progress.Progress, default=None) – If provided, displays a progress bar while scoring responses

References