uqlm.scorers.panel.LLMPanel#

class uqlm.scorers.panel.LLMPanel(judges, llm=None, system_prompt='You are a helpful assistant.', max_calls_per_min=None, scoring_templates=None)#

Bases: UncertaintyQuantifier

__init__(judges, llm=None, system_prompt='You are a helpful assistant.', max_calls_per_min=None, scoring_templates=None)#

Class for aggregating multiple instances of LLMJudge using min, max, or majority voting

Parameters:

judges (list of LLMJudge or BaseChatModel) – Judges to use. If BaseChatModel, LLMJudge is instantiated using default parameters.
llm (BaseChatModel) – A langchain llm object to get passed to chain constructor. User is responsible for specifying temperature and other relevant parameters to the constructor of their llm object.
max_calls_per_min (int, default=None) – Used to control rate limiting. Will be used for original llm and any judges constructed from instances of BaseChatModel in judges
system_prompt (str or None, default="You are a helpful assistant.") – Optional argument for user to provide custom system prompt
scoring_templates (List[str], default=None) – Specifies which off-the-shelf template to use for each judge. Four off-the-shelf templates offered: incorrect/uncertain/correct (0/0.5/1), incorrect/correct (0/1), continuous score (0 to 1), and likert scale score ( 1-5 scale, normalized to 0/0.25/0.5/0.75/1). These templates are respectively specified as ‘true_false_uncertain’, ‘true_false’, ‘continuous’, and ‘likert’ If specified, must be of equal length to judges list. Defaults to ‘true_false_uncertain’ template used by Chen and Mueller (2023) [1] for each judge.

Methods

`__init__`(judges[, llm, system_prompt, ...])	Class for aggregating multiple instances of LLMJudge using min, max, or majority voting
`generate_and_score`(prompts)	Generate LLM responses to provided prompts and use panel of judges to score responses for correctness.
`generate_candidate_responses`(prompts)	This method generates multiple responses for uncertainty estimation.
`generate_original_responses`(prompts)	This method generates original responses for uncertainty estimation.
`score`(prompts[, responses])	Use panel to of judges to score provided responses for correctness.

async generate_and_score(prompts)#

Generate LLM responses to provided prompts and use panel of judges to score responses for correctness.

Return type:: UQResult
Parameters:: prompts (list of str) – A list of input prompts for the model.
Returns:: UQResult containing prompts, responses, Q/A concatenations, judge responses, and judge scores
Return type:: UQResult

async generate_candidate_responses(prompts)#

This method generates multiple responses for uncertainty estimation. If specified in the child class, all responses are postprocessed using the callable function defined by the user.

Return type:: List[List[str]]

async generate_original_responses(prompts)#

This method generates original responses for uncertainty estimation. If specified in the child class, all responses are postprocessed using the callable function defined by the user.

Return type:: List[str]

async score(prompts, responses=None)#

Use panel to of judges to score provided responses for correctness. Use if responses are already generated. Otherwise, use generate_and_score.

Return type:

UQResult

Parameters:

prompts (list of str) – A list of input prompts for the model.
responses (list of str, default = None) – A list of LLM responses for the corresponding to the provided prompts.

Returns:

UQResult containing prompts, responses, Q/A concatenations, judge responses, and judge scores

Return type:

UQResult

References

uqlm.scorers.panel.LLMPanel#

This Page