Likert Scale Judge#
likert
The Likert scale judge template instructs an LLM to score a question-response on a 5-point scale, which is then normalized to the \([0, 1]\) range.
Definition#
The judge is asked to score on a 5-point Likert scale, which is converted to normalized scores:
Key Properties:
Structured 5-point scale familiar from survey research
Balanced granularity between binary and continuous scoring
Normalized to \([0, 1]\) for consistency with other scorers
How It Works#
Present the judge LLM with the original question and response
Ask the judge to rate on a 5-point scale:
1: Completely incorrect
2: Mostly incorrect
3: Partially correct
4: Mostly correct
5: Completely correct
Normalize the score to \([0, 1]\) by mapping 1→0, 2→0.25, 3→0.5, 4→0.75, 5→1
The Likert scale provides more structure than continuous scoring while offering more granularity than ternary classification.
Parameters#
When using LLMJudge or LLMPanel, specify
scoring_template="likert".
Example#
from uqlm.judges import LLMJudge
# Initialize with likert template
judge = LLMJudge(
llm=judge_llm,
scoring_template="likert"
)
# Score responses
result = await judge.judge_responses(
prompts=prompts,
responses=responses
)
# Scores will be one of: 0, 0.25, 0.5, 0.75, 1
print(result["scores"])
Using with LLMPanel:
from uqlm import LLMPanel
# Create a panel with likert scoring
panel = LLMPanel(
llm=original_llm,
judges=[judge_llm1, judge_llm2],
scoring_templates=["likert"] * 2
)
results = await panel.generate_and_score(prompts=prompts)
References#
Bai, Y., et al. (2023). Benchmarking ChatGPT for Retrieving and Recommending Medical Information. arXiv.
See Also#
LLMJudge- Single LLM judge classLLMPanel- Panel of multiple judgesContinuous Judge - Continuous scoring alternative
Ternary Judge (True/False/Uncertain) - Simpler 3-point classification