Likert Scale Judge#

likert

The Likert scale judge template instructs an LLM to score a question-response on a 5-point scale, which is then normalized to the \([0, 1]\) range.

Definition#

The judge is asked to score on a 5-point Likert scale, which is converted to normalized scores:

\[\begin{split}J(y_i) = \begin{cases} 0 & \text{LLM states response is completely incorrect} \\ 0.25 & \text{LLM states response is mostly incorrect} \\ 0.5 & \text{LLM states response is partially correct} \\ 0.75 & \text{LLM states response is mostly correct} \\ 1 & \text{LLM states response is completely correct} \end{cases}\end{split}\]

Key Properties:

Structured 5-point scale familiar from survey research
Balanced granularity between binary and continuous scoring
Normalized to \([0, 1]\) for consistency with other scorers

How It Works#

Present the judge LLM with the original question and response
Ask the judge to rate on a 5-point scale:
- 1: Completely incorrect
- 2: Mostly incorrect
- 3: Partially correct
- 4: Mostly correct
- 5: Completely correct
Normalize the score to \([0, 1]\) by mapping 1→0, 2→0.25, 3→0.5, 4→0.75, 5→1

The Likert scale provides more structure than continuous scoring while offering more granularity than ternary classification.

Parameters#

When using LLMJudge or LLMPanel, specify scoring_template="likert".

Example#

from uqlm.judges import LLMJudge

# Initialize with likert template
judge = LLMJudge(
    llm=judge_llm,
    scoring_template="likert"
)

# Score responses
result = await judge.judge_responses(
    prompts=prompts,
    responses=responses
)

# Scores will be one of: 0, 0.25, 0.5, 0.75, 1
print(result["scores"])

Using with LLMPanel:

from uqlm import LLMPanel

# Create a panel with likert scoring
panel = LLMPanel(
    llm=original_llm,
    judges=[judge_llm1, judge_llm2],
    scoring_templates=["likert"] * 2
)

results = await panel.generate_and_score(prompts=prompts)

References#

Bai, Y., et al. (2023). Benchmarking ChatGPT for Retrieving and Recommending Medical Information. arXiv.