Binary Judge (True/False)#

true_false

The binary judge template instructs an LLM to classify a question-response as either correct or incorrect.

Definition#

This template modifies the ternary approach to include only two categories:

\[\begin{split}J(y_i) = \begin{cases} 0 & \text{LLM states response is incorrect} \\ 1 & \text{LLM states response is correct} \end{cases}\end{split}\]

The judge function \(J: \mathcal{Y} \rightarrow \{0, 1\}\) maps responses to binary scores.

Key Properties:

  • Simpler binary classification without uncertain category

  • Forces the judge to make a definitive decision

  • Useful when you want clear-cut correct/incorrect labels

How It Works#

  1. Present the judge LLM with the original question and response

  2. Ask the judge to classify the response as “correct” or “incorrect”

  3. Map the classification to a numerical score (1 or 0)

Use this template when you prefer binary decisions without an intermediate uncertainty category.

Parameters#

When using LLMJudge or LLMPanel, specify scoring_template="true_false".

Example#

from uqlm.judges import LLMJudge

# Initialize with binary template
judge = LLMJudge(
    llm=judge_llm,
    scoring_template="true_false"
)

# Score responses
result = await judge.judge_responses(
    prompts=prompts,
    responses=responses
)

Using with LLMPanel:

from uqlm import LLMPanel

# Create a panel with binary scoring
panel = LLMPanel(
    llm=original_llm,
    judges=[judge_llm1, judge_llm2],
    scoring_templates=["true_false"] * 2
)

results = await panel.generate_and_score(prompts=prompts)

References#

See Also#