Binary Judge (True/False)#
true_false
The binary judge template instructs an LLM to classify a question-response as either correct or incorrect.
Definition#
This template modifies the ternary approach to include only two categories:
The judge function \(J: \mathcal{Y} \rightarrow \{0, 1\}\) maps responses to binary scores.
Key Properties:
Simpler binary classification without uncertain category
Forces the judge to make a definitive decision
Useful when you want clear-cut correct/incorrect labels
How It Works#
Present the judge LLM with the original question and response
Ask the judge to classify the response as “correct” or “incorrect”
Map the classification to a numerical score (1 or 0)
Use this template when you prefer binary decisions without an intermediate uncertainty category.
Parameters#
When using LLMJudge or LLMPanel, specify
scoring_template="true_false".
Example#
from uqlm.judges import LLMJudge
# Initialize with binary template
judge = LLMJudge(
llm=judge_llm,
scoring_template="true_false"
)
# Score responses
result = await judge.judge_responses(
prompts=prompts,
responses=responses
)
Using with LLMPanel:
from uqlm import LLMPanel
# Create a panel with binary scoring
panel = LLMPanel(
llm=original_llm,
judges=[judge_llm1, judge_llm2],
scoring_templates=["true_false"] * 2
)
results = await panel.generate_and_score(prompts=prompts)
References#
Chen, J. & Mueller, J. (2023). Quantifying Uncertainty in Answers from any Language Model and Enhancing their Trustworthiness. arXiv.
Luo, H., et al. (2023). ChatGPT as a Factual Inconsistency Evaluator for Text Summarization. arXiv.
See Also#
LLMJudge- Single LLM judge classLLMPanel- Panel of multiple judgesTernary Judge (True/False/Uncertain) - Ternary scoring template with uncertainty