Ternary Judge (True/False/Uncertain)#
true_false_uncertain
The ternary judge template instructs an LLM to score a question-response concatenation as either incorrect, uncertain, or correct using a carefully constructed prompt.
Definition#
We follow the approach proposed by Chen & Mueller (2023), where an LLM is instructed to score a question-response as one of three categories. These categories are mapped to numerical scores:
The judge function \(J: \mathcal{Y} \rightarrow \{0, 0.5, 1\}\) maps responses to confidence scores.
Key Properties:
Three-way classification allows expressing uncertainty
Intermediate score (0.5) useful for ambiguous cases
Can be used with self-judging or external judge LLMs
How It Works#
Present the judge LLM with the original question and response
Ask the judge to classify the response as “incorrect”, “uncertain”, or “correct”
Map the classification to a numerical score (0, 0.5, or 1)
The ternary format is the default template in UQLM and is recommended for most use cases where distinguishing between definitely wrong, uncertain, and definitely correct responses is valuable.
Parameters#
When using LLMJudge or LLMPanel, specify
scoring_template="true_false_uncertain".
Example#
from uqlm.judges import LLMJudge
# Initialize with ternary template (default)
judge = LLMJudge(
llm=judge_llm,
scoring_template="true_false_uncertain"
)
# Score responses
result = await judge.judge_responses(
prompts=prompts,
responses=responses
)
Using with LLMPanel for multiple judges:
from uqlm import LLMPanel
# Create a panel with multiple judges using ternary template
panel = LLMPanel(
llm=original_llm,
judges=[judge_llm1, judge_llm2, judge_llm3],
scoring_templates=["true_false_uncertain"] * 3
)
# Generate and score
results = await panel.generate_and_score(prompts=prompts)
References#
Chen, J. & Mueller, J. (2023). Quantifying Uncertainty in Answers from any Language Model and Enhancing their Trustworthiness. arXiv.
Manakul, P., et al. (2023). SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models. arXiv.
See Also#
LLMJudge- Single LLM judge classLLMPanel- Panel of multiple judgesBinary Judge (True/False) - Binary (simpler) scoring template