Continuous Judge#

continuous

The continuous judge template instructs an LLM to directly score a question-response concatenation’s correctness on a scale of 0 to 1.

Definition#

For the continuous template, the LLM is asked to provide a numerical score:

\[J(y_i) \in [0, 1]\]

where 0 indicates completely incorrect and 1 indicates completely correct.

Key Properties:

Fine-grained scoring without discrete categories
Allows nuanced assessment of partial correctness
Score range: \([0, 1]\) continuous

How It Works#

Present the judge LLM with the original question and response
Ask the judge to assign a correctness score between 0 and 1
Parse and return the numerical score

This template is useful when you want more granular assessments than binary or ternary classifications, allowing the judge to express partial correctness (e.g., 0.7 for mostly correct responses).

Parameters#

When using LLMJudge or LLMPanel, specify scoring_template="continuous".

Example#

from uqlm.judges import LLMJudge

# Initialize with continuous template
judge = LLMJudge(
    llm=judge_llm,
    scoring_template="continuous"
)

# Score responses
result = await judge.judge_responses(
    prompts=prompts,
    responses=responses
)

# Scores will be continuous values between 0 and 1
print(result["scores"])

Using with LLMPanel:

from uqlm import LLMPanel

# Create a panel with continuous scoring
panel = LLMPanel(
    llm=original_llm,
    judges=[judge_llm1, judge_llm2],
    scoring_templates=["continuous"] * 2
)

results = await panel.generate_and_score(prompts=prompts)

References#

Xiong, M., et al. (2024). Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs. arXiv.