LLM-as-a-Judge Scorers ====================== LLM-as-a-Judge scorers use one or more LLMs to evaluate the reliability of the original LLM's response. They offer high customizability through prompt engineering and the choice of judge LLM(s). **Key Characteristics:** - **Universal Compatibility:** Works with any LLM - **Highly Customizable:** Use any LLM as a judge and tailor instruction prompts for specific use cases - **Self-Reflection Capable:** Can use the same LLM as both generator and judge **Trade-offs:** - **Added Cost:** Requires additional LLM calls for the judge LLM(s) - **Added Latency:** Judge evaluations add to the total response time **Overview:** Under the LLM-as-a-Judge approach, either the same LLM that was used for generating the original responses or a different LLM is asked to form a judgment about a pre-generated response. Several scoring templates are available to accommodate different use cases. .. toctree:: :maxdepth: 1 :caption: Scoring Templates true_false_uncertain true_false continuous likert Panel of Judges --------------- For improved robustness, you can use the :class:`~uqlm.scorers.LLMPanel` class to aggregate scores from multiple LLM judges using various aggregation methods (average, min, max, median). .. toctree:: :maxdepth: 1 panel