P(True) ======= .. currentmodule:: uqlm.scorers ``p_true`` P(True) is a self-reflection method that presents an LLM with its own previous response and asks it to classify the statement as "True" or "False". The confidence score is derived from the token probability for the "True" answer. Definition ---------- Given a prompt :math:`x` and the LLM's response :math:`y`, the P(True) scorer: 1. Constructs a self-reflection prompt asking the LLM to evaluate whether :math:`y` correctly answers :math:`x` 2. Requests the LLM to respond with "True" or "False" 3. Returns the token probability for "True" as the confidence score .. math:: P(True) = p(\text{"True"} | x, y, \text{reflection\_prompt}) If the model answers "False", the score is computed as: .. math:: P(True) = 1 - p(\text{"False"} | x, y, \text{reflection\_prompt}) **Key Properties:** - Self-reflection approach - uses the same LLM to evaluate its own response - Requires one additional LLM generation per response - Score range: :math:`[0, 1]` How It Works ------------ 1. Generate an original response to the prompt 2. Construct a self-reflection prompt that presents: - The original question/prompt - The LLM's response - A request to classify whether the response is correct 3. Generate the classification with logprobs enabled 4. Extract the probability of "True" (or 1 - probability of "False") This scorer leverages the model's own ability to assess the quality of its responses, providing a form of self-consistency check. Parameters ---------- When using :class:`WhiteBoxUQ`, specify ``"p_true"`` in the ``scorers`` list. Example ------- .. code-block:: python from uqlm import WhiteBoxUQ # Initialize with p_true scorer wbuq = WhiteBoxUQ( llm=llm, scorers=["p_true"] ) # Generate responses and compute scores # Note: p_true generates one additional call per prompt results = await wbuq.generate_and_score(prompts=prompts) # Access the p_true scores print(results.to_df()["p_true"]) References ---------- - Kadavath, S., et al. (2022). `Language Models (Mostly) Know What They Know `_. *arXiv*. See Also -------- - :class:`WhiteBoxUQ` - Main class for white-box uncertainty quantification - :doc:`/scorer_definitions/llm_judges/index` - LLM-as-a-Judge scorers for external evaluation