Non-Contradiction Probability ============================= .. currentmodule:: uqlm.scorers ``noncontradiction`` Non-Contradiction Probability (NCP) computes the mean non-contradiction probability estimated by a natural language inference (NLI) model. Definition ---------- This score is formally defined as follows: .. math:: NCP(y_i; \tilde{\mathbf{y}}_i) = 1 - \frac{1}{m} \sum_{j=1}^m\frac{p_{\text{contra}}(y_i, \tilde{y}_{ij}) + p_{\text{contra}}(\tilde{y}_{ij}, y_i)}{2} where :math:`p_{\text{contra}}(y_i, \tilde{y}_{ij})` denotes the (asymmetric) contradiction probability estimated by the NLI model for response :math:`y_i` and candidate :math:`\tilde{y}_{ij}`. **Key Properties:** - The bidirectional averaging :math:`(p_{\text{contra}}(a, b) + p_{\text{contra}}(b, a))/2` accounts for the asymmetric nature of NLI - Higher NCP values indicate that the original response is less likely to contradict the sampled responses - Score range: :math:`[0, 1]` where 1 indicates no contradictions How It Works ------------ 1. Generate multiple candidate responses :math:`\tilde{\mathbf{y}}_i` from the same prompt 2. For each pair of original response :math:`y_i` and candidate :math:`\tilde{y}_{ij}`: - Compute contradiction probability in both directions using an NLI model - Average the bidirectional contradiction probabilities 3. Average across all candidates and subtract from 1 to get non-contradiction probability Parameters ---------- When using :class:`BlackBoxUQ`, specify ``"noncontradiction"`` in the ``scorers`` list. Example ------- .. code-block:: python from uqlm import BlackBoxUQ # Initialize with noncontradiction scorer bbuq = BlackBoxUQ( llm=llm, scorers=["noncontradiction"], nli_model_name="microsoft/deberta-large-mnli" ) # Generate responses and compute scores results = await bbuq.generate_and_score(prompts=prompts, num_responses=5) # Access the noncontradiction scores print(results.to_df()["noncontradiction"]) References ---------- - Chen, J. & Mueller, J. (2023). `Quantifying Uncertainty in Answers from any Language Model and Enhancing their Trustworthiness `_. *arXiv*. - Lin, Z., et al. (2024). `Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models `_. *arXiv*. - Manakul, P., et al. (2023). `SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models `_. *arXiv*. See Also -------- - :class:`BlackBoxUQ` - Main class for black-box uncertainty quantification - :doc:`entailment` - Related scorer measuring entailment probability