{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 🎯 White-Box Uncertainty Quantification\n", "\n", "
\n",
" White-box Uncertainty Quantification (UQ) methods leverage token probabilities to estimate uncertainty. They are significantly faster and cheaper than black-box methods, but require access to the LLM's internal probabilities, meaning they are not necessarily compatible with all LLMs/APIs. This demo provides an illustration of how to use state-of-the-art white-box UQ methods with uqlm
. The following scorers are available:\n",
"
Set up LLM instance and load example data prompts.
\n", "Generate LLM Responses and Confidence Scores
\n", "Generate and score LLM responses to the example questions using the WhiteBoxUQ()
class.
Evaluate Hallucination Detection Performance
\n", "Visualize model accuracy at different thresholds of the various white-box UQ confidence scores. Compute precision, recall, and F1-score of hallucination detection.
\n", "\n", " | question | \n", "answer | \n", "
---|---|---|
0 | \n", "Natalia sold clips to 48 of her friends in Apr... | \n", "72 | \n", "
1 | \n", "Weng earns $12 an hour for babysitting. Yester... | \n", "10 | \n", "
2 | \n", "Betty is saving money for a new wallet which c... | \n", "5 | \n", "
3 | \n", "Julie is reading a 120-page book. Yesterday, s... | \n", "42 | \n", "
4 | \n", "James writes a 3-page letter to 2 different fr... | \n", "624 | \n", "
Parameter | \n", "Type & Default | \n", "Description | \n", "
---|---|---|
llm | \n", "BaseChatModeldefault=None | \n",
" A langchain llm `BaseChatModel`. User is responsible for specifying temperature and other relevant parameters to the constructor of their `llm` object. | \n", "
scorers | \n", "List[str]default=None | \n",
" Specifies which white-box (token-probability-based) scorers to include. Must be subset of {\"normalized_probability\", \"min_probability\"}. If None, defaults to all. | \n", "
system_prompt | \n", "str or Nonedefault=\"You are a helpful assistant.\" | \n",
" Optional argument for user to provide custom system prompt for the LLM. | \n", "
max_calls_per_min | \n", "intdefault=None | \n",
" Specifies how many API calls to make per minute to avoid rate limit errors. By default, no limit is specified. | \n", "
🧠 Model-Specific
\n", "llm
system_prompt
📊 Confidence Scores
\n", "scorers
⚡ Performance
\n", "max_calls_per_min
Method | \n", "Description & Parameters | \n", "
---|---|
WhiteBoxUQ.generate_and_score | \n", "\n",
" Generate LLM responses and compute confidence scores for the provided prompts. \n", "Parameters: \n", "
Returns: \n",
" 💡 Best For: Complete end-to-end uncertainty quantification when starting with prompts.\n",
" \n",
" | \n",
"
\n", " | prompt | \n", "response | \n", "logprob | \n", "normalized_probability | \n", "min_probability | \n", "
---|---|---|---|---|---|
0 | \n", "When you solve this math problem only return t... | \n", "72 | \n", "[{'token': '72', 'bytes': [55, 50], 'logprob':... | \n", "0.999949 | \n", "0.999949 | \n", "
1 | \n", "When you solve this math problem only return t... | \n", "$10 | \n", "[{'token': '$', 'bytes': [36], 'logprob': -0.0... | \n", "0.999398 | \n", "0.998797 | \n", "
2 | \n", "When you solve this math problem only return t... | \n", "$20 | \n", "[{'token': '$', 'bytes': [36], 'logprob': -0.0... | \n", "0.945383 | \n", "0.900076 | \n", "
3 | \n", "When you solve this math problem only return t... | \n", "48 | \n", "[{'token': '48', 'bytes': [52, 56], 'logprob':... | \n", "0.996684 | \n", "0.996684 | \n", "
4 | \n", "When you solve this math problem only return t... | \n", "624 | \n", "[{'token': '624', 'bytes': [54, 50, 52], 'logp... | \n", "0.999926 | \n", "0.999926 | \n", "
\n", " | prompt | \n", "response | \n", "logprob | \n", "normalized_probability | \n", "min_probability | \n", "answer | \n", "response_correct | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "When you solve this math problem only return t... | \n", "72 | \n", "[{'token': '72', 'bytes': [55, 50], 'logprob':... | \n", "0.999949 | \n", "0.999949 | \n", "72 | \n", "True | \n", "
1 | \n", "When you solve this math problem only return t... | \n", "$10 | \n", "[{'token': '$', 'bytes': [36], 'logprob': -0.0... | \n", "0.999398 | \n", "0.998797 | \n", "10 | \n", "True | \n", "
2 | \n", "When you solve this math problem only return t... | \n", "$20 | \n", "[{'token': '$', 'bytes': [36], 'logprob': -0.0... | \n", "0.945383 | \n", "0.900076 | \n", "5 | \n", "False | \n", "
3 | \n", "When you solve this math problem only return t... | \n", "48 | \n", "[{'token': '48', 'bytes': [52, 56], 'logprob':... | \n", "0.996684 | \n", "0.996684 | \n", "42 | \n", "False | \n", "
4 | \n", "When you solve this math problem only return t... | \n", "624 | \n", "[{'token': '624', 'bytes': [54, 50, 52], 'logp... | \n", "0.999926 | \n", "0.999926 | \n", "624 | \n", "True | \n", "