{ "cells": [ { "cell_type": "markdown", "id": "f09d41c2-7158-49fb-b42d-6b8d85d0e200", "metadata": {}, "source": [ "# 🎯 Multimodal Uncertainty Quantification\n", "\n", "
\n",
" The UQLM library offers Uncertainty Quantification (UQ) methods for multimodal inputs, given that outputs are text-based. This demo provides an minimal illustration \n",
" of how to use uqlm scorers for multimodal inputs. The following scorers offer multimodal compatibility:\n",
"
Set up LLM instance and load example image-based data prompts.
\n", "Generate LLM Responses and Confidence Scores
\n", "Generate and score LLM responses to the example image-based questions using the BlackBoxUQ() class.
Evaluate Hallucination Detection Performance
\n", "Inspect which responses were correct/incorrect and compare to computed confidence scores.
\n", "/home/jupyter/uqlm/uqlm/utils/response_generator.py:105: UQLMBetaWarning: Use of BaseMessage in prompts argument is\n",
"in beta. Please use it with caution as it may change in future releases.\n",
" beta_warning(\"Use of BaseMessage in prompts argument is in beta. Please use it with caution as it may change in \n",
"future releases.\")\n",
"\n"
],
"text/plain": [
"/home/jupyter/uqlm/uqlm/utils/response_generator.py:105: UQLMBetaWarning: Use of BaseMessage in prompts argument is\n",
"in beta. Please use it with caution as it may change in future releases.\n",
" beta_warning(\"Use of BaseMessage in prompts argument is in beta. Please use it with caution as it may change in \n",
"future releases.\")\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"/home/jupyter/uqlm/uqlm/utils/response_generator.py:105: UQLMBetaWarning: Use of BaseMessage in prompts argument is\n",
"in beta. Please use it with caution as it may change in future releases.\n",
" beta_warning(\"Use of BaseMessage in prompts argument is in beta. Please use it with caution as it may change in \n",
"future releases.\")\n",
"\n"
],
"text/plain": [
"/home/jupyter/uqlm/uqlm/utils/response_generator.py:105: UQLMBetaWarning: Use of BaseMessage in prompts argument is\n",
"in beta. Please use it with caution as it may change in future releases.\n",
" beta_warning(\"Use of BaseMessage in prompts argument is in beta. Please use it with caution as it may change in \n",
"future releases.\")\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n"
],
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from uqlm import BlackBoxUQ\n",
"\n",
"bbuq = BlackBoxUQ(llm=llm, scorers=[\"noncontradiction\"], system_prompt=\"Answer as concisely as possible.\", device=device, use_best=False)\n",
"result = await bbuq.generate_and_score(prompts=prompts, num_responses=5)"
]
},
{
"cell_type": "markdown",
"id": "429af11e-0353-43d8-8131-8cb68438bcd9",
"metadata": {},
"source": [
"\n",
"## 3. Evaluate Hallucination Detection Performance"
]
},
{
"cell_type": "markdown",
"id": "19db1925-f7ad-44a8-aa4c-ee94ad128f2b",
"metadata": {},
"source": [
"Finally, we can check which questions the LLM answered correctly and compare to our UQLM's black-box confidence scores. "
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "eebce78c-cadf-4eca-a1f7-a4916dc2c850",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"| \n", " | question | \n", "response | \n", "sampled_responses | \n", "noncontradiction | \n", "
|---|---|---|---|---|
| 0 | \n", "How many berries appear in the image | \n", "3 | \n", "[3, 3, 3, 3, 3] | \n", "1.000000 | \n", "
| 1 | \n", "What color are the berries? | \n", "The berries are red. | \n", "[Red., The berries are dark red., Red., The berries are a deep red color., The berries are dark ... | \n", "0.998269 | \n", "
| 2 | \n", "How many times does the letter R appear in this image? | \n", "3 | \n", "[3, 3, 3, 3, 3] | \n", "1.000000 | \n", "
| 3 | \n", "How many times does the letter P appear in this image | \n", "1 | \n", "[2, 2, 2, 2, 1] | \n", "0.221604 | \n", "
| 4 | \n", "How many words appear in this image? | \n", "2 | \n", "[2, 2, 2, 2, 2] | \n", "1.000000 | \n", "