{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ๐ฏ Long-Text Uncertainty Quantification\n", "\n", "
\n",
" Long-Text Uncertainty Quantification (LUQ) is a long-form adaptation of black-box uncertainty quantification. This approach generates multiple responses to the same prompt, decomposes those responses into granular units (sentences or claims), and scores those units by measuring whether sampled responses entail each unit. This demo provides an illustration \n",
" of how to use the LUQ methods with uqlm. The available scorers and papers from which they are adapted are below:\n",
"
Set up LLM instance and load example data prompts.
\n", "Generate LLM Responses and Confidence Scores
\n", "Generate responses and compute claim-level confidence scores using the LongTextUQ() class.
Evaluate Hallucination Detection Performance
\n", "Grade claims with `FactScoreGrader` class and evaluate claim-level hallucination detection.
\n", "| \n", " | prompt | \n", "wikipedia_text | \n", "
|---|---|---|
| 0 | \n", "Tell me a bio of Suthida.\\n | \n", "Suthida Bajrasudhabimalalakshana (Thai: เธชเธกเนเธเนเธ... | \n", "
| 1 | \n", "Tell me a bio of Miguel รngel Fรฉlix Gallardo.\\n | \n", "Miguel รngel Fรฉlix Gallardo (born January 8, 1... | \n", "
| 2 | \n", "Tell me a bio of Iggy Azalea.\\n | \n", "Amethyst Amelia Kelly (born 7 June 1990), know... | \n", "
| 3 | \n", "Tell me a bio of Fernando da Costa Novaes.\\n | \n", "Fernando da Costa Novaes (April 6, 1927 โ Marc... | \n", "
| 4 | \n", "Tell me a bio of Jan Zamoyski.\\n | \n", "Jan Sariusz Zamoyski (Latin: Ioannes Zamoyski ... | \n", "
| Parameter | \n", "Type & Default | \n", "Description | \n", "
|---|---|---|
| llm | \n", "BaseChatModeldefault=None | \n",
" A langchain llm `BaseChatModel`. User is responsible for specifying temperature and other relevant parameters to the constructor of the provided `llm` object. | \n", "
| granularity | \n", "strdefault=\"claim\" | \n",
" Specifies whether to decompose and score at claim or sentence level granularity. Must be either \"claim\" or \"sentence\". | \n", "
| mode | \n", "strdefault=\"unit_response\" | \n",
" Specifies whether to implement unit-response (LUQ-style) scoring or matched-unit (LUQ-pair-style) scoring. Must be either \"unit_response\" (recommended) or \"matched_unit\". | \n", "
| scorers | \n", "List[str]default=None | \n",
" Specifies which black box (consistency) scorers to include. subset of {\"entailment\", \"noncontradiction\", \"contrasted_entailment\", \"bert_score\", \"cosine_sim\"}. If None, defaults to [\"entailment\"]. | \n", "
| aggregation | \n", "strdefault=\"mean\" | \n",
" Specifies how to aggregate claim/sentence-level scores to response-level scores. Must be one of 'min' or 'mean'. | \n", "
| response_refinement | \n", "booldefault=False | \n",
" Specifies whether to refine responses with uncertainty-aware decoding. This approach removes claims with confidence scores below the response_refinement_threshold and uses the claim_decomposition_llm to reconstruct the response from the retained claims. For more details, refer to Jiang et al., 2024: https://arxiv.org/abs/2410.20783 | \n", "
| claim_filtering_scorer | \n", "Optional[str]default=None | \n",
" Specifies which scorer to use to filter claims if response_refinement is True. If not provided, defaults to the first element of self.scorers. | \n", "
| claim_decomposition_llm | \n", "BaseChatModeldefault=None | \n",
" A langchain llm `BaseChatModel` to be used for decomposing responses into individual claims. Also used for claim refinement. If granularity=\"claim\" and claim_decomposition_llm is None, the provided `llm` will be used for claim decomposition. | \n", "
| device | \n", "str or torch.devicedefault=None | \n",
" Specifies the device that NLI model use for prediction. If None, detects and returns the best available PyTorch device. Prioritizes CUDA (NVIDIA GPU), then MPS (macOS), then CPU. | \n", "
| system_prompt | \n", "str or Nonedefault=\"You are a helpful assistant.\" | \n",
" Optional argument for user to provide custom system prompt for the LLM. | \n", "
| max_calls_per_min | \n", "intdefault=None | \n",
" Specifies how many API calls to make per minute to avoid rate limit errors. By default, no limit is specified. | \n", "
| use_n_param | \n", "booldefault=False | \n",
" Specifies whether to use n parameter for BaseChatModel. Not compatible with all BaseChatModel classes. If used, it speeds up the generation process substantially when num_responses is large. | \n",
"
| sampling_temperature | \n", "floatdefault=1 | \n",
" The 'temperature' parameter for LLM to use when generating sampled LLM responses. Must be greater than 0. | \n", "
| nli_model_name | \n", "strdefault=\"microsoft/deberta-large-mnli\" | \n",
" Specifies which NLI model to use. Must be acceptable input to AutoTokenizer.from_pretrained() and AutoModelForSequenceClassification.from_pretrained(). | \n",
"
| max_length | \n", "intdefault=2000 | \n",
" Specifies the maximum allowed string length for LLM responses for NLI computation. Responses longer than this value will be truncated in NLI computations to avoid OutOfMemoryError. | \n",
"
๐ง LLM-Specific
\n", "llmsystem_promptsampling_temperature๐ Confidence Scores
\n", "granularityscorersmodeaggregationresponse_refinementresponse_refinement_threshold๐ฅ๏ธ Hardware
\n", "deviceโก Performance
\n", "max_calls_per_minuse_n_param| Method | \n", "Description & Parameters | \n", "
|---|---|
| BlackBoxUQ.generate_and_score | \n", "\n",
" Generate LLM responses, sampled LLM (candidate) responses, and compute confidence scores for the provided prompts. \n", "Parameters: \n", "
Returns: \n",
" ๐ก Best For: Complete end-to-end uncertainty quantification when starting with prompts.\n",
" \n",
" | \n",
"
| BlackBoxUQ.score | \n", "\n",
" Compute confidence scores on provided LLM responses. Should only be used if responses and sampled responses are already generated. \n", "Parameters: \n", "
Returns: \n",
" ๐ก Best For: Computing uncertainty scores when responses are already generated elsewhere.\n",
" \n",
" | \n",
"
| \n", " | prompt | \n", "response | \n", "sampled_responses | \n", "entailment | \n", "claims_data | \n", "refined_response | \n", "refined_entailment | \n", "
|---|---|---|---|---|---|---|---|
| 0 | \n", "Tell me a bio of Suthida.\\n | \n", "Suthida Bajrasudhabimalalakshana, born on June... | \n", "[Suthida Bajrasudhabimalalakshana, commonly kn... | \n", "0.378289 | \n", "[{'claim': 'Suthida Bajrasudhabimalalakshana w... | \n", "Suthida Bajrasudhabimalalakshana, born on June... | \n", "0.786631 | \n", "
| 1 | \n", "Tell me a bio of Miguel รngel Fรฉlix Gallardo.\\n | \n", "Miguel รngel Fรฉlix Gallardo, often referred to... | \n", "[Miguel รngel Fรฉlix Gallardo, often referred t... | \n", "0.472164 | \n", "[{'claim': 'Miguel รngel Fรฉlix Gallardo is oft... | \n", "Miguel รngel Fรฉlix Gallardo, often referred to... | \n", "0.654868 | \n", "
| 2 | \n", "Tell me a bio of Iggy Azalea.\\n | \n", "Iggy Azalea, born Amethyst Amelia Kelly on Jun... | \n", "[Iggy Azalea, born Amethyst Amelia Kelly on Ju... | \n", "0.468483 | \n", "[{'claim': 'Iggy Azalea was born Amethyst Amel... | \n", "Iggy Azalea, born Amethyst Amelia Kelly on Jun... | \n", "0.650532 | \n", "
| 3 | \n", "Tell me a bio of Fernando da Costa Novaes.\\n | \n", "Fernando da Costa Novaes was a prominent Brazi... | \n", "[Fernando da Costa Novaes was a Brazilian orni... | \n", "0.494438 | \n", "[{'claim': 'Fernando da Costa Novaes was a pro... | \n", "Fernando da Costa Novaes was a highly respecte... | \n", "0.705326 | \n", "
| 4 | \n", "Tell me a bio of Jan Zamoyski.\\n | \n", "Jan Zamoyski (1542โ1605) was a prominent Polis... | \n", "[Jan Zamoyski (1542โ1605) was a prominent Poli... | \n", "0.573752 | \n", "[{'claim': 'Jan Zamoyski was a Polish nobleman... | \n", "Jan Zamoyski, born in 1542 and deceased in 160... | \n", "0.733766 | \n", "
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\n",
"\n"
],
"text/plain": [
"\u001b[1;30mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
" Response Refinement Example \n",
"\n"
],
"text/plain": [
" \u001b[1;30mResponse Refinement Example\u001b[0m \n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\n",
"\n"
],
"text/plain": [
"\u001b[1;30mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Original Response โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ\n", "โ Suthida Bajrasudhabimalalakshana, born on June 3, 1978, is the Queen of Thailand. She became queen following โ\n", "โ her marriage to King Maha Vajiralongkorn (Rama X) on May 1, 2019. โ\n", "โ โ\n", "โ Before becoming queen, Suthida was known for her service in the Thai military and royal security. She joined โ\n", "โ the Thai military, where she eventually rose to the rank of General. Her notable role was as the Deputy โ\n", "โ Commander of the Kingโs Own Bodyguard Battalion. Suthida was also appointed as the Commander of the Special โ\n", "โ Operations Unit of the Kingโs Guard in 2013, and later, she was made the Commander of the Royal Thai โ\n", "โ Aide-de-Camp Department. โ\n", "โ โ\n", "โ Her service to the royal family and her close association with King Vajiralongkorn began during his time as โ\n", "โ Crown Prince. Suthida was appointed as a General in the Royal Thai Army in December 2016, shortly after โ\n", "โ Vajiralongkorn ascended to the throne. โ\n", "โ โ\n", "โ Queen Suthida's royal name, bestowed upon her after marriage, is Her Majesty Queen Suthida โ\n", "โ Bajrasudhabimalalakshana. The marriage and her subsequent coronation as queen were part of the elaborate royal โ\n", "โ ceremonies that solidified her position as the consort of the reigning monarch. โ\n", "โ โ\n", "โ Queen Suthida is known for her dignity and dedication to her roles both in royal duties and her previous โ\n", "โ military service. Her work and public engagements often highlight charitable activities and support for various โ\n", "โ social causes within Thailand. โ\n", "โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ\n", "\n" ], "text/plain": [ "\u001b[33mโญโ\u001b[0m\u001b[33mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m\u001b[33m \u001b[0m\u001b[1;33mOriginal Response\u001b[0m\u001b[33m \u001b[0m\u001b[33mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m\u001b[33mโโฎ\u001b[0m\n", "\u001b[33mโ\u001b[0m Suthida Bajrasudhabimalalakshana, born on June 3, 1978, is the Queen of Thailand. She became queen following \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m her marriage to King Maha Vajiralongkorn (Rama X) on May 1, 2019. \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m Before becoming queen, Suthida was known for her service in the Thai military and royal security. She joined \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m the Thai military, where she eventually rose to the rank of General. Her notable role was as the Deputy \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m Commander of the Kingโs Own Bodyguard Battalion. Suthida was also appointed as the Commander of the Special \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m Operations Unit of the Kingโs Guard in 2013, and later, she was made the Commander of the Royal Thai \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m Aide-de-Camp Department. \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m Her service to the royal family and her close association with King Vajiralongkorn began during his time as \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m Crown Prince. Suthida was appointed as a General in the Royal Thai Army in December 2016, shortly after \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m Vajiralongkorn ascended to the throne. \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m Queen Suthida's royal name, bestowed upon her after marriage, is Her Majesty Queen Suthida \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m Bajrasudhabimalalakshana. The marriage and her subsequent coronation as queen were part of the elaborate royal \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m ceremonies that solidified her position as the consort of the reigning monarch. \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m Queen Suthida is known for her dignity and dedication to her roles both in royal duties and her previous \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m military service. Her work and public engagements often highlight charitable activities and support for various \u001b[33mโ\u001b[0m\n", "\u001b[33mโ\u001b[0m social causes within Thailand. \u001b[33mโ\u001b[0m\n", "\u001b[33mโฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Low-Confidence Claims to be Removed โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ\n", "โ โข Suthida Bajrasudhabimalalakshana's notable role was as the Deputy Commander of the Kingโs Own Bodyguard โ\n", "โ Battalion. โ\n", "โ โข Suthida Bajrasudhabimalalakshana was appointed as the Commander of the Special Operations Unit of the Kingโs โ\n", "โ Guard in 2013. โ\n", "โ โข Suthida Bajrasudhabimalalakshana was made the Commander of the Royal Thai Aide-de-Camp Department. โ\n", "โ โข Suthida Bajrasudhabimalalakshana's service to the royal family began during Vajiralongkorn's time as Crown โ\n", "โ Prince. โ\n", "โ โข Suthida Bajrasudhabimalalakshana was appointed as a General in the Royal Thai Army in December 2016. โ\n", "โ โข Vajiralongkorn ascended to the throne shortly before December 2016. โ\n", "โ โข The marriage and Queen Suthida's subsequent coronation were part of the elaborate royal ceremonies. โ\n", "โ โข The elaborate royal ceremonies solidified Queen Suthida's position as the consort of the reigning monarch. โ\n", "โ โข Queen Suthida is known for her dignity. โ\n", "โ โข Queen Suthida is known for her dedication to her roles in royal duties. โ\n", "โ โข Queen Suthida is known for her dedication to her previous military service. โ\n", "โ โข Queen Suthida's work and public engagements often highlight charitable activities in Thailand. โ\n", "โ โข Queen Suthida's work and public engagements often support various social causes within Thailand. โ\n", "โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ\n", "\n" ], "text/plain": [ "\u001b[31mโญโ\u001b[0m\u001b[31mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m\u001b[31m \u001b[0m\u001b[1;31mLow-Confidence Claims to be Removed\u001b[0m\u001b[31m \u001b[0m\u001b[31mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m\u001b[31mโโฎ\u001b[0m\n", "\u001b[31mโ\u001b[0m โข Suthida Bajrasudhabimalalakshana's notable role was as the Deputy Commander of the Kingโs Own Bodyguard \u001b[31mโ\u001b[0m\n", "\u001b[31mโ\u001b[0m Battalion. \u001b[31mโ\u001b[0m\n", "\u001b[31mโ\u001b[0m โข Suthida Bajrasudhabimalalakshana was appointed as the Commander of the Special Operations Unit of the Kingโs \u001b[31mโ\u001b[0m\n", "\u001b[31mโ\u001b[0m Guard in 2013. \u001b[31mโ\u001b[0m\n", "\u001b[31mโ\u001b[0m โข Suthida Bajrasudhabimalalakshana was made the Commander of the Royal Thai Aide-de-Camp Department. \u001b[31mโ\u001b[0m\n", "\u001b[31mโ\u001b[0m โข Suthida Bajrasudhabimalalakshana's service to the royal family began during Vajiralongkorn's time as Crown \u001b[31mโ\u001b[0m\n", "\u001b[31mโ\u001b[0m Prince. \u001b[31mโ\u001b[0m\n", "\u001b[31mโ\u001b[0m โข Suthida Bajrasudhabimalalakshana was appointed as a General in the Royal Thai Army in December 2016. \u001b[31mโ\u001b[0m\n", "\u001b[31mโ\u001b[0m โข Vajiralongkorn ascended to the throne shortly before December 2016. \u001b[31mโ\u001b[0m\n", "\u001b[31mโ\u001b[0m โข The marriage and Queen Suthida's subsequent coronation were part of the elaborate royal ceremonies. \u001b[31mโ\u001b[0m\n", "\u001b[31mโ\u001b[0m โข The elaborate royal ceremonies solidified Queen Suthida's position as the consort of the reigning monarch. \u001b[31mโ\u001b[0m\n", "\u001b[31mโ\u001b[0m โข Queen Suthida is known for her dignity. \u001b[31mโ\u001b[0m\n", "\u001b[31mโ\u001b[0m โข Queen Suthida is known for her dedication to her roles in royal duties. \u001b[31mโ\u001b[0m\n", "\u001b[31mโ\u001b[0m โข Queen Suthida is known for her dedication to her previous military service. \u001b[31mโ\u001b[0m\n", "\u001b[31mโ\u001b[0m โข Queen Suthida's work and public engagements often highlight charitable activities in Thailand. \u001b[31mโ\u001b[0m\n", "\u001b[31mโ\u001b[0m โข Queen Suthida's work and public engagements often support various social causes within Thailand. \u001b[31mโ\u001b[0m\n", "\u001b[31mโฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Refined Response โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ\n", "โ Suthida Bajrasudhabimalalakshana, born on June 3, 1978, is the Queen of Thailand. She became queen following โ\n", "โ her marriage to King Maha Vajiralongkorn on May 1, 2019, and upon marriage, she was bestowed with the royal โ\n", "โ name Her Majesty Queen Suthida Bajrasudhabimalalakshana. Before her ascension to the throne, Queen Suthida was โ\n", "โ recognized for her dedicated service in the Thai military, where she rose to the rank of General, and in royal โ\n", "โ security. Her military career and commitment to royal security played a significant role in her rise to โ\n", "โ prominence, ultimately leading to her role as queen. โ\n", "โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ\n", "\n" ], "text/plain": [ "\u001b[32mโญโ\u001b[0m\u001b[32mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m\u001b[32m \u001b[0m\u001b[1;32mRefined Response\u001b[0m\u001b[32m \u001b[0m\u001b[32mโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\u001b[0m\u001b[32mโโฎ\u001b[0m\n", "\u001b[32mโ\u001b[0m Suthida Bajrasudhabimalalakshana, born on June 3, 1978, is the Queen of Thailand. She became queen following \u001b[32mโ\u001b[0m\n", "\u001b[32mโ\u001b[0m her marriage to King Maha Vajiralongkorn on May 1, 2019, and upon marriage, she was bestowed with the royal \u001b[32mโ\u001b[0m\n", "\u001b[32mโ\u001b[0m name Her Majesty Queen Suthida Bajrasudhabimalalakshana. Before her ascension to the throne, Queen Suthida was \u001b[32mโ\u001b[0m\n", "\u001b[32mโ\u001b[0m recognized for her dedicated service in the Thai military, where she rose to the rank of General, and in royal \u001b[32mโ\u001b[0m\n", "\u001b[32mโ\u001b[0m security. Her military career and commitment to royal security played a significant role in her rise to \u001b[32mโ\u001b[0m\n", "\u001b[32mโ\u001b[0m prominence, ultimately leading to her role as queen. \u001b[32mโ\u001b[0m\n", "\u001b[32mโฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display_response_refinement(original_text=result_df.response[0], claims_data=result_df.claims_data[0], refined_text=result_df.refined_response[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Evaluate Hallucination Detection Performance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To evaluate hallucination detection performance, we 'grade' the atomic claims in the responses against an answer key. Here, we use UQLM's out-of-the-box `FactScoreGrader`, which can be used with [LangChain Chat Model](https://js.langchain.com/docs/integrations/chat/). **If you are using your own prompts/questions, be sure to update the grading method accordingly**." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "tags": [] }, "outputs": [], "source": [ "# set up the LLM grader\n", "from langchain_google_vertexai import ChatVertexAI\n", "\n", "gemini_flash = ChatVertexAI(model=\"gemini-2.5-flash\")\n", "grader = FactScoreGrader(llm=gemini_flash)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before grading, we need to have claims formatted in list of lists where each interior list corresponds to a generated response. " ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Convert claims to list of lists\n", "claims_data_lists = claims_dicts_to_lists(result_df.claims_data.tolist())" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
| \n", " | prompt | \n", "response | \n", "sampled_responses | \n", "entailment | \n", "claims_data | \n", "refined_response | \n", "refined_entailment | \n", "claim_grades | \n", "answer | \n", "
|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "Tell me a bio of Suthida.\\n | \n", "Suthida Bajrasudhabimalalakshana, born on June... | \n", "[Suthida Bajrasudhabimalalakshana, commonly kn... | \n", "0.378289 | \n", "[{'claim': 'Suthida Bajrasudhabimalalakshana w... | \n", "Suthida Bajrasudhabimalalakshana, born on June... | \n", "0.786631 | \n", "[True, True, False, True, True, True, True, Fa... | \n", "Suthida Bajrasudhabimalalakshana (Thai: เธชเธกเนเธเนเธ... | \n", "
| 1 | \n", "Tell me a bio of Miguel รngel Fรฉlix Gallardo.\\n | \n", "Miguel รngel Fรฉlix Gallardo, often referred to... | \n", "[Miguel รngel Fรฉlix Gallardo, often referred t... | \n", "0.472164 | \n", "[{'claim': 'Miguel รngel Fรฉlix Gallardo is oft... | \n", "Miguel รngel Fรฉlix Gallardo, often referred to... | \n", "0.654868 | \n", "[True, True, True, True, False, True, True, Tr... | \n", "Miguel รngel Fรฉlix Gallardo (born January 8, 1... | \n", "
| 2 | \n", "Tell me a bio of Iggy Azalea.\\n | \n", "Iggy Azalea, born Amethyst Amelia Kelly on Jun... | \n", "[Iggy Azalea, born Amethyst Amelia Kelly on Ju... | \n", "0.468483 | \n", "[{'claim': 'Iggy Azalea was born Amethyst Amel... | \n", "Iggy Azalea, born Amethyst Amelia Kelly on Jun... | \n", "0.650532 | \n", "[True, True, True, True, False, True, True, Fa... | \n", "Amethyst Amelia Kelly (born 7 June 1990), know... | \n", "
| 3 | \n", "Tell me a bio of Fernando da Costa Novaes.\\n | \n", "Fernando da Costa Novaes was a prominent Brazi... | \n", "[Fernando da Costa Novaes was a Brazilian orni... | \n", "0.494438 | \n", "[{'claim': 'Fernando da Costa Novaes was a pro... | \n", "Fernando da Costa Novaes was a highly respecte... | \n", "0.705326 | \n", "[True, True, True, True, False, True, True, Tr... | \n", "Fernando da Costa Novaes (April 6, 1927 โ Marc... | \n", "
| 4 | \n", "Tell me a bio of Jan Zamoyski.\\n | \n", "Jan Zamoyski (1542โ1605) was a prominent Polis... | \n", "[Jan Zamoyski (1542โ1605) was a prominent Poli... | \n", "0.573752 | \n", "[{'claim': 'Jan Zamoyski was a Polish nobleman... | \n", "Jan Zamoyski, born in 1542 and deceased in 160... | \n", "0.733766 | \n", "[True, True, True, True, True, True, True, Tru... | \n", "Jan Sariusz Zamoyski (Latin: Ioannes Zamoyski ... | \n", "