_images/uqlm_flow_ds.png _images/uqlm_flow_ds_dark.png

uqlm: Uncertainty Quantification for Language Models#

A Python library for LLM hallucination detection using state-of-the-art uncertainty quantification techniques. Each scorer returns a confidence score between 0 and 1, where higher scores indicate lower hallucination likelihood.

Scorer Types#

UQLM provides five categories of scorers. Click a card to explore the options.

🌐 Black-Box Scorers

Measure consistency across multiple LLM generations. Compatible with any model with no access to internals needed.

1. Black-Box Scorers (Consistency-Based)
⚑ White-Box Scorers

Leverage token probabilities for fast, free single-generation scoring. No extra LLM calls required.

2. White-Box Scorers (Token-Probability-Based)
βš–οΈ LLM-as-a-Judge

Use one or more LLMs to evaluate response reliability. Highly customizable via prompt engineering.

3. LLM-as-a-Judge Scorers
πŸ”€ Ensemble Scorers

Combine multiple scorers via weighted averaging for more robust confidence estimates. Tunable for advanced users.

4. Ensemble Scorers
πŸ“ Long-Text Scorers

Score uncertainty at the claim level for long-form responses, with support for uncertainty-aware response refinement.

5. Long-Text Scorers (Claim-Level)

Contents#