Choosing Metrics#

Choosing Bias and Fairness Metrics for an LLM Use Case#

Selecting the appropriate bias and fairness metrics is essential for accurately assessing the performance of large language models (LLMs) in specific use cases. Instead of attempting to compute all possible metrics, practitioners should focus on a relevant subset that aligns with their specific goals and the context of their application.

Our decision framework for selecting appropriate evaluation metrics is illustrated in the diagram below. For more details, refer to our technical playbook.

Use Case Framework

Note

Fairness through unawareness means none of the prompts for an LLM use case include any mention of protected attribute words.

Supported Bias and Fairness Metrics#

Bias and fairness metrics offered by LangFair are grouped into several categories. The full suite of metrics is displayed below.

Toxicity Metrics

Counterfactual Fairness Metrics

Stereotype Metrics

Recommendation (Counterfactual) Fairness Metrics

Classification Fairness Metrics