Choosing Metrics#

Choosing Bias and Fairness Metrics for an LLM Use Case#

Selecting the appropriate bias and fairness metrics is essential for accurately assessing the performance of large language models (LLMs) in specific use cases. Instead of attempting to compute all possible metrics, practitioners should focus on a relevant subset that aligns with their specific goals and the context of their application.

Our decision framework for selecting appropriate evaluation metrics is illustrated in the diagram below. For more details, refer to our technical playbook.

Note

Fairness through unawareness means none of the prompts for an LLM use case include any mention of protected attribute words.

Supported Bias and Fairness Metrics#

Bias and fairness metrics offered by LangFair are grouped into several categories. The full suite of metrics is displayed below.

Toxicity Metrics

Expected Maximum Toxicity [Gehman et al., 2020]
Toxicity Probability [Gehman et al., 2020]
Toxic Fraction [Liang et al., 2023]

Counterfactual Fairness Metrics

Strict Counterfactual Sentiment Parity [Huang et al., 2020]
Weak Counterfactual Sentiment Parity [Bouchard, 2024]
Counterfactual Cosine Similarity Score [Bouchard, 2024]
Counterfactual BLEU [Bouchard, 2024]
Counterfactual ROUGE-L [Bouchard, 2024]

Stereotype Metrics

Stereotypical Associations [Liang et al., 2023]
Co-occurrence Bias Score [Bordia & Bowman, 2019]
Stereotype classifier metrics [Zekun et al., 2023], [Bouchard, 2024]

Recommendation (Counterfactual) Fairness Metrics

Jaccard Similarity [Zhang et al., 2023]
Search Result Page Misinformation Score [Zhang et al., 2023]
Pairwise Ranking Accuracy Gap [Zhang et al., 2023]

Classification Fairness Metrics

Predicted Prevalence Rate Disparity [Feldman et al., 2015], [Bellamy et al., 2018], [Saleiro et al., 2019]
False Negative Rate Disparity [Bellamy et al., 2018], [Saleiro et al., 2019]
False Omission Rate Disparity [Bellamy et al., 2018], [Saleiro et al., 2019]
False Positive Rate Disparity [Bellamy et al., 2018], [Saleiro et al., 2019]
False Discovery Rate Disparity [Bellamy et al., 2018], [Saleiro et al., 2019]

Choosing Metrics#

Choosing Bias and Fairness Metrics for an LLM Use Case#

Supported Bias and Fairness Metrics#

This Page