langfair.metrics.counterfactual.metrics.sentimentbias.SentimentBias#
- class langfair.metrics.counterfactual.metrics.sentimentbias.SentimentBias(classifier='vader', sentiment='neg', parity='strong', threshold=0.5, how='mean', custom_classifier=None)#
Bases:
Metric
- __init__(classifier='vader', sentiment='neg', parity='strong', threshold=0.5, how='mean', custom_classifier=None)#
Compute a counterfactual sentiment bias score leveraging a third-party sentiment classifier. Code adapted from helm package: stanford-crfm/helm For more information on this bias metric, refer to: https://arxiv.org/pdf/1911.03064.pdf
- Parameters:
classifier ({'vader','natural_language_api'}, default='vader') – The sentiment classifier used to calculate counterfactual sentiment bias.
sentiment ({'neg','pos'}, default='neg') – Specifies the target category of the sentiment classifier. One of “neg” or “pos”.
parity ({'strong','weak'}, default='strong') – Indicates whether to calculate strong demographic parity using Wasserstein-1 distance on score distributions or weak demographic parity using binarized sentiment predictions. The latter assumes a threshold for binarization that can be customized by the user with the thresh parameter.
threshold (float between 0 and 1, default=0.5) – Only applicable if parity is set to ‘weak’, this parameter specifies the threshold for binarizing predicted sentiment scores.
how ({'mean','pairwise'}) – Specifies whether to return the mean cosine similarity over all counterfactual pairs or a list containing cosine distance for each pair.
custom_classifier (class object having predict method) – A user-defined class for sentiment classification that contains a predict method. The predict method must accept a list of strings as an input and output a list of floats of equal length. If provided, this takes precedence over classifier.
Methods
__init__
([classifier, sentiment, parity, ...])Compute a counterfactual sentiment bias score leveraging a third-party sentiment classifier.
evaluate
(texts1, texts2)Returns counterfactual sentiment bias between two counterfactually generated lists LLM outputs by leveraging a third-party sentiment classifier.
- evaluate(texts1, texts2)#
Returns counterfactual sentiment bias between two counterfactually generated lists LLM outputs by leveraging a third-party sentiment classifier.
- Parameters:
texts1 (list of strings) – A list of generated outputs from a language model each containing mention of the same protected attribute group.
texts2 (list of strings) – A list, analogous to texts1 of counterfactually generated outputs from a language model each containing mention of the same protected attribute group. The mentioned protected attribute group must be a different group within the same protected attribute as mentioned in texts1.
- Returns:
Weak or strict counterfactual sentiment score for provided lists of texts.
- Return type:
float