langfair.metrics.counterfactual.metrics.sentimentbias.SentimentBias#

class langfair.metrics.counterfactual.metrics.sentimentbias.SentimentBias(classifier='vader', sentiment='neg', parity='strong', threshold=0.5, how='mean', device='cpu', custom_classifier=None)#

Bases: Metric

__init__(classifier='vader', sentiment='neg', parity='strong', threshold=0.5, how='mean', device='cpu', custom_classifier=None)#

Compute a counterfactual sentiment bias score leveraging a third-party sentiment classifier. Code adapted from helm package: stanford-crfm/helm. For more information on these metrics, see Huang et al. (2020) [1] and Bouchard (2024) [2].

Parameters:

classifier ({'vader','roberta'}, default='vader') – The sentiment classifier used to calculate counterfactual sentiment bias.
sentiment ({'neg','pos'}, default='neg') – Specifies the target category of the sentiment classifier. One of “neg” or “pos”.
parity ({'strong','weak'}, default='strong') – Indicates whether to calculate strong demographic parity using Wasserstein-1 distance on score distributions or weak demographic parity using binarized sentiment predictions. The latter assumes a threshold for binarization that can be customized by the user with the thresh parameter.
threshold (float between 0 and 1, default=0.5) – Only applicable if parity is set to ‘weak’, this parameter specifies the threshold for binarizing predicted sentiment scores.
how ({'mean','pairwise'}) – Specifies whether to return the aggregate sentiment bias over all counterfactual pairs or a list containing difference in sentiment scores for each pair.
device (str or torch.device input or torch.device object, default="cpu") – Specifies the device that classifiers use for prediction. Set to “cuda” for classifiers to be able to leverage the GPU. Currently, ‘roberta’ will use this parameter.
custom_classifier (class object having predict method) – A user-defined class for sentiment classification that contains a predict method. The predict method must accept a list of strings as an input and output a list of floats of equal length. If provided, this takes precedence over classifier.

Methods

`__init__`([classifier, sentiment, parity, ...])	Compute a counterfactual sentiment bias score leveraging a third-party sentiment classifier.
`evaluate`(texts1, texts2)	Returns counterfactual sentiment bias between two counterfactually generated lists LLM outputs by leveraging a third-party sentiment classifier.

evaluate(texts1, texts2)#

Returns counterfactual sentiment bias between two counterfactually generated lists LLM outputs by leveraging a third-party sentiment classifier.

Parameters:

texts1 (list of strings) – A list of generated outputs from a language model each containing mention of the same protected attribute group.
texts2 (list of strings) – A list, analogous to texts1 of counterfactually generated outputs from a language model each containing mention of the same protected attribute group. The mentioned protected attribute group must be a different group within the same protected attribute as mentioned in texts1.

Returns:

Weak or strict counterfactual sentiment score for provided lists of texts.

Return type:

float

References

langfair.metrics.counterfactual.metrics.sentimentbias.SentimentBias#

This Page