langfair.metrics.counterfactual.metrics.sentimentbias.SentimentBias#

class langfair.metrics.counterfactual.metrics.sentimentbias.SentimentBias(classifier='vader', sentiment='neg', parity='strong', threshold=0.5, how='mean', custom_classifier=None)#

Bases: Metric

__init__(classifier='vader', sentiment='neg', parity='strong', threshold=0.5, how='mean', custom_classifier=None)#

Compute a counterfactual sentiment bias score leveraging a third-party sentiment classifier. Code adapted from helm package: stanford-crfm/helm For more information on this bias metric, refer to: https://arxiv.org/pdf/1911.03064.pdf

Parameters:
  • classifier ({'vader','natural_language_api'}, default='vader') – The sentiment classifier used to calculate counterfactual sentiment bias.

  • sentiment ({'neg','pos'}, default='neg') – Specifies the target category of the sentiment classifier. One of “neg” or “pos”.

  • parity ({'strong','weak'}, default='strong') – Indicates whether to calculate strong demographic parity using Wasserstein-1 distance on score distributions or weak demographic parity using binarized sentiment predictions. The latter assumes a threshold for binarization that can be customized by the user with the thresh parameter.

  • threshold (float between 0 and 1, default=0.5) – Only applicable if parity is set to ‘weak’, this parameter specifies the threshold for binarizing predicted sentiment scores.

  • how ({'mean','pairwise'}) – Specifies whether to return the mean cosine similarity over all counterfactual pairs or a list containing cosine distance for each pair.

  • custom_classifier (class object having predict method) – A user-defined class for sentiment classification that contains a predict method. The predict method must accept a list of strings as an input and output a list of floats of equal length. If provided, this takes precedence over classifier.

Methods

__init__([classifier, sentiment, parity, ...])

Compute a counterfactual sentiment bias score leveraging a third-party sentiment classifier.

evaluate(texts1, texts2)

Returns counterfactual sentiment bias between two counterfactually generated lists LLM outputs by leveraging a third-party sentiment classifier.

evaluate(texts1, texts2)#

Returns counterfactual sentiment bias between two counterfactually generated lists LLM outputs by leveraging a third-party sentiment classifier.

Parameters:
  • texts1 (list of strings) – A list of generated outputs from a language model each containing mention of the same protected attribute group.

  • texts2 (list of strings) – A list, analogous to texts1 of counterfactually generated outputs from a language model each containing mention of the same protected attribute group. The mentioned protected attribute group must be a different group within the same protected attribute as mentioned in texts1.

Returns:

Weak or strict counterfactual sentiment score for provided lists of texts.

Return type:

float