langfair.metrics.counterfactual.counterfactual.CounterfactualMetrics#

class langfair.metrics.counterfactual.counterfactual.CounterfactualMetrics(metrics=['Cosine', 'Rougel', 'Bleu', 'Sentiment Bias'], neutralize_tokens=True)#

Bases: object

__init__(metrics=['Cosine', 'Rougel', 'Bleu', 'Sentiment Bias'], neutralize_tokens=True)#

This class computes few or all counterfactual metrics supported LangFair. For more information on these metrics, see Huang et al. (2020) [1] and Bouchard (2024) [2].

Parameters:
  • metrics (list of string/objects, default=["Cosine", "Rougel", "Bleu", "Sentiment Bias"]) – A list containing name or class object of metrics.

  • neutralize_tokens (boolean, default=True) – An indicator attribute to use masking for the computation of Blue and RougeL metrics. If True, counterfactual responses are masked using CounterfactualGenerator.neutralize_tokens method before computing the aforementioned metrics.

Methods

__init__([metrics, neutralize_tokens])

This class computes few or all counterfactual metrics supported LangFair.

evaluate(texts1, texts2[, attribute, ...])

This method evaluate the counterfactual metrics values for the provided pair of texts.

evaluate(texts1, texts2, attribute=None, return_data=False)#

This method evaluate the counterfactual metrics values for the provided pair of texts.

Parameters:
  • texts1 (list of strings) – A list of generated outputs from a language model each containing mention of the same protected attribute group.

  • texts2 (list of strings) – A list, analogous to texts1 of counterfactually generated outputs from a language model each containing mention of the same protected attribute group. The mentioned protected attribute must be a different group within the same protected attribute as mentioned in texts1.

  • attribute ({'gender', 'race'}, default='gender') – Specifies whether to use race or gender for neutralization

  • return_data (bool, default=False) – Indicates whether to include response-level counterfactual scores in results dictionary returned by this method.

Returns:

Dictionary containing values of counterfactual metrics

Return type:

dict

References