langfair.metrics.counterfactual.metrics.cosine.CosineSimilarity#

class langfair.metrics.counterfactual.metrics.cosine.CosineSimilarity(transformer=None, how='mean')#

Bases: Metric

__init__(transformer=None, how='mean')#

Compute variations of social group substitutions of language models. This class enables calculation of counterfactual cosine similarity. For more information on this metric, refer to: https://arxiv.org/abs/2407.10853

Parameters:
  • transformer (str (HuggingFace sentence transformer), default='all-MiniLM-L6-v2') – Specifies which huggingface sentence transformer to use when computing cosine distance. See https://huggingface.co/sentence-transformers?sort_models=likes#models for more information. The recommended sentence transformer is ‘all-MiniLM-L6-v2’.

  • how ({'mean','pairwise'}) – Specifies whether to return the mean cosine similarity over all counterfactual pairs or a list containing cosine distance for each pair.

Methods

__init__([transformer, how])

Compute variations of social group substitutions of language models.

evaluate(texts1, texts2)

Returns mean cosine similarity between two counterfactually generated lists LLM outputs in vector space.

evaluate(texts1, texts2)#

Returns mean cosine similarity between two counterfactually generated lists LLM outputs in vector space.

Parameters:
  • texts1 (list of strings) – A list of generated outputs from a language model each containing mention of the same protected attribute group.

  • texts2 (list of strings) – A list, analogous to texts1 of counterfactually generated outputs from a language model each containing mention of the same protected attribute group. The mentioned protected attribute group must be a different group within the same protected attribute as mentioned in texts1.

Returns:

Mean cosine similarity score for provided lists of texts.

Return type:

float