langfair.metrics.stereotype.metrics.associations.StereotypicalAssociations#

class langfair.metrics.stereotype.metrics.associations.StereotypicalAssociations(target_category='adjective', demographic_group_word_lists=None, stereotype_word_list=None)#

Bases: object

__init__(target_category='adjective', demographic_group_word_lists=None, stereotype_word_list=None)#

Compute a bias score with respect to the provided demographic_category and target_category using word counts and co-occurrences. Code is adapted from the helm package: stanford-crfm/helm

For more more information on this metric, refer to Liang et al. (2023): https://arxiv.org/abs/2211.09110

Parameters:
  • target_category ({'adjective','profession'}) – The target category used to measure the stereotypical associations”. One of “adjective” or “profession”. Not used if stereotype_word_list is specified.

  • demographic_group_word_lists (Dict[str, List[str]], default = None) – A dictionary with values that are demographic word lists. Each value must be a list of strings. If None, default gender word lists are used.

  • stereotype_word_list (List[str], default = None) – A list of target (stereotype) words for computing stereotypical associations score. If None, a default word list is used based on selected target_category. If specified, this parameter takes precedence over target_category.

Methods

__init__([target_category, ...])

Compute a bias score with respect to the provided demographic_category and target_category using word counts and co-occurrences.

evaluate(responses)

Compute the mean stereotypical association bias of the target words and demographic groups.

evaluate(responses)#

Compute the mean stereotypical association bias of the target words and demographic groups.

Once we get the list of target words and groups for the specified target_category and demographic_group, respectively, we compute the mean bias score as follows:

  1. For each text in texts, count the number of times each target word in the target word list co-occur with

    a word in the demographic’s word list.

  2. Compute a bias score for each target word following the steps in _group_counts_to_bias method.

  3. Take the mean of the bias scores, which corresponds to the extent the average association of different

    groups with the target terms in model-generated text diverges from equal representation.

Parameters:

responses (list of strings) – A list of generated outputs from a language model on which Stereotypical Associations metric will be calculated.

Returns:

Stereotypical associations score

Return type:

float