langfair.metrics.stereotype.metrics.associations.StereotypicalAssociations#
- class langfair.metrics.stereotype.metrics.associations.StereotypicalAssociations(target_category='adjective', demographic_group_word_lists=None, stereotype_word_list=None)#
Bases:
object
- __init__(target_category='adjective', demographic_group_word_lists=None, stereotype_word_list=None)#
Compute a bias score with respect to the provided demographic_category and target_category using word counts and co-occurrences. Code is adapted from the helm package: stanford-crfm/helm
For more more information on this metric, refer to Liang et al. (2023): https://arxiv.org/abs/2211.09110
- Parameters:
target_category ({'adjective','profession'}) – The target category used to measure the stereotypical associations”. One of “adjective” or “profession”. Not used if stereotype_word_list is specified.
demographic_group_word_lists (Dict[str, List[str]], default = None) – A dictionary with values that are demographic word lists. Each value must be a list of strings. If None, default gender word lists are used.
stereotype_word_list (List[str], default = None) – A list of target (stereotype) words for computing stereotypical associations score. If None, a default word list is used based on selected target_category. If specified, this parameter takes precedence over target_category.
Methods
__init__
([target_category, ...])Compute a bias score with respect to the provided demographic_category and target_category using word counts and co-occurrences.
evaluate
(responses)Compute the mean stereotypical association bias of the target words and demographic groups.
- evaluate(responses)#
Compute the mean stereotypical association bias of the target words and demographic groups.
Once we get the list of target words and groups for the specified target_category and demographic_group, respectively, we compute the mean bias score as follows:
- For each text in texts, count the number of times each target word in the target word list co-occur with
a word in the demographic’s word list.
Compute a bias score for each target word following the steps in _group_counts_to_bias method.
- Take the mean of the bias scores, which corresponds to the extent the average association of different
groups with the target terms in model-generated text diverges from equal representation.
- Parameters:
responses (list of strings) – A list of generated outputs from a language model on which Stereotypical Associations metric will be calculated.
- Returns:
Stereotypical associations score
- Return type:
float