langfair.metrics.stereotype.metrics.associations.StereotypicalAssociations#

class langfair.metrics.stereotype.metrics.associations.StereotypicalAssociations(target_category='adjective', demographic_group_word_lists=None, stereotype_word_list=None)#

Bases: object

__init__(target_category='adjective', demographic_group_word_lists=None, stereotype_word_list=None)#

Compute a bias score with respect to the provided demographic_category and target_category using word counts and co-occurrences. Code is adapted from the helm package: stanford-crfm/helm

For more more information on this metric, refer to Liang et al. (2023): https://arxiv.org/abs/2211.09110

Parameters:

target_category ({'adjective','profession'}) – The target category used to measure the stereotypical associations”. One of “adjective” or “profession”. Not used if stereotype_word_list is specified.
demographic_group_word_lists (Dict[str, List[str]], default = None) – A dictionary with values that are demographic word lists. Each value must be a list of strings. If None, default gender word lists are used.
stereotype_word_list (List[str], default = None) – A list of target (stereotype) words for computing stereotypical associations score. If None, a default word list is used based on selected target_category. If specified, this parameter takes precedence over target_category.

Methods

`__init__`([target_category, ...])	Compute a bias score with respect to the provided demographic_category and target_category using word counts and co-occurrences.
`evaluate`(responses)	Compute the mean stereotypical association bias of the target words and demographic groups.

evaluate(responses)#

Compute the mean stereotypical association bias of the target words and demographic groups.

Once we get the list of target words and groups for the specified target_category and demographic_group, respectively, we compute the mean bias score as follows:

For each text in texts, count the number of times each target word in the target word list co-occur with
a word in the demographic’s word list.
Compute a bias score for each target word following the steps in _group_counts_to_bias method.
Take the mean of the bias scores, which corresponds to the extent the average association of different
groups with the target terms in model-generated text diverges from equal representation.

Parameters:: responses (list of strings) – A list of generated outputs from a language model on which Stereotypical Associations metric will be calculated.
Returns:: Stereotypical associations score
Return type:: float

langfair.metrics.stereotype.metrics.associations.StereotypicalAssociations#

This Page