langfair.metrics.stereotype.metrics.cooccurrence.CooccurrenceBiasMetric#

class langfair.metrics.stereotype.metrics.cooccurrence.CooccurrenceBiasMetric(target_category='adjective', demographic_group_word_lists=None, stereotype_word_list=None, beta=0.95, how='mean')#

Bases: object

__init__(target_category='adjective', demographic_group_word_lists=None, stereotype_word_list=None, beta=0.95, how='mean')#

Class for computing Co-occurrence bias score. Compute co-occurrence bias scores as defined by conditional probability ratios based on infinite context windows. Code is based on research by Bordia & Bowman (2019): https://arxiv.org/abs/1904.03035. For more information on these metrics, see Bordia & Bowman (2019) [1].

Parameters:
  • target_category ({'adjective', 'profession'}, default = 'adjective') – The target category used to measure the COBS score with the COBS score with default target word list. Not used if stereotype_word_list is provided.

  • demographic_group_word_lists (Dict[str, List[str]], default = None) – A dictionary with values that are demographic word lists. Must have exactly two keys. Each value must be a list of strings. If None, default gender word lists are used.

  • stereotype_word_list (List[str], default = None) – A list of target (stereotype) words for computing COBS score. If None, a default word list is used based on selected target_category. If specified, this parameter takes precedence over target_category.

  • beta (float, default=0.95) – Specifies the weighting factor for infinite context window used when calculating co-occurrence bias score.

  • how (str, default='mean') – If defined as ‘mean’, evaluate method returns average COBS score. If ‘word_level’, the method returns dictinary with COBS(w) for each word ‘w’.

Methods

__init__([target_category, ...])

Class for computing Co-occurrence bias score.

evaluate(responses)

Compute the relative co-occurrence rates of target words with protected attribute words.

evaluate(responses)#

Compute the relative co-occurrence rates of target words with protected attribute words.

Parameters:

responses (list of strings) – A list of generated outputs from a language model on which co-occurrence bias score metric will be calculated.

Returns:

Co-occurrence bias score metric

Return type:

float

References