langfair.metrics.stereotype.metrics.cooccurrence.CooccurrenceBiasMetric#
- class langfair.metrics.stereotype.metrics.cooccurrence.CooccurrenceBiasMetric(target_category='adjective', demographic_group_word_lists=None, stereotype_word_list=None, beta=0.95, how='mean')#
Bases:
object
- __init__(target_category='adjective', demographic_group_word_lists=None, stereotype_word_list=None, beta=0.95, how='mean')#
Class for computing Co-occurrence bias score. Compute co-occurrence bias scores as defined by conditional probability ratios based on infinite context windows. Code is based on research by Bordia & Bowman (2019): https://arxiv.org/abs/1904.03035. For more information on these metrics, see Bordia & Bowman (2019) [1].
- Parameters:
target_category ({'adjective', 'profession'}, default = 'adjective') – The target category used to measure the COBS score with the COBS score with default target word list. Not used if stereotype_word_list is provided.
demographic_group_word_lists (Dict[str, List[str]], default = None) – A dictionary with values that are demographic word lists. Must have exactly two keys. Each value must be a list of strings. If None, default gender word lists are used.
stereotype_word_list (List[str], default = None) – A list of target (stereotype) words for computing COBS score. If None, a default word list is used based on selected target_category. If specified, this parameter takes precedence over target_category.
beta (float, default=0.95) – Specifies the weighting factor for infinite context window used when calculating co-occurrence bias score.
how (str, default='mean') – If defined as ‘mean’, evaluate method returns average COBS score. If ‘word_level’, the method returns dictinary with COBS(w) for each word ‘w’.
Methods
__init__
([target_category, ...])Class for computing Co-occurrence bias score.
evaluate
(responses)Compute the relative co-occurrence rates of target words with protected attribute words.
- evaluate(responses)#
Compute the relative co-occurrence rates of target words with protected attribute words.
- Parameters:
responses (list of strings) – A list of generated outputs from a language model on which co-occurrence bias score metric will be calculated.
- Returns:
Co-occurrence bias score metric
- Return type:
float
References