langfair.generator.counterfactual.CounterfactualGenerator#

class langfair.generator.counterfactual.CounterfactualGenerator(langchain_llm=None, suppressed_exceptions=None, use_n_param=False, max_calls_per_min=None)#

Bases: ResponseGenerator

__init__(langchain_llm=None, suppressed_exceptions=None, use_n_param=False, max_calls_per_min=None)#

Class for parsing and replacing protected attribute words.

For the full list of gender and race words, refer to pages/cvs-health

Parameters:

langchain_llm (langchain BaseChatModel, default=None) – A langchain llm BaseChatModel. User is responsible for specifying temperature and other relevant parameters to the constructor of their langchain_llm object.
suppressed_exceptions (tuple or dict, default=None) – If a tuple, specifies which exceptions to handle as ‘Unable to get response’ rather than raising the exception. If a dict, enables users to specify exception-specific failure messages with keys being subclasses of BaseException
use_n_param (bool, default=False) – Specifies whether to use n parameter for BaseChatModel. Not compatible with all BaseChatModel classes. If used, it speeds up the generation process substantially when count > 1.
max_calls_per_min (int, default=None) – [Deprecated] Use LangChain’s InMemoryRateLimiter instead.

Methods

`__init__`([langchain_llm, ...])	Class for parsing and replacing protected attribute words.
`check_ftu`(prompts[, attribute, custom_list, ...])	Checks for fairness through unawarenss (FTU) based on a list of prompts and a specified protected attribute
`create_prompts`(prompts[, attribute, custom_dict])	Creates prompts by counterfactual substitution
`estimate_token_cost`(tiktoken_model_name, ...)	Estimates the token cost for a given list of prompts and (optionally) example responses.
`generate_responses`(prompts[, attribute, ...])	Creates prompts by counterfactual substitution and generates responses asynchronously
`neutralize_tokens`(texts[, attribute])	Neutralize gender and race words contained in a list of texts.
`parse_texts`(texts[, attribute, custom_list])	Parses a list of texts for protected attribute words

check_ftu(prompts, attribute=None, custom_list=None, subset_prompts=True)#

Checks for fairness through unawarenss (FTU) based on a list of prompts and a specified protected attribute

Parameters:

prompts (list of strings) – A list of prompts to be parsed for protected attribute words
attribute ({'race','gender'}, default=None) – Specifies what to parse for among race words and gender words. Must be specified if custom_list is None
custom_list (List[str], default=None) – Custom list of tokens to use for parsing prompts. Must be provided if attribute is None.
subset_prompts (bool, default=True) – Indicates whether to return all prompts or only those containing attribute words

Returns:

A dictionary with two keys: ‘data’ and ‘metadata’.

’data’dict

A dictionary containing the prompts and the attribute words they contain.

’prompt’list: A list of prompts.
’attribute_words’list: A list of attribute_words in each prompt.

’metadata’dict

A dictionary containing metadata related to FTU.

’ftu_satisfied’boolean: Boolean indicator of whether or not prompts satisfy FTU
’filtered_prompt_count’int: The number of prompts that satisfy FTU.

Return type:

dict

create_prompts(prompts, attribute=None, custom_dict=None)#

Creates prompts by counterfactual substitution

Parameters:

prompts (List[str]) – A list of prompts on which counterfactual substitution and response generation will be done
attribute ({'gender', 'race'}, default=None) – Specifies whether to use race or gender for counterfactual substitution. Must be provided if custom_dict is None.
custom_dict (Dict[str, List[str]], default=None) – A dictionary containing corresponding lists of tokens for counterfactual substitution. Keys should correspond to groups. Must be provided if attribute is None. For example: {‘male’: [‘he’, ‘him’, ‘woman’], ‘female’: [‘she’, ‘her’, ‘man’]}

Returns:

Dictionary containing counterfactual prompts

Return type:

dict

async estimate_token_cost(tiktoken_model_name, prompts, attribute, example_responses=None, response_sample_size=30, system_prompt='You are a helpful assistant', count=25)#

Estimates the token cost for a given list of prompts and (optionally) example responses. Note: This method is only compatible with GPT models.

Parameters:

prompts (list of strings) – A list of prompts
tiktoken_model_name (str) – The name of the OpenAI model to use for token counting.
attribute (str, either 'gender' or 'race') – Specifies attribute to be used for counterfactual generation
example_responses (list of strings, default=None) – A list of example responses. If provided, the function will estimate the response tokens based on these examples
response_sample_size (int, default = 30.) – The number of responses to generate for cost estimation if example_responses is not provided.
system_prompt (str, default="You are a helpful assistant.") – The system prompt to use.
count (int, default=25) – The number of generations per prompt used when estimating cost.

Returns:

A dictionary containing the estimated token costs, including prompt token cost, completion token cost, and total token cost.

Return type:

dict

async generate_responses(prompts, attribute=None, system_prompt='You are a helpful assistant.', count=25, custom_dict=None)#

Creates prompts by counterfactual substitution and generates responses asynchronously

Parameters:

prompts (list of strings) – A list of prompts on which counterfactual substitution and response generation will be done
attribute ({'gender', 'race'}, default=None) – Specifies whether to use race or gender for counterfactual substitution. Must be provided if custom_dict is None.
custom_dict (Dict[str, List[str]], default=None) – A dictionary containing corresponding lists of tokens for counterfactual substitution. Keys should correspond to groups. Must be provided if attribute is None. For example: {‘male’: [‘he’, ‘him’, ‘woman’], ‘female’: [‘she’, ‘her’, ‘man’]}
system_prompt (str, default="You are a helpful assistant.") – Specifies system prompt for generation
count (int, default=25) – Specifies number of responses to generate for each prompt.

Returns:

A dictionary with two keys: ‘data’ and ‘metadata’.

’data’dict

A dictionary containing the prompts and responses.

’prompt’list: A list of prompts.
’response’list: A list of responses corresponding to the prompts.

’metadata’dict

A dictionary containing metadata about the generation process.

’non_completion_rate’float: The rate at which the generation process did not complete.
’temperature’float: The temperature parameter used in the generation process.
’count’int: The count of prompts used in the generation process.
’system_prompt’str: The system prompt used for generating responses

Return type:

dict

neutralize_tokens(texts, attribute='gender')#

Neutralize gender and race words contained in a list of texts. Replaces gender words with a gender-neutral equivalent and race words with “[MASK]”.

Parameters:

texts (List[str]) – A list of texts on which gender or race neutralization will occur
attribute ({'gender', 'race'}, default='gender') – Specifies whether to use race or gender for neutralization

Returns:

List of texts neutralized for race or gender

Return type:

list

parse_texts(texts, attribute=None, custom_list=None)#

Parses a list of texts for protected attribute words

Parameters:

texts (list of strings) – A list of texts to be parsed for protected attribute words
attribute ({'race','gender'}, default=None) – Specifies what to parse for among race words and gender words. Must be specified if custom_list is None
custom_list (List[str], default=None) – Custom list of tokens to use for parsing prompts. Must be provided if attribute is None.

Returns:

List of length len(texts) with each element being a list of identified protected attribute words in provided text

Return type:

list

langfair.generator.counterfactual.CounterfactualGenerator#

This Page