langfair.generator.counterfactual.CounterfactualGenerator#
- class langfair.generator.counterfactual.CounterfactualGenerator(langchain_llm=None, suppressed_exceptions=None, use_n_param=False, max_calls_per_min=None)#
- Bases: - ResponseGenerator- __init__(langchain_llm=None, suppressed_exceptions=None, use_n_param=False, max_calls_per_min=None)#
- Class for parsing and replacing protected attribute words. - For the full list of gender and race words, refer to pages/cvs-health - Parameters:
- langchain_llm (langchain BaseChatModel, default=None) – A langchain llm BaseChatModel. User is responsible for specifying temperature and other relevant parameters to the constructor of their langchain_llm object. 
- suppressed_exceptions (tuple or dict, default=None) – If a tuple, specifies which exceptions to handle as ‘Unable to get response’ rather than raising the exception. If a dict, enables users to specify exception-specific failure messages with keys being subclasses of BaseException 
- use_n_param (bool, default=False) – Specifies whether to use n parameter for BaseChatModel. Not compatible with all BaseChatModel classes. If used, it speeds up the generation process substantially when count > 1. 
- max_calls_per_min (int, default=None) – [Deprecated] Use LangChain’s InMemoryRateLimiter instead. 
 
 
 - Methods - __init__([langchain_llm, ...])- Class for parsing and replacing protected attribute words. - check_ftu(prompts[, attribute, custom_list, ...])- Checks for fairness through unawarenss (FTU) based on a list of prompts and a specified protected attribute - create_prompts(prompts[, attribute, custom_dict])- Creates prompts by counterfactual substitution - estimate_token_cost(tiktoken_model_name, ...)- Estimates the token cost for a given list of prompts and (optionally) example responses. - generate_responses(prompts[, attribute, ...])- Creates prompts by counterfactual substitution and generates responses asynchronously - neutralize_tokens(texts[, attribute])- Neutralize gender and race words contained in a list of texts. - parse_texts(texts[, attribute, custom_list])- Parses a list of texts for protected attribute words - check_ftu(prompts, attribute=None, custom_list=None, subset_prompts=True)#
- Checks for fairness through unawarenss (FTU) based on a list of prompts and a specified protected attribute - Parameters:
- prompts (list of strings) – A list of prompts to be parsed for protected attribute words 
- attribute ({'race','gender'}, default=None) – Specifies what to parse for among race words and gender words. Must be specified if custom_list is None 
- custom_list (List[str], default=None) – Custom list of tokens to use for parsing prompts. Must be provided if attribute is None. 
- subset_prompts (bool, default=True) – Indicates whether to return all prompts or only those containing attribute words 
 
- Returns:
- A dictionary with two keys: ‘data’ and ‘metadata’. - ’data’dict
- A dictionary containing the prompts and the attribute words they contain. - ’prompt’list
- A list of prompts. 
- ’attribute_words’list
- A list of attribute_words in each prompt. 
 
- ’metadata’dict
- A dictionary containing metadata related to FTU. - ’ftu_satisfied’boolean
- Boolean indicator of whether or not prompts satisfy FTU 
- ’filtered_prompt_count’int
- The number of prompts that satisfy FTU. 
 
 
- Return type:
- dict 
 
 - create_prompts(prompts, attribute=None, custom_dict=None)#
- Creates prompts by counterfactual substitution - Parameters:
- prompts (List[str]) – A list of prompts on which counterfactual substitution and response generation will be done 
- attribute ({'gender', 'race'}, default=None) – Specifies whether to use race or gender for counterfactual substitution. Must be provided if custom_dict is None. 
- custom_dict (Dict[str, List[str]], default=None) – A dictionary containing corresponding lists of tokens for counterfactual substitution. Keys should correspond to groups. Must be provided if attribute is None. For example: {‘male’: [‘he’, ‘him’, ‘woman’], ‘female’: [‘she’, ‘her’, ‘man’]} 
 
- Returns:
- Dictionary containing counterfactual prompts 
- Return type:
- dict 
 
 - async estimate_token_cost(tiktoken_model_name, prompts, attribute, example_responses=None, response_sample_size=30, system_prompt='You are a helpful assistant', count=25)#
- Estimates the token cost for a given list of prompts and (optionally) example responses. Note: This method is only compatible with GPT models. - Parameters:
- prompts (list of strings) – A list of prompts 
- tiktoken_model_name (str) – The name of the OpenAI model to use for token counting. 
- attribute (str, either 'gender' or 'race') – Specifies attribute to be used for counterfactual generation 
- example_responses (list of strings, default=None) – A list of example responses. If provided, the function will estimate the response tokens based on these examples 
- response_sample_size (int, default = 30.) – The number of responses to generate for cost estimation if example_responses is not provided. 
- system_prompt (str, default="You are a helpful assistant.") – The system prompt to use. 
- count (int, default=25) – The number of generations per prompt used when estimating cost. 
 
- Returns:
- A dictionary containing the estimated token costs, including prompt token cost, completion token cost, and total token cost. 
- Return type:
- dict 
 
 - async generate_responses(prompts, attribute=None, system_prompt='You are a helpful assistant.', count=25, custom_dict=None)#
- Creates prompts by counterfactual substitution and generates responses asynchronously - Parameters:
- prompts (list of strings) – A list of prompts on which counterfactual substitution and response generation will be done 
- attribute ({'gender', 'race'}, default=None) – Specifies whether to use race or gender for counterfactual substitution. Must be provided if custom_dict is None. 
- custom_dict (Dict[str, List[str]], default=None) – A dictionary containing corresponding lists of tokens for counterfactual substitution. Keys should correspond to groups. Must be provided if attribute is None. For example: {‘male’: [‘he’, ‘him’, ‘woman’], ‘female’: [‘she’, ‘her’, ‘man’]} 
- system_prompt (str, default="You are a helpful assistant.") – Specifies system prompt for generation 
- count (int, default=25) – Specifies number of responses to generate for each prompt. 
 
- Returns:
- A dictionary with two keys: ‘data’ and ‘metadata’. - ’data’dict
- A dictionary containing the prompts and responses. - ’prompt’list
- A list of prompts. 
- ’response’list
- A list of responses corresponding to the prompts. 
 
- ’metadata’dict
- A dictionary containing metadata about the generation process. - ’non_completion_rate’float
- The rate at which the generation process did not complete. 
- ’temperature’float
- The temperature parameter used in the generation process. 
- ’count’int
- The count of prompts used in the generation process. 
- ’system_prompt’str
- The system prompt used for generating responses 
 
 
- Return type:
- dict 
 
 - neutralize_tokens(texts, attribute='gender')#
- Neutralize gender and race words contained in a list of texts. Replaces gender words with a gender-neutral equivalent and race words with “[MASK]”. - Parameters:
- texts (List[str]) – A list of texts on which gender or race neutralization will occur 
- attribute ({'gender', 'race'}, default='gender') – Specifies whether to use race or gender for neutralization 
 
- Returns:
- List of texts neutralized for race or gender 
- Return type:
- list 
 
 - parse_texts(texts, attribute=None, custom_list=None)#
- Parses a list of texts for protected attribute words - Parameters:
- texts (list of strings) – A list of texts to be parsed for protected attribute words 
- attribute ({'race','gender'}, default=None) – Specifies what to parse for among race words and gender words. Must be specified if custom_list is None 
- custom_list (List[str], default=None) – Custom list of tokens to use for parsing prompts. Must be provided if attribute is None. 
 
- Returns:
- List of length len(texts) with each element being a list of identified protected attribute words in provided text 
- Return type:
- list 
 
 
 
    