langfair.generator.counterfactual.CounterfactualGenerator#
- class langfair.generator.counterfactual.CounterfactualGenerator(langchain_llm=None, suppressed_exceptions=None, max_calls_per_min=None)#
Bases:
ResponseGenerator
- __init__(langchain_llm=None, suppressed_exceptions=None, max_calls_per_min=None)#
Class for parsing and replacing protected attribute words.
For the full list of gender and race words, refer to pages/cvs-health
- Parameters:
langchain_llm (langchain llm object, default=None) – A langchain llm object to get passed to chain constructor. User is responsible for specifying temperature and other relevant parameters to the constructor of their langchain_llm object.
suppressed_exceptions (tuple, default=None) – Specifies which exceptions to handle as ‘Unable to get response’ rather than raising the exception
max_calls_per_min (int, default=None) – [Deprecated] Use LangChain’s InMemoryRateLimiter instead.
Methods
__init__
([langchain_llm, ...])Class for parsing and replacing protected attribute words.
create_prompts
(prompts[, attribute, custom_dict])Creates prompts by counterfactual substitution
estimate_token_cost
(tiktoken_model_name, ...)Estimates the token cost for a given list of prompts and (optionally) example responses.
generate_responses
(prompts[, attribute, ...])Creates prompts by counterfactual substitution and generates responses asynchronously
neutralize_tokens
(texts[, attribute])Neutralize gender and race words contained in a list of texts.
parse_texts
(texts[, attribute, custom_list])Parses a list of texts for protected attribute words
- create_prompts(prompts, attribute=None, custom_dict=None)#
Creates prompts by counterfactual substitution
- Parameters:
prompts (List[str]) – A list of prompts on which counterfactual substitution and response generation will be done
attribute ({'gender', 'race'}, default=None) – Specifies whether to use race or gender for counterfactual substitution. Must be provided if custom_dict is None.
custom_dict (Dict[str, List[str]], default=None) – A dictionary containing corresponding lists of tokens for counterfactual substitution. Keys should correspond to groups. Must be provided if attribute is None. For example: {‘male’: [‘he’, ‘him’, ‘woman’], ‘female’: [‘she’, ‘her’, ‘man’]}
- Returns:
Dictionary containing counterfactual prompts
- Return type:
dict
- async estimate_token_cost(tiktoken_model_name, prompts, attribute, example_responses=None, response_sample_size=30, system_prompt='You are a helpful assistant', count=25)#
Estimates the token cost for a given list of prompts and (optionally) example responses. Note: This method is only compatible with GPT models.
- Parameters:
prompts (list of strings) – A list of prompts
tiktoken_model_name (str) – The name of the OpenAI model to use for token counting.
attribute (str, either 'gender' or 'race') – Specifies attribute to be used for counterfactual generation
example_responses (list of strings, default=None) – A list of example responses. If provided, the function will estimate the response tokens based on these examples
response_sample_size (int, default = 30.) – The number of responses to generate for cost estimation if example_responses is not provided.
system_prompt (str, default="You are a helpful assistant.") – The system prompt to use.
count (int, default=25) – The number of generations per prompt used when estimating cost.
- Returns:
A dictionary containing the estimated token costs, including prompt token cost, completion token cost, and total token cost.
- Return type:
dict
- async generate_responses(prompts, attribute=None, system_prompt='You are a helpful assistant.', count=25, custom_dict=None)#
Creates prompts by counterfactual substitution and generates responses asynchronously
- Parameters:
prompts (list of strings) – A list of prompts on which counterfactual substitution and response generation will be done
attribute ({'gender', 'race'}, default=None) – Specifies whether to use race or gender for counterfactual substitution. Must be provided if custom_dict is None.
custom_dict (Dict[str, List[str]], default=None) – A dictionary containing corresponding lists of tokens for counterfactual substitution. Keys should correspond to groups. Must be provided if attribute is None. For example: {‘male’: [‘he’, ‘him’, ‘woman’], ‘female’: [‘she’, ‘her’, ‘man’]}
system_prompt (str, default="You are a helpful assistant.") – Specifies system prompt for generation
count (int, default=25) – Specifies number of responses to generate for each prompt.
- Returns:
A dictionary with two keys: ‘data’ and ‘metadata’. ‘data’ : dict
A dictionary containing the prompts and responses.
- ’metadata’dict
A dictionary containing metadata about the generation process. ‘non_completion_rate’ : float
The rate at which the generation process did not complete.
- ’temperature’float
The temperature parameter used in the generation process.
- ’count’int
The count of prompts used in the generation process.
- ’system_prompt’str
The system prompt used for generating responses
- Return type:
dict
- neutralize_tokens(texts, attribute='gender')#
Neutralize gender and race words contained in a list of texts. Replaces gender words with a gender-neutral equivalent and race words with “[MASK]”.
- Parameters:
texts (List[str]) – A list of texts on which gender or race neutralization will occur
attribute ({'gender', 'race'}, default='gender') – Specifies whether to use race or gender for neutralization
- Returns:
List of texts neutralized for race or gender
- Return type:
list
- parse_texts(texts, attribute=None, custom_list=None)#
Parses a list of texts for protected attribute words
- Parameters:
texts (list of strings) – A list of texts to be parsed for protected attribute words
attribute ({'race','gender'}, default=None) – Specifies what to parse for among race words and gender words. Must be specified if custom_list is None
custom_list (List[str], default=None) – Custom list of tokens to use for parsing prompts. Must be provided if attribute is None.
- Returns:
List of length len(texts) with each element being a list of identified protected attribute words in provided text
- Return type:
list