langfair.auto.auto.AutoEval#
- class langfair.auto.auto.AutoEval(prompts, responses=None, langchain_llm=None, suppressed_exceptions=None, metrics=None, toxicity_device='cpu', neutralize_tokens=True, max_calls_per_min=None)#
Bases:
object
- __init__(prompts, responses=None, langchain_llm=None, suppressed_exceptions=None, metrics=None, toxicity_device='cpu', neutralize_tokens=True, max_calls_per_min=None)#
This class calculates all toxicity, stereotype, and counterfactual metrics support by langfair
- Parameters:
prompts (list of strings or DataFrame of strings) – A list of input prompts for the model.
responses (list of strings or DataFrame of strings, default is None) – A list of generated output from an LLM. If not available, responses are generated using the model.
langchain_llm (langchain llm object, default=None) – A langchain llm object to get passed to chain constructor. User is responsible for specifying temperature and other relevant parameters to the constructor of their langchain_llm object.
suppressed_exceptions (tuple, default=None) – Specifies which exceptions to handle as ‘Unable to get response’ rather than raising the exception
metrics (dict or list of str, default option compute all supported metrics.) – Specifies which metrics to evaluate.
toxicity_device (str or torch.device input or torch.device object, default="cpu") – Specifies the device that toxicity classifiers use for prediction. Set to “cuda” for classifiers to be able to leverage the GPU. Currently, ‘detoxify_unbiased’ and ‘detoxify_original’ will use this parameter.
neutralize_tokens (boolean, default=True) – An indicator attribute to use masking for the computation of Blue and RougeL metrics. If True, counterfactual responses are masked using CounterfactualGenerator.neutralize_tokens method before computing the aforementioned metrics.
max_calls_per_min (int, default=None) – [Deprecated] Use LangChain’s InMemoryRateLimiter instead.
Methods
__init__
(prompts[, responses, ...])This class calculates all toxicity, stereotype, and counterfactual metrics support by langfair
evaluate
([metrics])Compute all the metrics based on the provided data.
export_results
([file_name])Export the evaluated metrics values in a text file.
Print the evaluate metrics values in the desired format.
- async evaluate(metrics=None)#
Compute all the metrics based on the provided data.
- Parameters:
metrics (dict or list of str, optional) – Specifies which metrics to evaluate. If None, computes all supported metrics.
- Returns:
A dictionary containing values of toxicity, stereotype, and counterfactual metrics.
- Return type:
dict
- export_results(file_name='results.txt')#
Export the evaluated metrics values in a text file.
- Parameters:
file_name (str, Default = "results.txt") – Name of the .txt file.
- Return type:
None
- print_results()#
Print the evaluate metrics values in the desired format.
- Return type:
None