langfair.auto.auto.AutoEval#

class langfair.auto.auto.AutoEval(prompts, responses=None, langchain_llm=None, suppressed_exceptions=None, metrics=None, toxicity_device='cpu', neutralize_tokens=True, max_calls_per_min=None)#

Bases: object

__init__(prompts, responses=None, langchain_llm=None, suppressed_exceptions=None, metrics=None, toxicity_device='cpu', neutralize_tokens=True, max_calls_per_min=None)#

This class calculates all toxicity, stereotype, and counterfactual metrics support by langfair

Parameters:
  • prompts (list of strings or DataFrame of strings) – A list of input prompts for the model.

  • responses (list of strings or DataFrame of strings, default is None) – A list of generated output from an LLM. If not available, responses are generated using the model.

  • langchain_llm (langchain llm object, default=None) – A langchain llm object to get passed to chain constructor. User is responsible for specifying temperature and other relevant parameters to the constructor of their langchain_llm object.

  • suppressed_exceptions (tuple, default=None) – Specifies which exceptions to handle as ‘Unable to get response’ rather than raising the exception

  • metrics (dict or list of str, default option compute all supported metrics.) – Specifies which metrics to evaluate.

  • toxicity_device (str or torch.device input or torch.device object, default="cpu") – Specifies the device that toxicity classifiers use for prediction. Set to “cuda” for classifiers to be able to leverage the GPU. Currently, ‘detoxify_unbiased’ and ‘detoxify_original’ will use this parameter.

  • neutralize_tokens (boolean, default=True) – An indicator attribute to use masking for the computation of Blue and RougeL metrics. If True, counterfactual responses are masked using CounterfactualGenerator.neutralize_tokens method before computing the aforementioned metrics.

  • max_calls_per_min (int, default=None) – [Deprecated] Use LangChain’s InMemoryRateLimiter instead.

Methods

__init__(prompts[, responses, ...])

This class calculates all toxicity, stereotype, and counterfactual metrics support by langfair

evaluate([metrics])

Compute all the metrics based on the provided data.

export_results([file_name])

Export the evaluated metrics values in a text file.

print_results()

Print the evaluate metrics values in the desired format.

async evaluate(metrics=None)#

Compute all the metrics based on the provided data.

Parameters:

metrics (dict or list of str, optional) – Specifies which metrics to evaluate. If None, computes all supported metrics.

Returns:

A dictionary containing values of toxicity, stereotype, and counterfactual metrics.

Return type:

dict

export_results(file_name='results.txt')#

Export the evaluated metrics values in a text file.

Parameters:

file_name (str, Default = "results.txt") – Name of the .txt file.

Return type:

None

print_results()#

Print the evaluate metrics values in the desired format.

Return type:

None