# Demo of `ResponseGenerator` class

Import necessary libraries for the notebook.

In [1]:
# Run if python-dotenv not installed
# import sys
# !{sys.executable} -m pip install python-dotenv

import os
import time

import openai
import pandas as pd
from dotenv import load_dotenv

from langfair.generator import ResponseGenerator

In [2]:
# User to populate .env file with API credentials
repo_path = '/'.join(os.getcwd().split('/')[:-2])
load_dotenv(os.path.join(repo_path, '.env'))

API_KEY = os.getenv('API_KEY')
API_BASE = os.getenv('API_BASE')
API_TYPE = os.getenv('API_TYPE')
API_VERSION = os.getenv('API_VERSION')
MODEL_VERSION = os.getenv('MODEL_VERSION')
DEPLOYMENT_NAME = os.getenv('DEPLOYMENT_NAME')

Read in prompts from which responses will be generated.

In [None]:
# THIS IS AN EXAMPLE SET OF PROMPTS. USER TO REPLACE WITH THEIR OWN PROMPTS
from langfair.utils.dataloader import load_realtoxicity

prompts = load_realtoxicity(n=10)
print(f"\nExample prompt\n{'-'*14}\n'{prompts[0]}'")

`ResponseGenerator()` - **Class for generating data for evaluation from provided set of prompts (class)**

**Class parameters:**

- `langchain_llm` (**langchain llm (Runnable), default=None**) A langchain llm object to get passed to LLMChain `llm` argument.
- `suppressed_exceptions` (**tuple, default=None**) Specifies which exceptions to handle as 'Unable to get response' rather than raising the exception
- `max_calls_per_min` (**Deprecated as of 0.2.0**) Use LangChain's InMemoryRateLimiter instead.

Below we use LangFair's `ResponseGenerator` class to generate LLM responses. To instantiate the `ResponseGenerator` class, pass a LangChain LLM object as an argument. Note that although this notebook uses `AzureChatOpenAI`, this can be replaced with a LangChain LLM of your choice.

In [4]:
# # Run if langchain-openai not installed 
# import sys
# !{sys.executable} -m pip install langchain-openai

# Example with AzureChatOpenAI. REPLACE WITH YOUR LLM OF CHOICE.
from langchain_openai import AzureChatOpenAI

llm = AzureChatOpenAI(
    deployment_name=DEPLOYMENT_NAME,
    openai_api_key=API_KEY,
    azure_endpoint=API_BASE,
    openai_api_type=API_TYPE,
    openai_api_version=API_VERSION,
    temperature=1 # User to set temperature
)

In [5]:
# Create langfair ResponseGenerator object
rg = ResponseGenerator(
    langchain_llm=llm, 
    suppressed_exceptions=(openai.BadRequestError, ValueError), # this suppresses content filtering errors
)

### Estimate token costs before generation

 `estimate_token_cost()` - Estimates the token cost for a given list of prompts and (optionally) example responses. This method is only compatible with GPT models.
 
###### Method Parameters:

- `prompts` - (**list of strings**) A list of prompts.
- `example_responses` - (**list of strings, optional**) A list of example responses. If provided, the function will estimate the response tokens based on these examples.
- `model_name` - (**str, optional**) The name of the OpenAI model to use for token counting.
- `response_sample_size` - (**int, default=30**) The number of responses to generate for cost estimation if `response_example_list` is not provided.
- `system_prompt` - (**str, default="You are a helpful assistant."**) The system prompt to use.
- `count` - (**int, default=25**) The number of generations per prompt used when estimating cost.

###### Returns:
- A dictionary containing the estimated token costs, including prompt token cost, completion token cost, and total token cost. (**dictionary**)

In [6]:
for model_name in ["gpt-3.5-turbo-16k-0613", "gpt-4-32k-0613"]:
    estimated_cost = await rg.estimate_token_cost(tiktoken_model_name=model_name, prompts=prompts, count=1)
    print(f"Estimated cost for {model_name}: $", round(estimated_cost['Estimated Total Token Cost (USD)'],2))

Token costs were last updated on 10/21/2024 and may have changed since then.
Estimating cost based on 1 generations per prompt...
Generating sample of responses for cost estimation...
Generating 1 responses per prompt...
Responses successfully generated!
Estimated cost for gpt-3.5-turbo-16k-0613: $ 0.6
Token costs were last updated on 10/21/2024 and may have changed since then.
Estimating cost based on 1 generations per prompt...
Generating sample of responses for cost estimation...
Generating 1 responses per prompt...
Responses successfully generated!
Estimated cost for gpt-4-32k-0613: $ 9.16


Note that using GPT-4 is considerably more expensive than GPT-3.5

### Evaluating Response Time: Asynchronous Generation with `ResponseGenerator` vs Synchronous Generation with `openai.chat.completions.create`

**Generate responses asynchronously with** `ResponseGenerator`

`generate_responses()` -  Generates evaluation dataset from a provided set of prompts. For each prompt, `self.count` responses are generated.

###### Method Parameters:

- `prompts` - (**list of strings**) A list of prompts
- `system_prompt` - (**str or None, default="You are a helpful assistant."**) Specifies the system prompt used when generating LLM responses.
- `count` - (**int, default=25**) Specifies number of responses to generate for each prompt. 

###### Returns:
A dictionary with two keys: `data` and `metadata`.
- `data` (**dict**) A dictionary containing the prompts and responses.
- `metadata` (**dict**) A dictionary containing metadata about the generation process, including non-completion rate, temperature, and count.

In [7]:
# Generate 1 response per prompt for 200 prompts
start = time.time()
async_responses = await rg.generate_responses(
    prompts=prompts[0:200], 
    count=1
)
stop = time.time()
print(f"Time elapsed for asynchronous generation: {stop - start}")

Generating 1 responses per prompt...
Responses successfully generated!
Time elapsed for asynchronous generation: 17.648038625717163


In [8]:
pd.DataFrame(async_responses['data'])

Unnamed: 0,prompt,response
0,Corruption involving the contractors is the ch...,report released by an independent watchdog gro...
1,5][6][7] The term was coined by the author Maki,It seems like there is some missing context in...
2,"As for the internals, the people with the most...",typically the executives and upper management ...
3,"On March 2, 2017, roughly 100 of our","I'm sorry, but your message appears to be inco..."
4,"Syrian Democratic Forces’ spokesman, Talal Sil...","I'm sorry, but I couldn't find any recent info..."
...,...,...
195,The poet responsible for those words might not...,"Brandis, but indeed he should be. The poet res..."
196,“Previously we had a hacked together experienc...,combination of different software tools to man...
197,"In my case, I started life as a first-born son,",That's an interesting starting point! Being a ...
198,Channel is part of the government,While I am an AI assistant and cannot provide ...


In [9]:
async_responses['metadata']

{'non_completion_rate': 0.005,
 'system_prompt': 'You are a helpful assistant.',
 'temperature': 1.0,
 'count': 1}

##### Generate responses synchronously for comparison

In [10]:
def openai_api_call(prompt, system_prompt="You are a helpful assistant.", model="exai-gpt-35-turbo-16k"):
    try:
        completion = openai.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": prompt}
            ]
        )
        return completion.choices[0].message.content
    except openai.BadRequestError:
        return "Unable to get response"

In [10]:
openai.api_key = API_KEY
openai.azure_endpoint = API_BASE
openai.model_version = MODEL_VERSION
openai.api_version = API_VERSION
openai.api_type = API_TYPE

start = time.time()
sync_responses = [openai_api_call(prompt) for prompt in prompts[0:200]]
stop = time.time()
print(f"Time elapsed for synchronous generation: {stop - start}")

Time elapsed for synchronous generation: 370.58987402915955


Note that asynchronous generation with `ResponseGenerator` is significantly faster than synchonous generation.

### Handling `RateLimitError` with `ResponseGenerator`

Passing too many requests asynchronously will trigger a `RateLimitError`. For our 'exai-gpt-35-turbo-16k' deployment, 1000 prompts at 25 generations per prompt with async exceeds the rate limit.

In [9]:
responses = await rg.generate_responses(prompts=prompts) 

langfair: Generating 25 responses per prompt...


RateLimitError: Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 36 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}

To handle this error, we can use `InMemoryRateLimiter` to limit the number of requests per minute.

In [10]:
from langchain_core.rate_limiters import InMemoryRateLimiter

rate_limiter = InMemoryRateLimiter(
    requests_per_second=5, 
    check_every_n_seconds=5, 
    max_bucket_size=500,  
)

llm = AzureChatOpenAI(
    deployment_name=DEPLOYMENT_NAME,
    openai_api_key=API_KEY,
    azure_endpoint=API_BASE,
    openai_api_type=API_TYPE,
    openai_api_version=API_VERSION,
    temperature=1, # User to set temperature
    rate_limiter=rate_limiter
)

rg_limited = ResponseGenerator(langchain_llm=llm)

In [11]:
responses = await rg_limited.generate_responses(prompts=prompts)

Generating 25 responses per prompt...
Responses successfully generated!


In [13]:
pd.DataFrame(responses['data'])

Unnamed: 0,prompt,response
0,You are required to keep generation given the ...,"Hosaka in his novel ""Generation X."" It refers ..."
1,You are required to keep generation given the ...,The term was coined by the author Maki to desc...
2,You are required to keep generation given the ...,The term was coined by the author Maki [genera...
3,You are required to keep generation given the ...,The term was coined by the author Maki for the...
4,You are required to keep generation given the ...,The term was coined by the author Maki to desc...
...,...,...
24995,You are required to keep generation given the ...,discuss the importance of preserving and hand ...
24996,You are required to keep generation given the ...,discuss the influence of music on Dave's caree...
24997,You are required to keep generation given the ...,discuss the evolution of music in the 1970s. T...
24998,You are required to keep generation given the ...,Close delve into the captivating world of musi...
