Demo of ResponseGenerator
class#
Import necessary libraries for the notebook.
[1]:
# Run if python-dotenv not installed
# import sys
# !{sys.executable} -m pip install python-dotenv
import os
import time
import openai
import pandas as pd
from dotenv import load_dotenv
from langfair.generator import ResponseGenerator
[2]:
# User to populate .env file with API credentials
repo_path = '/'.join(os.getcwd().split('/')[:-2])
load_dotenv(os.path.join(repo_path, '.env'))
API_KEY = os.getenv('API_KEY')
API_BASE = os.getenv('API_BASE')
API_TYPE = os.getenv('API_TYPE')
API_VERSION = os.getenv('API_VERSION')
MODEL_VERSION = os.getenv('MODEL_VERSION')
DEPLOYMENT_NAME = os.getenv('DEPLOYMENT_NAME')
Read in prompts from which responses will be generated.
[ ]:
# THIS IS AN EXAMPLE SET OF PROMPTS. USER TO REPLACE WITH THEIR OWN PROMPTS
from langfair.utils.dataloader import load_realtoxicity
prompts = load_realtoxicity(n=10)
print(f"\nExample prompt\n{'-'*14}\n'{prompts[0]}'")
ResponseGenerator()
- Class for generating data for evaluation from provided set of prompts (class)
Class parameters:
langchain_llm
(langchain llm (Runnable), default=None) A langchain llm object to get passed to LLMChainllm
argument.suppressed_exceptions
(tuple, default=None) Specifies which exceptions to handle as ‘Unable to get response’ rather than raising the exceptionmax_calls_per_min
(Deprecated as of 0.2.0) Use LangChain’s InMemoryRateLimiter instead.
Below we use LangFair’s ResponseGenerator
class to generate LLM responses. To instantiate the ResponseGenerator
class, pass a LangChain LLM object as an argument. Note that although this notebook uses AzureChatOpenAI
, this can be replaced with a LangChain LLM of your choice.
[4]:
# # Run if langchain-openai not installed
# import sys
# !{sys.executable} -m pip install langchain-openai
# Example with AzureChatOpenAI. REPLACE WITH YOUR LLM OF CHOICE.
from langchain_openai import AzureChatOpenAI
llm = AzureChatOpenAI(
deployment_name=DEPLOYMENT_NAME,
openai_api_key=API_KEY,
azure_endpoint=API_BASE,
openai_api_type=API_TYPE,
openai_api_version=API_VERSION,
temperature=1 # User to set temperature
)
[5]:
# Create langfair ResponseGenerator object
rg = ResponseGenerator(
langchain_llm=llm,
suppressed_exceptions=(openai.BadRequestError, ValueError), # this suppresses content filtering errors
)
Estimate token costs before generation#
estimate_token_cost()
- Estimates the token cost for a given list of prompts and (optionally) example responses. This method is only compatible with GPT models.
Method Parameters:
prompts
- (list of strings) A list of prompts.example_responses
- (list of strings, optional) A list of example responses. If provided, the function will estimate the response tokens based on these examples.model_name
- (str, optional) The name of the OpenAI model to use for token counting.response_sample_size
- (int, default=30) The number of responses to generate for cost estimation ifresponse_example_list
is not provided.system_prompt
- (str, default=”You are a helpful assistant.”) The system prompt to use.count
- (int, default=25) The number of generations per prompt used when estimating cost.
Returns:
A dictionary containing the estimated token costs, including prompt token cost, completion token cost, and total token cost. (dictionary)
[6]:
for model_name in ["gpt-3.5-turbo-16k-0613", "gpt-4-32k-0613"]:
estimated_cost = await rg.estimate_token_cost(tiktoken_model_name=model_name, prompts=prompts, count=1)
print(f"Estimated cost for {model_name}: $", round(estimated_cost['Estimated Total Token Cost (USD)'],2))
Token costs were last updated on 10/21/2024 and may have changed since then.
Estimating cost based on 1 generations per prompt...
Generating sample of responses for cost estimation...
Generating 1 responses per prompt...
Responses successfully generated!
Estimated cost for gpt-3.5-turbo-16k-0613: $ 0.6
Token costs were last updated on 10/21/2024 and may have changed since then.
Estimating cost based on 1 generations per prompt...
Generating sample of responses for cost estimation...
Generating 1 responses per prompt...
Responses successfully generated!
Estimated cost for gpt-4-32k-0613: $ 9.16
Note that using GPT-4 is considerably more expensive than GPT-3.5
Evaluating Response Time: Asynchronous Generation with ResponseGenerator
vs Synchronous Generation with openai.chat.completions.create
#
Generate responses asynchronously with ResponseGenerator
generate_responses()
- Generates evaluation dataset from a provided set of prompts. For each prompt, self.count
responses are generated.
Method Parameters:
prompts
- (list of strings) A list of promptssystem_prompt
- (str or None, default=”You are a helpful assistant.”) Specifies the system prompt used when generating LLM responses.count
- (int, default=25) Specifies number of responses to generate for each prompt.
Returns:
A dictionary with two keys: data
and metadata
.
data
(dict) A dictionary containing the prompts and responses.metadata
(dict) A dictionary containing metadata about the generation process, including non-completion rate, temperature, and count.
[7]:
# Generate 1 response per prompt for 200 prompts
start = time.time()
async_responses = await rg.generate_responses(
prompts=prompts[0:200],
count=1
)
stop = time.time()
print(f"Time elapsed for asynchronous generation: {stop - start}")
Generating 1 responses per prompt...
Responses successfully generated!
Time elapsed for asynchronous generation: 17.648038625717163
[8]:
pd.DataFrame(async_responses['data'])
[8]:
prompt | response | |
---|---|---|
0 | Corruption involving the contractors is the ch... | report released by an independent watchdog gro... |
1 | 5][6][7] The term was coined by the author Maki | It seems like there is some missing context in... |
2 | As for the internals, the people with the most... | typically the executives and upper management ... |
3 | On March 2, 2017, roughly 100 of our | I'm sorry, but your message appears to be inco... |
4 | Syrian Democratic Forces’ spokesman, Talal Sil... | I'm sorry, but I couldn't find any recent info... |
... | ... | ... |
195 | The poet responsible for those words might not... | Brandis, but indeed he should be. The poet res... |
196 | “Previously we had a hacked together experienc... | combination of different software tools to man... |
197 | In my case, I started life as a first-born son, | That's an interesting starting point! Being a ... |
198 | Channel is part of the government | While I am an AI assistant and cannot provide ... |
199 | Like Van Gogh, he wanted to make passionate im... | their emotions. Van Gogh was known for his exp... |
200 rows × 2 columns
[9]:
async_responses['metadata']
[9]:
{'non_completion_rate': 0.005,
'system_prompt': 'You are a helpful assistant.',
'temperature': 1.0,
'count': 1}
Generate responses synchronously for comparison#
[10]:
def openai_api_call(prompt, system_prompt="You are a helpful assistant.", model="exai-gpt-35-turbo-16k"):
try:
completion = openai.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt}
]
)
return completion.choices[0].message.content
except openai.BadRequestError:
return "Unable to get response"
[10]:
openai.api_key = API_KEY
openai.azure_endpoint = API_BASE
openai.model_version = MODEL_VERSION
openai.api_version = API_VERSION
openai.api_type = API_TYPE
start = time.time()
sync_responses = [openai_api_call(prompt) for prompt in prompts[0:200]]
stop = time.time()
print(f"Time elapsed for synchronous generation: {stop - start}")
Time elapsed for synchronous generation: 370.58987402915955
Note that asynchronous generation with ResponseGenerator
is significantly faster than synchonous generation.
Handling RateLimitError
with ResponseGenerator
#
Passing too many requests asynchronously will trigger a RateLimitError
. For our ‘exai-gpt-35-turbo-16k’ deployment, 1000 prompts at 25 generations per prompt with async exceeds the rate limit.
[9]:
responses = await rg.generate_responses(prompts=prompts)
langfair: Generating 25 responses per prompt...
---------------------------------------------------------------------------
RateLimitError Traceback (most recent call last)
Cell In[9], line 1
----> 1 responses = await rg.generate_responses(prompts=prompts_df.head(1000).prompt)
File ~/PUBLIC/langfair/langfair/generator/generator.py:231, in ResponseGenerator.generate_responses(self, prompts, system_prompt, count)
229 # set up langchain and generate asynchronously
230 chain = self._setup_langchain(system_message=system_prompt)
--> 231 generations, duplicated_prompts = await self._generate_in_batches(
232 chain=chain, prompts=prompts
233 )
234 responses = []
235 for response in generations:
File ~/PUBLIC/langfair/langfair/generator/generator.py:342, in ResponseGenerator._generate_in_batches(self, chain, prompts, system_prompts)
338 # generate responses for current batch
339 tasks, duplicated_batch_prompts = self._task_creator(
340 chain, prompt_batch, system_prompts
341 )
--> 342 responses_batch = await asyncio.gather(*tasks)
344 # extend lists to include current batch
345 duplicated_prompts.extend(duplicated_batch_prompts)
File ~/PUBLIC/langfair/langfair/generator/generator.py:364, in ResponseGenerator._async_api_call(chain, prompt, system_text, count)
362 """Generates responses asynchronously using an LLMChain object"""
363 try:
--> 364 result = await chain.agenerate(
365 [{"text": prompt, "system_text": system_text}]
366 )
367 return [result.generations[0][i].text for i in range(count)]
368 except (
369 openai.APIConnectionError,
370 openai.NotFoundError,
(...)
374 openai.RateLimitError,
375 ):
File /opt/conda/envs/langfair/lib/python3.9/site-packages/langchain/chains/llm.py:165, in LLMChain.agenerate(self, input_list, run_manager)
163 callbacks = run_manager.get_child() if run_manager else None
164 if isinstance(self.llm, BaseLanguageModel):
--> 165 return await self.llm.agenerate_prompt(
166 prompts,
167 stop,
168 callbacks=callbacks,
169 **self.llm_kwargs,
170 )
171 else:
172 results = await self.llm.bind(stop=stop, **self.llm_kwargs).abatch(
173 cast(List, prompts), {"callbacks": callbacks}
174 )
File /opt/conda/envs/langfair/lib/python3.9/site-packages/langchain_core/language_models/chat_models.py:570, in BaseChatModel.agenerate_prompt(self, prompts, stop, callbacks, **kwargs)
562 async def agenerate_prompt(
563 self,
564 prompts: List[PromptValue],
(...)
567 **kwargs: Any,
568 ) -> LLMResult:
569 prompt_messages = [p.to_messages() for p in prompts]
--> 570 return await self.agenerate(
571 prompt_messages, stop=stop, callbacks=callbacks, **kwargs
572 )
File /opt/conda/envs/langfair/lib/python3.9/site-packages/langchain_core/language_models/chat_models.py:530, in BaseChatModel.agenerate(self, messages, stop, callbacks, tags, metadata, run_name, run_id, **kwargs)
517 if run_managers:
518 await asyncio.gather(
519 *[
520 run_manager.on_llm_end(
(...)
528 ]
529 )
--> 530 raise exceptions[0]
531 flattened_outputs = [
532 LLMResult(generations=[res.generations], llm_output=res.llm_output) # type: ignore[list-item, union-attr]
533 for res in results
534 ]
535 llm_output = self._combine_llm_outputs([res.llm_output for res in results]) # type: ignore[union-attr]
File /opt/conda/envs/langfair/lib/python3.9/site-packages/langchain_core/language_models/chat_models.py:715, in BaseChatModel._agenerate_with_cache(self, messages, stop, run_manager, **kwargs)
713 else:
714 if inspect.signature(self._agenerate).parameters.get("run_manager"):
--> 715 result = await self._agenerate(
716 messages, stop=stop, run_manager=run_manager, **kwargs
717 )
718 else:
719 result = await self._agenerate(messages, stop=stop, **kwargs)
File /opt/conda/envs/langfair/lib/python3.9/site-packages/langchain_openai/chat_models/base.py:623, in BaseChatOpenAI._agenerate(self, messages, stop, run_manager, **kwargs)
621 message_dicts, params = self._create_message_dicts(messages, stop)
622 params = {**params, **kwargs}
--> 623 response = await self.async_client.create(messages=message_dicts, **params)
624 return self._create_chat_result(response)
File /opt/conda/envs/langfair/lib/python3.9/site-packages/openai/resources/chat/completions.py:1633, in AsyncCompletions.create(self, messages, model, audio, frequency_penalty, function_call, functions, logit_bias, logprobs, max_completion_tokens, max_tokens, metadata, modalities, n, parallel_tool_calls, presence_penalty, response_format, seed, service_tier, stop, store, stream, stream_options, temperature, tool_choice, tools, top_logprobs, top_p, user, extra_headers, extra_query, extra_body, timeout)
1593 @required_args(["messages", "model"], ["messages", "model", "stream"])
1594 async def create(
1595 self,
(...)
1630 timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
1631 ) -> ChatCompletion | AsyncStream[ChatCompletionChunk]:
1632 validate_response_format(response_format)
-> 1633 return await self._post(
1634 "/chat/completions",
1635 body=await async_maybe_transform(
1636 {
1637 "messages": messages,
1638 "model": model,
1639 "audio": audio,
1640 "frequency_penalty": frequency_penalty,
1641 "function_call": function_call,
1642 "functions": functions,
1643 "logit_bias": logit_bias,
1644 "logprobs": logprobs,
1645 "max_completion_tokens": max_completion_tokens,
1646 "max_tokens": max_tokens,
1647 "metadata": metadata,
1648 "modalities": modalities,
1649 "n": n,
1650 "parallel_tool_calls": parallel_tool_calls,
1651 "presence_penalty": presence_penalty,
1652 "response_format": response_format,
1653 "seed": seed,
1654 "service_tier": service_tier,
1655 "stop": stop,
1656 "store": store,
1657 "stream": stream,
1658 "stream_options": stream_options,
1659 "temperature": temperature,
1660 "tool_choice": tool_choice,
1661 "tools": tools,
1662 "top_logprobs": top_logprobs,
1663 "top_p": top_p,
1664 "user": user,
1665 },
1666 completion_create_params.CompletionCreateParams,
1667 ),
1668 options=make_request_options(
1669 extra_headers=extra_headers, extra_query=extra_query, extra_body=extra_body, timeout=timeout
1670 ),
1671 cast_to=ChatCompletion,
1672 stream=stream or False,
1673 stream_cls=AsyncStream[ChatCompletionChunk],
1674 )
File /opt/conda/envs/langfair/lib/python3.9/site-packages/openai/_base_client.py:1838, in AsyncAPIClient.post(self, path, cast_to, body, files, options, stream, stream_cls)
1824 async def post(
1825 self,
1826 path: str,
(...)
1833 stream_cls: type[_AsyncStreamT] | None = None,
1834 ) -> ResponseT | _AsyncStreamT:
1835 opts = FinalRequestOptions.construct(
1836 method="post", url=path, json_data=body, files=await async_to_httpx_files(files), **options
1837 )
-> 1838 return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
File /opt/conda/envs/langfair/lib/python3.9/site-packages/openai/_base_client.py:1532, in AsyncAPIClient.request(self, cast_to, options, stream, stream_cls, remaining_retries)
1529 else:
1530 retries_taken = 0
-> 1532 return await self._request(
1533 cast_to=cast_to,
1534 options=options,
1535 stream=stream,
1536 stream_cls=stream_cls,
1537 retries_taken=retries_taken,
1538 )
File /opt/conda/envs/langfair/lib/python3.9/site-packages/openai/_base_client.py:1618, in AsyncAPIClient._request(self, cast_to, options, stream, stream_cls, retries_taken)
1616 if remaining_retries > 0 and self._should_retry(err.response):
1617 await err.response.aclose()
-> 1618 return await self._retry_request(
1619 input_options,
1620 cast_to,
1621 retries_taken=retries_taken,
1622 response_headers=err.response.headers,
1623 stream=stream,
1624 stream_cls=stream_cls,
1625 )
1627 # If the response is streamed then we need to explicitly read the response
1628 # to completion before attempting to access the response text.
1629 if not err.response.is_closed:
File /opt/conda/envs/langfair/lib/python3.9/site-packages/openai/_base_client.py:1665, in AsyncAPIClient._retry_request(self, options, cast_to, retries_taken, response_headers, stream, stream_cls)
1661 log.info("Retrying request to %s in %f seconds", options.url, timeout)
1663 await anyio.sleep(timeout)
-> 1665 return await self._request(
1666 options=options,
1667 cast_to=cast_to,
1668 retries_taken=retries_taken + 1,
1669 stream=stream,
1670 stream_cls=stream_cls,
1671 )
File /opt/conda/envs/langfair/lib/python3.9/site-packages/openai/_base_client.py:1618, in AsyncAPIClient._request(self, cast_to, options, stream, stream_cls, retries_taken)
1616 if remaining_retries > 0 and self._should_retry(err.response):
1617 await err.response.aclose()
-> 1618 return await self._retry_request(
1619 input_options,
1620 cast_to,
1621 retries_taken=retries_taken,
1622 response_headers=err.response.headers,
1623 stream=stream,
1624 stream_cls=stream_cls,
1625 )
1627 # If the response is streamed then we need to explicitly read the response
1628 # to completion before attempting to access the response text.
1629 if not err.response.is_closed:
File /opt/conda/envs/langfair/lib/python3.9/site-packages/openai/_base_client.py:1665, in AsyncAPIClient._retry_request(self, options, cast_to, retries_taken, response_headers, stream, stream_cls)
1661 log.info("Retrying request to %s in %f seconds", options.url, timeout)
1663 await anyio.sleep(timeout)
-> 1665 return await self._request(
1666 options=options,
1667 cast_to=cast_to,
1668 retries_taken=retries_taken + 1,
1669 stream=stream,
1670 stream_cls=stream_cls,
1671 )
File /opt/conda/envs/langfair/lib/python3.9/site-packages/openai/_base_client.py:1633, in AsyncAPIClient._request(self, cast_to, options, stream, stream_cls, retries_taken)
1630 await err.response.aread()
1632 log.debug("Re-raising status error")
-> 1633 raise self._make_status_error_from_response(err.response) from None
1635 return await self._process_response(
1636 cast_to=cast_to,
1637 options=options,
(...)
1641 retries_taken=retries_taken,
1642 )
RateLimitError: Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 36 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}
To handle this error, we can use InMemoryRateLimiter
to limit the number of requests per minute.
[10]:
from langchain_core.rate_limiters import InMemoryRateLimiter
rate_limiter = InMemoryRateLimiter(
requests_per_second=5,
check_every_n_seconds=5,
max_bucket_size=500,
)
llm = AzureChatOpenAI(
deployment_name=DEPLOYMENT_NAME,
openai_api_key=API_KEY,
azure_endpoint=API_BASE,
openai_api_type=API_TYPE,
openai_api_version=API_VERSION,
temperature=1, # User to set temperature
rate_limiter=rate_limiter
)
rg_limited = ResponseGenerator(langchain_llm=llm)
[11]:
responses = await rg_limited.generate_responses(prompts=prompts)
Generating 25 responses per prompt...
Responses successfully generated!
[13]:
pd.DataFrame(responses['data'])
[13]:
prompt | response | |
---|---|---|
0 | You are required to keep generation given the ... | Hosaka in his novel "Generation X." It refers ... |
1 | You are required to keep generation given the ... | The term was coined by the author Maki to desc... |
2 | You are required to keep generation given the ... | The term was coined by the author Maki [genera... |
3 | You are required to keep generation given the ... | The term was coined by the author Maki for the... |
4 | You are required to keep generation given the ... | The term was coined by the author Maki to desc... |
... | ... | ... |
24995 | You are required to keep generation given the ... | discuss the importance of preserving and hand ... |
24996 | You are required to keep generation given the ... | discuss the influence of music on Dave's caree... |
24997 | You are required to keep generation given the ... | discuss the evolution of music in the 1970s. T... |
24998 | You are required to keep generation given the ... | Close delve into the captivating world of musi... |
24999 | You are required to keep generation given the ... | explore the vast influence of legendary musici... |
25000 rows × 2 columns