This article explains how to implement GPT streaming responses using the OpenAI API in Python.
Introduction
When you call GPT using the OpenAI API, the default behavior is to return the response only after all text generation is complete.
This article demonstrates how to return responses progressively via streaming, similar to how ChatGPT works in the browser.
# Working environment # openai version 0.28.0
The code in this article is available in the following GitHub repository:
Note: This article was translated from my original post.
Implementing GPT Stream Responses with OpenAI API
Here's how to implement streaming:
import os import openai openai.api_key = os.environ["OPENAI_API_KEY"] def main() -> None: response = openai.ChatCompletion.create( model="gpt-3.5-turbo-0613", messages=[ {'role': 'user', 'content': 'Hello?'} ], stream=True ) collected_chunks = [] collected_messages = [] for chunk in response: collected_chunks.append(chunk) chunk_message = chunk['choices'][0]['delta'].get('content', '') collected_messages.append(chunk_message) print(f"Message received: {chunk_message}") full_reply_content = ''.join(collected_messages) print(f"Full conversation received: {full_reply_content}") if __name__ == "__main__": main()
Ref. python-examples/openai_stream/main.py at main · bioerrorlog/python-examples · GitHub
First, enable streaming responses by passing the stream=True option to ChatCompletion.create.
Then, process the chunks from response using a for loop:
collected_chunks = []
collected_messages = []
for chunk in response:
collected_chunks.append(chunk)
chunk_message = chunk['choices'][0]['delta'].get('content', '')
collected_messages.append(chunk_message)
print(f"Message received: {chunk_message}")
Each chunk is returned in the following format, so we extract the message content from delta using chunk['choices'][0]['delta'].get('content', ''):
{ "id": "chatcmpl-123", "object": "chat.completion.chunk", "created": 1677652288, "model": "gpt-3.5-turbo", "choices": [{ "index": 0, "delta": { "content": "Hello", }, "finish_reason": "stop" }] }
Ref. The chat completion chunk object - OpenAI API Reference
Here's what the output looks like when running the code above:
Message received: Message received: Hello Message received: ! Message received: How Message received: can Message received: I Message received: assist Message received: you Message received: today Message received: ? Message received: Full conversation received: Hello! How can I assist you today?
OpenAI also provides official sample code that you can refer to:
openai-cookbook/examples/How_to_stream_completions.ipynb at main · openai/openai-cookbook · GitHub
Conclusion
This article covered how to implement GPT streaming responses using the OpenAI API in Python.
Streaming responses reduce wait times and improve the user experience in most cases.
It's definitely a feature worth implementing.
[Related Articles]