https://en.bioerrorlog.work/entry/open-ai-gpt-stream

This article explains how to implement GPT streaming responses using the OpenAI API in Python.

Introduction

When you call GPT using the OpenAI API, the default behavior is to return the response only after all text generation is complete.

This article demonstrates how to return responses progressively via streaming, similar to how ChatGPT works in the browser.

# Working environment
# openai version
0.28.0

The code in this article is available in the following GitHub repository:

github.com

Note: This article was translated from my original post.

Implementing GPT Stream Responses with OpenAI API

Here's how to implement streaming:

import os
import openai

openai.api_key = os.environ["OPENAI_API_KEY"]


def main() -> None:
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-0613",
        messages=[
            {'role': 'user', 'content': 'Hello?'}
        ],
        stream=True
    )

    collected_chunks = []
    collected_messages = []
    for chunk in response:
        collected_chunks.append(chunk)
        chunk_message = chunk['choices'][0]['delta'].get('content', '')
        collected_messages.append(chunk_message)
        print(f"Message received: {chunk_message}")

    full_reply_content = ''.join(collected_messages)
    print(f"Full conversation received: {full_reply_content}")


if __name__ == "__main__":
    main()

Ref. python-examples/openai_stream/main.py at main · bioerrorlog/python-examples · GitHub

First, enable streaming responses by passing the stream=True option to ChatCompletion.create.

Then, process the chunks from response using a for loop:

    collected_chunks = []
    collected_messages = []
    for chunk in response:
        collected_chunks.append(chunk)
        chunk_message = chunk['choices'][0]['delta'].get('content', '')
        collected_messages.append(chunk_message)
        print(f"Message received: {chunk_message}")

Each chunk is returned in the following format, so we extract the message content from delta using chunk['choices'][0]['delta'].get('content', ''):

{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1677652288,
  "model": "gpt-3.5-turbo",
  "choices": [{
    "index": 0,
    "delta": {
      "content": "Hello",
    },
    "finish_reason": "stop"
  }]
}

Ref. The chat completion chunk object - OpenAI API Reference

Here's what the output looks like when running the code above:

Message received: 
Message received: Hello
Message received: !
Message received:  How
Message received:  can
Message received:  I
Message received:  assist
Message received:  you
Message received:  today
Message received: ?
Message received: 
Full conversation received: Hello! How can I assist you today?

OpenAI also provides official sample code that you can refer to: openai-cookbook/examples/How_to_stream_completions.ipynb at main · openai/openai-cookbook · GitHub