Skip to main content

Streaming Responses

Get real-time responses as the model generates tokens.

Enabling Streaming

Add stream: true to your request to enable streaming responses:

request.json
{
  "model": "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
  "messages": [{ "role": "user", "content": "Hello!" }],
  "stream": true
}

SSE Response Format

Streaming responses use Server-Sent Events (SSE). Each chunk is a JSON object prefixed with 'data: ':

sse-response.txt
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":5,"total_tokens":15,"base_cost_usd":0.000075,"platform_fee_usd":0.0000075,"total_cost_usd":0.0000825}}

data: [DONE]

Chunk Structure

Each streaming chunk has the following structure:

chunk-types.ts
interface ChatCompletionChunk {
  id: string;
  object: "chat.completion.chunk";
  created: number;
  model: string;
  choices: [{
    index: number;
    delta: {
      role?: "assistant";
      content?: string;
    };
    finish_reason: "stop" | null;
  }];
  usage?: {
    prompt_tokens: number;
    completion_tokens: number;
    total_tokens: number;
    base_cost_usd: number;
    platform_fee_usd: number;
    total_cost_usd: number;
  };  // Only in final chunk
}

Code Examples

Examples for streaming in different environments:

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.gonkagate.com/v1",
    api_key="gp-your-api-key"
)

stream = client.chat.completions.create(
    model="Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
    messages=[{"role": "user", "content": "Write a poem"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

    # Final chunk contains usage info
    if chunk.usage:
        print(f"\n\nCost: ${chunk.usage.total_cost_usd:.6f}")

Cost Tracking in Streams

The final chunk includes full usage and cost breakdown:

Check the final chunk Usage with total_cost_usd is only available in the last chunk before [DONE].

cost-tracking.ts
// Track costs from streaming response
let totalCost = 0;

for await (const chunk of stream) {
  // ... handle content ...

  if (chunk.usage) {
    totalCost = chunk.usage.total_cost_usd;
    console.log(`Request cost: $${totalCost.toFixed(6)}`);
    console.log(`Breakdown: base $${chunk.usage.base_cost_usd.toFixed(6)} + fee $${chunk.usage.platform_fee_usd.toFixed(6)}`);
  }
}

Error Handling in Streams

Handle connection issues gracefully when streaming:

error-handling.ts
async function streamWithErrorHandling(messages: Message[]) {
  try {
    const stream = await client.chat.completions.create({
      model: "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
      messages,
      stream: true,
    });

    for await (const chunk of stream) {
      // Process chunk...
    }
  } catch (error) {
    if (error.code === "ECONNRESET" || error.code === "ETIMEDOUT") {
      // Connection was dropped
      console.error("Connection lost. Consider retrying.");
    } else if (error.status === 402) {
      // Insufficient balance
      showDepositModal();
    } else {
      throw error;
    }
  }
}
Was this page helpful?