Streaming Responses
Get real-time responses as the model generates tokens.
Enabling Streaming
Add stream: true to your request to enable streaming responses:
request.json
{
"model": "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
"messages": [{ "role": "user", "content": "Hello!" }],
"stream": true
}SSE Response Format
Streaming responses use Server-Sent Events (SSE). Each chunk is a JSON object prefixed with 'data: ':
sse-response.txt
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":5,"total_tokens":15,"base_cost_usd":0.000075,"platform_fee_usd":0.0000075,"total_cost_usd":0.0000825}}
data: [DONE]Usage in Final Chunk
Cost and usage information is included in the final chunk before [DONE].
Chunk Structure
Each streaming chunk has the following structure:
chunk-types.ts
interface ChatCompletionChunk {
id: string;
object: "chat.completion.chunk";
created: number;
model: string;
choices: [{
index: number;
delta: {
role?: "assistant";
content?: string;
};
finish_reason: "stop" | null;
}];
usage?: {
prompt_tokens: number;
completion_tokens: number;
total_tokens: number;
base_cost_usd: number;
platform_fee_usd: number;
total_cost_usd: number;
}; // Only in final chunk
}Code Examples
Examples for streaming in different environments:
python
from openai import OpenAI
client = OpenAI(
base_url="https://api.gonkagate.com/v1",
api_key="gp-your-api-key"
)
stream = client.chat.completions.create(
model="Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
messages=[{"role": "user", "content": "Write a poem"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
# Final chunk contains usage info
if chunk.usage:
print(f"\n\nCost: ${chunk.usage.total_cost_usd:.6f}")Cost Tracking in Streams
The final chunk includes full usage and cost breakdown:
Check the final chunk — Usage with total_cost_usd is only available in the last chunk before [DONE].
cost-tracking.ts
// Track costs from streaming response
let totalCost = 0;
for await (const chunk of stream) {
// ... handle content ...
if (chunk.usage) {
totalCost = chunk.usage.total_cost_usd;
console.log(`Request cost: $${totalCost.toFixed(6)}`);
console.log(`Breakdown: base $${chunk.usage.base_cost_usd.toFixed(6)} + fee $${chunk.usage.platform_fee_usd.toFixed(6)}`);
}
}Error Handling in Streams
Handle connection issues gracefully when streaming:
Connection Drops
Network issues can interrupt streams. Implement reconnection logic for production use.
error-handling.ts
async function streamWithErrorHandling(messages: Message[]) {
try {
const stream = await client.chat.completions.create({
model: "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
messages,
stream: true,
});
for await (const chunk of stream) {
// Process chunk...
}
} catch (error) {
if (error.code === "ECONNRESET" || error.code === "ETIMEDOUT") {
// Connection was dropped
console.error("Connection lost. Consider retrying.");
} else if (error.status === 402) {
// Insufficient balance
showDepositModal();
} else {
throw error;
}
}
}