Gonka API Reference
Complete API reference for GonkaGate. OpenAI-compatible endpoints with full parameter documentation.
Base URL
All API requests should be made to:
https://api.gonkagate.com/v1Authentication
Authenticate your API requests using a Bearer token in the Authorization header.
Authorization: Bearer YOUR_API_KEYEndpoints
Create Chat Completion
/chat/completionsCreates a completion for a chat conversation.
Request Body
model- Type
- string
- Required
- Yes
- Default
- —
- Description
- Model ID (e.g., Qwen/Qwen3-32B-FP8)
messages- Type
- array
- Required
- Yes
- Default
- —
- Description
- Array of message objects with role and content
stream- Type
- boolean
- Required
- No
- Default
- false
- Description
- Enable streaming responses
temperature- Type
- number
- Required
- No
- Default
- 1.0
- Description
- Sampling temperature (0-2)
max_tokens- Type
- integer
- Required
- No
- Default
- 4096
- Description
- Maximum tokens in response
top_p- Type
- number
- Required
- No
- Default
- 1.0
- Description
- Nucleus sampling threshold
stop- Type
- string | array
- Required
- No
- Default
- null
- Description
- Stop sequences
Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1735500000,
"model": "Qwen/Qwen3-32B-FP8",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 10,
"total_tokens": 35,
"base_cost_usd": 0.000175,
"platform_fee_usd": 0.0000175,
"total_cost_usd": 0.0001925
}
}Code Examples
from openai import OpenAI
client = OpenAI(
base_url="https://api.gonkagate.com/v1",
api_key="your-gonkagate-api-key"
)
response = client.chat.completions.create(
model="Qwen/Qwen3-32B-FP8",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
temperature=0.7,
max_tokens=1000
)
print(response.choices[0].message.content)
print(f"Usage: {response.usage.total_tokens} tokens")List Models
/modelsLists the available models.
Response
{
"data": [
{
"id": "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
"name": "Qwen3 235B A22B Instruct 2507 FP8",
"description": "A powerful 235B parameter model for complex reasoning tasks.",
"context_length": 131072,
"pricing": {
"prompt": "0.000000350",
"completion": "0.000000350"
}
},
{
"id": "Qwen/QwQ-32B",
"name": "QwQ 32B",
"description": "Compact reasoning model with strong logic capabilities.",
"context_length": 32768,
"pricing": {
"prompt": "0.000000580",
"completion": "0.000000580"
}
}
]
}Code Examples
import requests
response = requests.get(
"https://api.gonkagate.com/v1/models",
headers={"Authorization": "Bearer your-gonkagate-api-key"}
)
models = response.json()["data"]
for model in models:
price_per_m = float(model["pricing"]["prompt"]) * 1_000_000
print(f"{model['name']}: ${price_per_m:.2f}/1M tokens")Parameters
Complete reference for all request parameters.
OpenAI Compatibility
model- Type
- string
- Required
- Yes
- Default
- —
- Description
Model ID to use for completion
The model ID specifies which Gonka Network model to use. Example: `Qwen/Qwen3-32B-FP8`.
messages- Type
- array
- Required
- Yes
- Default
- —
- Description
Array of message objects with `role` and `content`
A list of messages comprising the conversation. Each message has a `role` (system, user, assistant, or tool) and `content`. See Messages Schema below for full structure.
stream- Type
- boolean
- Required
- No
- Default
- false
- Description
Enable streaming responses
If true, partial message deltas are sent as Server-Sent Events (SSE). Tokens are sent as they become available.
max_tokens- Type
- integer
- Required
- No
- Default
- 4096
- Description
Maximum tokens in response
The maximum number of tokens to generate. The model will stop when this limit is reached.
Constraints: 1 to context window size
Messages Array Schema
Messages Array Schema
Structure of the messages array parameter. Each message is an object with role and content.
role- Type
- string
- Required
- Yes
- Description
The role of the message author
Valid values:
system,user,assistant,developer
content- Type
- string | null
- Required
- Yes
- Description
The message content. Can be null for assistant messages with tool_calls.
name- Type
- string
- Required
- No
- Description
Optional name for the participant. Useful for multi-agent conversations.
tool_calls- Type
- array
- Required
- No
- Description
Tool calls made by the assistant. Only present in assistant messages.
tool_call_id- Type
- string
- Required
- No
- Description
ID of the tool call this message is responding to. Required for tool role.
Examples
[
{
"role": "user",
"content": "Hello!"
}
]Response Schemas
Structure of API responses.
Chat Completion Response
Chat Completion Response
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier for the completion |
object | string | Always "chat.completion" |
created | integer | Unix timestamp when created |
model | string | Model used for the completion |
choices | array | Array of completion choices |
index | integer | Index of this choice |
message | object | The generated message |
role | string | Always "assistant" |
content | string | null | The generated text |
tool_calls | array | Tool calls (if any) |
finish_reason | string | "stop", "length", "tool_calls", or "content_filter" |
usage | object | Token usage and cost statistics |
prompt_tokens | integer | Tokens in the input prompt |
completion_tokens | integer | Tokens in the response |
total_tokens | integer | Total tokens used |
base_cost_usd | number | Base cost before platform fee (USD) |
platform_fee_usd | number | Platform fee — 10% of base cost (USD) |
total_cost_usd | number | Total cost deducted from balance (USD) |
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1735500000,
"model": "Qwen/Qwen3-32B-FP8",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 10,
"total_tokens": 35,
"base_cost_usd": 0.000175,
"platform_fee_usd": 0.0000175,
"total_cost_usd": 0.0001925
}
}Streaming Response
When stream: true, responses are sent as Server-Sent Events (SSE).
Chunk Format
| Field | Type | Description |
|---|---|---|
id | string | Same ID for all chunks in a stream |
object | string | Always "chat.completion.chunk" |
created | integer | Unix timestamp |
model | string | Model used |
choices | array | Array with incremental content |
index | integer | Choice index |
delta | object | Incremental content |
role | string | Role (first chunk only) |
content | string | Text content to append |
finish_reason | string | null | null until final chunk, then "stop" |
usage | object | undefined | Token usage and cost (only in final chunk before [DONE]) |
prompt_tokens | integer | Tokens in the input prompt |
completion_tokens | integer | Tokens in the response |
total_tokens | integer | Total tokens used |
base_cost_usd | number | Base cost before platform fee (USD) |
platform_fee_usd | number | Platform fee — 10% of base cost (USD) |
total_cost_usd | number | Total cost deducted from balance (USD) |
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1735500000,"model":"Qwen/Qwen3-32B-FP8","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1735500000,"model":"Qwen/Qwen3-32B-FP8","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1735500000,"model":"Qwen/Qwen3-32B-FP8","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1735500000,"model":"Qwen/Qwen3-32B-FP8","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":5,"total_tokens":15,"base_cost_usd":0.000075,"platform_fee_usd":0.0000075,"total_cost_usd":0.0000825}}
data: [DONE]Models List Response
Models List Response
| Field | Type | Description |
|---|---|---|
data | array | Array of model objects |
id | string | Model identifier for API requests |
name | string | Human-readable model name |
description | string | null | Model description (from HuggingFace) |
context_length | number | null | Maximum context window in tokens |
pricing | object | Cost per token in USD |
prompt | string | Cost per input token in USD |
completion | string | Cost per output token in USD |
{
"data": [
{
"id": "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
"name": "Qwen3 235B A22B Instruct 2507 FP8",
"description": "A powerful 235B parameter model for complex reasoning tasks.",
"context_length": 131072,
"pricing": {
"prompt": "0.000000350",
"completion": "0.000000350"
}
},
{
"id": "Qwen/QwQ-32B",
"name": "QwQ 32B",
"description": "Compact reasoning model with strong logic capabilities.",
"context_length": 32768,
"pricing": {
"prompt": "0.000000580",
"completion": "0.000000580"
}
}
]
}Error Codes
GonkaGate uses standard HTTP status codes and OpenAI-compatible error responses. Click any error to see causes, resolutions, and example response.
HTTP Status Codes
All possible API error responses with their meanings.
Error Handling
Implement retry logic with exponential backoff for rate limits and server errors. Client errors (4xx except 429) should not be retried.
import openai
import time
client = openai.OpenAI(
api_key="your-gonkagate-api-key",
base_url="https://api.gonkagate.com/v1"
)
def chat_with_retry(messages, max_retries=3):
"""Make API request with exponential backoff retry logic."""
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="Qwen/Qwen3-32B-FP8",
messages=messages
)
return response.choices[0].message.content
except openai.RateLimitError as e:
# 429: Wait and retry with exponential backoff
wait_time = 2 ** attempt
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
except openai.AuthenticationError as e:
# 401: Invalid API key - don't retry
print(f"Auth error: {e.message}")
raise
except openai.BadRequestError as e:
# 400: Invalid request - don't retry
print(f"Bad request: {e.message}")
raise
except openai.APIStatusError as e:
# 5xx: Server error - retry with backoff
if e.status_code >= 500:
wait_time = 2 ** attempt
print(f"Server error {e.status_code}. Retrying in {wait_time}s...")
time.sleep(wait_time)
else:
raise
raise Exception("Max retries exceeded")
# Usage
result = chat_with_retry([{"role": "user", "content": "Hello!"}])
print(result)Rate Limits
API rate limits are applied per IP address to ensure fair usage.
| Limit | Value | Scope |
|---|---|---|
| Requests per minute | 100 | Per IP |
| Tokens per minute | 100,000 | Per IP |
| Requests per day | 10,000 | Per IP |
When rate limited, check the Retry-After header for wait time. Implement exponential backoff for retries.
Low Balance Warning
When your balance falls below $5, you will see a warning in the Dashboard. Consider adding funds to avoid service interruption.