Gonka API Reference

Complete API reference for GonkaGate. OpenAI-compatible endpoints with full parameter documentation.

Back to Quickstart

Base URL

All API requests should be made to:

Base URL:https://api.gonkagate.com/v1

Try in Playground

Authentication

Authenticate your API requests using a Bearer token in the Authorization header.

HTTP Header

Authorization: Bearer YOUR_API_KEY

Get your API key Learn more about authentication

Endpoints

Create Chat Completion

POST/chat/completions

Creates a completion for a chat conversation.

Request Body

Parameter	Type	Required	Default	Description
`model`	string	Yes	—	Model ID (e.g., Qwen/Qwen3-32B-FP8)
`messages`	array	Yes	—	Array of message objects with role and content
`stream`	boolean	No	false	Enable streaming responses
`temperature`	number	No	1.0	Sampling temperature (0-2)
`max_tokens`	integer	No	4096	Maximum tokens in response
`top_p`	number	No	1.0	Nucleus sampling threshold
`stop`	string \| array	No	null	Stop sequences

model

Type: string
Required: Yes
Default: —
Description: Model ID (e.g., Qwen/Qwen3-32B-FP8)

messages

Type: array
Required: Yes
Default: —
Description: Array of message objects with role and content

stream

Type: boolean
Required: No
Default: false
Description: Enable streaming responses

temperature

Type: number
Required: No
Default: 1.0
Description: Sampling temperature (0-2)

max_tokens

Type: integer
Required: No
Default: 4096
Description: Maximum tokens in response

top_p

Type: number
Required: No
Default: 1.0
Description: Nucleus sampling threshold

stop

Type: string | array
Required: No
Default: null
Description: Stop sequences

Response

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1735500000,
  "model": "Qwen/Qwen3-32B-FP8",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 10,
    "total_tokens": 35,
    "base_cost_usd": 0.000175,
    "platform_fee_usd": 0.0000175,
    "total_cost_usd": 0.0001925
  }
}

Code Examples

chat_completions.py

from openai import OpenAI

client = OpenAI(
    base_url="https://api.gonkagate.com/v1",
    api_key="your-gonkagate-api-key"
)

response = client.chat.completions.create(
    model="Qwen/Qwen3-32B-FP8",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    temperature=0.7,
    max_tokens=1000
)

print(response.choices[0].message.content)
print(f"Usage: {response.usage.total_tokens} tokens")

List Models

GET/models

Lists the available models.

Response

json

{
  "data": [
    {
      "id": "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
      "name": "Qwen3 235B A22B Instruct 2507 FP8",
      "description": "A powerful 235B parameter model for complex reasoning tasks.",
      "context_length": 131072,
      "pricing": {
        "prompt": "0.000000350",
        "completion": "0.000000350"
      }
    },
    {
      "id": "Qwen/QwQ-32B",
      "name": "QwQ 32B",
      "description": "Compact reasoning model with strong logic capabilities.",
      "context_length": 32768,
      "pricing": {
        "prompt": "0.000000580",
        "completion": "0.000000580"
      }
    }
  ]
}

Code Examples

list_models.py

import requests

response = requests.get(
    "https://api.gonkagate.com/v1/models",
    headers={"Authorization": "Bearer your-gonkagate-api-key"}
)

models = response.json()["data"]

for model in models:
    price_per_m = float(model["pricing"]["prompt"]) * 1_000_000
    print(f"{model['name']}: ${price_per_m:.2f}/1M tokens")

View live model list

Parameters

Complete reference for all request parameters.

OpenAI Compatibility

GonkaGate supports all standard OpenAI Chat Completions parameters. OpenAI API Reference

Core Parameters

Required and commonly used parameters for chat completions.

Parameter	Type	Required	Default	Description
`model`	string	Yes	—	Model ID to use for completion The model ID specifies which Gonka Network model to use. Example: `Qwen/Qwen3-32B-FP8`.
`messages`	array	Yes	—	Array of message objects with `role` and `content` A list of messages comprising the conversation. Each message has a `role` (system, user, assistant, or tool) and `content`. See Messages Schema below for full structure.
`stream`	boolean	No	false	Enable streaming responses If true, partial message deltas are sent as Server-Sent Events (SSE). Tokens are sent as they become available.
`max_tokens`	integer	No	4096	Maximum tokens in response The maximum number of tokens to generate. The model will stop when this limit is reached. Constraints: 1 to context window size

model

Type: string
Required: Yes
Default: —
Description: Model ID to use for completion
The model ID specifies which Gonka Network model to use. Example: `Qwen/Qwen3-32B-FP8`.

messages

Type: array
Required: Yes
Default: —
Description: Array of message objects with `role` and `content`
A list of messages comprising the conversation. Each message has a `role` (system, user, assistant, or tool) and `content`. See Messages Schema below for full structure.

stream

Type: boolean
Required: No
Default: false
Description: Enable streaming responses
If true, partial message deltas are sent as Server-Sent Events (SSE). Tokens are sent as they become available.

max_tokens

Type: integer
Required: No
Default: 4096
Description: Maximum tokens in response
The maximum number of tokens to generate. The model will stop when this limit is reached.
Constraints: 1 to context window size

Sampling Parameters

Control randomness and creativity of model outputs.

Advanced Parameters

Additional options for JSON mode, tools, and reproducibility.

Messages Array Schema

Structure of the messages array parameter. Each message is an object with role and content.

Field	Type	Required	Description
`role`	string	Yes	The role of the message author Valid values: `system`, `user`, `assistant`, `developer`
`content`	string \| null	Yes	The message content. Can be null for assistant messages with tool_calls.
`name`	string	No	Optional name for the participant. Useful for multi-agent conversations.
`tool_calls`	array	No	Tool calls made by the assistant. Only present in assistant messages.
`tool_call_id`	string	No	ID of the tool call this message is responding to. Required for tool role.

role

Type: string
Required: Yes
Description: The role of the message author
Valid values: system, user, assistant, developer

content

Type: string | null
Required: Yes
Description: The message content. Can be null for assistant messages with tool_calls.

name

Type: string
Required: No
Description: Optional name for the participant. Useful for multi-agent conversations.

tool_calls

Type: array
Required: No
Description: Tool calls made by the assistant. Only present in assistant messages.

tool_call_id

Type: string
Required: No
Description: ID of the tool call this message is responding to. Required for tool role.

Examples

json

[
  {
    "role": "user",
    "content": "Hello!"
  }
]

Response Schemas

Structure of API responses.

Chat Completion Response

Field	Type	Description
`id`	string	Unique identifier for the completion
`object`	string	Always "chat.completion"
`created`	integer	Unix timestamp when created
`model`	string	Model used for the completion
`choices`	array	Array of completion choices
`index`	integer	Index of this choice
`message`	object	The generated message
`role`	string	Always "assistant"
`content`	string \| null	The generated text
`tool_calls`	array	Tool calls (if any)
`finish_reason`	string	"stop", "length", "tool_calls", or "content_filter"
`usage`	object	Token usage and cost statistics
`prompt_tokens`	integer	Tokens in the input prompt
`completion_tokens`	integer	Tokens in the response
`total_tokens`	integer	Total tokens used
`base_cost_usd`	number	Base cost before platform fee (USD)
`platform_fee_usd`	number	Platform fee — 10% of base cost (USD)
`total_cost_usd`	number	Total cost deducted from balance (USD)

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1735500000,
  "model": "Qwen/Qwen3-32B-FP8",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 10,
    "total_tokens": 35,
    "base_cost_usd": 0.000175,
    "platform_fee_usd": 0.0000175,
    "total_cost_usd": 0.0001925
  }
}

Streaming Response

When stream: true, responses are sent as Server-Sent Events (SSE).

Chunk Format

Field	Type	Description
`id`	string	Same ID for all chunks in a stream
`object`	string	Always "chat.completion.chunk"
`created`	integer	Unix timestamp
`model`	string	Model used
`choices`	array	Array with incremental content
`index`	integer	Choice index
`delta`	object	Incremental content
`role`	string	Role (first chunk only)
`content`	string	Text content to append
`finish_reason`	string \| null	null until final chunk, then "stop"
`usage`	object \| undefined	Token usage and cost (only in final chunk before [DONE])
`prompt_tokens`	integer	Tokens in the input prompt
`completion_tokens`	integer	Tokens in the response
`total_tokens`	integer	Total tokens used
`base_cost_usd`	number	Base cost before platform fee (USD)
`platform_fee_usd`	number	Platform fee — 10% of base cost (USD)
`total_cost_usd`	number	Total cost deducted from balance (USD)

json

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1735500000,"model":"Qwen/Qwen3-32B-FP8","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1735500000,"model":"Qwen/Qwen3-32B-FP8","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1735500000,"model":"Qwen/Qwen3-32B-FP8","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1735500000,"model":"Qwen/Qwen3-32B-FP8","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":5,"total_tokens":15,"base_cost_usd":0.000075,"platform_fee_usd":0.0000075,"total_cost_usd":0.0000825}}

data: [DONE]

data: [DONE]

Indicates the stream has completed. No more chunks will be sent.

Models List Response

Field	Type	Description
`data`	array	Array of model objects
`id`	string	Model identifier for API requests
`name`	string	Human-readable model name
`description`	string \| null	Model description (from HuggingFace)
`context_length`	number \| null	Maximum context window in tokens
`pricing`	object	Cost per token in USD
`prompt`	string	Cost per input token in USD
`completion`	string	Cost per output token in USD

json

{
  "data": [
    {
      "id": "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
      "name": "Qwen3 235B A22B Instruct 2507 FP8",
      "description": "A powerful 235B parameter model for complex reasoning tasks.",
      "context_length": 131072,
      "pricing": {
        "prompt": "0.000000350",
        "completion": "0.000000350"
      }
    },
    {
      "id": "Qwen/QwQ-32B",
      "name": "QwQ 32B",
      "description": "Compact reasoning model with strong logic capabilities.",
      "context_length": 32768,
      "pricing": {
        "prompt": "0.000000580",
        "completion": "0.000000580"
      }
    }
  ]
}

Error Codes

GonkaGate uses standard HTTP status codes and OpenAI-compatible error responses. Click any error to see causes, resolutions, and example response.

HTTP Status Codes

All possible API error responses with their meanings.

StatusTypeCodeDescription

Error Handling

Implement retry logic with exponential backoff for rate limits and server errors. Client errors (4xx except 429) should not be retried.

error_handling.py

import openai
import time

client = openai.OpenAI(
    api_key="your-gonkagate-api-key",
    base_url="https://api.gonkagate.com/v1"
)

def chat_with_retry(messages, max_retries=3):
    """Make API request with exponential backoff retry logic."""
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="Qwen/Qwen3-32B-FP8",
                messages=messages
            )
            return response.choices[0].message.content

        except openai.RateLimitError as e:
            # 429: Wait and retry with exponential backoff
            wait_time = 2 ** attempt
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)

        except openai.AuthenticationError as e:
            # 401: Invalid API key - don't retry
            print(f"Auth error: {e.message}")
            raise

        except openai.BadRequestError as e:
            # 400: Invalid request - don't retry
            print(f"Bad request: {e.message}")
            raise

        except openai.APIStatusError as e:
            # 5xx: Server error - retry with backoff
            if e.status_code >= 500:
                wait_time = 2 ** attempt
                print(f"Server error {e.status_code}. Retrying in {wait_time}s...")
                time.sleep(wait_time)
            else:
                raise

    raise Exception("Max retries exceeded")

# Usage
result = chat_with_retry([{"role": "user", "content": "Hello!"}])
print(result)

Rate Limits

API rate limits are applied per IP address to ensure fair usage.

Limit	Value	Scope
Requests per minute	100	Per IP
Tokens per minute	100,000	Per IP
Requests per day	10,000	Per IP

When rate limited, check the Retry-After header for wait time. Implement exponential backoff for retries.

Low Balance Warning

When your balance falls below $5, you will see a warning in the Dashboard. Consider adding funds to avoid service interruption.