Skip to main content

Gonka API Reference

Complete API reference for GonkaGate. OpenAI-compatible endpoints with full parameter documentation.

Back to Quickstart

Base URL

All API requests should be made to:

https://api.gonkagate.com/v1
Try in Playground

Authentication

Authenticate your API requests using a Bearer token in the Authorization header.

HTTP Header
Authorization: Bearer YOUR_API_KEY

Endpoints

Create Chat Completion

POST/chat/completions

Creates a completion for a chat conversation.

Request Body

model
Type
string
Required
Yes
Default
Description
Model ID (e.g., Qwen/Qwen3-32B-FP8)
messages
Type
array
Required
Yes
Default
Description
Array of message objects with role and content
stream
Type
boolean
Required
No
Default
false
Description
Enable streaming responses
temperature
Type
number
Required
No
Default
1.0
Description
Sampling temperature (0-2)
max_tokens
Type
integer
Required
No
Default
4096
Description
Maximum tokens in response
top_p
Type
number
Required
No
Default
1.0
Description
Nucleus sampling threshold
stop
Type
string | array
Required
No
Default
null
Description
Stop sequences

Response

json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1735500000,
  "model": "Qwen/Qwen3-32B-FP8",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 10,
    "total_tokens": 35,
    "base_cost_usd": 0.000175,
    "platform_fee_usd": 0.0000175,
    "total_cost_usd": 0.0001925
  }
}

Code Examples

chat_completions.py
from openai import OpenAI

client = OpenAI(
    base_url="https://api.gonkagate.com/v1",
    api_key="your-gonkagate-api-key"
)

response = client.chat.completions.create(
    model="Qwen/Qwen3-32B-FP8",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    temperature=0.7,
    max_tokens=1000
)

print(response.choices[0].message.content)
print(f"Usage: {response.usage.total_tokens} tokens")

List Models

GET/models

Lists the available models.

Response

json
{
  "data": [
    {
      "id": "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
      "name": "Qwen3 235B A22B Instruct 2507 FP8",
      "description": "A powerful 235B parameter model for complex reasoning tasks.",
      "context_length": 131072,
      "pricing": {
        "prompt": "0.000000350",
        "completion": "0.000000350"
      }
    },
    {
      "id": "Qwen/QwQ-32B",
      "name": "QwQ 32B",
      "description": "Compact reasoning model with strong logic capabilities.",
      "context_length": 32768,
      "pricing": {
        "prompt": "0.000000580",
        "completion": "0.000000580"
      }
    }
  ]
}

Code Examples

list_models.py
import requests

response = requests.get(
    "https://api.gonkagate.com/v1/models",
    headers={"Authorization": "Bearer your-gonkagate-api-key"}
)

models = response.json()["data"]

for model in models:
    price_per_m = float(model["pricing"]["prompt"]) * 1_000_000
    print(f"{model['name']}: ${price_per_m:.2f}/1M tokens")

View live model list

Parameters

Complete reference for all request parameters.

OpenAI Compatibility

GonkaGate supports all standard OpenAI Chat Completions parameters. OpenAI API Reference
model
Type
string
Required
Yes
Default
Description

Model ID to use for completion

The model ID specifies which Gonka Network model to use. Example: `Qwen/Qwen3-32B-FP8`.

messages
Type
array
Required
Yes
Default
Description

Array of message objects with `role` and `content`

A list of messages comprising the conversation. Each message has a `role` (system, user, assistant, or tool) and `content`. See Messages Schema below for full structure.

stream
Type
boolean
Required
No
Default
false
Description

Enable streaming responses

If true, partial message deltas are sent as Server-Sent Events (SSE). Tokens are sent as they become available.

max_tokens
Type
integer
Required
No
Default
4096
Description

Maximum tokens in response

The maximum number of tokens to generate. The model will stop when this limit is reached.

Constraints: 1 to context window size

Messages Array Schema

Messages Array Schema

Structure of the messages array parameter. Each message is an object with role and content.

role
Type
string
Required
Yes
Description

The role of the message author

Valid values: system, user, assistant, developer

content
Type
string | null
Required
Yes
Description

The message content. Can be null for assistant messages with tool_calls.

name
Type
string
Required
No
Description

Optional name for the participant. Useful for multi-agent conversations.

tool_calls
Type
array
Required
No
Description

Tool calls made by the assistant. Only present in assistant messages.

tool_call_id
Type
string
Required
No
Description

ID of the tool call this message is responding to. Required for tool role.

Examples
json
[
  {
    "role": "user",
    "content": "Hello!"
  }
]

Response Schemas

Structure of API responses.

Chat Completion Response

Chat Completion Response

FieldTypeDescription
idstringUnique identifier for the completion
objectstringAlways "chat.completion"
createdintegerUnix timestamp when created
modelstringModel used for the completion
choicesarrayArray of completion choices
indexintegerIndex of this choice
messageobjectThe generated message
rolestringAlways "assistant"
contentstring | nullThe generated text
tool_callsarrayTool calls (if any)
finish_reasonstring"stop", "length", "tool_calls", or "content_filter"
usageobjectToken usage and cost statistics
prompt_tokensintegerTokens in the input prompt
completion_tokensintegerTokens in the response
total_tokensintegerTotal tokens used
base_cost_usdnumberBase cost before platform fee (USD)
platform_fee_usdnumberPlatform fee — 10% of base cost (USD)
total_cost_usdnumberTotal cost deducted from balance (USD)
json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1735500000,
  "model": "Qwen/Qwen3-32B-FP8",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 10,
    "total_tokens": 35,
    "base_cost_usd": 0.000175,
    "platform_fee_usd": 0.0000175,
    "total_cost_usd": 0.0001925
  }
}

Streaming Response

When stream: true, responses are sent as Server-Sent Events (SSE).

Chunk Format

FieldTypeDescription
idstringSame ID for all chunks in a stream
objectstringAlways "chat.completion.chunk"
createdintegerUnix timestamp
modelstringModel used
choicesarrayArray with incremental content
indexintegerChoice index
deltaobjectIncremental content
rolestringRole (first chunk only)
contentstringText content to append
finish_reasonstring | nullnull until final chunk, then "stop"
usageobject | undefinedToken usage and cost (only in final chunk before [DONE])
prompt_tokensintegerTokens in the input prompt
completion_tokensintegerTokens in the response
total_tokensintegerTotal tokens used
base_cost_usdnumberBase cost before platform fee (USD)
platform_fee_usdnumberPlatform fee — 10% of base cost (USD)
total_cost_usdnumberTotal cost deducted from balance (USD)
json
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1735500000,"model":"Qwen/Qwen3-32B-FP8","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1735500000,"model":"Qwen/Qwen3-32B-FP8","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1735500000,"model":"Qwen/Qwen3-32B-FP8","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1735500000,"model":"Qwen/Qwen3-32B-FP8","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":5,"total_tokens":15,"base_cost_usd":0.000075,"platform_fee_usd":0.0000075,"total_cost_usd":0.0000825}}

data: [DONE]

Models List Response

Models List Response

FieldTypeDescription
dataarrayArray of model objects
idstringModel identifier for API requests
namestringHuman-readable model name
descriptionstring | nullModel description (from HuggingFace)
context_lengthnumber | nullMaximum context window in tokens
pricingobjectCost per token in USD
promptstringCost per input token in USD
completionstringCost per output token in USD
json
{
  "data": [
    {
      "id": "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
      "name": "Qwen3 235B A22B Instruct 2507 FP8",
      "description": "A powerful 235B parameter model for complex reasoning tasks.",
      "context_length": 131072,
      "pricing": {
        "prompt": "0.000000350",
        "completion": "0.000000350"
      }
    },
    {
      "id": "Qwen/QwQ-32B",
      "name": "QwQ 32B",
      "description": "Compact reasoning model with strong logic capabilities.",
      "context_length": 32768,
      "pricing": {
        "prompt": "0.000000580",
        "completion": "0.000000580"
      }
    }
  ]
}

Error Codes

GonkaGate uses standard HTTP status codes and OpenAI-compatible error responses. Click any error to see causes, resolutions, and example response.

HTTP Status Codes

All possible API error responses with their meanings.

Error Handling

Implement retry logic with exponential backoff for rate limits and server errors. Client errors (4xx except 429) should not be retried.

error_handling.py
import openai
import time

client = openai.OpenAI(
    api_key="your-gonkagate-api-key",
    base_url="https://api.gonkagate.com/v1"
)

def chat_with_retry(messages, max_retries=3):
    """Make API request with exponential backoff retry logic."""
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="Qwen/Qwen3-32B-FP8",
                messages=messages
            )
            return response.choices[0].message.content

        except openai.RateLimitError as e:
            # 429: Wait and retry with exponential backoff
            wait_time = 2 ** attempt
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)

        except openai.AuthenticationError as e:
            # 401: Invalid API key - don't retry
            print(f"Auth error: {e.message}")
            raise

        except openai.BadRequestError as e:
            # 400: Invalid request - don't retry
            print(f"Bad request: {e.message}")
            raise

        except openai.APIStatusError as e:
            # 5xx: Server error - retry with backoff
            if e.status_code >= 500:
                wait_time = 2 ** attempt
                print(f"Server error {e.status_code}. Retrying in {wait_time}s...")
                time.sleep(wait_time)
            else:
                raise

    raise Exception("Max retries exceeded")

# Usage
result = chat_with_retry([{"role": "user", "content": "Hello!"}])
print(result)

Rate Limits

API rate limits are applied per IP address to ensure fair usage.

LimitValueScope
Requests per minute100Per IP
Tokens per minute100,000Per IP
Requests per day10,000Per IP

When rate limited, check the Retry-After header for wait time. Implement exponential backoff for retries.

Low Balance Warning

When your balance falls below $5, you will see a warning in the Dashboard. Consider adding funds to avoid service interruption.

Was this page helpful?