EARLY ACCESS: Features, prices, parameters may change in future.
Powered by optimized Blibs-based architecture, our OpenAI-compatible API delivers high-performance large language model inference — without requiring dedicated GPU resources on your end. Hosted exclusively in EU data centers, it ensures seamless integration with your applications while adhering to stringent GDPR compliance. Leveraging Trooper.AI’s scalable infrastructure, this solution eliminates hardware constraints, enabling cost-efficient deployment for startups and enterprises alike. Whether you’re building chatbots, fine-tuning models, or deploying generative AI workflows, our API provides low-latency responses with enterprise-grade reliability.
Designed for flexibility, the service integrates effortlessly into existing pipelines via straightforward endpoints, supporting batch processing and real-time queries. Backed by persistent storage and advanced security protocols—including firewalls and DDoS protection—the platform guarantees uninterrupted uptime for mission-critical AI tasks. Explore our tailored configurations below to match your project’s demands.
Our base pricing is as follows. Additional pricing per country may apply on API Blib order. Alos you can benefit from our regular Extra Credits Promotions!
| Route | Model | Context | Input/1M | Output/1M | +33% Promo** | Strengths |
|---|---|---|---|---|---|---|
| liv | Google Gemma 4 2.3B + 0.15B (vision) + 0.3B (audio) | 20,380 | €0.029 | €0.049 | Text, Vision, Audio, Reasoning, Tools, JSON | |
| clara | Mistral Ministral 3 13.5B + 0.4B (vision) | 9,248 | €0.139 | €0.249 | EU Model, Vision, Tools, Text, JSON | |
| nikola | NVIDIA Nemotron 3 Nano 30B MoE (128 experts, 6 active @ 3.5B) | 20,480 | €0.159 | €0.319 | Text, Reasoning, Tools, Coding, JSON |
* Current Promotion gives your +33% Extra Credits on your payment from €150 this month. Pay €150 and get €200 into your account!
Before using our API you need to order your API Blib on our website: Order API Blib
You can choose you Model and your Region and after deploying your desired model should see something like this in your Management Dashboard:
We offer free system prompt up to 1,024 characters. The free system prompt is only available via the dashboard! Dynamic system messages sent via the API is charged as usual. Click on Actions and you see the system prompt edit dialogue:
Lets go, order your API Blib now:
If you want to benchmark our LLM Endpoints with other LLM Platforms, you can use our free LLM Quality Benchmark tool here: Free LLM Quality Benchmark This is how it looks like:
Test any OpenAI-compatible LLM endpoint with 25 automated quality checks — reasoning, coding, multilingual, structured output, tool calling and more.
https://eu.router.trooper.ai/v1
Regional endpoints for country-level data residency:
| Domain | Region |
|---|---|
eu.router.trooper.ai |
EU (all EU data centers) |
de.router.trooper.ai |
Germany only |
nl.router.trooper.ai |
Netherlands only |
All requests require a Bearer token. Get your API key from the management dashboard .
Authorization: Bearer YOUR_TROOPER_KEY
Activate routes at trooper.ai/order-apiblib . Each route gives you a model name you use in API calls.
| Route | Base Model | Strengths |
|---|---|---|
| liv | Google Gemma 4 | Cheapest. Text, images, audio, reasoning. High-volume workloads. |
| clara | Mistral Ministral 3 | Vision-first. Fast throughput, strong EU language support. |
| nikola | NVIDIA Nemotron 3 Nano | Reasoning powerhouse. Code generation, function calling, agentic workflows. |
/v1/chat/completionsStandard OpenAI-compatible chat completions endpoint.
/v1/modelsLists your activated models. Requires authentication. Returns only models matching the region of the domain you’re calling.
Each model object includes:
| Field | Type | Description |
|---|---|---|
id |
string | Your route name (used as model in requests). |
object |
string | Always "model". |
owned_by |
string | Owner identifier. |
created |
integer | Unix timestamp of creation. |
base_models |
string[] | Underlying model name(s). |
context_length |
integer | Maximum context window (tokens). |
max_tokens |
integer | Maximum output tokens. |
capabilities |
object | Feature flags for this model (see below). |
supported_parameters |
string[] | Parameters accepted by this model. |
Capabilities object:
| Flag | Type | Description |
|---|---|---|
thinking |
boolean | Supports reasoning_effort and chain-of-thought reasoning. |
tools |
boolean | Supports function calling / tools. |
vision |
boolean | Supports image and PDF inputs. |
audio |
boolean | Supports audio inputs. |
json_mode |
boolean | Supports response_format (JSON mode / structured outputs). |
token_budget |
boolean | Supports explicit thinking token budget control. |
Use capabilities.thinking to determine whether a model accepts reasoning parameters before sending them.
Example response:
{
"object": "list",
"data": [
{
"id": "clara",
"object": "model",
"owned_by": "trooper_42",
"created": 1700000000,
"base_models": ["Ministral-3"],
"context_length": 131072,
"max_tokens": 131072,
"capabilities": {
"tools": true,
"vision": true,
"audio": false,
"thinking": false,
"json_mode": true,
"token_budget": false
},
"supported_parameters": [
"temperature", "top_p", "max_tokens", "stream",
"response_format", "tools", "tool_choice"
]
}
]
}
/healthReturns endpoint availability and region info.
All parameters follow the OpenAI Chat Completions API format.
| Parameter | Type | Description |
|---|---|---|
model |
string | Your route name (e.g. "clara", "nikola", "liv") |
messages |
array | Array of message objects (role + content) |
| Parameter | Type | Default | Description |
|---|---|---|---|
max_tokens |
integer | auto (32–4096) | Maximum output tokens. Auto-clamped to route’s context window. |
max_completion_tokens |
integer | — | Alias for max_tokens. |
stream |
boolean | false |
Enable SSE streaming. |
temperature |
number | model default | Sampling temperature (0–2). |
top_p |
number | model default | Nucleus sampling. |
response_format |
object | — | {"type": "json_object"} or {"type": "json_schema", "json_schema": {...}} |
tools |
array | — | Function calling tool definitions (OpenAI format). |
tool_choice |
string/object | — | Controls tool selection ("auto", "none", or specific tool). |
reasoning |
object | — | `{“effort”: “none” |
reasoning_effort |
string | — | Shorthand: "none", "medium", "high". |
reasoning.exclude |
boolean | false |
Strip reasoning content from the response. |
Standard SSE streaming, fully compatible with the OpenAI SDK.
{ "stream": true }
Response format: data: {...}\n\n lines, terminated by data: [DONE]\n\n.
Request structured JSON output. Make sure not only setting JSON request in text message, you also need to set at least the response_format to json_object!
If the model fails to produce valid JSON, you are not charged.
{ "response_format": { "type": "json_object" } }
With a schema:
{
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "my_schema",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "integer" }
},
"required": ["name", "age"]
}
}
}
}
Send images via URL or base64. PDFs are auto-converted to page images server-side.
{
"messages": [{
"role": "user",
"content": [
{ "type": "image_url", "image_url": { "url": "https://example.com/photo.jpg" } },
{ "type": "text", "text": "Describe this image." }
]
}]
}
Base64:
{
"type": "image_url",
"image_url": { "url": "data:image/png;base64,iVBOR..." }
}
Send audio files in multimodal messages. Supported on liv.
Control whether and how deeply the model reasons (chain-of-thought) before answering.
| Value | Effect |
|---|---|
"none" |
Thinking off — fastest responses, lowest token usage. |
"low" |
Thinking off — same as none. |
"medium" |
Thinking on — the model reasons step-by-step before answering. Good balance of quality and speed. |
"high" |
Thinking on + deep — the model is instructed to think very carefully and in great detail. Best for complex math, logic, and code. If max_tokens is set, it must be at least 4 096 or the request will be rejected. |
Enable standard thinking:
{ "reasoning_effort": "medium" }
Enable deep thinking for maximum quality:
{ "reasoning_effort": "high" }
Or via the reasoning object:
{ "reasoning": { "effort": "high" } }
Disable thinking explicitly:
{ "reasoning_effort": "none" }
To strip reasoning from the response (thinking still happens, but the reasoning tokens are not returned):
{ "reasoning": { "effort": "high", "exclude": true } }
Thinking Behaviour can also be configured per route in the management dashboard. The dashboard setting controls the default behaviour and how reasoning is returned:
"reasoning_effort": "medium" or "high".reasoning_content field.<think> tags inside the content.Thinking rescue: if the model enters a reasoning loop, the router auto-recovers and returns a usable response.
Standard OpenAI tools format. Works with all routes.
{
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" }
},
"required": ["location"]
}
}
}],
"tool_choice": "auto"
}
If your input exceeds the context window, the router automatically compresses the middle of the conversation to fit — no manual truncation needed. You always get a response.
A free system prompt can be configured per route in the management dashboard. It’s prepended to every request automatically and not billed.
Standard OpenAI response envelope:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "clara",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you?"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 8,
"total_tokens": 20
}
}
With reasoning enabled, the response may include reasoning_content alongside content.
Every response includes an x-transaction-id header for billing reference and debugging.
Errors follow the OpenAI error envelope format:
{
"error": {
"message": "The model 'nonexistent' does not exist.",
"type": "invalid_request_error",
"param": "model",
"code": "model_not_found"
}
}
| HTTP Status | Code | Description |
|---|---|---|
| 400 | invalid_value |
Missing model, API key, input too short, or invalid max_tokens. |
| 403 | invalid_api_key |
Invalid API key or insufficient budget. |
| 404 | model_not_found |
Model doesn’t exist or not activated. |
| 404 | region_mismatch |
Model not available in the requested region. |
| 500 | — | Internal router error. |
| 503 | — | No endpoints available in the requested region. |
See popular examples of code how to use a OpenAI compatible API for LLM inference. The router.trooper.ai must be replaced with the endpoint URL shown in your API Blib Order!
from openai import OpenAI
client = OpenAI(
base_url="https://router.trooper.ai/v1",
api_key="YOUR_TROOPER_KEY"
)
response = client.chat.completions.create(
model="clara",
messages=[{"role": "user", "content": "Summarize this document."}],
max_tokens=1024
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://router.trooper.ai/v1",
apiKey: "YOUR_TROOPER_KEY",
});
const response = await client.chat.completions.create({
model: "nikola",
messages: [{ role: "user", content: "Write a unit test for this function." }],
max_tokens: 2048,
});
console.log(response.choices[0].message.content);
from openai import OpenAI
client = OpenAI(
base_url="https://router.trooper.ai/v1",
api_key="YOUR_TROOPER_KEY"
)
stream = client.chat.completions.create(
model="liv",
messages=[{"role": "user", "content": "Explain quantum computing."}],
max_tokens=2048,
stream=True
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
import json
from openai import OpenAI
client = OpenAI(
base_url="https://router.trooper.ai/v1",
api_key="YOUR_TROOPER_KEY"
)
response = client.chat.completions.create(
model="clara",
messages=[{"role": "user", "content": "List the 3 largest EU countries as JSON with name and population."}],
max_tokens=512,
response_format={"type": "json_object"}
)
data = json.loads(response.choices[0].message.content)
print(data)
from openai import OpenAI
client = OpenAI(
base_url="https://router.trooper.ai/v1",
api_key="YOUR_TROOPER_KEY"
)
response = client.chat.completions.create(
model="clara",
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://example.com/invoice.png"}},
{"type": "text", "text": "Extract all line items from this invoice as JSON."}
]
}],
max_tokens=2048,
response_format={"type": "json_object"}
)
print(response.choices[0].message.content)
from openai import OpenAI
client = OpenAI(
base_url="https://router.trooper.ai/v1",
api_key="YOUR_TROOPER_KEY"
)
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}]
response = client.chat.completions.create(
model="nikola",
messages=[{"role": "user", "content": "What's the weather in Berlin?"}],
tools=tools,
tool_choice="auto",
max_tokens=512
)
tool_calls = response.choices[0].message.tool_calls
if tool_calls:
print(tool_calls[0].function.name, tool_calls[0].function.arguments)
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="https://router.trooper.ai/v1",
api_key="YOUR_TROOPER_KEY",
model="clara",
max_tokens=1024
)
response = llm.invoke("Extract all dates from the following text: ...")
print(response.content)
from llama_index.llms.openai_like import OpenAILike
llm = OpenAILike(
api_base="https://router.trooper.ai/v1",
api_key="YOUR_TROOPER_KEY",
model="nikola",
max_tokens=2048
)
response = llm.complete("Explain the EU AI Act in simple terms.")
print(response.text)
curl https://router.trooper.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_TROOPER_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "clara",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 512
}'
One-line change — update the base URL and API key:
# Before (OpenAI)
client = OpenAI(api_key="sk-...")
# After (Trooper.AI)
client = OpenAI(
base_url="https://router.trooper.ai/v1",
api_key="YOUR_TROOPER_KEY"
)
Everything else stays the same: request format, response schema, streaming, tools, JSON mode.