High-performance OpenAI-compatible LLM inference API Blibs for developers

EARLY ACCESS: Features, prices, parameters may change in future.

Powered by optimized Blibs-based architecture, our OpenAI-compatible API delivers high-performance large language model inference — without requiring dedicated GPU resources on your end. Hosted exclusively in EU data centers, it ensures seamless integration with your applications while adhering to stringent GDPR compliance. Leveraging Trooper.AI’s scalable infrastructure, this solution eliminates hardware constraints, enabling cost-efficient deployment for startups and enterprises alike. Whether you’re building chatbots, fine-tuning models, or deploying generative AI workflows, our API provides low-latency responses with enterprise-grade reliability.

Designed for flexibility, the service integrates effortlessly into existing pipelines via straightforward endpoints, supporting batch processing and real-time queries. Backed by persistent storage and advanced security protocols—including firewalls and DDoS protection—the platform guarantees uninterrupted uptime for mission-critical AI tasks. Explore our tailored configurations below to match your project’s demands.

Order API Blib now

Models and Pricing

Our base pricing is as follows. Additional pricing per country may apply on API Blib order. Alos you can benefit from our regular Extra Credits Promotions!

Route	Model	Max Tokens Total	Max Tokens Out	Input/1M	Output/1M	+25% Promo**	Strengths
liv	Google Gemma 4 2.3B + 0.15B (vision) + 0.3B (audio)	131,072	4,000	€0.029	€0.049	~~€0.029~~ €0.023 in, ~~€0.049~~ €0.039 out	Text, Chat, Vision, Audio, Reasoning, Tools, JSON
clara	Mistral Ministral 3 13.5B + 0.4B (vision)	32,768	2,000	€0.139	€0.249	~~€0.139~~ €0.111 in, ~~€0.249~~ €0.199 out	EU Model, Vision, Tools, Text, JSON
nikola	NVIDIA Nemotron 3 Nano 30B MoE (128 experts, 6 active @ 3.5B)	98,304	9,000	€0.159	€0.319	~~€0.159~~ €0.127 in, ~~€0.319~~ €0.255 out	Text, Reasoning, Tools, Coding, JSON

* Current Promotion gives your +25% Extra Credits on your payment from €160 this month. Pay €160 and get €200 into your account!

How to order your API Blib

Before using our API you need to order your API Blib on our website: Order API Blib

You can choose you Model and your Region and after deploying your desired model should see something like this in your Management Dashboard:

Free Static System Prompt

We offer free system prompt up to 1,024 characters. The free system prompt is only available via the dashboard! Dynamic system messages sent via the API is charged as usual. Click on Actions and you see the system prompt edit dialogue:

Lets go, order your API Blib now:

Order API Blib now

LLM Benchmark 🧪

If you want to benchmark our LLM Endpoints with other LLM Platforms, you can use our free LLM Quality Benchmark tool here: Free LLM Quality Benchmark This is how it looks like:

Test any OpenAI-compatible LLM endpoint with 25 automated quality checks — reasoning, coding, multilingual, structured output, tool calling and more.

Base URL

Code

https://eu.router.trooper.ai/v1

Regional endpoints for country-level data residency:

Domain	Region
`eu.router.trooper.ai`	EU (all EU data centers)
`de.router.trooper.ai`	Germany only
`nl.router.trooper.ai`	Netherlands only

Authentication

All requests require a Bearer token. Get your API key from the management dashboard .

Code

Authorization: Bearer YOUR_TROOPER_KEY

Available Models

Activate routes at trooper.ai/order-apiblib . Each route gives you a model name you use in API calls.

Route	Base Model	Strengths
liv	Google Gemma 4	Cheapest. Text, images, audio, reasoning. High-volume workloads.
clara	Mistral Ministral 3	Vision-first. Fast throughput, strong EU language support.
nikola	NVIDIA Nemotron 3 Nano	Reasoning powerhouse. Code generation, function calling, agentic workflows.

Endpoints

POST `/v1/chat/completions`

Standard OpenAI-compatible chat completions endpoint.

GET `/v1/models`

Lists your activated models. Requires authentication. Returns only models matching the region of the domain you’re calling.

Each model object includes:

Field	Type	Description
`id`	string	Your route name (used as `model` in requests).
`object`	string	Always `"model"`.
`owned_by`	string	Owner identifier.
`created`	integer	Unix timestamp of creation.
`base_models`	string[]	Underlying model name(s).
`context_length`	integer	Maximum context window (tokens).
`max_tokens`	integer	Maximum output tokens.
`capabilities`	object	Feature flags for this model (see below).
`supported_parameters`	string[]	Parameters accepted by this model.

Capabilities object:

Flag	Type	Description
`thinking`	boolean	Supports `reasoning_effort` and chain-of-thought reasoning.
`tools`	boolean	Supports function calling / tools.
`vision`	boolean	Supports image and PDF inputs.
`audio`	boolean	Supports audio inputs.
`json_mode`	boolean	Supports `response_format` (JSON mode / structured outputs).
`token_budget`	boolean	Supports explicit thinking token budget control.

Use capabilities.thinking to determine whether a model accepts reasoning parameters before sending them.

Example response:

json

{
  "object": "list",
  "data": [
    {
      "id": "clara",
      "object": "model",
      "owned_by": "trooper_42",
      "created": 1700000000,
      "base_models": ["Ministral-3"],
      "context_length": 131072,
      "max_tokens": 131072,
      "capabilities": {
        "tools": true,
        "vision": true,
        "audio": false,
        "thinking": false,
        "json_mode": true,
        "token_budget": false
      },
      "supported_parameters": [
        "temperature", "top_p", "max_tokens", "stream",
        "response_format", "tools", "tool_choice"
      ]
    }
  ]
}

GET `/health`

Returns endpoint availability and region info.

Order API Blib now

Request Parameters

All parameters follow the OpenAI Chat Completions API format.

Required

Parameter	Type	Description
`model`	string	Your route name (e.g. `"clara"`, `"nikola"`, `"liv"`)
`messages`	array	Array of message objects (`role` + `content`)

Optional

Parameter	Type	Default	Description
`max_tokens`	integer	auto (32–4096)	Maximum output tokens. Auto-clamped to route’s context window.
`max_completion_tokens`	integer	—	Alias for `max_tokens`.
`stream`	boolean	`false`	Enable SSE streaming.
`temperature`	number	model default	Sampling temperature (0–2).
`top_p`	number	model default	Nucleus sampling.
`response_format`	object	—	`{"type": "json_object"}` or `{"type": "json_schema", "json_schema": {...}}`
`tools`	array	—	Function calling tool definitions (OpenAI format).
`tool_choice`	string/object	—	Controls tool selection (`"auto"`, `"none"`, or specific tool).
`reasoning`	object	—	`{“effort”: “none”
`reasoning_effort`	string	—	Shorthand: `"none"`, `"medium"`, `"high"`.
`reasoning.exclude`	boolean	`false`	Strip reasoning content from the response.

Features

Streaming

Standard SSE streaming, fully compatible with the OpenAI SDK.

json

{ "stream": true }

Response format: data: {...}\n\n lines, terminated by data: [DONE]\n\n.

JSON Mode

Request structured JSON output. Make sure not only setting JSON request in text message, you also need to set at least the response_format to json_object!

If the model fails to produce valid JSON, you are not charged.

json

{ "response_format": { "type": "json_object" } }

With a schema:

json

{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "my_schema",
      "schema": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "age": { "type": "integer" }
        },
        "required": ["name", "age"]
      }
    }
  }
}

Vision (Images & PDFs)

Send images via URL or base64. PDFs are auto-converted to page images server-side.

json

{
  "messages": [{
    "role": "user",
    "content": [
      { "type": "image_url", "image_url": { "url": "https://example.com/photo.jpg" } },
      { "type": "text", "text": "Describe this image." }
    ]
  }]
}

Base64:

json

{
  "type": "image_url",
  "image_url": { "url": "data:image/png;base64,iVBOR..." }
}

Images are auto-resized and metadata-stripped (SSRF-safe).
Max image count and size depend on your route configuration.
Supported on liv and clara routes.

Audio

Send audio files in multimodal messages. Supported on liv. Maximum length is 30 seconds due to Gemma 4 limitations. If you need more, cut your audio into overlapping slices and send one by one with the result of the parts already transcoded.

Thinking / Reasoning

Control whether and how deeply the model reasons (chain-of-thought) before answering.

Value	Effect
`"none"`	Thinking off — fastest responses, lowest token usage.
`"low"`	Thinking off — same as none.
`"medium"`	Thinking on — the model reasons step-by-step before answering. Good balance of quality and speed.
`"high"`	Thinking on + deep — the model is instructed to think very carefully and in great detail. Best for complex math, logic, and code. If `max_tokens` is set, it must be at least 4 096 or the request will be rejected.

Enable standard thinking:

json

{ "reasoning_effort": "medium" }

Enable deep thinking for maximum quality:

json

{ "reasoning_effort": "high" }

Or via the reasoning object:

json

{ "reasoning": { "effort": "high" } }

Disable thinking explicitly:

json

{ "reasoning_effort": "none" }

To strip reasoning from the response (thinking still happens, but the reasoning tokens are not returned):

json

{ "reasoning": { "effort": "high", "exclude": true } }

Thinking Behaviour can also be configured per route in the management dashboard. The dashboard setting controls the default behaviour and how reasoning is returned:

Thinking Modes under Actions on supported Models

Disabled — Thinking off by default. Can still be enabled per-request via "reasoning_effort": "medium" or "high".
Strip — Thinking on, but reasoning tokens are removed from the response.
Reasoning Content — Thinking on, reasoning returned in a separate reasoning_content field.
Think Tag — Thinking on, reasoning returned as <think> tags inside the content.

Thinking rescue: if the model enters a reasoning loop, the router auto-recovers and returns a usable response.

Function Calling / Tools

Standard OpenAI tools format. Works with all routes.

json

{
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": { "type": "string" }
        },
        "required": ["location"]
      }
    }
  }],
  "tool_choice": "auto"
}

Auto Context Compression

If your input exceeds the context window, the router automatically compresses the middle of the conversation to fit — no manual truncation needed. You always get a response.

System Prompt

A free system prompt can be configured per route in the management dashboard. It’s prepended to every request automatically and not billed.

Response Format

Standard OpenAI response envelope:

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "clara",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help you?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

With reasoning enabled, the response may include reasoning_content alongside content.

Transaction ID

Every response includes an x-transaction-id header for billing reference and debugging.

Error Handling

Errors follow the OpenAI error envelope format:

json

{
  "error": {
    "message": "The model 'nonexistent' does not exist.",
    "type": "invalid_request_error",
    "param": "model",
    "code": "model_not_found"
  }
}

Error Codes

HTTP Status	Code	Description
400	`invalid_value`	Missing model, API key, input too short, or invalid `max_tokens`.
403	`invalid_api_key`	Invalid API key or insufficient budget.
404	`model_not_found`	Model doesn’t exist or not activated.
404	`region_mismatch`	Model not available in the requested region.
500	—	Internal router error.
503	—	No endpoints available in the requested region.

Billing

Per-token pricing — input and output tokens billed separately per million tokens.
No idle costs — you only pay when you make API calls.
Prepaid credits — top up your balance and draw down with usage.
System prompt tokens are free (not billed).
Failed JSON requests are not charged when using JSON mode.
Vision inputs are billed per unit (images processed).

Order API Blib now

Integration Examples

See popular examples of code how to use a OpenAI compatible API for LLM inference. The router.trooper.ai must be replaced with the endpoint URL shown in your API Blib Order!

Python (OpenAI SDK)

python

from openai import OpenAI

client = OpenAI(
    base_url="https://router.trooper.ai/v1",
    api_key="YOUR_TROOPER_KEY"
)

response = client.chat.completions.create(
    model="clara",
    messages=[{"role": "user", "content": "Summarize this document."}],
    max_tokens=1024
)

print(response.choices[0].message.content)

Node.js (OpenAI SDK)

javascript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://router.trooper.ai/v1",
  apiKey: "YOUR_TROOPER_KEY",
});

const response = await client.chat.completions.create({
  model: "nikola",
  messages: [{ role: "user", content: "Write a unit test for this function." }],
  max_tokens: 2048,
});

console.log(response.choices[0].message.content);

Python with Streaming

python

from openai import OpenAI

client = OpenAI(
    base_url="https://router.trooper.ai/v1",
    api_key="YOUR_TROOPER_KEY"
)

stream = client.chat.completions.create(
    model="liv",
    messages=[{"role": "user", "content": "Explain quantum computing."}],
    max_tokens=2048,
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

JSON Mode

python

import json
from openai import OpenAI

client = OpenAI(
    base_url="https://router.trooper.ai/v1",
    api_key="YOUR_TROOPER_KEY"
)

response = client.chat.completions.create(
    model="clara",
    messages=[{"role": "user", "content": "List the 3 largest EU countries as JSON with name and population."}],
    max_tokens=512,
    response_format={"type": "json_object"}
)

data = json.loads(response.choices[0].message.content)
print(data)

Vision (Image Analysis)

python

from openai import OpenAI

client = OpenAI(
    base_url="https://router.trooper.ai/v1",
    api_key="YOUR_TROOPER_KEY"
)

response = client.chat.completions.create(
    model="clara",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "https://example.com/invoice.png"}},
            {"type": "text", "text": "Extract all line items from this invoice as JSON."}
        ]
    }],
    max_tokens=2048,
    response_format={"type": "json_object"}
)

print(response.choices[0].message.content)

Function Calling

python

from openai import OpenAI

client = OpenAI(
    base_url="https://router.trooper.ai/v1",
    api_key="YOUR_TROOPER_KEY"
)

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="nikola",
    messages=[{"role": "user", "content": "What's the weather in Berlin?"}],
    tools=tools,
    tool_choice="auto",
    max_tokens=512
)

tool_calls = response.choices[0].message.tool_calls
if tool_calls:
    print(tool_calls[0].function.name, tool_calls[0].function.arguments)

LangChain

python

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://router.trooper.ai/v1",
    api_key="YOUR_TROOPER_KEY",
    model="clara",
    max_tokens=1024
)

response = llm.invoke("Extract all dates from the following text: ...")
print(response.content)

LlamaIndex

python

from llama_index.llms.openai_like import OpenAILike

llm = OpenAILike(
    api_base="https://router.trooper.ai/v1",
    api_key="YOUR_TROOPER_KEY",
    model="nikola",
    max_tokens=2048
)

response = llm.complete("Explain the EU AI Act in simple terms.")
print(response.text)

cURL

bash

curl https://router.trooper.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_TROOPER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "clara",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 512
  }'

Migration from OpenAI

One-line change — update the base URL and API key:

python

# Before (OpenAI)
client = OpenAI(api_key="sk-...")

# After (Trooper.AI)
client = OpenAI(
    base_url="https://router.trooper.ai/v1",
    api_key="YOUR_TROOPER_KEY"
)

Everything else stays the same: request format, response schema, streaming, tools, JSON mode.

Data Residency & Compliance

All processing on ISO/IEC 27001 certified data centers in Germany and the EU.
Zero data retention — prompts and responses processed in RAM only, never stored.
GDPR compliant — no cross-border data transfers, no model training on your data.
Country-level routing available (DE, NL, or broader EU).

Order API Blib now

API Blibs — Developer Integration Guide

Models and Pricing

How to order your API Blib

Free Static System Prompt

LLM Benchmark 🧪

Base URL

Authentication

Available Models

Endpoints

POST /v1/chat/completions

GET /v1/models

GET /health

Request Parameters

Required

Optional

Features

Streaming

JSON Mode

Vision (Images & PDFs)

Audio

Thinking / Reasoning

Function Calling / Tools

Auto Context Compression

System Prompt

Response Format

Transaction ID

Error Handling

Error Codes

Billing

Integration Examples

Python (OpenAI SDK)

Node.js (OpenAI SDK)

Python with Streaming

JSON Mode

Vision (Image Analysis)

Function Calling

LangChain

LlamaIndex

cURL

Migration from OpenAI

Data Residency & Compliance

POST `/v1/chat/completions`

GET `/v1/models`

GET `/health`