πŸš€ API Blibs – LLM Inference API

GPU-less AI inference. No configs headaches, no security worries. Just pure speed with πŸ‡ͺπŸ‡Ί EU region control.

100% OpenAI-compatible API Full GDPR compliance Free of charge system prompt
View API Routes

API Routes

Choose your AI inference route – pay only for what you use (per token):

Loading available API routes


What's included in an API Blib?

Infrastructure & compliance – fully managed, secure, and regulation-ready from day one.

  • No GPU needed – pure API, no hardware management
  • No OS & no security issues – fully managed infrastructure
  • Full region control – choose EU, DE or specific country endpoints
  • πŸ‡ͺπŸ‡Ί EU-hosted, GDPR compliant infrastructure
  • ISO/IEC 27001 certified πŸ‡©πŸ‡ͺ data centers
  • No data logging – fully stateless RAM only inference, in-out-forget
  • OpenAI-compatible API – drop-in replacement, use any SDK
  • Pay-per-token pricing – no idle costs, no minimum commitments

Smart inference & media – built-in intelligence that handles edge cases so you don't have to.

  • High-speed inference – optimized vLLM backends with load balancing
  • Free system prompt – up to 1,024 tokens, set via management dashboard
  • Guaranteed JSON mode – valid JSON or no charge
  • Reasoning + JSON mode – auto 2-call strategy when model can't do both at once
  • Thinking rescue – model stuck in reasoning? Auto-detected and recovered
  • Auto context compression – auto-summarized when exceeding context window, no hard rejects
  • Audio & vision support on multimodal models
  • PDF vision support – PDFs auto-converted to page images, zero pre-processing
  • Image auto-optimization – metadata stripped, auto-resized, safety-validated

Security & resilience – hardened, self-healing, always on.

  • Hardened API surface – dangerous params blocked, injection vectors eliminated
  • SSRF-safe image fetching – server-side validation, HTTPS-only, no private IP leaks
  • Automatic failover & multi-endpoint redundancy
  • Self-healing endpoints – auto-detected failures, health-verified before re-entry

Quick Start

Use any OpenAI-compatible SDK. Just point it to your Trooper.AI route endpoint:

cURL
curl https://router.trooper.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_TROOPER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "clara",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 512
  }'

Why GPU-less LLM Inference Beats Self-Hosting

Running large language models on your own infrastructure means managing GPUs, driver updates, CUDA versions, model weights, scaling, and security patches β€” all before a single token is generated. With API Blibs you skip every layer of that stack. Our fully managed LLM inference endpoints give you access to state-of-the-art open-source models β€” Google Gemma 4, Mistral Ministral 3, and NVIDIA Nemotron 3 Nano β€” without provisioning a single GPU. Requests are processed on optimized vLLM backends with automatic load balancing, delivering consistent low-latency responses even under heavy traffic. No idle GPU costs when you're not calling the API, no ops burden, no surprise bills β€” just pure inference on demand.

For teams evaluating self-hosted LLM deployments versus managed AI inference, the math is straightforward: API Blibs remove the entire GPU procurement and MLOps layer while giving you the same models, the same quality, and faster time-to-production.

Markus and Jaimie working on an A100 GPU cluster for inference servers

Reliable Hardware, Built by Experts

Behind every API Blib is enterprise-grade, upcycled hardware maintained by our own team. Here, Markus and Jaimie are racking an NVIDIA A100 cluster in one of our German data centers β€” the same GPU servers that power your inference requests. We upcycle high-performance components into optimized inference rigs, extending hardware lifecycles while reducing e-waste. We don't resell third-party capacity; we build, own, and operate our infrastructure end-to-end so we can guarantee performance, security, and data residency at every layer of the stack.

OpenAI-Compatible API β€” Migrate Your AI Stack in Minutes

API Blibs are 100% compatible with the OpenAI chat completions format. If your application already uses the OpenAI SDK β€” Python, Node.js, or any HTTP client β€” switching to Trooper.AI is a one-line change: update the base URL and API key. You get the same /v1/chat/completions endpoint, the same request and response schema, and full support for streaming, JSON mode, function calling, and multimodal inputs. No code rewrite, no new abstractions, no vendor lock-in β€” your integration stays portable and you stay in control.

Looking for an OpenAI API alternative hosted in Europe? API Blibs give you equivalent functionality with EU data residency, transparent per-token pricing, and no rate-limit surprises.

GDPR-Compliant AI Inference Hosted in the EU

Every API Blib route runs exclusively on ISO/IEC 27001-certified data centers in Germany and the European Union. Your prompts and completions are processed in RAM only β€” fully stateless, no logging, no storage, no model training on your data. This zero-retention architecture makes API Blibs a strong choice for regulated industries including healthcare, legal tech, fintech, and public sector, as well as any business where data residency and GDPR compliance are non-negotiable.

Need country-level routing? Choose a specific jurisdiction β€” Germany, the Netherlands, or broader EU β€” and your requests will never leave that region. Combined with our hardened API surface and SSRF-safe image fetching, you get an AI inference layer that meets enterprise security requirements out of the box.

Predictable Per-Token Pricing Without Hidden Costs

With API Blibs you pay only for the tokens you consume β€” input and output, billed per million tokens. No setup fees, no monthly minimums, no charges for idle time. Prepay credits at your own pace and your budget is drawn down only when you make actual API calls. On top of that, every monthly campaign adds bonus credits to your top-up β€” the exact percentage depends on the current promotion. This makes it simple to forecast costs whether you're running a customer-facing chatbot, a document extraction pipeline, or large-scale batch classification.

Compare that to GPU rental where you pay by the hour regardless of utilization, or proprietary API providers with opaque rate limits and unpredictable overage charges. API Blibs give you cost transparency from the first token to the last.


API Blibs vs. OpenAI, Azure OpenAI & AWS Bedrock

Choosing a managed LLM inference provider in Europe means balancing price, data residency, and operational simplicity. Here is how API Blibs compare to the three major alternatives.

Trooper.AI API Blibs OpenAI API Azure OpenAI AWS Bedrock
EU data residency βœ… Default – every request processed in πŸ‡ͺπŸ‡Ί EU / πŸ‡©πŸ‡ͺ DE ⚠️ EU endpoints available, but only for "eligible" enterprise customers; 10 % price uplift on EU-resident endpoints ⚠️ EU Data Zones available; regional deployment limited to select models; requires Azure subscription ⚠️ EU regions (Frankfurt, Ireland, etc.) available; cross-region inference may route outside EU
Data retention βœ… Zero – stateless RAM-only inference, in β†’ out β†’ forget ⚠️ Zero data retention on EU-resident projects; standard API retains data up to 30 days ⚠️ Configurable; default 30-day retention on abuse monitoring ⚠️ Configurable; default logging to CloudWatch
Country-level routing βœ… Yes – choose DE, NL or broader EU ❌ No country-level control on regular plans ⚠️ Regional deployment (e.g. Germany) available but with limited model selection ⚠️ Regional deployment possible but not all models available in every region
Pricing model βœ… Per-token in €, no minimums, prepaid credits + promo credits on top ⚠️ Per-token in $, prepaid credits, 50 % batch discount ⚠️ Per-token or Provisioned Throughput Units (PTUs); complex pricing tiers ⚠️ Per-token; priority tier at 75 % premium; provisioned throughput available
Hidden costs βœ… None – no infrastructure, no setup fees ⚠️ Web search tool calls charged extra; fine-tuned model hosting from ~$1,800/mo ⚠️ Key Vault, Cognitive Services overhead; fine-tuned model hosting costs ⚠️ Knowledge Bases, Guardrails, Agents all add separate charges
API compatibility βœ… 100 % OpenAI-compatible, one-line migration βœ… Native ⚠️ OpenAI-compatible via Azure endpoints ❌ Proprietary Converse API; not OpenAI-compatible
Setup complexity βœ… API key + base URL, done ⚠️ API key + project setup; EU residency requires "eligible" approval ❌ Azure subscription + resource group + deployment + IAM ❌ AWS account + IAM + Bedrock console model access requests
Vendor lock-in βœ… None – OpenAI-compatible, switch anytime ⚠️ Low (standard API) ⚠️ Medium (Azure ecosystem) ❌ High (Bedrock-specific APIs, IAM, CloudTrail integration)
Built-in features Auto context compression, PDF vision, thinking rescue, guaranteed JSON, SSRF-safe image fetch Batch API, prompt caching Prompt caching, Guardrails, RAG Knowledge Bases ⚠️ Agents, Guardrails, Knowledge Bases, RAG, evaluations
Certifications ISO/IEC 27001 πŸ‡©πŸ‡ͺ data centers SOC 2 Type 2, CSA STAR, ISO 27001 ⚠️ Azure compliance portfolio (SOC, ISO, C5, etc.) AWS compliance portfolio (SOC, ISO, C5, etc.)
Best for EU-first teams that want zero-config, GDPR-compliant inference at transparent prices Global teams already on OpenAI wanting EU residency (enterprise tier) Organizations deeply invested in the Microsoft/Azure ecosystem AWS-native organizations needing IAM, CloudTrail, and multi-model access

Bottom line: OpenAI, Azure, and Bedrock all offer EU data residency β€” but it comes with eligibility requirements, price uplifts, or ecosystem lock-in. API Blibs give you EU-hosted, GDPR-compliant inference out of the box, with zero setup friction and no hidden costs.


Supported Models β€” Open-Source LLMs Optimized for Production

API Blibs give you access to carefully selected open-source models, optimized for production workloads on our vLLM inference backends. Each model is chosen for its price-to-performance ratio, EU language coverage, and license clarity.

liv β€” Google Gemma 4

The most affordable route β€” a compact multimodal model that handles text, images, audio, and reasoning in a single call. Ideal for high-volume workloads where cost per token matters most, from classification and summarization to image captioning and audio transcription.

clara β€” Mistral Ministral 3

A fast vision-first model built for throughput. Strong EU language performance, multi-image analysis, and structured extraction at a mid-range price point β€” perfect for document processing, OCR pipelines, and customer-facing chatbots that need to see.

nikola β€” NVIDIA Nemotron 3 Nano

The reasoning powerhouse. A mixture-of-experts architecture that delivers deep reasoning and strong coding ability at efficient inference cost. Best suited for code generation, complex reasoning chains, function calling, and agentic workflows.

All models are served via OpenAI-compatible endpoints. Switch between routes by changing the model parameter β€” no code changes required.


LLM API Use Cases for European Businesses

Document Extraction & RAG Pipelines

Feed PDFs, images, and scanned documents into vision-enabled routes like clara or liv. API Blibs auto-convert PDFs to page images and normalize image inputs β€” your RAG pipeline receives clean, structured data without pre-processing steps. Combined with guaranteed JSON mode, you get reliable structured output for downstream indexing.

Customer-Facing Chatbots & Virtual Assistants

Deploy AI-powered chat with sub-second latency and full GDPR compliance. Set a free system prompt via the management dashboard, use function calling for backend integration, and let auto context compression handle long conversations without hitting context limits. Zero data retention means your customers' conversations are never stored.

Code Generation & Developer Tools

Route complex coding tasks to nikola for deep reasoning and accurate function calling. The OpenAI-compatible API integrates directly with developer toolchains β€” VS Code extensions, CI/CD pipelines, code review bots β€” with a single base URL change.

Multimodal Workflows β€” Vision, Audio & PDF

Process images, audio files, and PDFs in a single API call. liv handles all three modalities; clara specializes in high-resolution vision tasks. Images are auto-optimized (metadata stripped, resized, SSRF-validated), and PDFs are converted to page images server-side. No client-side pre-processing needed.

Batch Classification & Data Enrichment

Run high-volume classification, tagging, sentiment analysis, or entity extraction at scale. Per-token pricing with no idle costs means you pay only when processing. Combine with guaranteed JSON mode for machine-readable output that feeds directly into your data pipeline.


Frequently Asked Questions About API Blibs

Is my data stored or used for training?

No. API Blibs use a fully stateless, RAM-only architecture. Your prompts and completions are processed in memory and discarded immediately after the response is returned. No logging, no storage, no model training on your data. Ever.

Can I use function calling and tool use?

Yes. All API Blib routes support OpenAI-compatible function calling. Define your tools in the standard tools parameter and the model will return structured tool calls in the response. Works with all routes.

What happens if my input exceeds the context window?

Instead of rejecting your request, API Blibs automatically compress the middle of the conversation to fit within the model's context window. You get a complete response without losing the beginning or end of your conversation thread.

Do you support streaming?

Yes. Standard SSE streaming via the stream: true parameter, fully compatible with the OpenAI SDK streaming interface.

How do I switch from OpenAI to Trooper.AI?

One-line change. Update your base_url to https://router.trooper.ai/v1 and replace your API key. The request format, response schema, and streaming behavior are identical.

Which EU regions are available?

You can route requests to Germany (DE), the Netherlands (NL), or broader EU endpoints. Select your preferred region in the management dashboard or via the API.

What if the model gets stuck in a reasoning loop?

API Blibs include thinking rescue β€” we detect when a model enters a reasoning loop and auto-recover, ensuring you always receive a usable response instead of a timeout or empty reply.

Is guaranteed JSON mode really guaranteed?

Yes. When you request JSON output, we validate the response structure. If the model fails to produce valid JSON, you are not charged for that request.

Do I need to pre-process images or PDFs before sending them?

No. Images are automatically normalized (metadata stripped, resized to the model's maximum resolution, validated for security). PDFs are converted to page images server-side. You send raw files; we handle the rest.

What certifications do your data centers have?

All infrastructure runs in ISO/IEC 27001-certified data centers in Germany and the EU. Combined with GDPR compliance, zero data retention, and a hardened API surface, API Blibs meet enterprise security requirements out of the box.


Integration Guides β€” Connect Your Stack to API Blibs

Python (OpenAI SDK)

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://router.trooper.ai/v1",
    api_key="YOUR_TROOPER_KEY"
)

response = client.chat.completions.create(
    model="clara",
    messages=[{"role": "user", "content": "Summarize this document."}],
    max_tokens=1024
)

print(response.choices[0].message.content)

Node.js (OpenAI SDK)

Node.js
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://router.trooper.ai/v1",
  apiKey: "YOUR_TROOPER_KEY",
});

const response = await client.chat.completions.create({
  model: "nikola",
  messages: [{ role: "user", content: "Write a unit test for this function." }],
  max_tokens: 2048,
});

console.log(response.choices[0].message.content);

LangChain (Python)

LangChain
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://router.trooper.ai/v1",
    api_key="YOUR_TROOPER_KEY",
    model="clara",
    max_tokens=1024
)

response = llm.invoke("Extract all dates from the following text: ...")
print(response.content)

LlamaIndex

LlamaIndex
from llama_index.llms.openai_like import OpenAILike

llm = OpenAILike(
    api_base="https://router.trooper.ai/v1",
    api_key="YOUR_TROOPER_KEY",
    model="nikola",
    max_tokens=2048
)

response = llm.complete("Explain the EU AI Act in simple terms.")
print(response.text)

cURL with Vision (Image Input)

cURL + Vision
curl https://router.trooper.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_TROOPER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "clara",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "https://example.com/invoice.png"}},
        {"type": "text", "text": "Extract all line items from this invoice as JSON."}
      ]
    }],
    "max_tokens": 2048,
    "response_format": {"type": "json_object"}
  }'

AI Compliance for German & EU Companies

The EU AI Act β€” What It Means for Your AI Infrastructure

The EU AI Act (Regulation 2024/1689) becomes broadly applicable on 2 August 2026, introducing the world's first comprehensive legal framework for artificial intelligence. For companies operating in Germany and the EU, this means new obligations around transparency, documentation, and risk management β€” with penalties of up to €35 million or 7 % of global annual turnover.

While the Act primarily targets providers and deployers of high-risk AI systems (such as AI used in recruitment, credit scoring, or critical infrastructure), every company using AI should understand where their systems sit on the risk pyramid β€” and ensure their inference infrastructure supports compliance.

Why Your Inference Provider Matters

Even for minimal- and limited-risk AI use cases, the EU AI Act emphasizes transparency and data governance. Choosing an inference provider that operates within the EU, retains no data, and provides clear documentation simplifies your compliance posture:

  • Data residency: The Act encourages processing within the EU. API Blibs run exclusively on ISO/IEC 27001-certified data centers in Germany and the EU β€” no data leaves the region.
  • Zero data retention: API Blibs use stateless, RAM-only inference. Prompts and completions are never stored, eliminating concerns around data logging, retention periods, and subject access requests under GDPR.
  • Transparency: Clear per-token pricing, documented model specifications, and a hardened API surface make it straightforward to document your AI supply chain β€” a key requirement for Auftragsverarbeitung (data processing agreements) under GDPR and the upcoming AI Act transparency obligations.
  • No model training on your data: Your inputs are never used to train or fine-tune models. Full data separation by design.

GDPR + AI Act: Dual Compliance

German companies face a dual compliance burden: GDPR (in effect since 2018) and the AI Act (phased enforcement through 2027). Both frameworks require you to demonstrate that personal data is processed lawfully, transparently, and with appropriate safeguards. Using a US-based inference provider without EU data residency creates unnecessary regulatory surface area β€” you must rely on Standard Contractual Clauses, assess adequacy decisions, and document cross-border data flows.

API Blibs eliminate this complexity: all processing happens within the EU, with zero retention and ISO-certified infrastructure. Your Datenschutzbeauftragter (data protection officer) can document a clean, EU-only data flow with no third-country transfers.

BaFin, Healthcare & Regulated Industries

For companies in regulated sectors β€” fintech (BaFin-regulated), healthtech, legal tech, public sector β€” the bar is even higher. Auditors expect:

  • Demonstrable data residency within the EU or specific member states
  • No data leakage to third-party systems or training pipelines
  • Clear documentation of the AI supply chain and subprocessors
  • Incident response and failover procedures

API Blibs address all four: country-level routing (DE, NL), zero-retention architecture, published model specifications, and automatic failover with self-healing endpoints.

Getting Started with Compliant AI Inference

You don't need a lengthy procurement cycle to deploy GDPR- and AI Act-ready LLM inference. Create a Trooper.AI account, top up prepaid credits, and start making API calls β€” all infrastructure is already certified, all data stays in the EU, and there is nothing to configure on the compliance side.

For Auftragsverarbeitungsvertrag (AVV / DPA) requests or questions about your specific compliance requirements, contact us at sales@trooper.ai or call +49 6126 9289991.

Your selected API Route:

Region Preference
Pricing Summary

PAYMENT – GOOD TO KNOW: You are billed per token used, charged against your prepaid budget. No idle costs – you only pay when you make API calls.
Official invoice on next day. VAT already included if applicable.
NO REFUNDS! Read the full payment docs.

Please to activate an API route.