Open Source Private AI MIT Licence

Supercharge Your Native Ollama.

Ollama Gateway: Supercharge your native Ollama with enterprise-grade API authentication, request auditing, rate limiting, and virtual model management—your secure, private AI gateway.

Go to Dashboard Deployment Guide View on GitHub

Enterprise Ready

Security and Auditing by Design.

Bearer Token Auth

Create multiple API keys for different users and applications, each with its own fine-grained permissions.

Clickhouse Auditing

Detailed auditing for every request and response, stored in high-performance Clickhouse storage for compliance.

Team Management

A built-in Role-Based Access Control (RBAC) system for managing your team's access to AI models.

Model Keep-alive

Periodically pings underlying models to ensure they stay loaded in memory for instant response times.

Default Model Support

Automatically redirect requests to a default virtual model if no model is specified in the API call.

Native Passthrough

Seamless support for native Ollama features including MCP tools, embedding, images, and stream mode.

Smart Proxy

Intelligent
Virtual Models

Create multiple aliases for your models with custom system prompts and parameter overrides.

Chat Models

Create virtual chat models with persistent system prompts.

Embedding Models

Dedicated management for embedding models for RAG applications.

Parameter Overrides

Override temperature, top_k, and other Ollama options per virtual model.

Smart Mapping

Alias -> Real

Validation

Auto Sync

Agent & Ecosystem Ready

Perfect for Agent Deployment

OllamaGateway has been officially tested and is fully compatible with popular ecosystem tools including Claude Code, Open-WebUI, Opencode, and Roocode. Its robust API translation makes it the ideal choice for deploying autonomous AI agents.

Recommended Models

qwen3.6:27b-q8_0
qwen3.6:35b-a3b-q4_K_M

Native Power, AI Development

Supercharge AI Development with Claude Code, Opencode & Qwen Code

OllamaGateway seamlessly connects your local Ollama capabilities to professional AI development tools. Whether you are using Claude Code, Opencode or Qwen Code, you can now leverage your own hardware for automated coding and development tasks—all verified and officially supported by our team.

Verified compatibility with Claude Code agent
Optimized for Opencode & Qwen Code ecosystems
Secure local inference with zero data leakage

Opencode and Qwen Code working with OllamaGateway

Universal API Compatibility

Mix Any Client with Any Backend

OllamaGateway supports two inbound API formats (OpenAI & Ollama) and two backend provider types — in any combination. Your clients and physical models don't have to speak the same language.

①

OpenAI Client
→ OpenAI Backend

/v1/chat/completions
→ OpenAI-compatible provider

②

Ollama Client
→ OpenAI Backend

/api/chat
→ OpenAI-compatible provider

③

OpenAI Client
→ Ollama Backend

/v1/chat/completions
→ Native Ollama

④

Ollama Client
→ Ollama Backend

/api/chat
→ Native Ollama

⑤

Anthropic Client
→ OpenAI Backend

/v1/messages
→ OpenAI-compatible provider

⑥

Anthropic Client
→ Ollama Backend

/v1/messages
→ Native Ollama

Parameter	①OpenAI → OpenAI	②Ollama → OpenAI	③OpenAI → Ollama	④Ollama → Ollama	⑤Anthropic → OpenAI	⑥Anthropic → Ollama
Temperature	Full support	Full support	Full support	Full support	Full support	Full support
Top P	Full support	Full support	Full support	Full support	Full support	Full support
Top K	Not supported	DB override discarded	DB override	Full support	Not supported	DB override
Num Ctx	Not supported	DB override discarded	DB override	Full support	Not supported	DB override
Thinking	DB override	DB override	DB override	Full support	DB override	DB override
Images / Multimodal	Passthrough	Converted	Converted	Passthrough	Not supported	Not supported
Repeat Penalty	Not supported	DB override discarded	DB override	Full support	Not supported	DB override

'DB override' means a hard assignment (=), not a null-coalescing assignment (??=). When the virtual model has a value configured in the database, the client-supplied value is always discarded.

High Availability

Load Balancing, Tiered Fallback & Mixed Providers

OllamaGateway goes beyond simple proxying. It intelligently distributes requests across multiple underlying models, automatically falls back when a provider is unavailable, and lets you combine models from different providers into a single virtual model.

Smart Load Balancing

Distribute API requests across multiple underlying Ollama instances or OpenAI-compatible providers. Configure weights and priorities for each backend to optimize throughput and resource utilization.

Tiered Fallback

Define fallback chains for your models. If the primary provider fails or times out, OllamaGateway automatically steps down to the next available backend — ensuring maximum uptime without manual intervention.

Multi-Provider Virtual Models

Blend multiple models from entirely different providers into a single virtual model. Mix Ollama backends with OpenAI-compatible services — your clients see one unified endpoint while the gateway handles all routing and translation behind the scenes.

Native vs. Gateway

Why choose OllamaGateway?

Native Ollama is great for personal use, but it lacks the enterprise features required for team collaboration and production deployment. OllamaGateway fills those gaps without changing your workflow.

Feature Comparison	Native Ollama	OllamaGateway
Model Hosting & Inference
Multimodal
MCP
Function call
Streaming
OpenAI API Translation
Anthropic API Translation
API Authentication (Bearer)
Multiple API Keys Management
Request & Response Auditing
API Rate Limiting
Virtual Model Overrides
Multi-backend Support
Load Balancing
Tiered Fallback
Default Model Support
Model Keep-alive (Ping)
Admin Management GUI
Chat/Embedding Segregation

Ready to secure your AI gateway?