Open Source Private AI MIT Licence

Supercharge Your Native Ollama.

Ollama Gateway: Supercharge your native Ollama with enterprise-grade API authentication, request auditing, rate limiting, and virtual model management—your secure, private AI gateway.

OllamaGateway Main screenshot
Enterprise Ready

Security and Auditing by Design.

Bearer Token Auth

Create multiple API keys for different users and applications, each with its own fine-grained permissions.

Clickhouse Auditing

Detailed auditing for every request and response, stored in high-performance Clickhouse storage for compliance.

Team Management

A built-in Role-Based Access Control (RBAC) system for managing your team's access to AI models.

Model Keep-alive

Periodically pings underlying models to ensure they stay loaded in memory for instant response times.

Default Model Support

Automatically redirect requests to a default virtual model if no model is specified in the API call.

Native Passthrough

Seamless support for native Ollama features including MCP tools, embedding, images, and stream mode.

OllamaGateway in Docker
Smart Proxy

Intelligent
Virtual Models

Create multiple aliases for your models with custom system prompts and parameter overrides.

Chat Models

Create virtual chat models with persistent system prompts.

Embedding Models

Dedicated management for embedding models for RAG applications.

Parameter Overrides

Override temperature, top_k, and other Ollama options per virtual model.

Smart Mapping
Alias -> Real
Validation
Auto Sync
Agent & Ecosystem Ready

Perfect for Agent Deployment

OllamaGateway has been officially tested and is fully compatible with popular ecosystem tools including Claude Code, Open-WebUI, Opencode, and Roocode. Its robust API translation makes it the ideal choice for deploying autonomous AI agents.

Recommended Models
  • qwen3.6:27b-q8_0
  • qwen3.6:35b-a3b-q4_K_M
Native Power, AI Development

Supercharge AI Development with Claude Code, Opencode & Qwen Code

OllamaGateway seamlessly connects your local Ollama capabilities to professional AI development tools. Whether you are using Claude Code, Opencode or Qwen Code, you can now leverage your own hardware for automated coding and development tasks—all verified and officially supported by our team.

  • Verified compatibility with Claude Code agent
  • Optimized for Opencode & Qwen Code ecosystems
  • Secure local inference with zero data leakage
Opencode and Qwen Code working with OllamaGateway
Universal API Compatibility

Mix Any Client with Any Backend

OllamaGateway supports two inbound API formats (OpenAI & Ollama) and two backend provider types — in any combination. Your clients and physical models don't have to speak the same language.

OpenAI Client
→ OpenAI Backend

/v1/chat/completions
→ OpenAI-compatible provider

Ollama Client
→ OpenAI Backend

/api/chat
→ OpenAI-compatible provider

OpenAI Client
→ Ollama Backend

/v1/chat/completions
→ Native Ollama

Ollama Client
→ Ollama Backend

/api/chat
→ Native Ollama

Anthropic Client
→ OpenAI Backend

/v1/messages
→ OpenAI-compatible provider

Anthropic Client
→ Ollama Backend

/v1/messages
→ Native Ollama

Parameter OpenAI → OpenAI Ollama → OpenAI OpenAI → Ollama Ollama → Ollama Anthropic → OpenAI Anthropic → Ollama
Temperature Full support Full support Full support Full support Full support Full support
Top P Full support Full support Full support Full support Full support Full support
Top K Not supported DB override discarded DB override Full support Not supported DB override
Num Ctx Not supported DB override discarded DB override Full support Not supported DB override
Thinking DB override DB override DB override Full support DB override DB override
Images / Multimodal Passthrough Converted Converted Passthrough Not supported Not supported
Repeat Penalty Not supported DB override discarded DB override Full support Not supported DB override

'DB override' means a hard assignment (=), not a null-coalescing assignment (??=). When the virtual model has a value configured in the database, the client-supplied value is always discarded.

High Availability

Load Balancing, Tiered Fallback & Mixed Providers

OllamaGateway goes beyond simple proxying. It intelligently distributes requests across multiple underlying models, automatically falls back when a provider is unavailable, and lets you combine models from different providers into a single virtual model.

Smart Load Balancing

Distribute API requests across multiple underlying Ollama instances or OpenAI-compatible providers. Configure weights and priorities for each backend to optimize throughput and resource utilization.

Tiered Fallback

Define fallback chains for your models. If the primary provider fails or times out, OllamaGateway automatically steps down to the next available backend — ensuring maximum uptime without manual intervention.

Multi-Provider Virtual Models

Blend multiple models from entirely different providers into a single virtual model. Mix Ollama backends with OpenAI-compatible services — your clients see one unified endpoint while the gateway handles all routing and translation behind the scenes.

OllamaGateway High Availability Routing
Native vs. Gateway

Why choose OllamaGateway?

Native Ollama is great for personal use, but it lacks the enterprise features required for team collaboration and production deployment. OllamaGateway fills those gaps without changing your workflow.

Feature Comparison Native Ollama OllamaGateway
Model Hosting & Inference
Multimodal
MCP
Function call
Streaming
OpenAI API Translation
Anthropic API Translation
API Authentication (Bearer)
Multiple API Keys Management
Request & Response Auditing
API Rate Limiting
Virtual Model Overrides
Multi-backend Support
Load Balancing
Tiered Fallback
Default Model Support
Model Keep-alive (Ping)
Admin Management GUI
Chat/Embedding Segregation