Enterprise-Grade Performance

pLLM
High-Performance
LLM Gateway

Drop-in OpenAI replacement built in Go. Handle thousands of concurrent requests with adaptive routing, multi-provider support, and enterprise-grade reliability.

High Performance
Cost Efficient
Low Latency
100% OpenAI Compatible
ClientspLLM GatewayRouterAuthCacheIntelligent Load BalancerRound Robin • Least Busy • WeightedAI Providers
OpenAI
Claude
Azure
Bedrock
Vertex

Enterprise-Grade Features

Built from the ground up for production workloads with performance, reliability, and developer experience in mind.

Zero Migration
🔌

100% OpenAI Compatible

Drop-in replacement for OpenAI API. No code changes needed - just update your base URL and you're ready to go.

7+ Providers
🌐

Multi-Provider Support

Support for OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Vertex AI, Llama, and Cohere with unified interface.

Zero Downtime
🎯

Adaptive Routing

Intelligent request routing with automatic failover, circuit breakers, and health-based load balancing.

Native Go
🚀

High Performance

Built in Go for maximum performance. Handle thousands of concurrent requests with minimal latency overhead.

Production Ready
🛡️

Enterprise Security

JWT authentication, RBAC, audit logging, and comprehensive monitoring with Prometheus metrics.

Save Money
💰

Cost Optimization

Budget management, intelligent caching, and multi-key load balancing to minimize API costs.

Technical Excellence

Deep technical capabilities designed for mission-critical production environments.

Performance

  • Sub-millisecond routing overhead
  • Thousands of concurrent connections
  • Native compilation with Go
  • Efficient memory management

Reliability

  • Circuit breaker protection
  • Automatic health monitoring
  • Graceful degradation
  • Zero-downtime deployments

Scalability

  • Horizontal scaling ready
  • Kubernetes native
  • Redis-backed caching
  • Distributed rate limiting

Observability

  • Prometheus metrics
  • Grafana dashboards
  • Distributed tracing
  • Comprehensive logging

Performance Advantage

See how pLLM compares to typical interpreted gateway solutions.

Metric pLLM (Go) Typical Gateway Advantage
Concurrent Connections Thousands Limited 🚀 Superior
Memory Usage 50-80MB 150-300MB+ 💾 3-6x Less
Startup Time <100ms 2-5s ⚡ 20-50x Faster
CPU Efficiency All cores GIL limited 🔥 True Parallel

Enterprise Authentication

Seamless integration with your existing identity infrastructure through OAuth/OIDC support powered by Dex.

Zero Configuration

Connect to your existing identity providers without complex setup. Dex handles the OAuth/OIDC protocols while pLLM manages authorization.

Free with all plans

Enterprise Security

Industry-standard OAuth 2.0 and OpenID Connect protocols with support for SAML, LDAP, and multi-factor authentication.

Single Sign-On (SSO)
Multi-Factor Auth
Role-Based Access
Audit Logging

Supported Identity Providers

Connect with popular identity providers and enterprise systems

Google
Google Workspace & Gmail
Microsoft
Azure AD & Office 365
GitHub
GitHub Organizations
Active Directory
Active Directory
Windows Active Directory
LDAP
LDAP
LDAP Directory Services
SAML
SAML
SAML 2.0 Identity Providers
AWS
AWS IAM & Cognito
Okta
Okta Identity Cloud
Auth0
Auth0 Identity Platform
9+
Providers
SSO
Ready
Free
Included

And many more through standard protocols:

SAML 2.0 OAuth 2.0 OpenID Connect LDAP/AD

Simple Configuration

Get started with OAuth/OIDC in minutes with a simple YAML configuration

auth:
  dex:
    issuer: https://dex.yourcompany.com
    connectors:
      - type: oidc
        name: Google
        config:
          issuer: https://accounts.google.com
          clientID: your-google-client-id
          clientSecret: your-google-client-secret
      - type: ldap
        name: Corporate Directory
        config:
          host: ldap.yourcompany.com:636
          insecureNoSSL: false
          bindDN: cn=admin,dc=company,dc=com

System Architecture

Enterprise-grade architecture designed for high availability, scalability, and performance

Client Layer

Applications & Services

🌐

Web Applications

React, Vue, Angular

📱

Mobile Apps

iOS, Android, React Native

⚙️

Backend Services

Node.js, Python, Go

🤖

AI Platforms

LangChain, AutoGPT

pLLM Gateway

Intelligent Routing Engine

Core Gateway

High-performance Go runtime

Router

Chi HTTP Router

Auth

JWT & RBAC

Cache

Redis Layer

Monitor

Metrics & Logs

Intelligent Load Balancer
Round Robin
Least Busy
Weighted

Provider Layer

LLM Service Providers

OpenAI

healthy

99.9%

uptime

Anthropic

healthy

99.9%

uptime

Azure OpenAI

degraded

85.2%

uptime

AWS Bedrock

healthy

99.9%

uptime

Google Vertex

healthy

99.9%

uptime

Llama

failed

0%

uptime

Data Flow & Features

Real-time monitoring and intelligent routing

Circuit Breaker

Automatic failover protection

💓

Health Checks

Continuous monitoring

🚦

Rate Limiting

Traffic control & quotas

📊

Analytics

Performance insights

Live Performance Metrics

1000+

Requests/sec

<1ms

Latency

99.9%

Uptime

65MB

Memory

Intelligent Load Balancing

Choose from 6 different routing strategies optimized for different use cases and workload patterns.

Round Robin

Even distribution across all providers

Best for: Balanced load scenarios

Least Busy

Routes to least loaded provider

Best for: Variable workloads

Weighted

Custom weight distribution

Best for: Tiered provider setups

Priority

Prefers high-priority providers

Best for: Cost optimization

Latency-Based

Routes to fastest responding provider

Best for: Performance critical

Usage-Based

Respects rate limits and quotas

Best for: Token management

Production-Ready Stack

Built with battle-tested technologies and modern best practices for enterprise reliability.

Chi Router

Lightning-fast HTTP routing and middleware

PostgreSQL

Reliable data persistence with GORM ORM

Redis

High-speed caching and rate limiting

Prometheus

Enterprise monitoring and metrics

Adaptive Request Flow

Interactive visualization of intelligent routing with automatic failover and circuit breaker protection.

Simulation Mode
Status Legend
Healthy
Degraded
Failed

Product Roadmap

Exciting features coming soon to make pLLM even more powerful and enterprise-ready.

Key Rotation & Secret Management

In ProgressQ1 2025

Automated key rotation and integration with external secret managers for enhanced security.

Key Features:

Automated API key rotation
AWS Secrets Manager integration
Azure Key Vault support
HashiCorp Vault connector
Secret versioning & rollback

Advanced Guardrails

PlannedQ1 2025

Content filtering, rate limiting, and safety mechanisms to ensure responsible AI usage.

Key Features:

Content filtering & moderation
PII detection & redaction
Toxicity & bias detection
Custom prompt validation
Usage policy enforcement

Enhanced Audit & Logging

PlannedQ2 2025

Comprehensive audit trails with retention policies and compliance reporting.

Key Features:

Detailed audit logs
Log retention policies
Compliance reporting
Real-time log streaming
Custom log formats

Shape Our Roadmap

Have a feature request or want to influence our development priorities? We'd love to hear from you.

Enterprise Support

While pLLM is open source and free, we offer professional support and custom solutions for enterprises with mission-critical requirements.

Community

Perfect for developers and small teams getting started with pLLM.

Free

What's included:

  • GitHub Issues & Discussions
  • Community Discord support
  • Documentation and guides
  • Best effort response time
  • Open source under MIT license

Limitations:

  • No SLA guarantees
  • Community-driven support
  • No priority bug fixes
Most Popular

Professional

For growing businesses that need reliable support and faster issue resolution.

CustomContact for pricing

What's included:

  • Priority email support
  • Guaranteed 24-hour response time
  • Deployment assistance
  • Configuration review
  • Priority bug fixes
  • Access to beta features

Limitations:

  • Business hours support only
  • Email-based communication

Enterprise

Comprehensive support for mission-critical deployments with custom requirements.

CustomContact for pricing

What's included:

  • Dedicated support engineer
  • Custom SLA (down to 2-hour response)
  • Phone & video call support
  • Custom feature development
  • Architecture consulting
  • On-site deployment assistance
  • Training and workshops
  • Priority feature requests

Ready for Enterprise Deployment?

Contact our team for custom integrations, dedicated support, and enterprise-grade deployment assistance.

Professional Consultation

Architecture review, deployment planning, and best practices guidance from our core team.

Custom Development

Tailored features, custom integrations, and specialized deployment configurations for your use case.

Dedicated Support

Priority support channels, SLA guarantees, and direct access to our engineering team.

Schedule a Consultation

Book a 30-minute call to discuss your requirements and explore how pLLM can fit your enterprise needs.

Book Consultation

Enterprise Inquiry

Submit a detailed inquiry for custom features, deployment assistance, or partnership opportunities.

Submit Inquiry Form

Frequently Asked Questions

Everything you need to know about pLLM, from technical details to enterprise support options.

Yes, pLLM is completely free and open source under the MIT license. This means you can use it in commercial applications, modify the code, and deploy it anywhere without licensing fees. The only costs you'll incur are your infrastructure expenses (servers, cloud resources) and API costs from the LLM providers themselves (OpenAI, Anthropic, etc.).

pLLM is built in Go for superior performance and lower resource usage compared to Python-based solutions. Key advantages include: sub-millisecond routing overhead, native compilation for better performance, 3-6x lower memory usage, 20-50x faster startup times, and true parallel processing without GIL limitations. Plus, it's 100% OpenAI API compatible, requiring zero code changes to integrate.

pLLM supports all major LLM providers including OpenAI (GPT-3.5, GPT-4, GPT-4 Turbo), Anthropic Claude, Azure OpenAI, AWS Bedrock, Google Vertex AI, Groq, and Cohere. The unified API interface means you can switch between providers or use multiple providers simultaneously with intelligent routing and automatic failover.

Yes, we provide comprehensive enterprise support including dedicated support engineers, custom SLA agreements (down to 2-hour response times), priority bug fixes, custom feature development, architecture consulting, on-site deployment assistance, and training workshops. Enterprise support is available through custom pricing based on your specific requirements.

Absolutely. pLLM is designed for flexible deployment scenarios including on-premise installations, air-gapped environments, and hybrid cloud setups. We provide Kubernetes manifests, Docker containers, and can assist with custom deployment configurations. The gateway can run entirely within your infrastructure while connecting to external LLM APIs or internal models.

pLLM includes comprehensive security features: JWT-based authentication, Role-Based Access Control (RBAC), audit logging for compliance, OAuth/OIDC integration through Dex (supporting Google, Microsoft, LDAP, Active Directory), API key management, rate limiting, and request monitoring. All communications use TLS encryption, and we support enterprise identity providers.

pLLM is optimized for high-performance scenarios: handles thousands of concurrent connections, sub-millisecond routing overhead, efficient memory usage (50-80MB typical), fast startup times (<100ms), and intelligent caching to reduce API costs. The Go-based architecture provides significant performance advantages over interpreted language solutions.

Yes! We have comprehensive documentation, GitHub discussions for community support, Discord server for real-time help, and regular updates on our roadmap. The open-source community actively contributes features and bug fixes. For enterprise customers, we provide dedicated documentation, training materials, and direct access to our engineering team.

Performance Benchmarks

Real-world performance data showing why Go-based pLLM outperforms interpreted gateway solutions.

🚀
4.8x faster

Requests/sec

pLLM 12,000+
Typical 2,500
18.7x faster

P99 Latency

pLLM 0.8ms
Typical 15ms
💾
4-8x faster

Memory Efficiency

pLLM 50-80MB
Typical 200-400MB
🏃
20-50x faster

Cold Start

pLLM <100ms
Typical 2-5s

Performance Comparison

pLLM vs Typical Interpreted Gateway

Concurrent Connections

900% more
pLLM
0
Typical
0
Higher is better

Memory Usage

71% less
pLLM
0MB
Typical
0MB
Lower is better

Startup Time

97% less
pLLM
0s
Typical
0s
Lower is better

Response Time

76% less
pLLM
0 ms overhead
Typical
0 ms overhead
Lower is better
pLLM (Go)
Typical Gateway

Performance comparison between pLLM and typical gateways

Load Testing Results

Stress tested with 10,000 concurrent users making chat completion requests.

Test Configuration

Concurrent Users: 10,000
Request Type: Chat Completions
Test Duration: 10 minutes
Infrastructure: Single 4-core instance
Memory Limit: 1GB

Results

Success Rate
99.97%

Zero failed requests under normal conditions

Average Response Time
1.2ms

Gateway overhead only, excluding LLM processing

Memory Usage
78MB

Peak memory during 10K concurrent connections

Enterprise Scalability

Built-in scalability features that make pLLM ideal for high-volume production workloads.

True Parallelism

No GIL limitations - utilize all CPU cores effectively

Handle thousands of concurrent requests on a single instance

Memory Efficient

Native compilation with optimized memory management

3-6x less memory usage compared to interpreted alternatives

Instant Scaling

Sub-100ms startup enables aggressive auto-scaling

Scale from 0 to production load in milliseconds

Network Optimized

Efficient connection pooling and keep-alive management

Minimal network overhead with connection reuse
⚠️

Enterprise Performance Scaling

For massive performance and ultra-low latency, the bottleneck is often the LLM providers themselves, not the gateway. To achieve true enterprise scale:

  • Multiple LLM Deployments: Deploy several instances of the same model (e.g., 5-10 GPT-4 Azure OpenAI deployments)
  • Multi-Provider Redundancy: Use multiple AWS Bedrock accounts, Azure regions, or provider accounts
  • Geographic Distribution: Deploy models across regions for latency optimization

Why This Matters: A single LLM deployment typically handles 60-100 RPM. For 10,000+ concurrent users, you need multiple deployments of the same model to prevent provider-side bottlenecks. pLLM's adaptive routing automatically distributes load across all deployments.

Get Started in Minutes

Choose your deployment method and get pLLM running in your environment quickly.

Deployment Options

Kubernetes with Helm

Production-ready deployment with auto-scaling

Production
High Availability
Auto-scaling
Monitoring
Production Ready
bash
# Add the Helm repository
helm repo add pllm https://andreimerfu.github.io/pllm
helm repo update

# Install with your configuration
helm install pllm pllm/pllm \
  --set pllm.secrets.jwtSecret="your-jwt-secret" \
  --set pllm.secrets.masterKey="sk-master-key" \
  --set pllm.secrets.openaiApiKey="sk-your-openai-key"

# Check status
kubectl get pods -l app.kubernetes.io/name=pllm

Docker Compose

Perfect for development and testing

Development
Quick Setup
Local Development
Easy Testing
Full Stack
bash
# Clone and setup
git clone https://github.com/andreimerfu/pllm.git
cd pllm

# Configure environment
cp .env.example .env
echo "OPENAI_API_KEY=sk-your-key-here" >> .env

# Launch pLLM
docker compose up -d

# Test it works
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Hello!"}]}'

Binary Installation

Lightweight deployment for simple setups

Minimal
No Dependencies
Single Binary
Fast Startup
Cross Platform
bash
# Download latest release
wget https://github.com/andreimerfu/pllm/releases/latest/download/pllm-linux-amd64

# Make executable
chmod +x pllm-linux-amd64

# Set environment variables
export OPENAI_API_KEY=sk-your-key-here
export JWT_SECRET=your-jwt-secret
export MASTER_KEY=sk-master-key

# Run pLLM
./pllm-linux-amd64 server

Drop-in Integration

pLLM is 100% OpenAI compatible. Just change your base URL and you're ready to go.

Python

python
from openai import OpenAI

# Just change the base_url - that's it!
client = OpenAI(
    api_key="your-api-key",
    base_url="http://localhost:8080/v1"  # ← Point to pLLM
)

# Use exactly like OpenAI
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello!"}]
)

Node.js

javascript
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: 'your-api-key',
  baseURL: 'http://localhost:8080/v1'  // ← Point to pLLM
});

const completion = await openai.chat.completions.create({
  model: "gpt-3.5-turbo",
  messages: [{role: "user", content: "Hello!"}]
});

cURL

bash
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'