The gateway for LLMs, MCP & Agents
Governed by the AD groups you already have.
# Drop-in OpenAI replacement — just change the base URL
$ curl http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer $PLLM_KEY" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hi"}]}'
{
"provider": "openai",
"model": "gpt-4o",
"latency_ms": 142,
"route": "least-latency"
}
Five primitives. One control plane.
Every capability is a first-class registry managed by pLLM. Click any pillar to dive deeper.
MCP Gateway
One gateway for every MCP server.
Register MCP servers once, govern them everywhere. Health, versioning, per-tool policy, observability.
Registry
Agents, skills & prompts as real artifacts.
Semver-versioned. Team-owned. AD-scoped. Canaries, evals, and rollback — like you'd expect for code.
Governance
Your AD groups are your AI access policy.
No new permission system. pLLM reads Entra / AD / Okta and turns membership into fine-grained rules.
Guardrails
Protect every request and response.
Pluggable filters at four stages — pre, post, during, and logging. PII masking, prompt-injection, moderation.
Routing
Your app calls one slug. pLLM picks the model.
Real-time latency-aware routing, three-layer failover, self-healing circuit breakers. No retry code in your app.
A boring, fast, single binary.
pLLM is one Go service — no sidecars, no Python runtime, no surprise dependencies. Boring infrastructure so your AI platform can stop being the interesting thing.

Request flow
Auth → policy → router → guardrails → provider. Toggle a simulation mode to see failover in action.
Running in two steps.
$ deploy → $ point your SDK → ship
# Clone and configure
git clone https://github.com/andreimerfu/pllm.git
cd pllm && cp .env.example .env
# Drop in your keys
echo "OPENAI_API_KEY=sk-..." >> .env
# Bring it up
docker compose up -d
# Smoke test
curl http://localhost:8080/v1/modelsfrom openai import OpenAI
# Same SDK. Just flip the base_url.
client = OpenAI(
api_key="sk-...",
base_url="https://pllm.company.com/v1"
)
response = client.chat.completions.create(
model="smart", # pLLM route — picks the best model
messages=[{"role": "user", "content": "Hello"}],
)Start shipping AI your security team can approve.
One gateway. One audit trail. Policies that live in the identity system you already trust.