Building a Provider-Agnostic LLM Gateway — Learn

01Why route every call through one gateway

When every service calls LLM providers directly, you get scattered API keys, no shared cost visibility, and painful model swaps. A single gateway fixes all three: one place for secrets, one tracing and cost surface, and the freedom to change models without touching the services that use them.

02A pure-data codec per provider

The trick is a two-layer abstraction. A generic HTTP wrapper handles transport; a per-provider codec — pure data in, pure data out — encodes the request and decodes the response. Because the codec does no I/O, it is trivially unit-testable offline.

type ProviderCodec interface {
    EncodeRequest(req ChatRequest) ([]byte, error)
    DecodeResponse(raw []byte) (ChatResponse, error)
    Endpoint(model string) string
    AuthHeader(key string) (string, string)
}

// One generic caller for every provider.
func (g *Gateway) Chat(ctx context.Context, p Provider, req ChatRequest) (ChatResponse, error) {
    codec := g.codecs[p]
    body, err := codec.EncodeRequest(req)
    if err != nil { return ChatResponse{}, err }

    raw, err := g.post(ctx, codec.Endpoint(req.Model), codec.AuthHeader(g.keys[p]), body)
    if err != nil { return ChatResponse{}, err }

    return codec.DecodeResponse(raw)
}

03Normalizing the fiddly part: tool calls

The genuinely hard part is that providers disagree on the shapes that matter most. Tool calls are the worst offender — Anthropic returns tool_use blocks, OpenAI returns tool_calls, Gemini returns functionCall. The codec collapses them into one internal type so callers never branch on provider.

// Canonical shape the rest of the platform sees.
type ToolCall struct {
    ID    string
    Name  string
    Input map[string]any
}

// Anthropic: content blocks of type "tool_use".
func decodeAnthropicTools(blocks []anthropicBlock) []ToolCall {
    var out []ToolCall
    for _, b := range blocks {
        if b.Type == "tool_use" {
            out = append(out, ToolCall{ID: b.ID, Name: b.Name, Input: b.Input})
        }
    }
    return out
}

// OpenAI: tool_calls with JSON-string arguments.
func decodeOpenAITools(calls []openAIToolCall) []ToolCall {
    var out []ToolCall
    for _, c := range calls {
        var input map[string]any
        _ = json.Unmarshal([]byte(c.Function.Arguments), &input)
        out = append(out, ToolCall{ID: c.ID, Name: c.Function.Name, Input: input})
    }
    return out
}

04Cost and observability for free

Because every call funnels through one place, logging usage and cost becomes a single middleware — no per-service work. Track input and output tokens, apply per-model (and cache-aware) rates, and attribute spend to the calling service. Now “which feature is burning tokens?” is a query, not a guess.

func (g *Gateway) record(p Provider, model, service string, u Usage) {
    rate := g.rates[model] // per-model pricing, cache-aware
    cost := float64(u.InputTokens)*rate.In + float64(u.OutputTokens)*rate.Out
    g.metrics.Observe(service, model, cost, u)
    _ = g.db.InsertUsage(service, string(p), model, u, cost)
}

05The payoff

Swapping a model for a cheaper or faster one becomes a config change, not a deploy. A/B testing models is trivial. Secrets live in one service. And every token of spend is attributable. The abstraction cost — a codec per provider — is paid once and amortized across every feature that touches an LLM.