01Why route every call through one gateway
When every service calls LLM providers directly, you get scattered API keys, no shared cost visibility, and painful model swaps. A single gateway fixes all three: one place for secrets, one tracing and cost surface, and the freedom to change models without touching the services that use them.
02A pure-data codec per provider
The trick is a two-layer abstraction. A generic HTTP wrapper handles transport; a per-provider codec — pure data in, pure data out — encodes the request and decodes the response. Because the codec does no I/O, it is trivially unit-testable offline.
type ProviderCodec interface {
EncodeRequest(req ChatRequest) ([]byte, error)
DecodeResponse(raw []byte) (ChatResponse, error)
Endpoint(model string) string
AuthHeader(key string) (string, string)
}
// One generic caller for every provider.
func (g *Gateway) Chat(ctx context.Context, p Provider, req ChatRequest) (ChatResponse, error) {
codec := g.codecs[p]
body, err := codec.EncodeRequest(req)
if err != nil { return ChatResponse{}, err }
raw, err := g.post(ctx, codec.Endpoint(req.Model), codec.AuthHeader(g.keys[p]), body)
if err != nil { return ChatResponse{}, err }
return codec.DecodeResponse(raw)
}03Normalizing the fiddly part: tool calls
The genuinely hard part is that providers disagree on the shapes that matter most. Tool calls are the worst offender — Anthropic returns tool_use blocks, OpenAI returns tool_calls, Gemini returns functionCall. The codec collapses them into one internal type so callers never branch on provider.
// Canonical shape the rest of the platform sees.
type ToolCall struct {
ID string
Name string
Input map[string]any
}
// Anthropic: content blocks of type "tool_use".
func decodeAnthropicTools(blocks []anthropicBlock) []ToolCall {
var out []ToolCall
for _, b := range blocks {
if b.Type == "tool_use" {
out = append(out, ToolCall{ID: b.ID, Name: b.Name, Input: b.Input})
}
}
return out
}
// OpenAI: tool_calls with JSON-string arguments.
func decodeOpenAITools(calls []openAIToolCall) []ToolCall {
var out []ToolCall
for _, c := range calls {
var input map[string]any
_ = json.Unmarshal([]byte(c.Function.Arguments), &input)
out = append(out, ToolCall{ID: c.ID, Name: c.Function.Name, Input: input})
}
return out
}04Cost and observability for free
Because every call funnels through one place, logging usage and cost becomes a single middleware — no per-service work. Track input and output tokens, apply per-model (and cache-aware) rates, and attribute spend to the calling service. Now “which feature is burning tokens?” is a query, not a guess.
func (g *Gateway) record(p Provider, model, service string, u Usage) {
rate := g.rates[model] // per-model pricing, cache-aware
cost := float64(u.InputTokens)*rate.In + float64(u.OutputTokens)*rate.Out
g.metrics.Observe(service, model, cost, u)
_ = g.db.InsertUsage(service, string(p), model, u, cost)
}05The payoff
Swapping a model for a cheaper or faster one becomes a config change, not a deploy. A/B testing models is trivial. Secrets live in one service. And every token of spend is attributable. The abstraction cost — a codec per provider — is paid once and amortized across every feature that touches an LLM.