Routing

Every request to
the right model.

Explane evaluates cost, latency, and quality in real time — dispatching each AI call to the optimal provider. No config files. No code changes. Just results.

Live Routing Decisions < 2ms decision time
Request
Type
Tokens
→ Model
Reason
Cost
ms
4,218 req / min
< 2ms decision
99.99% uptime
0 drops
Policy Engine

Routing rules that
think in conditions.

Write routing policies once. Explane evaluates them against every request in under 2ms — matching task type, cost budget, quality requirement, or any custom metadata you pass.

Condition
Volume
Routes to
Avg cost
task_type = "summarize"
1,248/hr
claude-haiku
$0.00006
task_type = "code"
892/hr
gpt-4o-mini
$0.00024
quality 0.90
2,104/hr
gpt-4o
$0.0042
cost < $0.001
486/hr
gemini-2.0-flash
$0.00019
DEFAULT
1,766/hr
gpt-4o
$0.0038
Cost Optimization

Route by cost, latency,
and quality requirements.

A simple summarization task routes to a fast, affordable model. Complex reasoning goes to your flagship. Explane evaluates each request in real time — no rules to write, no dashboards to watch.

  • Optimize for cost, latency, or quality per request type
  • Set per-request cost ceilings and quality floors
  • Weighted multi-objective routing for mixed workloads
routing.policy.yaml
route:
  optimize: cost
  constraints:
    max_latency_ms: 500
    quality_min: 0.85
  prefer: anthropic
  fallback:
    - gemini-2.0-flash
    - gpt-4o-mini
claude-3-haiku-20240307 $0.00006 94ms
High Availability

Never drop a request.
Ever.

When a provider rate-limits or goes down, Explane detects it within seconds and instantly reroutes to your fallback chain. Users see zero errors. Engineers sleep at night.

  • Automatic failover in under 50ms — no webhook needed
  • Configurable fallback chains per route, team, or model
  • Exponential backoff with jitter on retries
provider-failover · live incident triggered 2m ago
Normal
−4:20
Degraded
−2:41
Switch
−2:07
Stable
Now
Primary OpenAI GPT-4o Degraded
Fallback 1 Anthropic Claude ← routing here Active
Fallback 2 Google Gemini 2.0 Healthy
Auto-switched 127s ago · 0 requests dropped · 0 user errors
Scale Without Limits

Distribute load across
providers and API keys.

Stay under rate limits by spreading requests across multiple API keys and provider accounts. Scale to millions of requests per hour without engineering custom solutions.

  • Multi-key load balancing with rate-limit awareness
  • Weighted distribution — give preferred models more traffic
  • Priority queue: never let background jobs starve your users
Traffic distribution · last 60s live
OpenAI GPT-4o48%
2,024 req / min
Anthropic Claude 3.533%
1,392 req / min
Google Gemini 2.019%
802 req / min
Instant Integration

One endpoint.
All your providers.

Point your existing OpenAI SDK at api.explane.ai/v1 and you're done. No new SDKs. No refactoring. Explane is fully OpenAI API compatible.

  • OpenAI-compatible — works with any existing client
  • Two-line migration: swap api_key and base_url
  • Node.js, Python, Go, REST — all covered
Before
# Your existing setup, unchanged
client = OpenAI(
    api_key="sk-proj-..."
)
After — that's it
client = OpenAI(
    api_key="ex_live_...",
    base_url="https://api.explane.ai/v1"
)
# All 50+ providers, automatic routing,
# failover, and observability — unlocked.

Ready to start routing?

Join thousands of teams shipping AI products with Explane.