Back to blog

How Model Routing Works

Inside GreatRouter's intent detection, model selection, and cost optimization engine.

Intent Detection

Every request to GreatRouter starts with intent detection. The router reads your prompt and classifies the task — chat, image generation, transcription, embeddings, translation, code generation, and more. This classification happens in milliseconds and determines which category of models the request should be routed to. A photorealistic image prompt goes to image models. A code review request goes to reasoning models with function-calling capabilities.

Model Selection

Once intent is classified, GreatRouter evaluates the available models in that category. It considers capability tags (vision, reasoning, web search, streaming), latency profiles, cost per token or per image, and current provider availability. The result is a recommendation of the best model for your specific request — not just the most popular one, but the one that matches your quality expectations and budget.

Cost Optimization

GreatRouter supports two cost optimization modes. In price-optimized mode, the router automatically selects the cheapest capable model for each request. With budget_dollars caps, you set a maximum cost per request and the router ensures it stays within that ceiling. If the preferred model exceeds the cap, it falls back to a more affordable alternative automatically.

Observability and Fallback

Every routed request returns metadata: the model used, latency, cost, and capability tags. You can inspect these in the dashboard or stream them to your observability stack. When a provider goes down, GreatRouter automatically fails over to the next best model — no configuration required. The result is a single API that's more reliable than any individual provider.

Experience the ecosystem

Try GreatRouter, GreatStudios, and GreatChat — all interconnected by design.