Intelligent Routing (Stable)

Select models via ModelArray with load‑balancing strategy, minimal health checks, and routing metrics.

Basic usage (routing_mvp)

use ai_lib::{AiClientBuilder, ChatCompletionRequest, Message, Provider, Role};
use ai_lib::types::common::Content;
use ai_lib::provider::models::{ModelArray, ModelEndpoint, LoadBalancingStrategy};

let mut array = ModelArray::new("prod").with_strategy(LoadBalancingStrategy::RoundRobin);
array.add_endpoint(ModelEndpoint {
    name: "groq-70b".to_string(),
    model_name: "llama-3.3-70b-versatile".to_string(),
    url: "https://api.groq.com".to_string(),
    weight: 1.0,
    healthy: true,
    connection_count: 0,
});

let client = AiClientBuilder::new(Provider::Groq)
    .with_routing_array(array)
    .build()?;

// Use sentinel model "__route__" to trigger routing
let req = ChatCompletionRequest::new(
    "__route__".to_string(),
    vec![Message { role: Role::User, content: Content::new_text("Say hi"), function_call: None }]
);
let resp = client.chat_completion(req).await?;
println!("selected model: {}", resp.model);

Health checks & metrics

  • Minimal health check: probe {base_url} (or OpenAI‑compatible {base_url}/models) before selection.
  • Routing metrics (when routing_mvp enabled):
    • routing_mvp.request
    • routing_mvp.selected
    • routing_mvp.health_fail
    • routing_mvp.fallback_default
    • routing_mvp.no_endpoint
    • routing_mvp.missing_array

Notes

  • This is an MVP: round‑robin/weighted/minimal health checks; adaptive feedback loops can evolve in PRO.
Build: 3de64ed · 2025-09-09T12:50:59.664Z · v0.21