Observability
ai-lib provides comprehensive observability features for monitoring and debugging AI applications in production.
Metrics System
The metrics
module exposes traits (Metrics
, Timer
) for collecting performance and usage metrics. A noop implementation ships by default; plug in your collector by implementing these traits.
Core Metrics
- request_count: Total number of AI requests
- latency: Request duration (histogram p50/p95/p99)
- token_usage: Tokens consumed per request
- error_count: Errors by class and type
- provider_success_rate: Success rate per provider
Built-in Metrics
use ai_lib::metrics::{Metrics, Timer};
// Request counting
metrics.incr_counter("ai_requests_total", 1).await;
// Latency histogram
let timer = metrics.start_timer("ai_request_duration").await;
// ... execute request ...
timer.stop();
// Error tracking
metrics.incr_counter("ai_errors_total", 1).await;
// Token usage
metrics.incr_counter("ai_tokens_used", token_count).await;
Error Monitoring
Built-in error monitoring with configurable thresholds and alerting:
use ai_lib::error_handling::monitoring::{ErrorMonitor, ErrorThresholds};
let thresholds = ErrorThresholds {
error_rate_threshold: 0.1, // 10% error rate
consecutive_errors: 5,
time_window: Duration::from_secs(60),
};
let monitor = ErrorMonitor::new(metrics, thresholds);
monitor.record_error(&error, &context).await;
Client Integration
Create clients with custom metrics:
use ai_lib::{AiClient, Provider};
use std::sync::Arc;
let metrics = Arc::new(MyCustomMetrics::new());
let client = AiClient::new_with_metrics(Provider::OpenAI, metrics)?;
Custom Metrics Implementation
use ai_lib::metrics::{Metrics, Timer};
use std::time::Instant;
struct CustomMetrics {
// Your metrics storage
}
impl Metrics for CustomMetrics {
async fn incr_counter(&self, name: &str, value: u64) {
// Implement counter logic
}
async fn start_timer(&self, name: &str) -> Option<Box<dyn Timer + Send>> {
Some(Box::new(CustomTimer::new(name)))
}
async fn record_error(&self, name: &str, error_type: &str) {
// Record error metrics
}
async fn record_success(&self, name: &str, success: bool) {
// Record success/failure metrics
}
}
Advanced Features
Tagged Metrics
// Counters with tags
metrics.incr_counter_with_tags("ai_requests_total", 1, &[
("provider", "openai"),
("model", "gpt-4")
]).await;
// Histograms with tags
metrics.record_histogram_with_tags("ai_request_duration", 1.5, &[
("provider", "openai"),
("success", "true")
]).await;
Request Tracking
use ai_lib::metrics::MetricsExt;
// Record complete request with timing and success
metrics.record_request(
"ai_request",
timer,
success
).await;
// With additional tags
metrics.record_request_with_tags(
"ai_request",
timer,
success,
&[("provider", "openai"), ("model", "gpt-4")]
).await;
Integration Examples
Prometheus Integration
use ai_lib::metrics::{Metrics, Timer};
use prometheus::{Counter, Histogram, Registry};
struct PrometheusMetrics {
request_counter: Counter,
request_duration: Histogram,
}
impl Metrics for PrometheusMetrics {
async fn incr_counter(&self, name: &str, value: u64) {
if name == "ai_requests_total" {
self.request_counter.inc_by(value as f64);
}
}
async fn start_timer(&self, name: &str) -> Option<Box<dyn Timer + Send>> {
if name == "ai_request_duration" {
Some(Box::new(PrometheusTimer::new(self.request_duration.clone())))
} else {
None
}
}
}
OpenTelemetry Integration
use opentelemetry::{global, trace::Tracer};
async fn traced_request(client: &AiClient, request: ChatCompletionRequest) -> Result<ChatCompletionResponse, AiLibError> {
let tracer = global::tracer("ai-lib");
let span = tracer.start("ai.chat_completion");
let _guard = span.enter();
span.set_attribute("provider", client.current_provider().to_string());
span.set_attribute("model", request.model.clone());
let result = client.chat_completion(request).await;
match &result {
Ok(response) => {
span.set_attribute("success", true);
span.set_attribute("tokens_used", response.usage.total_tokens as u64);
}
Err(error) => {
span.set_attribute("success", false);
span.set_attribute("error", error.to_string());
}
}
result
}
Best Practices
- Consistent Naming: Use consistent metric names across your application
- Tag Usage: Use tags for dimensional analysis
- Performance: Implement metrics asynchronously to avoid blocking
- Error Handling: Always handle metric collection errors gracefully
- Resource Management: Use appropriate data structures for metric storage
Next Steps
- Learn about Reliability Features for production deployments
- Check Advanced Examples for practical patterns
- Explore Extension Guide for custom implementations