Model Registry
Model Registry
Section titled “Model Registry”The model registry (v1/models/*.yaml) maps model identifiers to provider configurations, recording capabilities, context windows, and pricing for each model.
Model File Structure
Section titled “Model File Structure”Models are organized by family (GPT, Claude, Gemini, etc.):
v1/models/├── gpt.yaml # OpenAI GPT models├── claude.yaml # Anthropic Claude models├── gemini.yaml # Google Gemini models├── deepseek.yaml # DeepSeek models├── qwen.yaml # Alibaba Qwen models├── mistral.yaml # Mistral models├── llama.yaml # Meta Llama models└── ... # 28+ model filesModel Definition
Section titled “Model Definition”Each model entry includes:
models: gpt-4o: provider: openai model_id: "gpt-4o" context_window: 128000 max_output_tokens: 16384 capabilities: - chat - streaming - tools - vision - json_mode pricing: input_per_token: 0.0000025 output_per_token: 0.00001 release_date: "2024-05-13"Model Identifiers
Section titled “Model Identifiers”Runtimes use a provider/model format to identify models:
anthropic/claude-3-5-sonnetopenai/gpt-4odeepseek/deepseek-chatgemini/gemini-2.0-flashqwen/qwen-plusThe runtime splits this into:
- Provider ID (
anthropic) → loads provider manifest - Model name (
claude-3-5-sonnet) → looks up in model registry
Capabilities
Section titled “Capabilities”Standard capability flags:
| Capability | Description |
|---|---|
chat | Basic chat completions |
streaming | Streaming responses |
tools | Function/tool calling |
vision | Image understanding |
audio | Audio input/output |
reasoning | Extended thinking (CoT) |
agentic | Multi-step agent workflows |
json_mode | Structured JSON output |
Pricing
Section titled “Pricing”Per-token pricing enables cost estimation in runtimes:
pricing: input_per_token: 0.000003 # $3 per 1M input tokens output_per_token: 0.000015 # $15 per 1M output tokens cached_input_per_token: 0.0000003 # Cached prompt discountBoth Rust and Python runtimes use this data for CostEstimate calculations.
Verification
Section titled “Verification”Models can include verification status for production deployments:
verification: status: "verified" last_checked: "2025-01-15" verified_capabilities: - chat - streaming - toolsNext Steps
Section titled “Next Steps”- Contributing Providers — Add new providers and models
- Quick Start — Start using models with the runtimes