Skip to content

Specification Details

The core specification (v1/spec.yaml) defines the standard vocabulary that all provider manifests and runtimes share.

These parameters have consistent meaning across all providers:

ParameterTypeDescription
temperaturefloatRandomness control (0.0 – 2.0)
max_tokensintegerMaximum response tokens
top_pfloatNucleus sampling threshold
streambooleanEnable streaming response
stopstring[]Stop sequences
toolsobject[]Tool/function definitions
tool_choicestring/objectTool selection mode
response_formatobjectStructured output format

Provider manifests map these standard names to provider-specific parameter names. For example, OpenAI uses max_completion_tokens while Anthropic uses max_tokens.

The specification defines unified streaming event types that runtimes emit:

EventDescription
PartialContentDeltaText content fragment
ThinkingDeltaReasoning/thinking block (extended thinking models)
ToolCallStartedFunction/tool invocation begins
PartialToolCallTool call argument streaming
ToolCallEndedTool invocation complete
StreamEndResponse stream complete
StreamErrorStream-level error
MetadataUsage statistics, model info

Provider manifests declare JSONPath-based rules that map provider-specific events to these standard types.

13 standard error types normalize provider-specific error responses:

Error ClassTypical HTTP StatusDescription
authentication401Invalid or missing API key
permission403Insufficient permissions
not_found404Model or endpoint not found
rate_limited429Rate limit exceeded
quota_exhausted402Billing/quota limit reached
invalid_request400Malformed request
context_length400Context window exceeded
content_filter400Content policy violation
overloaded503/529Server overloaded
server_error500Internal server error
timeout408/504Request timeout
networkNetwork connectivity issue
unknownUnclassified error

The spec defines standard retry strategies:

retry_policy:
strategy: "exponential_backoff"
max_retries: 3
initial_delay_ms: 1000
max_delay_ms: 30000
backoff_multiplier: 2.0
retryable_errors:
- "rate_limited"
- "overloaded"
- "server_error"
- "timeout"

Normalized finish reasons for response completion:

ReasonDescription
end_turnNatural completion
max_tokensToken limit reached
tool_useModel wants to call a tool
stop_sequenceStop sequence encountered
content_filterFiltered by content policy

Providers are categorized into API families to prevent request/response format confusion:

  • openai — OpenAI-compatible APIs (also used by Groq, Together, DeepSeek, etc.)
  • anthropic — Anthropic Messages API
  • gemini — Google Gemini API
  • custom — Provider-specific format