Skip to content

Ecosystem Architecture

The AI-Lib ecosystem is built on a clean three-layer architecture where each layer has a distinct responsibility. Current versions: AI-Protocol v0.8.3, ai-lib-rust v0.9.3, ai-lib-python v0.8.3, ai-lib-ts v0.5.3, ai-lib-go v0.0.1, ai-protocol-mock v0.1.11.

The specification layer. YAML manifests define:

  • Provider manifests (v1/providers/ + v2/providers/) — Endpoint, auth, parameter mappings, streaming decoder, error classification, and multimodal capability contracts for P0 providers (OpenAI/Anthropic/Google/DeepSeek/Qwen/Doubao)
  • Model registry (models/*.yaml) — Model instances with context windows, capabilities, pricing
  • Core specification (spec.yaml, v2-alpha/spec.yaml) — Standard parameters, events, error types, retry policies
  • V2 Schemas (schemas/v2/) — JSON Schema for provider, MCP, Computer Use, multimodal (including video generation output contract), context policy, and ProviderContract
  • V2 ProviderContract — API style declaration, capability matrix, action mapping, degradation strategy

The protocol layer is language-agnostic. It’s consumed by any runtime in any language.

2. Runtime Layer — Rust, Python, TypeScript, and Go SDKs

Section titled “2. Runtime Layer — Rust, Python, TypeScript, and Go SDKs”

The execution layer. Runtimes implement:

  • Protocol loading — Read and validate manifests from local files, env vars, or GitHub
  • Request compilation — Convert unified requests to provider-specific HTTP calls
  • Streaming pipeline — Decode, select, accumulate, and map provider responses to unified events
  • Resilience — Circuit breaker, rate limiting, retry, fallback
  • Extensions — Embeddings, caching, batching, plugins

All runtimes share the same protocol-driven architecture with cross-runtime parity:

ConceptRustPythonTypeScriptGo
ClientAiClientAiClientAiClientAiClient
BuilderAiClientBuilderAiClientBuilderAiClientBuilderAiClientBuilder
RequestChatRequestBuilderChatRequestBuilderChatBuilderChatRequestBuilder
EventsStreamingEvent enumStreamingEvent classunified streaming eventsStreamingEvent struct
Transportreqwest (tokio)httpx (asyncio)fetch (Node.js)net/http
TypesRust structsPydantic v2 modelsTypeScript interfacesGo structs
V2 DriverBox<dyn ProviderDriver>ProviderDriver ABCmanifest-driven parser/loaderProviderDriver interface
RegistryCapabilityRegistry (feature-gate)CapabilityRegistry (pip extras)capability modulesCapabilityRegistry
MCP BridgeMcpToolBridgeMcpToolBridgeMcpToolBridgeTo be implemented
MultimodalMultimodalCapabilitiesMultimodalCapabilitiesSTT/TTS/Rerank + multimodal typesMultimodalCapabilities

Applications use the unified runtime API. A single AiClient interface works across all providers:

Your App → AiClient → Protocol Manifest → Provider API

Switch providers by changing one model identifier. No code changes.

Here’s what happens when you call client.chat().user("Hello").stream():

  1. AiClient receives the request
  2. ProtocolLoader provides the provider manifest
  3. Request compiler maps standard params to provider-specific JSON
  4. Transport sends the HTTP request with correct auth/headers
  5. Pipeline processes the streaming response:
    • Decoder converts bytes → JSON frames (SSE or NDJSON)
    • Selector filters relevant frames using JSONPath
    • Accumulator assembles partial tool calls
    • EventMapper converts frames → unified StreamingEvent
  6. Application iterates over unified events

All runtimes search for protocol manifests in this order:

  1. Custom path — Explicitly set in builder
  2. Environment variableAI_PROTOCOL_DIR or AI_PROTOCOL_PATH
  3. Relative pathsai-protocol/ or ../ai-protocol/ from working directory
  4. GitHub fallback — Downloads from hiddenpath/ai-protocol repository

This means you can start developing without any local setup — the runtimes will fetch manifests from GitHub automatically.

The V2 protocol baseline (upgraded through v0.8.2 governance closure) delivers a complete three-layer pyramid with extended execution governance:

  • L1 Core Protocol — Message format, standard error codes (E1001–E9999), version declaration
  • L2 Capability Extensions — Streaming, vision, tools, MCP, Computer Use, multimodal — each controlled by feature flags
  • L3 Environment Profile — API keys, endpoints, retry policies — environment-specific configuration

V2 manifests are organized in three rings:

  • Ring 1 Core Skeleton (required) — Minimal fields: endpoint, auth, parameter mappings, model list
  • Ring 2 Capability Mapping (conditional) — Streaming config, tool mapping, MCP integration, Computer Use actions
  • Ring 3 Advanced Extensions (optional) — Custom headers, rate limit headers, context management policies

The runtime layer implements a ProviderDriver abstraction that normalizes three distinct API styles:

API StyleProviderRequest FormatStreaming Format
OpenAiCompatibleOpenAI, DeepSeek, Moonshotmessages arraySSE data: {...}
AnthropicMessagesAnthropicmessages + system separateSSE with typed events
GeminiGenerateGoogle Geminicontents arraySSE generateContent

The runtime automatically selects the correct driver based on the manifest’s api_style declaration.

AI-Protocol includes a built-in MCP (Model Context Protocol) tool bridge. Rather than operating at a separate layer, MCP tools are first-class citizens:

  • McpToolBridge converts MCP server tools to AI-Protocol ToolDefinition format
  • Tools are namespaced as mcp__{server}__{tool_name} to prevent collisions
  • Allow/deny filters control which MCP tools are exposed
  • Provider-specific MCP configuration (tool_parameter vs sdk_config) is handled automatically
  • Supports stdio, SSE, and streamable HTTP transports

A unified Computer Use capability normalizes GUI automation across providers:

  • ComputerAction enum covers all action types: screenshot, mouse click, keyboard type, browser navigate, file read/write
  • SafetyPolicy enforces mandatory safety constraints loaded from the manifest:
    • Confirmation required for destructive actions
    • Domain allowlist for browser navigation
    • Sensitive path protection
    • Maximum actions per turn limit
    • Sandbox mode support
  • Supports both screen_based (Anthropic, OpenAI) and tool_based (Google) implementation styles

V2 extends multimodal support beyond vision to include audio, video, and omni-mode:

ModalityInputOutputProviders
TextAll
Image✅ (select)OpenAI, Anthropic, Gemini, Qwen
Audio✅ (select)OpenAI (STT/TTS), Gemini, Qwen (omni)
VideoGemini
RerankCohere, Jina

Latest expansion notes:

  • Added V2 provider manifests for Qwen and Doubao in the P0 release train.
  • Added V2 multimodal schema support for multimodal.output.video to standardize video generation declarations.
  • ai-protocol-mock now includes Gemini generateContent and streamGenerateContent routes for cross-runtime verification.
  • ai-protocol-mock now also supports video generation async-polling (POST /v1/video/generations + GET /v1/video/generations/{job_id}) for transport lifecycle testing.
  • ai-protocol now ships full execution governance gate scripts:
    • npm run drift:check
    • npm run gate:manifest-consumption
    • npm run gate:compliance-matrix
    • npm run gate:fullchain
    • npm run release:gate
  • Governance scripts support staged adoption with --report-only mode for advisory rollout.
  • ai-protocol-mock video async lifecycle supports deterministic terminal states:
    • succeeded, failed, cancelled
    • control via X-Mock-Video-Terminal or terminal_state

The MultimodalCapabilities module validates content modalities against provider declarations before sending requests.

The ai-protocol-cli tool provides developer utilities:

Terminal window
ai-protocol-cli validate <path> # Validate manifests against schemas
ai-protocol-cli info <provider> # Show provider capabilities
ai-protocol-cli list # List all providers (37 total)
ai-protocol-cli check-compat <manifest> # Check runtime compatibility

The compliance suite is executed across Rust, Python, and TypeScript, covering protocol loading, error classification, retry decisions, message building, stream decoding, event mapping, and tool accumulation for fullchain consistency.