Ecosystem Architecture

The AI-Lib ecosystem is built on a clean three-layer architecture where each layer has a distinct responsibility. Current versions: AI-Protocol v0.8.3, ai-lib-rust v0.9.3, ai-lib-python v0.8.3, ai-lib-ts v0.5.3, ai-lib-go v0.0.1, ai-protocol-mock v0.1.11.

The Three Layers

1. Protocol Layer — AI-Protocol

The specification layer. YAML manifests define:

Provider manifests (v1/providers/ + v2/providers/) — Endpoint, auth, parameter mappings, streaming decoder, error classification, and multimodal capability contracts for P0 providers (OpenAI/Anthropic/Google/DeepSeek/Qwen/Doubao)
Model registry (models/*.yaml) — Model instances with context windows, capabilities, pricing
Core specification (spec.yaml, v2-alpha/spec.yaml) — Standard parameters, events, error types, retry policies
V2 Schemas (schemas/v2/) — JSON Schema for provider, MCP, Computer Use, multimodal (including video generation output contract), context policy, and ProviderContract
V2 ProviderContract — API style declaration, capability matrix, action mapping, degradation strategy

The protocol layer is language-agnostic. It’s consumed by any runtime in any language.

2. Runtime Layer — Rust, Python, TypeScript, and Go SDKs

The execution layer. Runtimes implement:

Protocol loading — Read and validate manifests from local files, env vars, or GitHub
Request compilation — Convert unified requests to provider-specific HTTP calls
Streaming pipeline — Decode, select, accumulate, and map provider responses to unified events
Resilience — Circuit breaker, rate limiting, retry, fallback
Extensions — Embeddings, caching, batching, plugins

All runtimes share the same protocol-driven architecture with cross-runtime parity:

Concept	Rust	Python	TypeScript	Go
Client	`AiClient`	`AiClient`	`AiClient`	`AiClient`
Builder	`AiClientBuilder`	`AiClientBuilder`	`AiClientBuilder`	`AiClientBuilder`
Request	`ChatRequestBuilder`	`ChatRequestBuilder`	`ChatBuilder`	`ChatRequestBuilder`
Events	`StreamingEvent` enum	`StreamingEvent` class	unified streaming events	`StreamingEvent` struct
Transport	reqwest (tokio)	httpx (asyncio)	fetch (Node.js)	net/http
Types	Rust structs	Pydantic v2 models	TypeScript interfaces	Go structs
V2 Driver	`Box<dyn ProviderDriver>`	`ProviderDriver` ABC	manifest-driven parser/loader	ProviderDriver interface
Registry	`CapabilityRegistry` (feature-gate)	`CapabilityRegistry` (pip extras)	capability modules	`CapabilityRegistry`
MCP Bridge	`McpToolBridge`	`McpToolBridge`	`McpToolBridge`	To be implemented
Multimodal	`MultimodalCapabilities`	`MultimodalCapabilities`	STT/TTS/Rerank + multimodal types	`MultimodalCapabilities`

3. Application Layer — Your Code

Applications use the unified runtime API. A single AiClient interface works across all providers:

Your App → AiClient → Protocol Manifest → Provider API

Switch providers by changing one model identifier. No code changes.

Data Flow

Here’s what happens when you call client.chat().user("Hello").stream():

AiClient receives the request
ProtocolLoader provides the provider manifest
Request compiler maps standard params to provider-specific JSON
Transport sends the HTTP request with correct auth/headers
Pipeline processes the streaming response:
- Decoder converts bytes → JSON frames (SSE or NDJSON)
- Selector filters relevant frames using JSONPath
- Accumulator assembles partial tool calls
- EventMapper converts frames → unified StreamingEvent
Application iterates over unified events

Protocol Loading

All runtimes search for protocol manifests in this order:

Custom path — Explicitly set in builder
Environment variable — AI_PROTOCOL_DIR or AI_PROTOCOL_PATH
Relative paths — ai-protocol/ or ../ai-protocol/ from working directory
GitHub fallback — Downloads from hiddenpath/ai-protocol repository

This means you can start developing without any local setup — the runtimes will fetch manifests from GitHub automatically.

V2 Protocol Architecture

The V2 protocol baseline (upgraded through v0.8.2 governance closure) delivers a complete three-layer pyramid with extended execution governance:

Three-Layer Pyramid

L1 Core Protocol — Message format, standard error codes (E1001–E9999), version declaration
L2 Capability Extensions — Streaming, vision, tools, MCP, Computer Use, multimodal — each controlled by feature flags
L3 Environment Profile — API keys, endpoints, retry policies — environment-specific configuration

Concentric Circle Manifest Model

V2 manifests are organized in three rings:

Ring 1 Core Skeleton (required) — Minimal fields: endpoint, auth, parameter mappings, model list
Ring 2 Capability Mapping (conditional) — Streaming config, tool mapping, MCP integration, Computer Use actions
Ring 3 Advanced Extensions (optional) — Custom headers, rate limit headers, context management policies

ProviderDriver Abstraction

The runtime layer implements a ProviderDriver abstraction that normalizes three distinct API styles:

API Style	Provider	Request Format	Streaming Format
`OpenAiCompatible`	OpenAI, DeepSeek, Moonshot	`messages` array	SSE `data: {...}`
`AnthropicMessages`	Anthropic	`messages` + `system` separate	SSE with typed events
`GeminiGenerate`	Google Gemini	`contents` array	SSE `generateContent`

The runtime automatically selects the correct driver based on the manifest’s api_style declaration.

MCP Tool Integration

AI-Protocol includes a built-in MCP (Model Context Protocol) tool bridge. Rather than operating at a separate layer, MCP tools are first-class citizens:

McpToolBridge converts MCP server tools to AI-Protocol ToolDefinition format
Tools are namespaced as mcp__{server}__{tool_name} to prevent collisions
Allow/deny filters control which MCP tools are exposed
Provider-specific MCP configuration (tool_parameter vs sdk_config) is handled automatically
Supports stdio, SSE, and streamable HTTP transports

Computer Use Abstraction

A unified Computer Use capability normalizes GUI automation across providers:

ComputerAction enum covers all action types: screenshot, mouse click, keyboard type, browser navigate, file read/write
SafetyPolicy enforces mandatory safety constraints loaded from the manifest:
- Confirmation required for destructive actions
- Domain allowlist for browser navigation
- Sensitive path protection
- Maximum actions per turn limit
- Sandbox mode support
Supports both screen_based (Anthropic, OpenAI) and tool_based (Google) implementation styles

Extended Multimodal

V2 extends multimodal support beyond vision to include audio, video, and omni-mode:

Modality	Input	Output	Providers
Text	✅	✅	All
Image	✅	✅ (select)	OpenAI, Anthropic, Gemini, Qwen
Audio	✅	✅ (select)	OpenAI (STT/TTS), Gemini, Qwen (omni)
Video	✅	—	Gemini
Rerank	—	✅	Cohere, Jina

Latest expansion notes:

Added V2 provider manifests for Qwen and Doubao in the P0 release train.
Added V2 multimodal schema support for multimodal.output.video to standardize video generation declarations.
ai-protocol-mock now includes Gemini generateContent and streamGenerateContent routes for cross-runtime verification.
ai-protocol-mock now also supports video generation async-polling (POST /v1/video/generations + GET /v1/video/generations/{job_id}) for transport lifecycle testing.
ai-protocol now ships full execution governance gate scripts:
- npm run drift:check
- npm run gate:manifest-consumption
- npm run gate:compliance-matrix
- npm run gate:fullchain
- npm run release:gate
Governance scripts support staged adoption with --report-only mode for advisory rollout.
ai-protocol-mock video async lifecycle supports deterministic terminal states:
- succeeded, failed, cancelled
- control via X-Mock-Video-Terminal or terminal_state

The MultimodalCapabilities module validates content modalities against provider declarations before sending requests.

CLI Tool

The ai-protocol-cli tool provides developer utilities:

ai-protocol-cli validate <path>        # Validate manifests against schemas
ai-protocol-cli info <provider>         # Show provider capabilities
ai-protocol-cli list                    # List all providers (37 total)
ai-protocol-cli check-compat <manifest> # Check runtime compatibility

Cross-Runtime Consistency

The compliance suite is executed across Rust, Python, and TypeScript, covering protocol loading, error classification, retry decisions, message building, stream decoding, event mapping, and tool accumulation for fullchain consistency.

Next Steps

AI-Protocol Overview — Deep dive into the specification
Rust SDK — Explore the Rust runtime
Python SDK — Explore the Python runtime
TypeScript SDK — Explore the TypeScript runtime
Go SDK — Explore the Go runtime