Request Flow
Applications send stateless inference requests to Suiri, which routes each request to the optimal edge datacenter hosting the requested model. Responses are returned synchronously over HTTPS.OpenAI-Compatible API
Suiri implements the OpenAI Chat Completions API schema, allowing existing applications to integrate with minimal or no code changes.- Endpoint: POST /v1/chat/completions
- Request and response formats follow OpenAI conventions
- Supports system, user, and assistant roles
- Token usage is returned with every response
- Reuse existing OpenAI SDKs and clients
- Swap API keys without refactoring application logic
- Gradually migrate workloads to Suiri
Stateless Inference Model
Each inference request to Suiri is fully stateless:- Conversation history must be provided by the client
- No server-side session memory is maintained
- No prompts or responses are persisted after inference completes
- Predictable behavior
- Strong privacy guarantees
- Horizontal scalability across regions
Request Lifecycle
A typical inference request follows this path:- Your application sends a request to the Suiri API
- Suiri authenticates the request using your API key
- The global control plane selects the optimal region based on:
a. Model availability
b. Proximity and latency
c. Capacity and health - The request is executed on the selected edge inference cluster
- The model response is returned to your application
- Token usage is recorded for billing and observability
Token Accounting
Suiri uses token-based billing, similar to OpenAI and other inference platforms:- Input tokens: tokens in your prompt and messages
- Output tokens: tokens generated by the model
- Pricing is specified per 1,000 tokens
- Usage details are included in every API response
When to Use Suiri
Suiri is optimized for:- Latency-sensitive inference workloads
- Stateless AI services and APIs
- Mobile, web, and edge applications
- Cost-conscious production deployments
- Long-lived conversational state
- Model training or fine-tuning
- Data storage or prompt retention