Available models - Suiri

Available models
How to read the model table
Recommended models by use case

Suiri offers a selection of on-demand models optimized for different use cases. For the most up-to-date models and pricing, please visit suiri.ai/models.

Available models

The table below is for illustrative purposes:

Model	Type	Context Window	Weights Precision	Price (per 1K tokens)
Google Gemma-3n-E4B-it	Serverless	16,000	Q4	See Pricing
Alibaba Qwen3-14B	Serverless	12,000	Q8	See Pricing
OpenAI GPT-oss-20b	Serverless	12,000	Q8	See Pricing
Meta llama-v3p3-70b-instruct	Serverless	8,000	FP8/Q8	See Pricing

How to read the model table

Context Window: Maximum number of tokens (input + output) the model can process in a single request.
Weights Precision: Quantization level (Q4, Q8, FP8). Lower precision generally means faster inference with minimal quality tradeoff.
Price: Pricing is per 1,000 tokens, with separate rates for input (your prompt) and output (model’s response).

Recommended models by use case

Google Gemma-3n-E4B-it: Best for lightweight, cost-sensitive applications requiring fast responses with smaller context needs.
Alibaba Qwen3-14B or OpenAI GPT-oss-20b: Strong general-purpose models balancing cost and capability for most production workloads.
Meta llama-v3p3-70b-instruct: Higher-capacity model for complex reasoning tasks, though with shorter context window.

Reference Quantization