Skip to main content
Suiri offers a selection of on-demand models optimized for different use cases. For the most up-to-date models and pricing, please visit suiri.ai/models.

Available models

The table below is for illustrative purposes:
ModelTypeContext WindowWeights PrecisionPrice (per 1K tokens)
Google Gemma-3n-E4B-itServerless16,000Q4See Pricing
Alibaba Qwen3-14BServerless12,000Q8See Pricing
OpenAI GPT-oss-20bServerless12,000Q8See Pricing
Meta llama-v3p3-70b-instructServerless8,000FP8/Q8See Pricing

How to read the model table

  • Context Window: Maximum number of tokens (input + output) the model can process in a single request.
  • Weights Precision: Quantization level (Q4, Q8, FP8). Lower precision generally means faster inference with minimal quality tradeoff.
  • Price: Pricing is per 1,000 tokens, with separate rates for input (your prompt) and output (model’s response).
  • Google Gemma-3n-E4B-it: Best for lightweight, cost-sensitive applications requiring fast responses with smaller context needs.
  • Alibaba Qwen3-14B or OpenAI GPT-oss-20b: Strong general-purpose models balancing cost and capability for most production workloads.
  • Meta llama-v3p3-70b-instruct: Higher-capacity model for complex reasoning tasks, though with shorter context window.