Available models
The table below is for illustrative purposes:| Model | Type | Context Window | Weights Precision | Price (per 1K tokens) |
|---|---|---|---|---|
| Google Gemma-3n-E4B-it | Serverless | 16,000 | Q4 | See Pricing |
| Alibaba Qwen3-14B | Serverless | 12,000 | Q8 | See Pricing |
| OpenAI GPT-oss-20b | Serverless | 12,000 | Q8 | See Pricing |
| Meta llama-v3p3-70b-instruct | Serverless | 8,000 | FP8/Q8 | See Pricing |
How to read the model table
- Context Window: Maximum number of tokens (input + output) the model can process in a single request.
- Weights Precision: Quantization level (Q4, Q8, FP8). Lower precision generally means faster inference with minimal quality tradeoff.
- Price: Pricing is per 1,000 tokens, with separate rates for input (your prompt) and output (model’s response).
Recommended models by use case
- Google Gemma-3n-E4B-it: Best for lightweight, cost-sensitive applications requiring fast responses with smaller context needs.
- Alibaba Qwen3-14B or OpenAI GPT-oss-20b: Strong general-purpose models balancing cost and capability for most production workloads.
- Meta llama-v3p3-70b-instruct: Higher-capacity model for complex reasoning tasks, though with shorter context window.