What is quantization?
Quantization refers to the numerical precision used to represent model weights. While different quantization techniques exist, Suiri selects and manages these settings internally for each model. Developers do not need to configure or tune quantization parameters.Common quantization labels
Q4 (e.g., q4_0)
- 4-bit weight representation
- Lower memory footprint
- Optimized for cost-efficient, low-latency inference
Q8
- 8-bit weight representation
- Higher numerical precision than Q4
- Used for models requiring greater fidelity
FP8
- 8-bit floating-point representation
- Higher numerical fidelity than integer quantization
- Typically used for larger or more capable models
What this means for developers
- Quantization is handled automatically by Suiri.
- There are no user-configurable quantization parameters.
- Model behavior is defined by the selected model ID.
- Changing quantization always results in a different model ID.