Quantization

Suiri uses default, pre-selected quantization settings for each model to balance inference efficiency, cost, and output quality.

What is quantization?

Quantization refers to the numerical precision used to represent model weights. While different quantization techniques exist, Suiri selects and manages these settings internally for each model. Developers do not need to configure or tune quantization parameters.

Common quantization labels

Q4 (e.g., q4_0)

4-bit weight representation
Lower memory footprint
Optimized for cost-efficient, low-latency inference

Q8

8-bit weight representation
Higher numerical precision than Q4
Used for models requiring greater fidelity

FP8

8-bit floating-point representation
Higher numerical fidelity than integer quantization
Typically used for larger or more capable models

What this means for developers

Quantization is handled automatically by Suiri.
There are no user-configurable quantization parameters.
Model behavior is defined by the selected model ID.
Changing quantization always results in a different model ID.

Suiri does not guarantee identical outputs across different models or quantization schemes. Developers should evaluate model behavior using their own workloads.

Getting started

API

Models

Security & privacy

Troubleshooting

Support

Quantization

What is quantization?

Common quantization labels

Q4 (e.g., q4_0)

Q8

FP8

What this means for developers

Getting started

API

Models

Security & privacy

Troubleshooting

Support

​What is quantization?

​Common quantization labels

​Q4 (e.g., q4_0)

​Q8

​FP8

​What this means for developers

What is quantization?

Common quantization labels

Q4 (e.g., q4_0)

Q8

FP8

What this means for developers