Available Models
To retrieve the current list of available models, query the models endpoint:How to Read Model Specifications
- Context Window: Maximum number of tokens (input + output) the model can process in a single request
- Weights Precision: Quantization level (Q4, Q8, FP8) — lower precision generally improves inference efficiency and reduces memory usage, but may affect output quality depending on the workload.