ModelKit vs VisionModelKit initialization paths¶
In mlx-engine, the system selects between two distinct initialization paths based on the model_type defined in the model's config.json. This bifurcation allows the engine to apply advanced optimizations to supported architectures while maintaining compatibility with a wider range of visual models.^[001-TODO__mlx-engine.md]
ModelKit Initialization Path¶
The ModelKit path is the default, high-performance route used for standard text-based Large Language Models (LLMs) and specific Vision-Language Models (VLMs) that have dedicated, optimized add-ons within the engine.^[001-TODO__mlx-engine.md]
A model is routed to this path if its model_type is present in the VISION_ADD_ON_MAP.^[001-TODO__mlx-engine.md]
Supported Models¶
This category includes pure text models (like Llama) as well as vision models that have received specific integration support. Supported vision model types include:^[001-TODO__mlx-engine.md]
* gemma3 / gemma3n
* lfm2-vl
* mistral3
* pixtral
Features and Optimizations¶
Models initialized via ModelKit support the full suite of mlx-engine performance features:^[001-TODO__mlx-engine.md]
* KV Cache Quantization: Reduces memory usage via kv_bits and kv_group_size parameters.
* Cross-Prompt Caching: Reuses computation from previous prompts to speed up new requests.
* Speculative Decoding: Accelerates generation by using a smaller draft model for validation (not applicable to vision models, only text).
Architecture¶
This implementation relies on the model_kit/model_kit.py file and dynamically loads specific vision processing logic from the vision_add_ons/ directory (e.g., pixtral.py, gemma3.py) when necessary.^[001-TODO__mlx-engine.md]
VisionModelKit Initialization Path¶
The VisionModelKit path acts as a generic compatibility wrapper for visual models that do not have specific optimizations implemented in mlx-engine.^[001-TODO__mlx-engine.md]
A model is routed to this path if its model_type is not found in the VISION_ADD_ON_MAP.^[001-TODO__mlx-engine.md]
Implementation¶
This path encapsulates the third-party mlx-vlm library to provide inference capabilities.^[001-TODO__mlx-engine.md] It is implemented in vision_model_kit/vision_model_kit.py.
Limitations¶
Because this path relies on a generic wrapper rather than custom-built optimizations, it lacks several advanced features available in the ModelKit path. Specifically, the VisionModelKit path does not support:^[001-TODO__mlx-engine.md] * KV cache quantization * Cross-Prompt Caching * Speculative Decoding
Additionally, there is currently no incremental reset mechanism for VisionModelKit models; they must be reloaded for every new prediction session.^[001-TODO__mlx-engine.md]
Known Exceptions¶
Some models technically defined as vision models may be temporarily excluded from either path due to bugs. For example, the Qwen VL series (qwen2_vl, qwen2_5_vl, qwen3_vl) is currently commented out of the add-on map due to a port bug (Issue #237).^[001-TODO__mlx-engine.md]
Related Concepts¶
- [[mlx-engine]]: The overarching inference engine.
- [[Speculative Decoding]]: An optimization feature available only to the ModelKit path.
- [[KV Cache]]: Memory management mechanisms that differ between the two paths.
Sources¶
001-TODO__mlx-engine.md