ModelKit vs VisionModelKit initialization paths¶

In mlx-engine, the system selects between two distinct initialization paths based on the model_type defined in the model's config.json. This bifurcation allows the engine to apply advanced optimizations to supported architectures while maintaining compatibility with a wider range of visual models.^[001-TODO__mlx-engine.md]

ModelKit Initialization Path¶

The ModelKit path is the default, high-performance route used for standard text-based Large Language Models (LLMs) and specific Vision-Language Models (VLMs) that have dedicated, optimized add-ons within the engine.^[001-TODO__mlx-engine.md]

A model is routed to this path if its model_type is present in the VISION_ADD_ON_MAP.^[001-TODO__mlx-engine.md]

Supported Models¶

This category includes pure text models (like Llama) as well as vision models that have received specific integration support. Supported vision model types include:^[001-TODO__mlx-engine.md] * gemma3 / gemma3n * lfm2-vl * mistral3 * pixtral

Features and Optimizations¶

Models initialized via ModelKit support the full suite of mlx-engine performance features:^[001-TODO__mlx-engine.md] * KV Cache Quantization: Reduces memory usage via kv_bits and kv_group_size parameters. * Cross-Prompt Caching: Reuses computation from previous prompts to speed up new requests. * Speculative Decoding: Accelerates generation by using a smaller draft model for validation (not applicable to vision models, only text).

Architecture¶

This implementation relies on the model_kit/model_kit.py file and dynamically loads specific vision processing logic from the vision_add_ons/ directory (e.g., pixtral.py, gemma3.py) when necessary.^[001-TODO__mlx-engine.md]

VisionModelKit Initialization Path¶

The VisionModelKit path acts as a generic compatibility wrapper for visual models that do not have specific optimizations implemented in mlx-engine.^[001-TODO__mlx-engine.md]

A model is routed to this path if its model_type is not found in the VISION_ADD_ON_MAP.^[001-TODO__mlx-engine.md]

Implementation¶

This path encapsulates the third-party mlx-vlm library to provide inference capabilities.^[001-TODO__mlx-engine.md] It is implemented in vision_model_kit/vision_model_kit.py.

Limitations¶

Because this path relies on a generic wrapper rather than custom-built optimizations, it lacks several advanced features available in the ModelKit path. Specifically, the VisionModelKit path does not support:^[001-TODO__mlx-engine.md] * KV cache quantization * Cross-Prompt Caching * Speculative Decoding

Additionally, there is currently no incremental reset mechanism for VisionModelKit models; they must be reloaded for every new prediction session.^[001-TODO__mlx-engine.md]

Known Exceptions¶

Some models technically defined as vision models may be temporarily excluded from either path due to bugs. For example, the Qwen VL series (qwen2_vl, qwen2_5_vl, qwen3_vl) is currently commented out of the add-on map due to a port bug (Issue #237).^[001-TODO__mlx-engine.md]

[[mlx-engine]]: The overarching inference engine.
[[Speculative Decoding]]: An optimization feature available only to the ModelKit path.
[[KV Cache]]: Memory management mechanisms that differ between the two paths.

Sources¶

001-TODO__mlx-engine.md