Qwen 3.5 35B Local Deployment¶
Qwen 3.5 35B is a large language model (LLM) that can be deployed locally on Apple Silicon hardware using the Ollama framework. By leveraging the MLX inference engine, this setup allows the 35-billion parameter model to run efficiently on consumer-grade hardware, providing a high-performance local AI alternative^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].
Performance Benchmarks¶
The utilization of the MLX engine on Apple Silicon results in significant performance improvements over previous versions^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].
- Text Generation Speed: Approximately 65–66 tokens per second (tok/s) during output generation^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].
- Prompt Processing: Approximately 5.3 tok/s^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].
- Hardware Acceleration: The setup achieves near 100% GPU utilization, fully leveraging the unified memory architecture^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].
System Requirements¶
To run the Qwen 3.5 35B model locally, specific hardware specifications are necessary due to the model's size and memory bandwidth requirements^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].
- Memory: A minimum of 32 GB of unified RAM is recommended.
- Storage: The model requires approximately 21 GB of disk space (quantized to NVFP4 format).
- Architecture: Designed for Apple Silicon (e.g., M1/M2/M3 series) to take advantage of the MLX backend.
In environments with limited RAM (e.g., 16 GB), the system can fall back on swap storage to run the model, though this incurs a performance penalty^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].
Deployment Workflow¶
Deploying the model on a MacBook involves installing the Ollama runtime and pulling the specific model weights^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].
- Install Ollama: Download the macOS application (version 0.9 or later) and move it to the Applications folder.
- Initialize: Open the terminal and run the
ollama runcommand to start the service. - Launch Model: Select the Qwen 3.5 35B model from the Ollama interface or run it via the command line.
Ollama provides a ChatGPT-style Web UI that supports model switching and parameter adjustment, facilitating interaction with the locally hosted model^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].
Related Concepts¶
- [[MLX]]
- [[Ollama]]
- [[Local LLM]]
- [[Quantization]]
Sources¶
001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md