Qwen 3.5 35B Local Deployment¶

Qwen 3.5 35B is a large language model (LLM) that can be deployed locally on Apple Silicon hardware using the Ollama framework. By leveraging the MLX inference engine, this setup allows the 35-billion parameter model to run efficiently on consumer-grade hardware, providing a high-performance local AI alternative^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].

Performance Benchmarks¶

The utilization of the MLX engine on Apple Silicon results in significant performance improvements over previous versions^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].

Text Generation Speed: Approximately 65–66 tokens per second (tok/s) during output generation^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].
Prompt Processing: Approximately 5.3 tok/s^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].
Hardware Acceleration: The setup achieves near 100% GPU utilization, fully leveraging the unified memory architecture^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].

System Requirements¶

To run the Qwen 3.5 35B model locally, specific hardware specifications are necessary due to the model's size and memory bandwidth requirements^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].

Memory: A minimum of 32 GB of unified RAM is recommended.
Storage: The model requires approximately 21 GB of disk space (quantized to NVFP4 format).
Architecture: Designed for Apple Silicon (e.g., M1/M2/M3 series) to take advantage of the MLX backend.

In environments with limited RAM (e.g., 16 GB), the system can fall back on swap storage to run the model, though this incurs a performance penalty^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].

Deployment Workflow¶

Deploying the model on a MacBook involves installing the Ollama runtime and pulling the specific model weights^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].

Install Ollama: Download the macOS application (version 0.9 or later) and move it to the Applications folder.
Initialize: Open the terminal and run the ollama run command to start the service.
Launch Model: Select the Qwen 3.5 35B model from the Ollama interface or run it via the command line.

Ollama provides a ChatGPT-style Web UI that supports model switching and parameter adjustment, facilitating interaction with the locally hosted model^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].

[[MLX]]
[[Ollama]]
[[Local LLM]]
[[Quantization]]

Sources¶

001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md

Qwen 3.5 35B Local Deployment¶

Performance Benchmarks¶

System Requirements¶

Deployment Workflow¶

Related Concepts¶

Sources¶