Skip to content

Qwen 3.5 35B Local Deployment

Qwen 3.5 35B is a large language model (LLM) that can be deployed locally on Apple Silicon hardware using the Ollama framework. By leveraging the MLX inference engine, this setup allows the 35-billion parameter model to run efficiently on consumer-grade hardware, providing a high-performance local AI alternative^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].

Performance Benchmarks

The utilization of the MLX engine on Apple Silicon results in significant performance improvements over previous versions^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].

  • Text Generation Speed: Approximately 65–66 tokens per second (tok/s) during output generation^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].
  • Prompt Processing: Approximately 5.3 tok/s^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].
  • Hardware Acceleration: The setup achieves near 100% GPU utilization, fully leveraging the unified memory architecture^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].

System Requirements

To run the Qwen 3.5 35B model locally, specific hardware specifications are necessary due to the model's size and memory bandwidth requirements^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].

  • Memory: A minimum of 32 GB of unified RAM is recommended.
  • Storage: The model requires approximately 21 GB of disk space (quantized to NVFP4 format).
  • Architecture: Designed for Apple Silicon (e.g., M1/M2/M3 series) to take advantage of the MLX backend.

In environments with limited RAM (e.g., 16 GB), the system can fall back on swap storage to run the model, though this incurs a performance penalty^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].

Deployment Workflow

Deploying the model on a MacBook involves installing the Ollama runtime and pulling the specific model weights^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].

  1. Install Ollama: Download the macOS application (version 0.9 or later) and move it to the Applications folder.
  2. Initialize: Open the terminal and run the ollama run command to start the service.
  3. Launch Model: Select the Qwen 3.5 35B model from the Ollama interface or run it via the command line.

Ollama provides a ChatGPT-style Web UI that supports model switching and parameter adjustment, facilitating interaction with the locally hosted model^[001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md].

  • [[MLX]]
  • [[Ollama]]
  • [[Local LLM]]
  • [[Quantization]]

Sources

  • 001-TODO__Ollama_MLX_Support_MacBook_Local_LLM.md