Qwen 3.6 27B for Agentic Coding¶

Qwen 3.6 27B is an open-source large language model developed by Alibaba's Tongyi team, specifically optimized for Agentic Coding workflows^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].

Unlike general-purpose dialogue models, Qwen 3.6 27B prioritizes performance in code reasoning, repository-level understanding, and long-context maintenance. It is designed to function not just as a conversational assistant, but as a core engine for autonomous agents, supporting native tool use and maintaining coherent task threads over extended interactions^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].

Core Characteristics¶

The model is designed to address common failure points in coding agent workflows, such as over-explaining actions instead of performing them, or losing context during long sessions^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].

Agentic Coding: Optimized for real-world coding workflows rather than just achieving high benchmark scores^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].
Thinking Preservation: Capable of maintaining context and "task threads" over long conversations without getting lost or changing goals midway^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].
Repository-level Reasoning: Able to understand the structure and dependencies of an entire codebase^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].
Native Tool Use: Intrinsically supports tool calling, allowing it to execute functions rather than merely describing them^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].

Deployment with vLLM¶

For running Agentic Coding workflows, particularly with the 27B parameter version, vLLM is the recommended deployment engine^[001-TODO__Qwen_3.6_27B_—面向_Coding_Agent_的开源模型.md]. Alternatives like Ollama do not currently support the specific 27B version required for these workflows^[001-TODO__Qwen_3.6_27B—_面向_Coding_Agent_的开源模型.md].

Startup Command¶

To deploy the model effectively, the following vllm serve command parameters are critical^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md]:

vllm serve Qwen/Qwen3-27B \
  --port 8000 \
  --tensor-parallel-size 1 \
  --max-model-len 32768 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3

Key Parameters¶

--max-model-len: Should be set to the maximum hardware allows (e.g., 32768). A long context window is a core advantage of this model^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].
--enable-auto-tool-choice: Must be enabled. Without this flag, the model may describe its intention to use a tool rather than actually calling it^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].
--tool-call-parser: Must be set to qwen3 to match the specific model version's formatting^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].

Integration with Agent Frameworks¶

Hermes Agent ¶

Hermes Agent provides a comprehensive integration for local deployment, supporting memory management and message orchestration^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].

Configuration typically involves setting a custom endpoint in ~/.hermes/config.yaml^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md]:

provider:
  base_url: http://localhost:8000/v1
  default_model: Qwen/Qwen3-27B
  max_context_tokens: 32768

It is important to explicitly set max_context_tokens to prevent the system from defaulting to a smaller window size^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].

Kilo CLI¶

For integration within [[VS Code]], the Kilo CLI can be configured to use the local vLLM endpoint^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].

Install via npm install -g @kilocode/cli.
Select "OpenAI-compatible" provider.
Set Base URL to http://localhost:8000/v1.

[[Hermes Agent 架构笔记]]
[[本地 LLM 部署方案对比]]
[[Agent Skills_-_结构化AI编码工作流框架]]

Sources¶

001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md