Hermes Agent integration with custom endpoints¶
Hermes Agent supports integration with custom endpoints, enabling users to connect locally hosted or self-deployed Large Language Models (LLMs) via OpenAI-compatible APIs.^[001-TODO__Qwen_3.6_27B_—面向_Coding_Agent_的开源模型.md] This capability allows for greater control over the underlying model, privacy, and the use of open-source models like Qwen 3.6 27B within the Hermes workflow^[001-TODO__Qwen_3.6_27B—_面向_Coding_Agent_的开源模型.md].
Configuration Methods¶
There are two primary methods to configure Hermes Agent to use a custom endpoint: interactive configuration and direct file editing.
Interactive Configuration¶
Users can initiate the configuration process via the command line interface.^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md]
hermes model
When prompted, select "custom endpoint". You will then need to provide the following details:
* Base URL: The address of your local server (e.g., http://localhost:8000/v1).
* Model: The specific model identifier (e.g., Qwen/Qwen3-27B).
* API Key: For local deployments, this field can usually be left empty^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].
Configuration File¶
For persistent settings, you can edit the Hermes configuration file directly at ~/.hermes/config.yaml^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].
provider:
base_url: http://localhost:8000/v1
default_model: Qwen/Qwen3-27B
# Explicitly set context limit to avoid default value restrictions
max_context_tokens: 32768
Key Configuration Parameters¶
When integrating with custom endpoints, especially locally hosted models, certain parameters are critical for optimal performance^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md]:
base_url: This must point to the OpenAI-compatible endpoint of your inference server (e.g., vLLM).default_model: The model name must match the one loaded in your inference server.max_context_tokens: It is recommended to explicitly set this value. If left unset, Hermes may default to a smaller window size, which would waste the capabilities of models optimized for long contexts (like 32k tokens)^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].
Agent Inheritance¶
In Hermes Agent, the model configuration established for a parent Agent is automatically inherited by any child Agents^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md]. This means that configuring the custom endpoint at the root level ensures consistent behavior across the entire agent hierarchy without the need for repetitive setup.
Integration Example: vLLM and Qwen 3.6 27B¶
A common use case for custom endpoints is integrating with vLLM, a high-throughput serving engine, using a model like Qwen 3.6 27B^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].
-
Start the vLLM Server: Ensure the server is started with the correct parameters, particularly
--enable-auto-tool-choiceand the correct--tool-call-parser(e.g.,qwen3), to ensure the model executes tools rather than just describing them^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].vllm serve Qwen/Qwen3-27B \ --port 8000 \ --tensor-parallel-size 1 \ --max-model-len 32768 \ --enable-auto-tool-choice \ --tool-call-parser qwen3 -
Configure Hermes: Update the
~/.hermes/config.yamlto point to the vLLM server, settingmax_context_tokensto32768to utilize the model's full capacity^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].
Related Concepts¶
- [[Hermes Agent 架构笔记]]
- 20/80 Learning Principle
- [[本地 LLM 部署方案对比]]
Sources¶
001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md