Hermes Agent integration with custom endpoints¶

Hermes Agent supports integration with custom endpoints, enabling users to connect locally hosted or self-deployed Large Language Models (LLMs) via OpenAI-compatible APIs.^[001-TODO__Qwen_3.6_27B_—面向_Coding_Agent_的开源模型.md] This capability allows for greater control over the underlying model, privacy, and the use of open-source models like Qwen 3.6 27B within the Hermes workflow^[001-TODO__Qwen_3.6_27B—_面向_Coding_Agent_的开源模型.md].

Configuration Methods¶

There are two primary methods to configure Hermes Agent to use a custom endpoint: interactive configuration and direct file editing.

Interactive Configuration¶

Users can initiate the configuration process via the command line interface.^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md]

hermes model

When prompted, select "custom endpoint". You will then need to provide the following details: * Base URL: The address of your local server (e.g., http://localhost:8000/v1). * Model: The specific model identifier (e.g., Qwen/Qwen3-27B). * API Key: For local deployments, this field can usually be left empty^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].

Configuration File¶

For persistent settings, you can edit the Hermes configuration file directly at ~/.hermes/config.yaml^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].

provider:
  base_url: http://localhost:8000/v1
  default_model: Qwen/Qwen3-27B
  # Explicitly set context limit to avoid default value restrictions
  max_context_tokens: 32768

Key Configuration Parameters¶

When integrating with custom endpoints, especially locally hosted models, certain parameters are critical for optimal performance^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md]:

base_url: This must point to the OpenAI-compatible endpoint of your inference server (e.g., vLLM).
default_model: The model name must match the one loaded in your inference server.
max_context_tokens: It is recommended to explicitly set this value. If left unset, Hermes may default to a smaller window size, which would waste the capabilities of models optimized for long contexts (like 32k tokens)^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].

Agent Inheritance¶

In Hermes Agent, the model configuration established for a parent Agent is automatically inherited by any child Agents^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md]. This means that configuring the custom endpoint at the root level ensures consistent behavior across the entire agent hierarchy without the need for repetitive setup.

Integration Example: vLLM and Qwen 3.6 27B¶

A common use case for custom endpoints is integrating with vLLM, a high-throughput serving engine, using a model like Qwen 3.6 27B^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].

Start the vLLM Server: Ensure the server is started with the correct parameters, particularly --enable-auto-tool-choice and the correct --tool-call-parser (e.g., qwen3), to ensure the model executes tools rather than just describing them^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].
```
vllm serve Qwen/Qwen3-27B \
  --port 8000 \
  --tensor-parallel-size 1 \
  --max-model-len 32768 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3
```
Configure Hermes: Update the ~/.hermes/config.yaml to point to the vLLM server, setting max_context_tokens to 32768 to utilize the model's full capacity^[001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md].

[[Hermes Agent 架构笔记]]
20/80 Learning Principle
[[本地 LLM 部署方案对比]]

Sources¶

001-TODO__Qwen_3.6_27B_—_面向_Coding_Agent_的开源模型.md