AI VTuber technology stack¶
The AI VTuber technology stack enables the creation of virtual characters capable of real-time interaction, gaming, and streaming. Unlike traditional static avatars, an AI VTuber integrates a Large Language Model (LLM) as a "brain" to drive conversation and decision-making, coordinated by a backend runtime^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md].
Modern implementations typically use a Monorepo architecture (e.g., using pnpm workspaces) to manage the complexity of supporting multiple deployment targets, including Web browsers, desktop apps (Electron), and mobile devices^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md].
Core Components¶
Avatar Rendering¶
Visual representation is handled through specialized rendering engines, supporting both 2D and 3D models^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md]:
- Live2D: Used for 2D models; libraries manage animations, automatic blinking, and gaze tracking^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md].
- VRM: Used for 3D models; typically rendered via [[Three.js]] to create spatially aware avatars^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md].
- Cross-Platform UI: The visual interface ("Stage UI") is often built with web frameworks like [[Vue.js]] and [[Vite]], allowing it to run natively on the web or be wrapped in desktop/mobile containers^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md].
AI Brain & LLM Integration¶
The "brain" of the VTuber connects to various LLM providers to process text and generate responses^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md]. Key considerations include:
- Provider Abstraction: Support for 30+ providers (e.g., OpenAI, Anthropic, DeepSeek, Qwen, local models via Ollama) through unified SDKs (similar to [[Vercel AI SDK]])^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md].
- Self-Hosting: Full self-hosting capabilities are standard, allowing users to keep API keys and data local^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md].
Audio & Voice Pipeline¶
Real-time voice interaction requires a processing pipeline for speech-to-text and text-to-speech^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md]:
- Input (ASR): Audio is captured via the [[WebAudio API]].
- Output (TTS): Text responses are converted to speech using services like ElevenLabs^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md].
Memory System¶
To maintain context over long sessions or multiple streams, a dedicated memory layer is required^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md]:
- Client-Side: In-browser databases like DuckDB WASM or pglite allow for local data storage without a backend^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md].
- Server-Side: For services like Discord or Telegram bots, traditional databases like [[PostgreSQL]] combined with vector extensions (e.g.,
pgvector) are used for semantic search and long-term memory^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md].
Gaming & Agent Capabilities¶
A defining feature of advanced AI VTubers (inspired by Neuro-sama) is the ability to play games^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md]. This requires bridging the LLM with game environments:
- Minecraft: Implemented using frameworks like Mineflayer to connect the AI to MC servers^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md].
- Other Games: Support exists for simulation games like Factorio and Kerbal Space Program via custom plugins^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md].
- Logic Control: The AI interprets game state and executes commands, transforming the VTuber from a chatbot into an autonomous agent^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md].
Ecosystem & Platform Integration¶
AI VTubers are often integrated into broader social platforms via specific services^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md]:
- Chat Platforms: Bots for Discord and Telegram allow the AI character to interact directly in community channels^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md].
- Plugin Architecture: Extensibility is achieved through plugin systems (e.g., for Bilibili, HomeAssistant, or Claude Code), enabling the character to perform tasks outside of simple chatting^[001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md].
Related Concepts¶
- [[Neuro-sama]]: The pioneering AI VTuber that popularized this technology stack.
- [[Three.js]]: The graphics library often used for rendering 3D VTuber models.
- [[Live2D]]: The technology behind 2D interactive avatars.
- [[Local LLM]]: Running language models locally for privacy and low latency.
Sources¶
001-TODO__Project_AIRI_-_开源_AI_VTuber_赛博伴侣.md