Xiaozhi ESP32 is an open-source project that enables users to build their own AI friends using various AI and IoT components, integrating language models and hardware like ESP32 to facilitate AI conversation capabilities.

Features

Offline Voice Activation

This feature allows the device to be activated and interacted with through voice commands without needing an internet connection, using the ESP-SR framework.

Multilingual Recognition

Supports voice recognition for languages like Chinese, Cantonese, English, Japanese, and Korean using the SenseVoice technology, enabling more inclusive communication.

Voice Streaming Communication

Enables real-time voice dialogue through WebSocket or UDP protocols, ensuring seamless and efficient voice interactions.

Speaker Recognition

Utilizes 3D Speaker technology to identify who is calling the AI by recognizing unique voice prints.

LLM Integration

Incorporates large language models like Qwen, DeepSeek, and Doubao to enhance the conversational abilities and responses of the AI.

Customizable Roles

Allows users to set up specific prompts and voice tones to create customized interactive experiences that can mimic different characters or personalities.