Powered by MuseTalk V15 + LiveKit WebRTC

AI Avatars That See, Hear &
Respond in Real Time

Deploy hyper-realistic AI avatar assistants that recognize faces, detect emotions, understand gestures, and hold intelligent conversations — all streamed live to any screen via WebRTC.

visionai.tech/demo
LIVE
👁Face Detected
😊Happy 94%
👋Wave Gesture
AI Avatar: Xin chào! Tôi có thể giúp gì cho bạn hôm nay?
↓ 2.1 Mbps
WebRTC p2p
142 ms
E2E latency
<200ms
Avatar Response Latency
99.9%
Stream Uptime
10+
Emotion States Detected
Concurrent Sessions
Platform Capabilities

Everything Your AI Avatar Needs to Shine

A vertically integrated stack — from GPU inference to WebRTC delivery — so you deploy in days, not months.

Custom Avatar Creation

Upload photos or videos to generate your own lifelike AI avatar powered by MuseTalk V15. Each avatar is stored securely in S3 and cached on GPU workers for zero-cold-start rendering.

MuseTalk V15 inference

Real-Time Face Recognition

Identify registered members the moment they appear on camera. The avatar greets them by name and personalizes the conversation using their stored profile context.

Zero-latency identification

Emotion Detection

Detect happiness, sadness, surprise, anger, fear and more with >90% accuracy. The AI adapts its tone, pacing, and response content to match the visitor's emotional state.

10+ emotion states

Gesture Detection

Wave to greet, point to direct, or use custom hand signals to trigger specific AI responses and workflows — all processed in real time from the camera feed.

Custom gesture triggers

RAG Knowledge Base

Upload your business documents, FAQs, product catalogs, and policies. The avatar retrieves relevant context using vector embeddings (OpenAI text-embedding-3-small) and answers accurately.

Vector-powered retrieval

Multilingual Support

Switch between Vietnamese and English mid-conversation. Language is detected automatically per session setup. Built-in prompting ensures culturally appropriate responses.

Auto language detection

Low-Latency WebRTC Streaming

LiveKit-powered peer-to-peer video delivery ensures avatar frames and audio reach your users in under 200ms globally. No plugins, no downloads — pure browser WebRTC.

<200ms end-to-end

Enterprise Configuration

Per-setup feature toggles let each client enable/disable face recognition, gestures, emotion detection independently. Redis-hot-reload pushes config changes without restarting sessions.

Hot-reload config
How It Works

Go Live in 4 Steps

From upload to live stream — VisionAI handles the infrastructure so you focus on the experience.

01

Create Your Avatar

Upload a reference photo or video. Our MuseTalk V15 pipeline processes it on GPU and stores the avatar assets in S3. A thumbnail is generated and cached for instant display.

Step 01
POST /api/avatar/upload
{ name: "John", file: <mp4> }
// MuseTalk processes → S3 stored
// avatarId: "a3f9c2d1-..."
02

Configure Your Assistant

Choose an assistant type (receptionist, family, sales), write system & behavior prompts, upload your knowledge base documents, and enable the features your use case needs.

Step 02
PATCH /api/setup/12
{
  avatarId: 7,
  language: "vi",
  features: {
    face_recognition: true,
    emotion_detection: true
  }
}
03

Start a Live Session

Call initSession() — a LiveKit room is created, your config is pushed to Redis, and the worker joins with the avatar pre-loaded. The WebRTC stream starts in under 2 seconds.

Step 03
POST /api/avatar/init-session
→ Redis: session:{roomName}:config
→ Worker joins LiveKit room
→ Avatar idle loop starts
→ roomName + token returned
04

Real-Time Interaction

Your customer sees the avatar live. Face recognition fires on entry, emotions are tracked frame-by-frame, gestures trigger custom workflows, and the avatar speaks via STT → LLM → TTS pipeline.

Step 04
// On presence_detected:
avatar.greet("Xin chào, Minh!")
// On gesture:wave → trigger FAQ
// On emotion:sad → switch tone
// LLM → TTS → lip-sync frames

System Architecture

Browser
WebRTC Client
LiveKit
SFU / Signaling
Worker
Python / FastAPI
MuseTalk
GPU Inference
NestJS API
REST + Auth
Redis
Session Config + Queue
PostgreSQL
Users + Setups + Avatars
S3
Avatar Files + Contexts
Use Cases

One Platform, Every Industry

From retail kiosks to hospital check-ins — VisionAI adapts to your vertical with zero code changes.

🏢Smart Office

AI Receptionist

Greet visitors by name the moment they walk in. The avatar checks appointments, issues badges, and directs guests — all without human intervention.

85%
Wait time reduced
Always on
24/7 coverage
Face Recognition
🏪Retail & Hospitality

In-Store Assistant

Customers approach a kiosk, the avatar detects their emotion, adapts its pitch, answers product questions from your RAG catalog, and upsells — in Vietnamese or English.

+32%
Conversion lift
VI / EN
Languages
Emotion Adaptive
🏥Healthcare

Patient Check-In

Guide patients through intake forms via gesture and voice. Face recognition pre-fills known patient data. Emotion detection flags distressed patients for immediate human escalation.

-60%
Check-in time
4.8★
Patient satisfaction
Gesture Control
🎓Education

Virtual Tutor

An avatar teacher that adapts to student engagement in real time — if emotion detection shows frustration, it simplifies the explanation and encourages with a smile.

+47%
Engagement rate
91%
Completion rate
Adaptive Learning
🏦Banking & Finance

Branch Advisor

Customers walk up to a branch kiosk and interact with an AI advisor that knows their account history, answers product questions via RAG, and escalates complex issues.

78%
Queries handled
22%
Escalation rate
RAG Knowledge
🎭Events & Exhibitions

Brand Ambassador

A unique branded avatar at your booth or event interacts with hundreds of visitors simultaneously, collects leads, shares promotional content, and logs all interactions.

3× more
Leads captured
Unlimited
Concurrent users
Multi-Session
Pricing

Simple, Transparent Pricing

No per-minute charges. No seat limits. Just flat monthly plans that scale with your usage.

Starter

$99/month

Perfect for piloting AI avatars in a single location.

  • 1 Custom Avatar
  • Up to 5 concurrent sessions
  • Face Recognition
  • Multilingual (VI + EN)
  • 1 Knowledge Base (50 docs)
  • LiveKit WebRTC streaming
  • Email support
  • Emotion Detection
  • Gesture Control
  • Multi-location
Start Free Trial
Most Popular

Growth

$349/month

For teams deploying across multiple touchpoints.

  • 5 Custom Avatars
  • Up to 50 concurrent sessions
  • Face Recognition
  • Emotion Detection
  • Gesture Detection
  • Multilingual (VI + EN)
  • 5 Knowledge Bases (500 docs each)
  • Redis hot-reload config
  • Priority support + SLA
  • Custom LLM fine-tuning
  • On-premise deployment
Start Free Trial

Enterprise

Custom

Full-stack deployment with dedicated infrastructure.

  • Unlimited Avatars
  • Unlimited concurrent sessions
  • All Growth features
  • Custom LLM fine-tuning
  • On-premise / private cloud
  • Custom integrations & APIs
  • Dedicated GPU cluster
  • 99.99% SLA
  • Dedicated CSM
Contact Sales

All plans include a 14-day free trial. No credit card required. Talk to us about custom requirements →

14-day free trial · No credit card

Ready to Deploy Your AI Avatar?

Join forward-thinking companies using VisionAI to deliver engaging, intelligent avatar experiences at scale. Setup takes less than an hour.

By submitting, you agree to our Terms of Service and Privacy Policy.

🔒 SOC2 Type II (in progress)🌏 Hosted in Asia-Pacific⚡ 99.9% uptime SLA🛡 GDPR compliant