Platform

The full stack behind every SoulBox.

Cloud personality AI, voice, avatars, firmware, and fleet ops — built as one integrated platform. Use it as a service or partner with us to ship your own.

Personality AI Cloud

A managed AI runtime for bots with their own voice, mood, and memory. Streaming chat, tool-calling, and persistent personalities.

Multi-provider model routing (OpenAI, Anthropic, local)
Persistent bot memory + mood
Streaming token/audio responses
Per-tenant isolation and quotas

Voice Stack

XTTS voice cloning, faster-whisper STT, and a managed TTS fleet — the same voices on the web, on phones, and on firmware devices.

XTTS v2 voice cloning + library
faster-whisper streaming STT
TTS fleet control (XTTS, Polly, OpenAI)
Avatar-driven talking-head playback

Firmware Runtime

A SoulBox runtime for ESP32-S3 boards. On-device wake words, low-latency audio, and OTA fleet updates from the cloud.

Custom microWakeWord (Modal-trained)
Streaming I²S audio in/out
OTA updates with staged rollout
Drop-in for AiPi Lite, DFR1221, custom boards

Avatar & Image Fleet

Generate avatars, talking-head video, and AI imagery on managed GPU fleets. Per-bot avatars, on-demand portraits, and image gen.

Talking-head video (SadTalker / EchoMimic)
Avatar pack management
Image generation fleet (Z-Image, custom)
Backend-routed asset storage in MinIO/S3

Fleet Operations

A control plane for multi-tenant device, voice, image, and inference fleets. Observe latency, scale instances, and manage providers.

TTS / STT / image / inference fleets
AWS EC2 + Modal scale-to-zero workers
Per-fleet metrics + health
Encrypted provider API keys

Auth & Billing

OAuth + SSO, multi-tenant accounts, Stripe billing, and roles. Everything you need to ship a real multi-tenant product.

Google + Apple SSO, email login
Multi-tenant servers, roles, invites
Stripe plans + subscriptions
reCAPTCHA + rate limiting

End to end

From wake word to spoken reply.

Every layer of the SoulBox platform is tuned for sub-second voice round trips — cloud, codec, and firmware moving as one.

Step 01
Wake
Device wakes on a custom on-device wake word — no server round-trip required.
Step 02
Listen
Stream audio to the SoulBox cloud, transcribed live with faster-whisper.
Step 03
Think
Routed through the right model with persistent personality, mood, and memory.
Step 04
Speak
Streaming XTTS playback with avatar lip-sync — back to your device in under a second.

Ship on the SoulBox platform.

Bring your own hardware, your own voices, or your own brand. We'll handle the cloud, the firmware, and the fleet underneath.

Talk with the team See supported devices