Platform

The full stack behind every SoulBox.

Cloud personality AI, voice, avatars, firmware, and fleet ops — built as one integrated platform. Use it as a service or partner with us to ship your own.

Personality AI Cloud

A managed AI runtime for bots with their own voice, mood, and memory. Streaming chat, tool-calling, and persistent personalities.

  • Multi-provider model routing (OpenAI, Anthropic, local)
  • Persistent bot memory + mood
  • Streaming token/audio responses
  • Per-tenant isolation and quotas

Voice Stack

XTTS voice cloning, faster-whisper STT, and a managed TTS fleet — the same voices on the web, on phones, and on firmware devices.

  • XTTS v2 voice cloning + library
  • faster-whisper streaming STT
  • TTS fleet control (XTTS, Polly, OpenAI)
  • Avatar-driven talking-head playback

Firmware Runtime

A SoulBox runtime for ESP32-S3 boards. On-device wake words, low-latency audio, and OTA fleet updates from the cloud.

  • Custom microWakeWord (Modal-trained)
  • Streaming I²S audio in/out
  • OTA updates with staged rollout
  • Drop-in for AiPi Lite, DFR1221, custom boards

Avatar & Image Fleet

Generate avatars, talking-head video, and AI imagery on managed GPU fleets. Per-bot avatars, on-demand portraits, and image gen.

  • Talking-head video (SadTalker / EchoMimic)
  • Avatar pack management
  • Image generation fleet (Z-Image, custom)
  • Backend-routed asset storage in MinIO/S3

Fleet Operations

A control plane for multi-tenant device, voice, image, and inference fleets. Observe latency, scale instances, and manage providers.

  • TTS / STT / image / inference fleets
  • AWS EC2 + Modal scale-to-zero workers
  • Per-fleet metrics + health
  • Encrypted provider API keys

Auth & Billing

OAuth + SSO, multi-tenant accounts, Stripe billing, and roles. Everything you need to ship a real multi-tenant product.

  • Google + Apple SSO, email login
  • Multi-tenant servers, roles, invites
  • Stripe plans + subscriptions
  • reCAPTCHA + rate limiting
End to end

From wake word to spoken reply.

Every layer of the SoulBox platform is tuned for sub-second voice round trips — cloud, codec, and firmware moving as one.

  1. Step 01
    Wake

    Device wakes on a custom on-device wake word — no server round-trip required.

  2. Step 02
    Listen

    Stream audio to the SoulBox cloud, transcribed live with faster-whisper.

  3. Step 03
    Think

    Routed through the right model with persistent personality, mood, and memory.

  4. Step 04
    Speak

    Streaming XTTS playback with avatar lip-sync — back to your device in under a second.

Ship on the SoulBox platform.

Bring your own hardware, your own voices, or your own brand. We'll handle the cloud, the firmware, and the fleet underneath.