TTS Backend Architecture¶

The TTS system follows a plugin architecture with a Protocol-based contract. Each backend is a self-contained module with its own detection, startup, and audio generation logic.

The Protocol¶

Every backend implements the TTSBackend Protocol defined in hooks/tts/_protocol.py:

@runtime_checkable
class TTSBackend(Protocol):
    name: str       # "kokoro", "fish-speech", "chatterbox", "qwen3-tts", "pocket-tts"
    priority: int   # lower = tried first in auto mode

    def is_available(self) -> bool: ...
    def ensure_running(self) -> bool: ...
    def generate(self, text: str, voice: str, speed: float) -> bytes: ...

Method	Purpose
`is_available()`	Check if the service is reachable right now
`ensure_running()`	Start the service if possible, then check availability
`generate()`	Produce WAV audio bytes from text

Conventions:

voice is always passed in canonical Kokoro form — backends map internally
speed may be ignored by backends that don't support it
generate() returns raw WAV bytes
GPU checks are internal to the backend (not in generic selection logic)

The Registry¶

hooks/tts/__init__.py maintains the backend registry:

def _registry() -> dict[str, type[TTSBackend]]:
    from .chatterbox import ChatterboxBackend
    from .fish_speech import FishSpeechBackend
    from .kokoro import KokoroBackend
    from .pocket_tts import PocketTTSBackend
    from .qwen3_tts import Qwen3TTSBackend

    return {
        "kokoro": KokoroBackend,
        "fish-speech": FishSpeechBackend,
        "pocket-tts": PocketTTSBackend,
        "chatterbox": ChatterboxBackend,
        "qwen3-tts": Qwen3TTSBackend,
    }

`select_backend(backend_pref, fallback)`¶

If backend_pref is a specific backend name, try it
If unavailable and fallback=True, fall through to auto
In auto mode, sort all backends by priority (ascending) and return the first that ensure_running() returns True
Returns None if nothing is reachable

Backend Implementations¶

KokoroBackend (`kokoro.py`)¶

Property	Value
Priority	20
API	OpenAI-compatible `/v1/audio/speech`
Health check	`GET /v1/models`
Voice support	Full catalog via `to_kokoro()`
Speed support	Yes (payload field)

FishSpeechBackend (`fish_speech.py`)¶

Property	Value
Priority	10
API	Gradio SSE (`/gradio_api/call/partial`)
Health check	`GET /config` + GPU utilization < threshold
Voice support	Ignored (uses own model)
Speed support	Ignored

The is_available() method checks both service reachability and GPU utilization via nvidia-smi.

ChatterboxBackend (`chatterbox.py`)¶

Property	Value
Priority	12
API	OpenAI-compatible `/v1/audio/speech`
Health check	`GET /voices`
Voice support	Ignored (uses voice cloning with default voice)
Speed support	Ignored

Chatterbox uses a voice cloning model — it always sends "voice": "default" regardless of the configured voice name.

Qwen3TTSBackend (`qwen3_tts.py`)¶

Property	Value
Priority	14
API	FastAPI REST `GET /base_tts/`
Health check	`GET /openapi.json`
Voice support	Ignored (uses default English voice)
Speed support	Yes (query parameter)

Qwen3-TTS uses the default English voice from the model and passes the speed parameter via query string.

PocketTTSBackend (`pocket_tts.py`)¶

Property	Value
Priority	30
API	Multipart form-data `/tts`
Health check	`GET /health`
Voice support	Aliased via `to_alias()`
Speed support	Ignored
Auto-start	Yes — spawns `uvx pocket-tts serve`

ensure_running() calls _start() which launches the server via uvx and polls /health for up to 60 seconds.

Playback System¶

`play_audio(audio_data: bytes)`¶

Audio player priority:

ffplay — streaming via stdin pipe (preferred, lowest latency)
afplay — macOS native player (temp file)
aplay — ALSA player on Linux (temp file)
paplay — PulseAudio player on Linux (temp file)

`PlaybackLock`¶

File-based mutex at /tmp/voice-playback.lock using fcntl.flock:

Prevents overlapping audio from concurrent responses
30-second acquire timeout
Supports context manager (with PlaybackLock(): ...)
Writes PID for diagnostics

`SessionState`¶

Sentinel files in /tmp/ for stop-hook integration:

File	Meaning
`/tmp/voice-{id}-running`	TTS is generating/playing
`/tmp/voice-{id}-done`	TTS completed successfully
`/tmp/voice-{id}-failed`	TTS failed

TTS Backend Architecture¶

The Protocol¶

The Registry¶

select_backend(backend_pref, fallback)¶

Backend Implementations¶

KokoroBackend (kokoro.py)¶

FishSpeechBackend (fish_speech.py)¶

ChatterboxBackend (chatterbox.py)¶

Qwen3TTSBackend (qwen3_tts.py)¶

PocketTTSBackend (pocket_tts.py)¶