Adding a New TTS Backend¶
cc-vox is designed so that adding a new backend requires one new file and one registry line. No existing code needs modification.
Step 1: Create the Backend File¶
Create hooks/tts/my_backend.py:
"""My Custom TTS backend — description of what it is."""
from __future__ import annotations
import json
import os
import urllib.error
import urllib.request
from .voices import to_kokoro # if your backend uses Kokoro voice names
MY_PORT = int(os.environ.get("MY_BACKEND_PORT", "9999"))
MY_URL = f"http://localhost:{MY_PORT}"
class MyBackend:
name = "my-backend"
priority = 25 # between kokoro (20) and pocket-tts (30)
def is_available(self) -> bool:
"""Check if the service is reachable."""
try:
req = urllib.request.Request(f"{MY_URL}/health", method="GET")
resp = urllib.request.urlopen(req, timeout=1.0)
return 200 <= resp.status < 300
except (urllib.error.URLError, OSError, ValueError):
return False
def ensure_running(self) -> bool:
"""Start the service if possible, then check availability.
If your backend can't auto-start, just return is_available().
"""
return self.is_available()
def generate(self, text: str, voice: str, speed: float) -> bytes:
"""Generate WAV audio from text.
Args:
text: The text to speak
voice: Voice name in canonical Kokoro form (e.g., "af_heart")
speed: Speech speed (1.0 = normal). Ignore if unsupported.
Returns:
Raw WAV audio bytes.
"""
voice = to_kokoro(voice)
payload = json.dumps({"text": text, "voice": voice}).encode()
req = urllib.request.Request(
f"{MY_URL}/synthesize",
data=payload,
headers={"Content-Type": "application/json"},
)
resp = urllib.request.urlopen(req, timeout=60)
return resp.read()
Key Points¶
name: Must be a unique, lowercase, hyphenated identifierpriority: Lower values are tried first in auto mode. Current priorities: Fish Speech=10, Kokoro=20, pocket-tts=30is_available(): Should be fast (<1s timeout). ReturnFalseon any errorensure_running(): For auto-startable backends, start the service then poll for readinessgenerate(): Must return raw WAV bytes. Thevoiceargument is always in Kokoro form — map internally if needed
Step 2: Register the Backend¶
Edit hooks/tts/__init__.py — add one import and one dict entry:
def _registry() -> dict[str, type[TTSBackend]]:
from .fish_speech import FishSpeechBackend
from .kokoro import KokoroBackend
from .my_backend import MyBackend # (1)!
from .pocket_tts import PocketTTSBackend
return {
"kokoro": KokoroBackend,
"fish-speech": FishSpeechBackend,
"my-backend": MyBackend, # (2)!
"pocket-tts": PocketTTSBackend,
}
- Import your backend class
- Add a registry entry — the key is the name used in config and slash commands
Step 3: Done¶
That's it. Your backend is now:
- Available in auto mode (sorted by priority)
- Selectable via
/voice:speak backend my-backend - Selectable via config
backend = "my-backend" - Selectable via env
TTS_BACKEND=my-backend - Listed in
available_backend_names() - Validated in config parsing
Checklist¶
- Backend file implements
name,priority,is_available(),ensure_running(),generate() -
generate()returns raw WAV bytes -
is_available()has a short timeout (1s) - Port is configurable via environment variable
- Registry entry added in
__init__.py - Tested with
./scripts/say --voice af_heart "test"after forcing your backend