TTS API — Text-to-Speech Providers
import "github.com/lookatitude/beluga-ai/voice/tts"Package tts provides the text-to-speech (TTS) interface and provider registry
for the Beluga AI voice pipeline. Providers implement the TTS interface and
register themselves via init() for discovery.
Core Interface
Section titled “Core Interface”The TTS interface supports both batch and streaming synthesis:
type TTS interface { Synthesize(ctx context.Context, text string, opts ...Option) ([]byte, error) SynthesizeStream(ctx context.Context, textStream iter.Seq2[string, error], opts ...Option) iter.Seq2[[]byte, error]}Audio Formats
Section titled “Audio Formats”Supported output formats are defined as AudioFormat constants:
FormatPCM, FormatOpus, FormatMP3, and FormatWAV.
Registry Pattern
Section titled “Registry Pattern”Providers register via Register in their init() function and are created
with New. Use List to discover available providers.
import _ "github.com/lookatitude/beluga-ai/voice/tts/providers/elevenlabs"
engine, err := tts.New("elevenlabs", tts.Config{Voice: "rachel"})audio, err := engine.Synthesize(ctx, "Hello, world!")
// Streaming:for chunk, err := range engine.SynthesizeStream(ctx, textStream) { if err != nil { break } transport.Send(chunk)}Frame Processor Integration
Section titled “Frame Processor Integration”Use AsFrameProcessor to wrap a TTS engine as a voice.FrameProcessor for
integration with the cascading pipeline:
processor := tts.AsFrameProcessor(engine, 24000)Configuration
Section titled “Configuration”The Config struct supports voice, model, sample rate, format, speed, pitch,
and provider-specific extras. Use functional options like WithVoice,
WithModel, WithSampleRate, WithFormat, WithSpeed, and WithPitch
to configure individual operations.
The Hooks struct provides callbacks: BeforeSynthesize, OnAudioChunk, and
OnError. Use ComposeHooks to merge multiple hooks.
Available Providers
Section titled “Available Providers”- elevenlabs — ElevenLabs (voice/tts/providers/elevenlabs)
- cartesia — Cartesia Sonic (voice/tts/providers/cartesia)
- playht — PlayHT (voice/tts/providers/playht)
- lmnt — LMNT (voice/tts/providers/lmnt)
- fish — Fish Audio (voice/tts/providers/fish)
- groq — Groq TTS (voice/tts/providers/groq)
- smallest — Smallest.ai (voice/tts/providers/smallest)
cartesia
Section titled “cartesia”import "github.com/lookatitude/beluga-ai/voice/tts/providers/cartesia"Package cartesia provides the Cartesia TTS provider for the Beluga AI voice pipeline. It uses the Cartesia Text-to-Speech API via direct HTTP for batch synthesis and streaming.
Registration
Section titled “Registration”This package registers itself as “cartesia” with the tts registry. Import it with a blank identifier to enable:
import _ "github.com/lookatitude/beluga-ai/voice/tts/providers/cartesia"engine, err := tts.New("cartesia", tts.Config{ Voice: "a0e99841-438c-4a64-b679-ae501e7d6091", Extra: map[string]any{"api_key": "sk-..."},})audio, err := engine.Synthesize(ctx, "Hello, world!")Configuration
Section titled “Configuration”Required configuration in Config.Extra:
- api_key — Cartesia API key (required)
- base_url — Custom API base URL (optional)
The default model is “sonic-2” with raw PCM output at 24000 Hz. Voice is specified as a Cartesia voice ID.
Exported Types
Section titled “Exported Types”- [Engine] — implements tts.TTS using Cartesia
- [New] — constructor accepting tts.Config
elevenlabs
Section titled “elevenlabs”import "github.com/lookatitude/beluga-ai/voice/tts/providers/elevenlabs"Package elevenlabs provides the ElevenLabs TTS provider for the Beluga AI voice pipeline. It uses the ElevenLabs Text-to-Speech API for high-quality voice synthesis.
Registration
Section titled “Registration”This package registers itself as “elevenlabs” with the tts registry. Import it with a blank identifier to enable:
import _ "github.com/lookatitude/beluga-ai/voice/tts/providers/elevenlabs"engine, err := tts.New("elevenlabs", tts.Config{ Voice: "rachel", Extra: map[string]any{"api_key": "xi-..."},})audio, err := engine.Synthesize(ctx, "Hello, world!")Configuration
Section titled “Configuration”Required configuration in Config.Extra:
- api_key — ElevenLabs API key (required)
- base_url — Custom API base URL (optional)
The default voice is “Rachel” (21m00Tcm4TlvDq8ikWAM) and the default model is “eleven_monolingual_v1”. Output format defaults to audio/mpeg.
Exported Types
Section titled “Exported Types”- [Engine] — implements tts.TTS using ElevenLabs
- [New] — constructor accepting tts.Config
import "github.com/lookatitude/beluga-ai/voice/tts/providers/fish"Package fish provides the Fish Audio TTS provider for the Beluga AI voice pipeline. It uses the Fish Audio Text-to-Speech API for voice synthesis.
Registration
Section titled “Registration”This package registers itself as “fish” with the tts registry. Import it with a blank identifier to enable:
import _ "github.com/lookatitude/beluga-ai/voice/tts/providers/fish"engine, err := tts.New("fish", tts.Config{ Voice: "default", Extra: map[string]any{"api_key": "..."},})audio, err := engine.Synthesize(ctx, "Hello!")Configuration
Section titled “Configuration”Required configuration in Config.Extra:
- api_key — Fish Audio API key (required)
- base_url — Custom API base URL (optional)
The default voice is “default”. Voice is used as the reference_id in the Fish Audio API.
Exported Types
Section titled “Exported Types”- [Engine] — implements tts.TTS using Fish Audio
- [New] — constructor accepting tts.Config
import "github.com/lookatitude/beluga-ai/voice/tts/providers/groq"Package groq provides the Groq TTS provider for the Beluga AI voice pipeline. It uses the Groq TTS endpoint (OpenAI-compatible API) for voice synthesis.
Registration
Section titled “Registration”This package registers itself as “groq” with the tts registry. Import it with a blank identifier to enable:
import _ "github.com/lookatitude/beluga-ai/voice/tts/providers/groq"engine, err := tts.New("groq", tts.Config{ Voice: "aura-asteria-en", Extra: map[string]any{"api_key": "gsk-..."},})audio, err := engine.Synthesize(ctx, "Hello!")Configuration
Section titled “Configuration”Required configuration in Config.Extra:
- api_key — Groq API key (required)
- base_url — Custom API base URL (optional)
The default voice is “aura-asteria-en” and the default model is “playai-tts”. Speed and output format are configurable through [tts.Config].
Exported Types
Section titled “Exported Types”- [Engine] — implements tts.TTS using Groq
- [New] — constructor accepting tts.Config
import "github.com/lookatitude/beluga-ai/voice/tts/providers/lmnt"Package lmnt provides the LMNT TTS provider for the Beluga AI voice pipeline. It uses the LMNT Text-to-Speech API for low-latency voice synthesis.
Registration
Section titled “Registration”This package registers itself as “lmnt” with the tts registry. Import it with a blank identifier to enable:
import _ "github.com/lookatitude/beluga-ai/voice/tts/providers/lmnt"engine, err := tts.New("lmnt", tts.Config{ Voice: "lily", Extra: map[string]any{"api_key": "..."},})audio, err := engine.Synthesize(ctx, "Hello!")Configuration
Section titled “Configuration”Required configuration in Config.Extra:
- api_key — LMNT API key (required)
- base_url — Custom API base URL (optional)
The default voice is “lily”. Speed and output format are configurable through [tts.Config].
Exported Types
Section titled “Exported Types”- [Engine] — implements tts.TTS using LMNT
- [New] — constructor accepting tts.Config
playht
Section titled “playht”import "github.com/lookatitude/beluga-ai/voice/tts/providers/playht"Package playht provides the PlayHT TTS provider for the Beluga AI voice pipeline. It uses the PlayHT Text-to-Speech API for voice synthesis.
Registration
Section titled “Registration”This package registers itself as “playht” with the tts registry. Import it with a blank identifier to enable:
import _ "github.com/lookatitude/beluga-ai/voice/tts/providers/playht"engine, err := tts.New("playht", tts.Config{ Voice: "s3://voice-cloning-zero-shot/...", Extra: map[string]any{"api_key": "...", "user_id": "..."},})audio, err := engine.Synthesize(ctx, "Hello!")Configuration
Section titled “Configuration”Required configuration in Config.Extra:
- api_key — PlayHT API key (required)
- user_id — PlayHT user ID (required)
- base_url — Custom API base URL (optional)
Speed and output format are configurable through [tts.Config]. Default output format is MP3.
Exported Types
Section titled “Exported Types”- [Engine] — implements tts.TTS using PlayHT
- [New] — constructor accepting tts.Config
smallest
Section titled “smallest”import "github.com/lookatitude/beluga-ai/voice/tts/providers/smallest"Package smallest provides the Smallest.ai TTS provider for the Beluga AI voice pipeline. It uses the Smallest.ai Text-to-Speech API for low-latency voice synthesis.
Registration
Section titled “Registration”This package registers itself as “smallest” with the tts registry. Import it with a blank identifier to enable:
import _ "github.com/lookatitude/beluga-ai/voice/tts/providers/smallest"engine, err := tts.New("smallest", tts.Config{ Voice: "emily", Extra: map[string]any{"api_key": "..."},})audio, err := engine.Synthesize(ctx, "Hello!")Configuration
Section titled “Configuration”Required configuration in Config.Extra:
- api_key — Smallest.ai API key (required)
- base_url — Custom API base URL (optional)
The default voice is “emily” and the default model is “lightning”. Speed is configurable through [tts.Config].
Exported Types
Section titled “Exported Types”- [Engine] — implements tts.TTS using Smallest.ai
- [New] — constructor accepting tts.Config