ElevenLabs Voice Provider
ElevenLabs provides both speech-to-text (via the Scribe engine) and text-to-speech (with voice cloning and premium synthesis). Beluga AI registers two separate providers: "elevenlabs" in the STT registry and "elevenlabs" in the TTS registry.
Choose ElevenLabs when voice quality is the top priority — particularly for applications requiring natural-sounding speech with voice cloning, custom voices, or multilingual synthesis. The TTS output is widely regarded as among the most expressive available. For lower-latency TTS in real-time pipelines, also evaluate Cartesia or LMNT.
Installation
Section titled “Installation”// STT (Scribe)import _ "github.com/lookatitude/beluga-ai/voice/stt/providers/elevenlabs"
// TTSimport _ "github.com/lookatitude/beluga-ai/voice/tts/providers/elevenlabs"STT (Scribe)
Section titled “STT (Scribe)”Configuration
Section titled “Configuration”| Field | Type | Default | Description |
|---|---|---|---|
Language | string | — | Language code for transcription |
Model | string | "scribe_v1" | Scribe model identifier |
Extra | — | — | See below |
Extra Fields
Section titled “Extra Fields”| Key | Type | Required | Description |
|---|---|---|---|
api_key | string | Yes | ElevenLabs API key (xi-...) |
base_url | string | No | Override base URL |
Basic Usage
Section titled “Basic Usage”package main
import ( "context" "fmt" "log" "os"
"github.com/lookatitude/beluga-ai/voice/stt" _ "github.com/lookatitude/beluga-ai/voice/stt/providers/elevenlabs")
func main() { ctx := context.Background()
engine, err := stt.New("elevenlabs", stt.Config{ Extra: map[string]any{"api_key": os.Getenv("ELEVENLABS_API_KEY")}, }) if err != nil { log.Fatal(err) }
audio, err := os.ReadFile("recording.wav") if err != nil { log.Fatal(err) }
text, err := engine.Transcribe(ctx, audio) if err != nil { log.Fatal(err) }
fmt.Println("Transcript:", text)}Streaming
Section titled “Streaming”ElevenLabs Scribe does not support native streaming. The TranscribeStream method transcribes each audio chunk independently as a batch request:
for event, err := range engine.TranscribeStream(ctx, audioStream) { if err != nil { log.Printf("error: %v", err) break } fmt.Printf("[FINAL] %s\n", event.Text)}For real-time streaming with interim results, consider Deepgram or AssemblyAI.
Configuration
Section titled “Configuration”| Field | Type | Default | Description |
|---|---|---|---|
Voice | string | "21m00Tcm4TlvDq8ikWAM" | Voice ID (Rachel by default) |
Model | string | "eleven_monolingual_v1" | TTS model identifier |
Format | AudioFormat | — | Output audio format |
Extra | — | — | See below |
Extra Fields
Section titled “Extra Fields”| Key | Type | Required | Description |
|---|---|---|---|
api_key | string | Yes | ElevenLabs API key (xi-...) |
base_url | string | No | Override base URL |
Basic Usage
Section titled “Basic Usage”package main
import ( "context" "log" "os"
"github.com/lookatitude/beluga-ai/voice/tts" _ "github.com/lookatitude/beluga-ai/voice/tts/providers/elevenlabs")
func main() { ctx := context.Background()
engine, err := tts.New("elevenlabs", tts.Config{ Voice: "rachel", Extra: map[string]any{"api_key": os.Getenv("ELEVENLABS_API_KEY")}, }) if err != nil { log.Fatal(err) }
audio, err := engine.Synthesize(ctx, "Hello, welcome to Beluga AI.") if err != nil { log.Fatal(err) }
if err := os.WriteFile("output.mp3", audio, 0644); err != nil { log.Fatal(err) }}Direct Construction
Section titled “Direct Construction”import "github.com/lookatitude/beluga-ai/voice/tts/providers/elevenlabs"
engine, err := elevenlabs.New(tts.Config{ Voice: "rachel", Model: "eleven_multilingual_v2", Extra: map[string]any{"api_key": os.Getenv("ELEVENLABS_API_KEY")},})Streaming
Section titled “Streaming”The TTS streaming interface synthesizes each text chunk from the input stream independently:
for chunk, err := range engine.SynthesizeStream(ctx, textStream) { if err != nil { log.Printf("error: %v", err) break } transport.Send(chunk)}FrameProcessor Integration
Section titled “FrameProcessor Integration”processor := tts.AsFrameProcessor(engine, 24000, tts.WithVoice("rachel"))pipeline := voice.Chain(sttProcessor, llmProcessor, processor)Advanced Features
Section titled “Advanced Features”Voice Settings
Section titled “Voice Settings”The provider sends stability and similarity boost parameters with each synthesis request. The default values (stability: 0.5, similarity_boost: 0.75) provide a balanced output. For production use, you can adjust these through the ElevenLabs dashboard per voice.
Per-Request Options
Section titled “Per-Request Options”// STTtext, err := sttEngine.Transcribe(ctx, audio, stt.WithLanguage("fr"), stt.WithModel("scribe_v1"),)
// TTSaudio, err := ttsEngine.Synthesize(ctx, "Bonjour!", tts.WithVoice("custom-voice-id"), tts.WithModel("eleven_multilingual_v2"),)