Skip to content
Docs

PlayHT Voice Provider

PlayHT provides AI-powered text-to-speech with voice cloning and multiple output formats. The Beluga AI provider uses the PlayHT v2 API for synthesis, supporting configurable voice selection, output format, and speech speed.

Choose PlayHT when you need voice cloning with flexible output formats (MP3, WAV, PCM, Opus) and fine-grained speed control. PlayHT’s zero-shot voice cloning lets you create custom voices from short audio samples. For the lowest synthesis latency, consider Cartesia or LMNT.

import _ "github.com/lookatitude/beluga-ai/voice/tts/providers/playht"

The blank import registers the "playht" provider with the TTS registry.

FieldTypeDefaultDescription
VoicestringVoice URL (e.g., s3://voice-cloning-zero-shot/...)
FormatAudioFormat"mp3"Output format (mp3, wav, pcm, opus)
Speedfloat64Speech rate multiplier (1.0 = normal)
ExtraSee below
KeyTypeRequiredDescription
api_keystringYesPlayHT API key
user_idstringYesPlayHT user ID
base_urlstringNoOverride base URL
package main
import (
"context"
"log"
"os"
"github.com/lookatitude/beluga-ai/voice/tts"
_ "github.com/lookatitude/beluga-ai/voice/tts/providers/playht"
)
func main() {
ctx := context.Background()
engine, err := tts.New("playht", tts.Config{
Voice: "s3://voice-cloning-zero-shot/775ae416-49bb-4fb6-bd45-740f205d3571/jennifersaad/manifest.json",
Extra: map[string]any{
"api_key": os.Getenv("PLAYHT_API_KEY"),
"user_id": os.Getenv("PLAYHT_USER_ID"),
},
})
if err != nil {
log.Fatal(err)
}
audio, err := engine.Synthesize(ctx, "Hello, welcome to Beluga AI.")
if err != nil {
log.Fatal(err)
}
if err := os.WriteFile("output.mp3", audio, 0644); err != nil {
log.Fatal(err)
}
}
import "github.com/lookatitude/beluga-ai/voice/tts/providers/playht"
engine, err := playht.New(tts.Config{
Voice: "s3://voice-cloning-zero-shot/775ae416-49bb-4fb6-bd45-740f205d3571/jennifersaad/manifest.json",
Extra: map[string]any{
"api_key": os.Getenv("PLAYHT_API_KEY"),
"user_id": os.Getenv("PLAYHT_USER_ID"),
},
})

The streaming interface synthesizes each text chunk independently:

for chunk, err := range engine.SynthesizeStream(ctx, textStream) {
if err != nil {
log.Printf("error: %v", err)
break
}
transport.Send(chunk)
}
processor := tts.AsFrameProcessor(engine, 24000)
pipeline := voice.Chain(sttProcessor, llmProcessor, processor)
audio, err := engine.Synthesize(ctx, "Hello!",
tts.WithVoice("different-voice-url"),
tts.WithFormat(tts.FormatWAV),
tts.WithSpeed(1.2),
)

PlayHT requires both an API key and a user ID. These are sent as Authorization: Bearer <api_key> and X-USER-ID: <user_id> headers respectively.

engine, err := tts.New("playht", tts.Config{
Extra: map[string]any{
"api_key": os.Getenv("PLAYHT_API_KEY"),
"user_id": os.Getenv("PLAYHT_USER_ID"),
"base_url": "https://playht.internal.corp/api/v2",
},
})