Groq Whisper Voice Provider
Groq provides both speech-to-text (via Whisper models) and text-to-speech through its OpenAI-compatible API, running on specialized LPU hardware for ultra-fast inference. Beluga AI registers "groq" in both the STT and TTS registries.
Choose Groq when you need the fastest possible batch transcription — Groq’s LPU hardware delivers Whisper inference significantly faster than CPU-based alternatives. The provider does not support real-time streaming, so it works best for batch or buffered transcription where all audio is available upfront. For real-time streaming, use Deepgram or AssemblyAI instead.
Installation
Section titled “Installation”// STT (Whisper)import _ "github.com/lookatitude/beluga-ai/voice/stt/providers/groq"
// TTSimport _ "github.com/lookatitude/beluga-ai/voice/tts/providers/groq"STT (Whisper on Groq)
Section titled “STT (Whisper on Groq)”Configuration
Section titled “Configuration”| Field | Type | Default | Description |
|---|---|---|---|
Language | string | — | ISO-639 language code |
Model | string | "whisper-large-v3" | Whisper model identifier |
Extra | — | — | See below |
Extra Fields
Section titled “Extra Fields”| Key | Type | Required | Description |
|---|---|---|---|
api_key | string | Yes | Groq API key (gsk-...) |
base_url | string | No | Override base URL |
Basic Usage
Section titled “Basic Usage”package main
import ( "context" "fmt" "log" "os"
"github.com/lookatitude/beluga-ai/voice/stt" _ "github.com/lookatitude/beluga-ai/voice/stt/providers/groq")
func main() { ctx := context.Background()
engine, err := stt.New("groq", stt.Config{ Model: "whisper-large-v3", Extra: map[string]any{"api_key": os.Getenv("GROQ_API_KEY")}, }) if err != nil { log.Fatal(err) }
audio, err := os.ReadFile("recording.wav") if err != nil { log.Fatal(err) }
text, err := engine.Transcribe(ctx, audio) if err != nil { log.Fatal(err) }
fmt.Println("Transcript:", text)}Streaming
Section titled “Streaming”Groq Whisper does not support native streaming. The TranscribeStream method buffers all audio chunks and performs a single batch transcription when the stream ends:
for event, err := range engine.TranscribeStream(ctx, audioStream) { if err != nil { log.Printf("error: %v", err) break } fmt.Printf("[FINAL] %s\n", event.Text)}For real-time streaming with interim results, consider Deepgram or AssemblyAI.
Configuration
Section titled “Configuration”| Field | Type | Default | Description |
|---|---|---|---|
Voice | string | "aura-asteria-en" | Voice identifier |
Model | string | "playai-tts" | TTS model identifier |
Format | AudioFormat | — | Output format (mp3, wav, pcm) |
Speed | float64 | — | Speech rate multiplier (1.0 = normal) |
Extra | — | — | See below |
Extra Fields
Section titled “Extra Fields”| Key | Type | Required | Description |
|---|---|---|---|
api_key | string | Yes | Groq API key (gsk-...) |
base_url | string | No | Override base URL |
Basic Usage
Section titled “Basic Usage”package main
import ( "context" "log" "os"
"github.com/lookatitude/beluga-ai/voice/tts" _ "github.com/lookatitude/beluga-ai/voice/tts/providers/groq")
func main() { ctx := context.Background()
engine, err := tts.New("groq", tts.Config{ Voice: "aura-asteria-en", Extra: map[string]any{"api_key": os.Getenv("GROQ_API_KEY")}, }) if err != nil { log.Fatal(err) }
audio, err := engine.Synthesize(ctx, "Hello, welcome to Beluga AI.") if err != nil { log.Fatal(err) }
if err := os.WriteFile("output.mp3", audio, 0644); err != nil { log.Fatal(err) }}Direct Construction
Section titled “Direct Construction”import "github.com/lookatitude/beluga-ai/voice/tts/providers/groq"
engine, err := groq.New(tts.Config{ Voice: "aura-asteria-en", Model: "playai-tts", Extra: map[string]any{"api_key": os.Getenv("GROQ_API_KEY")},})Streaming
Section titled “Streaming”The streaming interface synthesizes each text chunk independently:
for chunk, err := range engine.SynthesizeStream(ctx, textStream) { if err != nil { log.Printf("error: %v", err) break } transport.Send(chunk)}FrameProcessor Integration
Section titled “FrameProcessor Integration”// STTsttProcessor := stt.AsFrameProcessor(sttEngine, stt.WithLanguage("en"))
// TTSttsProcessor := tts.AsFrameProcessor(ttsEngine, 24000, tts.WithVoice("aura-asteria-en"))
pipeline := voice.Chain(sttProcessor, llmProcessor, ttsProcessor)Advanced Features
Section titled “Advanced Features”OpenAI-Compatible API
Section titled “OpenAI-Compatible API”Both the STT and TTS providers use Groq’s OpenAI-compatible API endpoints (/audio/transcriptions for STT, /audio/speech for TTS), making it straightforward to swap with other OpenAI-compatible endpoints.
Per-Request Options
Section titled “Per-Request Options”// STTtext, err := sttEngine.Transcribe(ctx, audio, stt.WithLanguage("fr"), stt.WithModel("whisper-large-v3-turbo"),)
// TTSaudio, err := ttsEngine.Synthesize(ctx, "Hello!", tts.WithVoice("aura-luna-en"), tts.WithFormat(tts.FormatWAV), tts.WithSpeed(1.2),)