Skip to content
Docs

Groq Whisper Voice Provider

Groq provides both speech-to-text (via Whisper models) and text-to-speech through its OpenAI-compatible API, running on specialized LPU hardware for ultra-fast inference. Beluga AI registers "groq" in both the STT and TTS registries.

Choose Groq when you need the fastest possible batch transcription — Groq’s LPU hardware delivers Whisper inference significantly faster than CPU-based alternatives. The provider does not support real-time streaming, so it works best for batch or buffered transcription where all audio is available upfront. For real-time streaming, use Deepgram or AssemblyAI instead.

// STT (Whisper)
import _ "github.com/lookatitude/beluga-ai/voice/stt/providers/groq"
// TTS
import _ "github.com/lookatitude/beluga-ai/voice/tts/providers/groq"
FieldTypeDefaultDescription
LanguagestringISO-639 language code
Modelstring"whisper-large-v3"Whisper model identifier
ExtraSee below
KeyTypeRequiredDescription
api_keystringYesGroq API key (gsk-...)
base_urlstringNoOverride base URL
package main
import (
"context"
"fmt"
"log"
"os"
"github.com/lookatitude/beluga-ai/voice/stt"
_ "github.com/lookatitude/beluga-ai/voice/stt/providers/groq"
)
func main() {
ctx := context.Background()
engine, err := stt.New("groq", stt.Config{
Model: "whisper-large-v3",
Extra: map[string]any{"api_key": os.Getenv("GROQ_API_KEY")},
})
if err != nil {
log.Fatal(err)
}
audio, err := os.ReadFile("recording.wav")
if err != nil {
log.Fatal(err)
}
text, err := engine.Transcribe(ctx, audio)
if err != nil {
log.Fatal(err)
}
fmt.Println("Transcript:", text)
}

Groq Whisper does not support native streaming. The TranscribeStream method buffers all audio chunks and performs a single batch transcription when the stream ends:

for event, err := range engine.TranscribeStream(ctx, audioStream) {
if err != nil {
log.Printf("error: %v", err)
break
}
fmt.Printf("[FINAL] %s\n", event.Text)
}

For real-time streaming with interim results, consider Deepgram or AssemblyAI.


FieldTypeDefaultDescription
Voicestring"aura-asteria-en"Voice identifier
Modelstring"playai-tts"TTS model identifier
FormatAudioFormatOutput format (mp3, wav, pcm)
Speedfloat64Speech rate multiplier (1.0 = normal)
ExtraSee below
KeyTypeRequiredDescription
api_keystringYesGroq API key (gsk-...)
base_urlstringNoOverride base URL
package main
import (
"context"
"log"
"os"
"github.com/lookatitude/beluga-ai/voice/tts"
_ "github.com/lookatitude/beluga-ai/voice/tts/providers/groq"
)
func main() {
ctx := context.Background()
engine, err := tts.New("groq", tts.Config{
Voice: "aura-asteria-en",
Extra: map[string]any{"api_key": os.Getenv("GROQ_API_KEY")},
})
if err != nil {
log.Fatal(err)
}
audio, err := engine.Synthesize(ctx, "Hello, welcome to Beluga AI.")
if err != nil {
log.Fatal(err)
}
if err := os.WriteFile("output.mp3", audio, 0644); err != nil {
log.Fatal(err)
}
}
import "github.com/lookatitude/beluga-ai/voice/tts/providers/groq"
engine, err := groq.New(tts.Config{
Voice: "aura-asteria-en",
Model: "playai-tts",
Extra: map[string]any{"api_key": os.Getenv("GROQ_API_KEY")},
})

The streaming interface synthesizes each text chunk independently:

for chunk, err := range engine.SynthesizeStream(ctx, textStream) {
if err != nil {
log.Printf("error: %v", err)
break
}
transport.Send(chunk)
}
// STT
sttProcessor := stt.AsFrameProcessor(sttEngine, stt.WithLanguage("en"))
// TTS
ttsProcessor := tts.AsFrameProcessor(ttsEngine, 24000, tts.WithVoice("aura-asteria-en"))
pipeline := voice.Chain(sttProcessor, llmProcessor, ttsProcessor)

Both the STT and TTS providers use Groq’s OpenAI-compatible API endpoints (/audio/transcriptions for STT, /audio/speech for TTS), making it straightforward to swap with other OpenAI-compatible endpoints.

// STT
text, err := sttEngine.Transcribe(ctx, audio,
stt.WithLanguage("fr"),
stt.WithModel("whisper-large-v3-turbo"),
)
// TTS
audio, err := ttsEngine.Synthesize(ctx, "Hello!",
tts.WithVoice("aura-luna-en"),
tts.WithFormat(tts.FormatWAV),
tts.WithSpeed(1.2),
)