Skip to content
Docs

Deepgram Voice Provider

Deepgram provides high-accuracy speech-to-text transcription with native WebSocket streaming support. The Beluga AI provider uses Deepgram’s REST API for batch transcription and WebSocket API for real-time streaming, delivering word-level timing, speaker diarization, and automatic punctuation.

Choose Deepgram when you need real-time streaming STT with low latency and high accuracy. Its native WebSocket support makes it a strong default for production voice pipelines where interim results matter. For batch-only workloads where cost is the priority, consider Groq Whisper instead.

import _ "github.com/lookatitude/beluga-ai/voice/stt/providers/deepgram"

The blank import registers the "deepgram" provider with the STT registry.

FieldTypeDefaultDescription
LanguagestringBCP-47 language code (e.g., "en", "es")
Modelstring"nova-2"Deepgram model (nova-2, nova, enhanced)
PunctuationboolfalseEnable automatic punctuation
DiarizationboolfalseEnable speaker identification
SampleRateintAudio sample rate in Hz
EncodingstringAudio encoding ("linear16", "opus")
ExtraSee below
KeyTypeRequiredDescription
api_keystringYesDeepgram API key
base_urlstringNoOverride REST base URL
ws_urlstringNoOverride WebSocket base URL
package main
import (
"context"
"fmt"
"log"
"os"
"github.com/lookatitude/beluga-ai/voice/stt"
_ "github.com/lookatitude/beluga-ai/voice/stt/providers/deepgram"
)
func main() {
ctx := context.Background()
engine, err := stt.New("deepgram", stt.Config{
Language: "en",
Model: "nova-2",
Extra: map[string]any{"api_key": os.Getenv("DEEPGRAM_API_KEY")},
})
if err != nil {
log.Fatal(err)
}
audio, err := os.ReadFile("recording.wav")
if err != nil {
log.Fatal(err)
}
text, err := engine.Transcribe(ctx, audio)
if err != nil {
log.Fatal(err)
}
fmt.Println("Transcript:", text)
}

For compile-time type safety, use the provider package directly:

import "github.com/lookatitude/beluga-ai/voice/stt/providers/deepgram"
engine, err := deepgram.New(stt.Config{
Language: "en",
Model: "nova-2",
Punctuation: true,
Diarization: true,
Extra: map[string]any{"api_key": os.Getenv("DEEPGRAM_API_KEY")},
})

Deepgram supports native real-time streaming via WebSocket. Audio chunks are sent over the socket and transcript events are emitted as they become available, with both interim (partial) and final results.

func transcribeStream(ctx context.Context, engine stt.STT, audioStream iter.Seq2[[]byte, error]) {
for event, err := range engine.TranscribeStream(ctx, audioStream) {
if err != nil {
log.Printf("stream error: %v", err)
break
}
if event.IsFinal {
fmt.Printf("[FINAL] %s (confidence=%.2f)\n", event.Text, event.Confidence)
} else {
fmt.Printf("[PARTIAL] %s\n", event.Text)
}
}
}

Transcript events include word-level timing when available:

for event, err := range engine.TranscribeStream(ctx, audioStream) {
if err != nil {
log.Printf("error: %v", err)
break
}
for _, word := range event.Words {
fmt.Printf(" %s [%.2fs - %.2fs] (%.2f)\n",
word.Text, word.Start.Seconds(), word.End.Seconds(), word.Confidence)
}
}

Wrap the engine as a FrameProcessor for use in a voice pipeline:

import "github.com/lookatitude/beluga-ai/voice/stt"
processor := stt.AsFrameProcessor(engine, stt.WithLanguage("en"))
// Use in a pipeline
pipeline := voice.Chain(vadProcessor, processor, llmProcessor, ttsProcessor)

Override configuration on individual calls:

text, err := engine.Transcribe(ctx, audio,
stt.WithLanguage("es"),
stt.WithModel("nova-2"),
stt.WithPunctuation(true),
stt.WithDiarization(true),
stt.WithEncoding("linear16"),
stt.WithSampleRate(16000),
)

For self-hosted or on-premise Deepgram deployments:

engine, err := stt.New("deepgram", stt.Config{
Model: "nova-2",
Extra: map[string]any{
"api_key": os.Getenv("DEEPGRAM_API_KEY"),
"base_url": "https://deepgram.internal.corp/v1",
"ws_url": "wss://deepgram.internal.corp/v1",
},
})