Skip to content
Docs

Deepgram Live Streaming STT

Deepgram is the default STT choice for real-time voice applications that prioritize low latency. Its Nova-2 model consistently delivers sub-300ms transcription with high accuracy, and interim results let your agent begin processing before the speaker finishes. Deepgram also supports punctuation, speaker diarization, and 36+ languages out of the box. This guide covers integrating the Deepgram STT provider with Beluga AI for live transcription.

The Deepgram integration uses the voice/stt package to connect to Deepgram’s streaming API. It supports multiple models (Nova-2 for accuracy, Base for speed), punctuation, speaker diarization, and interim results.

Terminal window
go get github.com/lookatitude/beluga-ai

Set your API key:

Terminal window
export DEEPGRAM_API_KEY="your-api-key"

Create a Deepgram STT provider:

package main
import (
"context"
"fmt"
"log"
"os"
"github.com/lookatitude/beluga-ai/voice/stt/providers/deepgram"
)
func main() {
ctx := context.Background()
config := deepgram.Config{
APIKey: os.Getenv("DEEPGRAM_API_KEY"),
Model: "nova-2",
}
provider, err := deepgram.NewDeepgramSTT(ctx, config)
if err != nil {
log.Fatalf("Failed to create provider: %v", err)
}
audio := loadAudioData()
transcript, err := provider.Transcribe(ctx, audio)
if err != nil {
log.Fatalf("Transcription failed: %v", err)
}
fmt.Printf("Transcript: %s\n", transcript)
}

Stream audio chunks for real-time transcription:

func streamTranscription(ctx context.Context, provider *deepgram.DeepgramSTT, audioStream <-chan []byte) (<-chan string, error) {
transcriptChan := make(chan string, 10)
go func() {
defer close(transcriptChan)
for audioChunk := range audioStream {
transcript, err := provider.Transcribe(ctx, audioChunk)
if err != nil {
log.Printf("Stream error: %v", err)
continue
}
transcriptChan <- transcript
}
}()
return transcriptChan, nil
}

Use the Deepgram provider in a voice session:

import "github.com/lookatitude/beluga-ai/voice/session"
sttProvider, err := deepgram.NewDeepgramSTT(ctx, deepgram.Config{
APIKey: os.Getenv("DEEPGRAM_API_KEY"),
Model: "nova-2",
})
if err != nil {
log.Fatalf("Failed to create STT provider: %v", err)
}
sess, err := session.NewVoiceSession(ctx,
session.WithSTTProvider(sttProvider),
session.WithConfig(session.DefaultConfig()),
)

Add OpenTelemetry tracing:

import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/trace"
)
tracer := otel.Tracer("beluga.voice.stt.deepgram")
ctx, span := tracer.Start(ctx, "deepgram.transcribe",
trace.WithAttributes(
attribute.String("model", config.Model),
),
)
defer span.End()
OptionDescriptionDefaultRequired
APIKeyDeepgram API key-Yes
ModelDeepgram modelnova-2No
LanguageLanguage codeenNo
PunctuateAdd punctuationtrueNo
DiarizeSpeaker diarizationfalseNo

Verify your API key is correct and active in the Deepgram dashboard.

Check network connectivity and firewall rules. Deepgram uses WebSocket connections that may be blocked by corporate firewalls.

  • Use WebSocket streaming for all real-time use cases
  • Select the appropriate model: nova-2 for accuracy, base for speed
  • Monitor API usage through the Deepgram dashboard
  • Implement reconnection logic for connection drops
  • Optimize chunk size and frequency for your latency requirements