AssemblyAI Voice Provider
AssemblyAI provides speech-to-text transcription with both asynchronous batch processing and real-time WebSocket streaming. The Beluga AI provider uses AssemblyAI’s upload-and-poll workflow for batch transcription and the real-time WebSocket API for streaming, delivering word-level timing, speaker labels, and automatic punctuation.
Choose AssemblyAI when you need both real-time streaming and high-quality async batch transcription from a single provider. Its upload-and-poll batch workflow is well-suited for pre-recorded audio processing, while WebSocket streaming handles live audio. For the lowest streaming latency, also evaluate Deepgram.
Installation
Section titled “Installation”import _ "github.com/lookatitude/beluga-ai/voice/stt/providers/assemblyai"The blank import registers the "assemblyai" provider with the STT registry.
Configuration
Section titled “Configuration”| Field | Type | Default | Description |
|---|---|---|---|
Language | string | — | BCP-47 language code (e.g., "en") |
Punctuation | bool | false | Enable automatic punctuation |
Diarization | bool | false | Enable speaker labels |
SampleRate | int | 16000 | Audio sample rate in Hz (for streaming) |
Extra | — | — | See below |
Extra Fields
Section titled “Extra Fields”| Key | Type | Required | Description |
|---|---|---|---|
api_key | string | Yes | AssemblyAI API key |
base_url | string | No | Override REST base URL |
ws_url | string | No | Override WebSocket URL |
Basic Usage
Section titled “Basic Usage”package main
import ( "context" "fmt" "log" "os"
"github.com/lookatitude/beluga-ai/voice/stt" _ "github.com/lookatitude/beluga-ai/voice/stt/providers/assemblyai")
func main() { ctx := context.Background()
engine, err := stt.New("assemblyai", stt.Config{ Language: "en", Extra: map[string]any{"api_key": os.Getenv("ASSEMBLYAI_API_KEY")}, }) if err != nil { log.Fatal(err) }
audio, err := os.ReadFile("recording.wav") if err != nil { log.Fatal(err) }
text, err := engine.Transcribe(ctx, audio) if err != nil { log.Fatal(err) }
fmt.Println("Transcript:", text)}Direct Construction
Section titled “Direct Construction”import "github.com/lookatitude/beluga-ai/voice/stt/providers/assemblyai"
engine, err := assemblyai.New(stt.Config{ Language: "en", Punctuation: true, Diarization: true, Extra: map[string]any{"api_key": os.Getenv("ASSEMBLYAI_API_KEY")},})Batch Transcription
Section titled “Batch Transcription”The batch workflow uploads audio to AssemblyAI, creates a transcript job, and polls for completion. This is handled automatically by Transcribe:
- Audio bytes are uploaded to AssemblyAI’s upload endpoint.
- A transcript job is created with the uploaded audio URL.
- The provider polls the transcript status every 500ms until completion.
This makes batch transcription best suited for pre-recorded audio rather than real-time use cases.
Streaming
Section titled “Streaming”AssemblyAI supports native real-time streaming via WebSocket. The provider emits both partial (PartialTranscript) and final (FinalTranscript) events:
func transcribeStream(ctx context.Context, engine stt.STT, audioStream iter.Seq2[[]byte, error]) { for event, err := range engine.TranscribeStream(ctx, audioStream, stt.WithSampleRate(16000), ) { if err != nil { log.Printf("stream error: %v", err) break } if event.IsFinal { fmt.Printf("[FINAL] %s\n", event.Text) } else { fmt.Printf("[PARTIAL] %s\n", event.Text) } }}Word-level timing is available on transcript events:
for _, word := range event.Words { fmt.Printf(" %s [%v - %v] (%.2f)\n", word.Text, word.Start, word.End, word.Confidence)}FrameProcessor Integration
Section titled “FrameProcessor Integration”processor := stt.AsFrameProcessor(engine, stt.WithLanguage("en"))pipeline := voice.Chain(vadProcessor, processor, llmProcessor, ttsProcessor)Advanced Features
Section titled “Advanced Features”Per-Request Options
Section titled “Per-Request Options”text, err := engine.Transcribe(ctx, audio, stt.WithLanguage("es"), stt.WithPunctuation(true), stt.WithDiarization(true), stt.WithSampleRate(16000),)Custom Endpoint
Section titled “Custom Endpoint”engine, err := stt.New("assemblyai", stt.Config{ Language: "en", Extra: map[string]any{ "api_key": os.Getenv("ASSEMBLYAI_API_KEY"), "base_url": "https://assemblyai.internal.corp/v2", "ws_url": "wss://assemblyai.internal.corp/v2/realtime/ws", },})