Skip to content
Docs

AWS Transcribe WebSocket STT

For organizations already operating within the AWS ecosystem, AWS Transcribe provides real-time speech transcription that integrates natively with IAM, CloudWatch, and other AWS services. Its automatic language detection and speaker diarization capabilities make it well suited for multi-speaker, multi-language scenarios such as contact center transcription. This guide covers integrating AWS Transcribe as an STT provider within Beluga AI.

The Amazon Transcribe integration wraps the AWS Transcribe Streaming API to implement the Beluga AI STT provider interface. Audio is streamed over WebSocket for low-latency transcription with support for multiple languages and speaker diarization.

  • Go 1.23 or later
  • AWS account with Transcribe access
  • AWS credentials configured (IAM role or access keys)
Terminal window
go get github.com/lookatitude/beluga-ai
go get github.com/aws/aws-sdk-go-v2/service/transcribestreaming

Configure AWS credentials:

Terminal window
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION="us-east-1"

Create an AWS Transcribe STT provider:

package main
import (
"context"
"fmt"
"log"
"os"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/service/transcribestreaming"
)
func main() {
ctx := context.Background()
cfg, err := config.LoadDefaultConfig(ctx,
config.WithRegion(os.Getenv("AWS_REGION")),
)
if err != nil {
log.Fatalf("Failed to load AWS config: %v", err)
}
client := transcribestreaming.NewFromConfig(cfg)
provider := NewAWSTranscribeSTT(client, "en-US")
audio := loadAudioData()
transcript, err := provider.Transcribe(ctx, audio)
if err != nil {
log.Fatalf("Transcription failed: %v", err)
}
fmt.Printf("Transcript: %s\n", transcript)
}

Wrap the AWS Transcribe client to implement the Beluga AI STT interface:

type AWSTranscribeSTT struct {
client *transcribestreaming.Client
language string
}
func NewAWSTranscribeSTT(client *transcribestreaming.Client, language string) *AWSTranscribeSTT {
return &AWSTranscribeSTT{
client: client,
language: language,
}
}
func (t *AWSTranscribeSTT) Transcribe(ctx context.Context, audio []byte) (string, error) {
// Implementation handles WebSocket connection,
// audio chunking, and result parsing via the
// AWS Transcribe Streaming API.
return "", nil
}

Use the Transcribe provider within a Beluga AI voice session:

import "github.com/lookatitude/beluga-ai/voice/session"
sess, err := session.NewVoiceSession(ctx,
session.WithSTTProvider(provider),
session.WithConfig(session.DefaultConfig()),
)
if err != nil {
log.Fatalf("Failed to create session: %v", err)
}
OptionDescriptionDefaultRequired
RegionAWS regionus-east-1No
LanguageLanguage codeen-USNo
SampleRateAudio sample rate16000No

Ensure AWS credentials are configured via environment variables, shared credentials file, or IAM role. Run aws configure to set up the default profile.

AWS Transcribe Streaming is not available in all regions. Use a supported region such as us-east-1, us-west-2, or eu-west-1.

  • Use IAM roles instead of access keys for authentication
  • Monitor usage through AWS CloudWatch and Cost Explorer
  • Choose an AWS region close to your users for lower latency
  • Use WebSocket streaming for all real-time transcription scenarios
  • Implement reconnection logic for WebSocket failures