Skip to content
Docs

VAD Providers — Voice Detection

Beluga AI provides a unified voice.VAD interface for detecting speech in audio streams. VAD providers analyze audio frames and report whether speech is present, along with confidence scores and state transitions (speech start, speech end, silence).

All VAD providers implement the same interface:

type VAD interface {
DetectActivity(ctx context.Context, audio []byte) (ActivityResult, error)
}

The ActivityResult contains:

type ActivityResult struct {
IsSpeech bool // true if speech was detected
EventType VADEventType // speech_start, speech_end, or silence
Confidence float64 // detection confidence (0.0 to 1.0)
}

You can instantiate any provider two ways:

Via the registry (recommended for dynamic configuration):

import (
"github.com/lookatitude/beluga-ai/voice"
_ "github.com/lookatitude/beluga-ai/voice/vad/providers/silero"
)
vad, err := voice.NewVAD("silero", map[string]any{
"threshold": 0.5,
})

Via direct construction (for compile-time type safety):

import "github.com/lookatitude/beluga-ai/voice/vad/providers/silero"
vad, err := silero.New(silero.Config{
Threshold: 0.5,
SampleRate: 16000,
})

VAD providers track state transitions between speech and silence:

Event TypeConstantDescription
Speech StartVADSpeechStartTransition from silence to speech
Speech EndVADSpeechEndTransition from speech to silence
SilenceVADSilenceNo speech detected (ongoing)

The voice package includes a built-in energy-threshold VAD that requires no external dependencies. It computes the RMS energy of 16-bit PCM audio and compares it against a configurable threshold:

import "github.com/lookatitude/beluga-ai/voice"
vad, err := voice.NewVAD("energy", map[string]any{
"threshold": 1000.0,
})
ProviderRegistry NameDescription
EnergyenergyBuilt-in RMS energy threshold detector
SilerosileroONNX model-based detection with energy fallback
WebRTCwebrtcEnergy + zero-crossing rate analysis

List all registered VAD providers at runtime:

for _, name := range voice.ListVAD() {
fmt.Println(name)
}
Use CaseRecommended ProviderReason
Quick prototypingenergyZero dependencies, built-in
Production accuracysileroNeural network-based detection
Low-latencywebrtcLightweight dual-metric analysis
Noise filteringwebrtcZero-crossing rate rejects noise