Skip to content
Docs

Heuristic Turn Detection Tuning

Accurate turn detection is critical to conversational voice UX. If the system cuts in too early, users feel interrupted; if it waits too long, conversations feel sluggish. The heuristic turn detection provider uses configurable silence duration, punctuation rules, and turn-length limits to determine when a speaker has finished their turn. This approach is ideal when you need a lightweight, zero-dependency solution that runs anywhere without deploying an ONNX model, and when your environment has reasonably clean audio.

The voice/turndetection package’s heuristic provider offers a lightweight, configuration-driven approach to turn detection. By adjusting MinSilenceDuration, SentenceEndMarkers, and turn-length limits, you can optimize for responsiveness or accuracy depending on your scenario.

  • Go 1.23 or later
Terminal window
go get github.com/lookatitude/beluga-ai

Create a heuristic detector with tuned parameters:

package main
import (
"context"
"fmt"
"log"
"time"
"github.com/lookatitude/beluga-ai/voice/turndetection"
)
func main() {
ctx := context.Background()
cfg := turndetection.DefaultConfig()
detector, err := turndetection.NewProvider(ctx, "heuristic", cfg,
turndetection.WithMinSilenceDuration(500*time.Millisecond),
turndetection.WithSentenceEndMarkers(".!?"),
turndetection.WithMinTurnLength(10),
turndetection.WithMaxTurnLength(5000),
)
if err != nil {
log.Fatalf("Failed to create detector: %v", err)
}
audio := make([]byte, 1024)
done, err := detector.DetectTurn(ctx, audio)
if err != nil {
log.Fatalf("Detection failed: %v", err)
}
fmt.Printf("Turn detected: %v\n", done)
}

When you have silence duration from VAD or STT, use DetectTurnWithSilence for more accurate detection:

silence := 550 * time.Millisecond
done, err := detector.DetectTurnWithSilence(ctx, audio, silence)
if err != nil {
log.Fatalf("Detection failed: %v", err)
}
// done == true when silence >= MinSilenceDuration (500 ms)
fmt.Printf("Turn detected: %v\n", done)

Test with silence values just below and above MinSilenceDuration to verify the threshold behavior.

GoalAdjustment
Faster responseDecrease MinSilenceDuration (300-400 ms). Risk: more false turn-end.
Fewer false turn-endIncrease MinSilenceDuration (600-800 ms). Risk: slower response.
Stricter sentence endUse WithSentenceEndMarkers(".!?") or add ; etc.
Longer turnsIncrease WithMaxTurnLength (8000-10000).

Example for cautious detection with longer silence and longer maximum turn length:

detector, err := turndetection.NewProvider(ctx, "heuristic", cfg,
turndetection.WithMinSilenceDuration(700*time.Millisecond),
turndetection.WithMaxTurnLength(10000),
)
OptionDescriptionDefault
MinSilenceDurationMin silence to trigger turn end500 ms
SentenceEndMarkersSentence-end characters.!?
MinTurnLengthMinimum turn length10
MaxTurnLengthMaximum turn length5000
TimeoutOperation timeout1 s

Lower MinSilenceDuration (350-450 ms). Ensure you use DetectTurnWithSilence with accurate silence values from VAD/STT.

Turn detected too early / user gets cut off

Section titled “Turn detected too early / user gets cut off”

Increase MinSilenceDuration (600-700 ms). Optionally increase MinTurnLength so very short segments are not considered complete turns.

The heuristic package registers itself via init(). Ensure you import the voice/turndetection package:

import "github.com/lookatitude/beluga-ai/voice/turndetection"
  • Call turndetection.InitMetrics(meter, tracer) at startup for OpenTelemetry monitoring of turn-end rate and latency
  • Use cfg.Validate() when building Config manually instead of using DefaultConfig() with options
  • Pass a context.Context with timeout or cancellation to NewProvider and detection methods