Skip to content
Docs

Noisy Environment Turn Detection

Default turn detection parameters assume clean audio. In real-world deployments — contact centers with hold music, retail floors with announcements, or factory environments with machinery — background noise causes frequent false turn-end triggers that interrupt users. This guide covers configuring both heuristic and ONNX turn detection providers for noise-resistant operation, ensuring your voice agent waits for genuine speech completion before responding.

The voice/turndetection package provides heuristic and ONNX-based turn detection. In noisy settings (contact centers, retail, factory floors), default parameters often produce too many false positives. By adjusting MinSilenceDuration, Threshold, and turn-length limits, you can achieve reliable detection despite ambient noise.

  • Go 1.23 or later
  • A voice pipeline with VAD providing silence duration
  • (Optional) ONNX turn-detection model for the onnx provider
Terminal window
go get github.com/lookatitude/beluga-ai

Increase MinSilenceDuration so brief noise gaps are not treated as end-of-turn. Use WithMinTurnLength to filter out very short spurious turns:

package main
import (
"context"
"fmt"
"log"
"time"
"github.com/lookatitude/beluga-ai/voice/turndetection"
)
func main() {
ctx := context.Background()
cfg := turndetection.DefaultConfig()
detector, err := turndetection.NewProvider(ctx, "heuristic", cfg,
turndetection.WithMinSilenceDuration(700*time.Millisecond),
turndetection.WithMinTurnLength(20),
turndetection.WithMaxTurnLength(8000),
turndetection.WithSentenceEndMarkers(".!?"),
)
if err != nil {
log.Fatalf("Failed to create detector: %v", err)
}
audio := make([]byte, 2048)
silence := 800 * time.Millisecond
done, err := detector.DetectTurnWithSilence(ctx, audio, silence)
if err != nil {
log.Fatalf("Detection failed: %v", err)
}
fmt.Printf("Turn detected: %v\n", done)
}

Raise Threshold to require stronger model confidence before declaring end-of-turn, reducing false positives in noise:

import "os"
detector, err := turndetection.NewProvider(ctx, "onnx", cfg,
turndetection.WithModelPath(os.Getenv("TURN_MODEL_PATH")),
turndetection.WithThreshold(0.6),
turndetection.WithMinSilenceDuration(600*time.Millisecond),
turndetection.WithMinTurnLength(15),
)
if err != nil {
log.Fatalf("Failed to create ONNX detector: %v", err)
}
OptionDescriptionDefaultNoisy Environment
MinSilenceDurationMinimum silence to trigger turn end500 ms600-800 ms
ThresholdONNX detection threshold (0-1)0.50.55-0.65
MinTurnLengthMinimum turn length1015-25
MaxTurnLengthMaximum turn length50008000+
SentenceEndMarkersHeuristic sentence-end characters.!?Keep or extend

Increase MinSilenceDuration (600-800 ms) and, for ONNX, Threshold (0.55-0.65). Use DetectTurnWithSilence fed by a robust VAD so silence is computed from actual speech absence rather than raw audio energy.

Ensure VAD correctly identifies silence. Avoid over-incrementing MinSilenceDuration. Consider switching to the ONNX provider if heuristic detection is insufficient. Verify audio format (sample rate, chunk size) matches provider expectations.

Import the ONNX provider to trigger registration:

import _ "github.com/lookatitude/beluga-ai/voice/turndetection/providers/onnx"

Set TURN_MODEL_PATH to a valid ONNX model file path.

  • Use turndetection.IsRetryableError(err) and retry where appropriate
  • Call turndetection.InitMetrics(meter, tracer) at startup for OpenTelemetry monitoring
  • A/B test heuristic vs ONNX providers with different thresholds using metrics before rollout
  • Run with real or synthetic noisy audio to validate settings against ground truth