Sentence-Boundary Turn Detection
Heuristic turn detection uses simple, predictable rules to determine when a user has completed their utterance. Unlike ML-based approaches that require model files and inference overhead, heuristic detection combines sentence-end markers (., !, ?), minimum silence duration, and utterance length constraints into a fast, deterministic detector. The tradeoff is that heuristics cannot detect conversational nuances like trailing speech or intentional pauses, but for structured interactions (commands, short queries, form-filling) they provide reliable detection with zero dependencies.
What You Will Build
Section titled “What You Will Build”A heuristic turn detector that identifies user turn completion based on sentence-end punctuation in transcripts, configurable silence thresholds, and minimum/maximum turn length constraints.
Prerequisites
Section titled “Prerequisites”- Go 1.23+
- Basic familiarity with voice session concepts
When to Use Heuristic Detection
Section titled “When to Use Heuristic Detection”Heuristic turn detection is appropriate when:
- You want deterministic, predictable behavior that is easy to debug and explain
- Your application processes primarily structured speech (commands, queries, form responses)
- You need minimal compute overhead and no external model files
- You are building a prototype before investing in ML-based detection
For more nuanced detection that handles trailing speech, overlapping turns, and non-verbal cues, see ML-Based Turn Prediction.
Step 1: Create a Heuristic Turn Detector
Section titled “Step 1: Create a Heuristic Turn Detector”The heuristic provider follows Beluga’s standard registry pattern. The configuration options define the rules the detector applies to each audio frame and transcript pair. MinSilenceDuration is the most important parameter — it controls how long the detector waits after the last detected speech before concluding that the user is done.
package main
import ( "context" "fmt" "log" "time"
"github.com/lookatitude/beluga-ai/voice/turndetection" _ "github.com/lookatitude/beluga-ai/voice/turndetection/providers/heuristic")
func main() { ctx := context.Background()
detector, err := turndetection.NewProvider(ctx, "heuristic", turndetection.DefaultConfig(), turndetection.WithMinSilenceDuration(400*time.Millisecond), turndetection.WithSentenceEndMarkers(".!?"), turndetection.WithMinTurnLength(10), turndetection.WithMaxTurnLength(5000), ) if err != nil { log.Fatalf("create turn detector: %v", err) }
// Test with a simulated audio frame audio := make([]byte, 1024) done, err := detector.DetectTurn(ctx, audio) if err != nil { log.Fatalf("detect turn: %v", err) }
fmt.Printf("Turn detected: %v\n", done)}Configuration Options
Section titled “Configuration Options”| Option | Default | Description |
|---|---|---|
MinSilenceDuration | 500ms | Silence required after last speech to trigger turn end |
SentenceEndMarkers | .!? | Characters that indicate potential sentence completion |
MinTurnLength | 10 | Minimum transcript length (characters) for a valid turn |
MaxTurnLength | 5000 | Maximum transcript length before forcing a turn end |
Step 2: Detect Turns with Measured Silence
Section titled “Step 2: Detect Turns with Measured Silence”For real-time pipelines where you track silence duration externally (typically from your VAD component), use DetectTurnWithSilence to combine your measured silence with the heuristic rules. This separation of concerns is intentional: VAD measures when speech stops, and the turn detector decides whether that stop means the user is done.
// In your audio processing loop, measure silence duration since last speech silenceDuration := 500 * time.Millisecond
done, err := detector.DetectTurnWithSilence(ctx, audio, silenceDuration) if err != nil { log.Fatalf("detect turn: %v", err) }
if done { fmt.Println("User finished speaking; proceed to LLM/TTS.") }The detector combines the measured silence with its internal heuristic rules in a three-step evaluation:
- Has the silence exceeded
MinSilenceDuration? - Does the current transcript end with a sentence-end marker?
- Is the transcript length between
MinTurnLengthandMaxTurnLength?
All three conditions must be satisfied for a turn to be detected, except when MaxTurnLength is reached, which forces a turn end regardless of the other conditions to prevent unbounded accumulation.
Step 3: Integration with a Voice Session
Section titled “Step 3: Integration with a Voice Session”Combine the heuristic turn detector with a voice session for a complete pipeline. The turn detector is passed as a functional option to the session, which uses it internally to decide when to stop collecting user audio and begin generating a response. This integration means you do not need to manage the turn detection loop yourself — the session handles it.
import ( "github.com/lookatitude/beluga-ai/voice/session" "github.com/lookatitude/beluga-ai/voice/stt" "github.com/lookatitude/beluga-ai/voice/tts" "github.com/lookatitude/beluga-ai/voice/turndetection")
func createSessionWithTurnDetection(ctx context.Context) { sttProvider, err := stt.NewProvider(ctx, "deepgram", stt.DefaultConfig(), stt.WithAPIKey("your-key"), ) if err != nil { log.Fatalf("create STT: %v", err) }
ttsProvider, err := tts.NewProvider(ctx, "openai", tts.DefaultConfig(), tts.WithAPIKey("your-key"), ) if err != nil { log.Fatalf("create TTS: %v", err) }
turnDetector, err := turndetection.NewProvider(ctx, "heuristic", turndetection.DefaultConfig(), turndetection.WithMinSilenceDuration(400*time.Millisecond), turndetection.WithSentenceEndMarkers(".!?"), ) if err != nil { log.Fatalf("create turn detector: %v", err) }
sess, err := session.NewVoiceSession(ctx, session.WithSTTProvider(sttProvider), session.WithTTSProvider(ttsProvider), session.WithTurnDetector(turnDetector), session.WithConfig(session.DefaultConfig()), ) if err != nil { log.Fatalf("create session: %v", err) } defer sess.Stop(ctx)}Verification
Section titled “Verification”- Run the application and verify
DetectTurnreturnsfalsefor short, incomplete utterances. - Use
DetectTurnWithSilencewith a silence duration exceedingMinSilenceDurationand verify the turn is detected. - Test with transcripts shorter than
MinTurnLengthand verify no premature turn detection.
Next Steps
Section titled “Next Steps”- ML-Based Turn Prediction — Use ONNX models for neural turn detection
- Sensitivity Tuning — Tune VAD and turn detection together
- Voice Session Interruptions — Handle barge-in during agent speech