Skip to content
Docs

Live Meeting Minutes Generator

Manual meeting minutes take 1-2 hours to create, miss 20-30% of key points, and have 4-6 hour delays before sharing. The person taking notes must simultaneously listen, decide what matters, and write — a cognitive multitasking problem that inevitably drops information. Critical decisions, action item assignments, and dissenting opinions are the most likely casualties because they require contextual judgment to capture accurately.

The delay compounds the problem: by the time minutes circulate, participants have already moved on, and corrections require re-engaging everyone’s attention. Incorrect or missing action items lead to dropped work and accountability gaps.

An automated system using Beluga AI’s STT pipeline with speaker diarization and LLM-based summarization generates structured minutes in real time with 96% completeness and immediate availability.

graph TB
    A[Meeting Audio Stream] --> B[STT Provider]
    B --> C[Real-time Transcription]
    C --> D[Speaker Diarization]
    D --> E[Transcript Buffer]
    E --> F[LLM Summarizer]
    F --> G[Action Item Extractor]
    G --> H[Structured Minutes]

The STT provider transcribes meeting audio in real time with speaker diarization. The transcript buffer accumulates segments, and at meeting end (or at intervals), the LLM summarizer generates structured minutes with key discussion points, decisions, and action items.

The pipeline is split into two distinct phases — transcription and summarization — rather than attempting real-time summarization of partial transcripts. This separation exists because LLM summarization quality degrades significantly on fragmented input. Complete transcript segments with speaker attribution produce dramatically better structured output than incremental updates. The buffered approach also means the STT and LLM components can be optimized independently: STT for latency and accuracy, LLM for output structure and completeness.

The transcription function consumes a streaming audio source (iter.Seq2[[]byte, error]) and produces a complete transcript. Deepgram is used here for its strong diarization capabilities (speaker identification), which is essential for meeting minutes — knowing who said what transforms a raw transcript into an attributable record. The IsFinal flag filters out interim results, ensuring only confirmed transcriptions enter the buffer.

package main
import (
"context"
"fmt"
"iter"
"strings"
"github.com/lookatitude/beluga-ai/llm"
"github.com/lookatitude/beluga-ai/schema"
"github.com/lookatitude/beluga-ai/voice/stt"
_ "github.com/lookatitude/beluga-ai/llm/providers/openai"
_ "github.com/lookatitude/beluga-ai/voice/stt/providers/deepgram"
)
func transcribeMeeting(ctx context.Context, audioStream iter.Seq2[[]byte, error]) (string, error) {
engine, err := stt.New("deepgram", nil)
if err != nil {
return "", fmt.Errorf("create stt engine: %w", err)
}
transcripts := engine.TranscribeStream(ctx, audioStream,
stt.WithLanguage("en"),
stt.WithPunctuation(true),
stt.WithDiarization(true),
)
var fullTranscript strings.Builder
for event, err := range transcripts {
if err != nil {
return "", fmt.Errorf("transcription error: %w", err)
}
if event.IsFinal {
fullTranscript.WriteString(event.Text + "\n")
fmt.Printf("[%s] %s\n", event.Timestamp, event.Text)
}
}
return fullTranscript.String(), nil
}

The summarization step sends the complete transcript to an LLM with a structured prompt that explicitly requests attendees, key discussion points, decisions, and action items with owners. This structured prompting is critical — without explicit output guidance, the LLM tends to produce narrative summaries that are harder to act on than itemized lists.

func generateMinutes(ctx context.Context, transcript string) (string, error) {
model, err := llm.New("openai", nil)
if err != nil {
return "", fmt.Errorf("create model: %w", err)
}
msgs := []schema.Message{
&schema.SystemMessage{Parts: []schema.ContentPart{
schema.TextPart{Text: "Generate structured meeting minutes from this transcript. " +
"Include: attendees, key discussion points, decisions made, and action items with owners."},
}},
&schema.HumanMessage{Parts: []schema.ContentPart{
schema.TextPart{Text: transcript},
}},
}
resp, err := model.Generate(ctx, msgs)
if err != nil {
return "", fmt.Errorf("generate: %w", err)
}
return resp.Parts[0].(schema.TextPart).Text, nil
}
func processMeeting(ctx context.Context, audioStream iter.Seq2[[]byte, error]) error {
transcript, err := transcribeMeeting(ctx, audioStream)
if err != nil {
return fmt.Errorf("transcription: %w", err)
}
minutes, err := generateMinutes(ctx, transcript)
if err != nil {
return fmt.Errorf("generate minutes: %w", err)
}
fmt.Println(minutes)
return nil
}
  • Streaming STT: Use streaming transcription for real-time display during the meeting
  • Speaker diarization: Enable diarization to attribute statements to specific speakers
  • Transcript buffering: Buffer transcripts before summarization to improve minute quality
  • Multiple languages: Configure STT language per meeting for global teams
  • Parallel tracks: For multi-speaker meetings, process audio in parallel tracks per speaker
  • Observability: Track transcription accuracy, summarization quality, and end-to-end latency
MetricBeforeAfterImprovement
Minute creation time1-2 hours12 minutes90-95% reduction
Completeness70-80%96%20-37% improvement
Time to availability4-6 hours5 minutes98-99% reduction
Quality score6.5/109.1/1040% improvement
  • Speaker diarization early: Speaker-agnostic transcripts are significantly less useful for minutes
  • Buffering improves quality: Processing complete transcript segments produces better summaries than immediate processing
  • LLM summarization is critical: Prompt engineering for structured output (attendees, decisions, action items) drives quality