Beluga AI Guides

Beluga AI v2 is a Go-native framework for building agentic AI systems with streaming-first design, protocol interoperability, and pluggable providers. These guides cover everything from your first LLM interaction to production deployment with observability, safety, and resilience. Each guide is self-contained with complete code examples, but they are organized into a progressive learning path so that concepts build naturally from one to the next.

How to Use These Guides

The guides are organized into three categories that form a recommended learning path:

Foundations — Start here. These guides introduce the core abstractions that every Beluga application uses: the ChatModel interface for LLM interaction, the Agent runtime for autonomous reasoning, the Tool system for extending agent capabilities, and the PromptManager for template management. Master these patterns first — registries, middleware, hooks, and iter.Seq2 streaming — and the rest of the framework follows the same conventions.
Capabilities — Explore these as your needs grow. Each guide covers a major subsystem — retrieval-augmented generation, persistent memory, voice processing, or multimodal AI — that extends the foundation with domain-specific features. You can read them in any order based on what your application requires.
Production — Read these when preparing for real-world deployment. They cover orchestration patterns for coordinating multiple agents, safety pipelines for content filtering and PII protection, OpenTelemetry instrumentation for observability, and resilience patterns for fault-tolerant operation.

Guide Categories

Foundations

The building blocks every Beluga application uses. These guides establish the core patterns and abstractions that the rest of the framework builds on.

Guide	What You’ll Learn
Building Your First Agent	Create a complete AI agent from scratch — wire up tools, stream responses with `iter.Seq2`, implement the ReAct reasoning loop, and hand off between specialized agents
Working with LLMs	Configure any language model through the unified `ChatModel` interface — set up providers, compose middleware for logging and retries, attach hooks for lifecycle events, and route requests across multiple models
Structured Output	Extract typed Go structs from LLM responses — generate JSON schemas automatically, validate and retry on parse failures, and build classification pipelines for routing and labeling
Prompt Engineering	Manage prompts as versioned, testable assets — use `PromptManager` for template resolution, `Builder` for cache-optimal token ordering, few-shot example selection, and A/B testing across prompt variants

Capabilities

Domain-specific subsystems that extend the foundation. Each capability follows the same extensibility model — small interfaces, registry-based providers, and composable middleware — so patterns you learn in one transfer directly to the others.

Guide	What You’ll Learn
RAG Pipeline	Give agents access to your data — build retrieval pipelines with embeddings, vector stores, and advanced strategies like hybrid search (BM25 + vector + RRF fusion), CRAG for self-correcting retrieval, and HyDE for hypothetical document generation
Document Processing	Prepare data for retrieval — load documents from files, URLs, and databases, split them into semantically meaningful chunks, and ingest them into vector stores for downstream search
Memory System	Give agents persistent memory across conversations — implement the MemGPT-inspired 3-tier model with Core memory (always in context), Recall memory (searchable conversation history), and Archival memory (vector-searchable long-term storage)
Tools & MCP	Extend what agents can do — create typed Go functions as tools, organize them in registries with middleware, and connect to remote MCP servers for runtime tool discovery and cross-framework interoperability
Voice AI Pipeline	Build real-time voice applications — process audio through a frame-based pipeline with STT, TTS, and speech-to-speech models, handle voice activity detection, and stream over WebSocket or WebRTC transports
Multimodal AI	Process images, audio, and video alongside text — send mixed-content messages to multimodal models for document intelligence, visual question answering, audio transcription, and content analysis

Production

Patterns and practices for operating Beluga applications under real-world demands — coordinating agent teams, enforcing safety constraints, instrumenting for observability, and deploying with resilience.

Guide	What You’ll Learn
Orchestration & Workflows	Coordinate complex agent pipelines — use Sequential, Parallel, and Loop workflow agents, build DAG execution graphs, and implement durable workflows that survive process restarts
Multi-Agent Systems	Design systems where specialized agents collaborate — implement handoffs for agent-to-agent transfers, supervisor patterns for centralized coordination, and event-driven communication for decoupled architectures
Safety & Guards	Protect your application and users — implement the three-stage guard pipeline (input, output, tool) with PII redaction, content filtering, prompt injection detection, and human-in-the-loop approval workflows
Observability	Understand what your agents are doing — instrument with OpenTelemetry using GenAI semantic conventions, collect metrics on token usage and latency, stream structured logs, and integrate health checks
Deploying to Production	Ship with confidence — serve agents as REST APIs using HTTP framework adapters (Gin, Fiber, Echo, Chi), apply circuit breakers and rate limiters for resilience, and configure container orchestration for scaling

Where to Go Next

Tutorials — Step-by-step walkthroughs that build complete, working applications from start to finish. Good for hands-on learning when you want to see all the pieces come together.
Cookbook — Focused recipes that solve specific problems in isolation. Use these when you know what you need to accomplish and want a concise, copy-paste-ready solution.
API Reference — Complete interface documentation for every exported type, function, and constant. The definitive reference when you need exact method signatures, option fields, or error codes.