Skip to content
Docs

Package Layout — Beluga AI

Beluga AI v2 is organized into layered packages with strict dependency rules. This document describes every package, its interfaces, how packages depend on each other, and how they work together to build agentic AI systems. The package layout follows a flat structure with no pkg/ prefix — packages live at the repository root, which keeps import paths short and aligns with Go conventions for library modules.

Module: github.com/lookatitude/beluga-ai
Go: 1.23+ (uses iter.Seq2 for streaming)
graph TB
    subgraph foundation [Foundation — zero external deps]
        core
        schema
        config
        o11y
    end

    subgraph capability [Capability — AI primitives]
        llm
        tool
        memory
        rag
        agent
        voice
    end

    subgraph orchestration [Orchestration]
        orch["orchestration"]
        workflow
    end

    subgraph infrastructure [Infrastructure — cross-cutting]
        guard
        resilience
        cache
        auth
        hitl
        eval
        state
        prompt
    end

    subgraph protocols [Protocols & Servers]
        protocol
        server
    end

    core --> schema
    llm --> core & schema & config
    tool --> schema
    memory --> schema & config & rag
    rag --> schema & config
    agent --> llm & tool & memory & schema
    voice --> llm & schema
    orch --> agent & llm
    workflow --> core & schema
    guard --> schema
    resilience --> core
    cache --> schema
    auth --> core
    hitl --> schema
    eval --> schema & llm & agent
    state --> core
    prompt --> schema
    protocol --> agent & tool & schema
    server --> protocol
beluga-ai/
├── core/ # Stream, Runnable, Lifecycle, Errors, Tenant, Option
├── schema/ # Message, ContentPart, ToolCall, Document, Event, Session
├── config/ # Load[T], Validate, Watcher, ProviderConfig
├── o11y/ # Tracer, Meter, Logger, Health, Exporters
│ └── providers/ # langsmith, langfuse, phoenix, opik
├── llm/ # ChatModel, Router, StructuredOutput, ContextManager
│ └── providers/ # openai, anthropic, google, ollama, bedrock, + 18 more
├── tool/ # Tool, FuncTool, Registry, MCP Client, Middleware, Hooks
├── memory/ # Memory, Core, Recall, Archival, Graph, Composite
│ └── stores/ # inmemory, redis, postgres, mongodb, sqlite, neo4j, memgraph, dragonfly
├── rag/
│ ├── embedding/ # Embedder + providers (openai, cohere, jina, ollama, + 5 more)
│ ├── vectorstore/ # VectorStore + providers (pgvector, pinecone, qdrant, + 10 more)
│ ├── retriever/ # Retriever + strategies (hybrid, CRAG, HyDE, ensemble, adaptive, rerank)
│ ├── loader/ # DocumentLoader + providers (text, pdf, html, web, csv, + 7 more)
│ └── splitter/ # TextSplitter (recursive, markdown, token)
├── agent/ # Agent, BaseAgent, Planner, Executor, Handoffs, Bus
│ └── workflow/ # SequentialAgent, ParallelAgent, LoopAgent
├── voice/
│ ├── stt/providers/ # whisper, deepgram, assemblyai, gladia, groq, elevenlabs
│ ├── tts/providers/ # elevenlabs, cartesia, openai, playht, lmnt, fish, smallest
│ ├── s2s/providers/ # openai_realtime, gemini_live, nova
│ ├── transport/providers/ # livekit, daily, pipecat
│ └── vad/providers/ # webrtc, silero
├── orchestration/ # Chain, Graph, Router, Scatter, Supervisor, Blackboard
├── workflow/ # DurableExecutor, Activities, State
│ └── providers/ # inmemory, temporal, kafka, nats, dapr, inngest
├── guard/ # Guard interface + providers (guardrailsai, lakera, llmguard, + 2)
├── resilience/ # Retry, CircuitBreaker, Hedge, RateLimit
├── cache/ # Cache interface + providers (inmemory)
├── hitl/ # Manager, Notifier, Approval, Feedback
├── auth/ # Policy, RBAC, ABAC, Composite, Middleware
├── eval/ # Metric, Runner, Dataset + providers (braintrust, deepeval, ragas)
├── state/ # Store interface + providers (inmemory)
├── prompt/ # Template, Manager, Builder + providers (file)
├── protocol/
│ ├── mcp/ # MCP Server, Client, Registry
│ ├── a2a/ # A2A Server, Client, AgentCard
│ ├── openai_agents/ # OpenAI Agents SDK compatibility
│ └── rest/ # REST/SSE API
├── server/ # ServerAdapter + adapters (gin, chi, echo, fiber, huma, grpc, connect)
└── internal/ # testutil (mocks), httpclient, jsonutil, syncutil, openaicompat

The foundation layer provides the types and utilities that every other package depends on. It has zero external dependencies beyond the Go standard library and OpenTelemetry, which ensures that these shared types never introduce transitive dependency conflicts.

The foundation of the framework. Zero external dependencies beyond stdlib + OTel. This package defines the primitive abstractions — streams, runnables, lifecycle management, and error types — that all other packages build upon.

Key types:

TypePurpose
Event[T]Generic event with Type, Payload, Err, Meta
Stream[T]Type alias for iter.Seq2[Event[T], error]
RunnableUniversal execution interface (Invoke + Stream)
LifecycleComponent lifecycle (Start, Stop, Health)
AppManages lifecycle components with ordered start/shutdown
ErrorTyped error with Op, Code, Message, Err
ErrorCodeCategorizes errors: rate_limit, timeout, auth_error, etc.
OptionFunctional option interface

Core interfaces:

// Runnable — universal execution contract
type Runnable interface {
Invoke(ctx context.Context, input any, opts ...Option) (any, error)
Stream(ctx context.Context, input any, opts ...Option) iter.Seq2[any, error]
}
// Lifecycle — component management
type Lifecycle interface {
Start(ctx context.Context) error
Stop(ctx context.Context) error
Health() HealthStatus
}

Stream utilities: CollectStream, MapStream, FilterStream, MergeStreams, FanOut, BufferedStream with backpressure via FlowController.

Composition: Pipe(a, b) chains runnables sequentially. Parallel(a, b, c) fans out concurrently.

Shared types used by all packages. Zero external dependencies. These types form the common vocabulary of the framework — when an LLM produces a message, a tool returns a result, or an agent emits an event, they all use types from this package. Keeping schema dependency-free ensures these types can be imported by any package without pulling in unwanted transitive dependencies.

TypePurpose
MessageInterface for conversation messages (System, Human, AI, Tool)
ContentPartInterface for multimodal content (Text, Image, Audio, Video)
ToolDefinitionTool name + description + JSON Schema for LLM binding
ToolCallLLM’s request to invoke a tool
ToolResultResult from tool execution
DocumentDocument with content and metadata for RAG
StreamChunkChunk of streaming LLM response
EventGeneric event type
SessionConversation session state

Configuration loading and validation.

TypePurpose
ProviderConfigBase configuration for all providers (APIKey, Model, BaseURL, Options)
LoadGeneric config loader (env, file, struct tags)
ValidateConfig validation
WatcherInterface for hot-reload watching
FileWatcherFile-based config watcher

Observability using OpenTelemetry gen_ai.* semantic conventions.

TypePurpose
SpanOTel span interface
TraceExporterInterface for trace exporters
HealthCheckerInterface for health checks
HealthRegistryAggregates health checks
LoggerStructured logging via slog

Providers: langsmith, langfuse, phoenix, opik.

The capability layer provides the core AI primitives: LLM inference, tool execution, memory, retrieval, agents, and voice. Each package defines a small interface, provides a registry for provider discovery, and supports middleware and hooks for extensibility. These packages import from the foundation layer but never from each other in a circular way.

The LLM abstraction layer. 23 providers. This is the most heavily used package in the framework — nearly every other capability depends on it.

ChatModel interface (the most important interface in the framework):

type ChatModel interface {
Generate(ctx context.Context, msgs []schema.Message, opts ...GenerateOption) (*schema.AIMessage, error)
Stream(ctx context.Context, msgs []schema.Message, opts ...GenerateOption) iter.Seq2[schema.StreamChunk, error]
BindTools(tools []schema.ToolDefinition) ChatModel
ModelID() string
}

Supporting types:

TypePurpose
Factoryfunc(cfg config.ProviderConfig) (ChatModel, error)
ModelSelectorInterface for routing across models (RoundRobin, Failover, CostOptimized)
ContextManagerInterface for fitting messages within token budgets (Truncate, Sliding, Summarize)
TokenizerInterface for token counting
Middlewarefunc(ChatModel) ChatModel
HooksBeforeGenerate, AfterGenerate, OnStream, OnError

Registry: Register("openai", factory), New("openai", cfg), List().

Providers: openai, anthropic, google, azure, bedrock, ollama, groq, mistral, cohere, deepseek, fireworks, huggingface, litellm, llama, openrouter, perplexity, qwen, sambanova, together, xai, cerebras, bifrost.

The tool system. Tools are instances, not factories. This distinction from the provider registries reflects how tools are used: providers are created from configuration at startup, while tools often carry runtime state (database connections, API clients) and may be added or removed dynamically during agent execution.

Tool interface:

type Tool interface {
Name() string
Description() string
InputSchema() map[string]any
Execute(ctx context.Context, input map[string]any) (*Result, error)
}

Key types:

TypePurpose
ResultMultimodal result: []schema.ContentPart + IsError
FuncToolWraps Go functions as tools with auto JSON Schema from struct tags
RegistryInstance-based: Add, Get, List, Remove (not global factory)
Middlewarefunc(Tool) Tool — WithTimeout, WithRetry
HooksBeforeExecute, AfterExecute, OnError
MCPRegistryInterface for MCP server discovery

MCP integration: FromMCP() connects to an MCP server via Streamable HTTP and wraps remote tools as native Tool instances.

Three-tier memory system inspired by MemGPT. The tiered design reflects the reality that different types of information have different access patterns: persona context must always be in the prompt (core), recent conversation history benefits from fast keyword search (recall), and long-term knowledge requires vector similarity (archival) or structured graph traversal (graph).

Memory interface:

type Memory interface {
Save(ctx context.Context, input, output schema.Message) error
Load(ctx context.Context, query string) ([]schema.Message, error)
Search(ctx context.Context, query string, k int) ([]schema.Document, error)
Clear(ctx context.Context) error
}

Tiers:

TierTypePurpose
CoreCoreAlways in context — persona text, user preferences
RecallRecallSearchable conversation history via MessageStore
ArchivalArchivalLong-term knowledge via vectorstore + embedder
GraphGraphStoreStructured knowledge relationships
CompositeCompositeMemoryCombines all tiers transparently

Registry: Register("inmemory", factory), New("inmemory", cfg), List().

Stores: inmemory, redis, postgres, mongodb, sqlite, neo4j, memgraph, dragonfly.

The RAG pipeline. 5 subpackages with independent registries. The pipeline is decomposed into discrete stages — loading, splitting, embedding, storing, and retrieving — each with its own interface and provider ecosystem. This decomposition means you can swap any stage independently: use a different embedding provider without changing your vector store, or add a new retrieval strategy without modifying how documents are loaded.

graph LR
    query["Query"] --> retriever["Retriever"]
    retriever --> bm25["BM25<br/>~200 results"]
    retriever --> dense["Dense/Vector<br/>~100 results"]
    bm25 --> fusion["RRF Fusion<br/>k=60"]
    dense --> fusion
    fusion --> rerank["Cross-Encoder<br/>Reranker"]
    rerank --> top10["Top 10<br/>Documents"]

    subgraph ingest [Ingestion Pipeline]
        loader["DocumentLoader"] --> splitter["TextSplitter"]
        splitter --> embedder["Embedder"]
        embedder --> store["VectorStore"]
    end

Subpackages:

PackageInterfaceMethodsProviders
embedding/EmbedderEmbed, EmbedBatchopenai, cohere, mistral, jina, ollama, google, voyage, sentence_transformers, inmemory
vectorstore/VectorStoreAdd, Search, Deletepgvector, pinecone, qdrant, weaviate, chroma, milvus, elasticsearch, redis, mongodb, vespa, turbopuffer, sqlitevec, inmemory
retriever/RetrieverRetrievevector, hybrid, multiquery, rerank, ensemble, HyDE, CRAG, adaptive
loader/DocumentLoaderLoadtext, json, csv, markdown, firecrawl, github, confluence, notion, gdrive, unstructured, cloudstorage, docling
splitter/TextSplitterSplitrecursive, markdown, token

Each subpackage has its own Register(), New(), List().

The agent runtime. Planner-agnostic executor with pluggable reasoning strategies. The separation of planner (decides what to do) from executor (does it) is the key architectural decision: it allows different reasoning approaches to be swapped without changing how tools are called, events are streamed, or handoffs are managed.

Agent interface:

type Agent interface {
ID() string
Persona() Persona
Tools() []tool.Tool
Children() []Agent
Invoke(ctx context.Context, input string, opts ...Option) (string, error)
Stream(ctx context.Context, input string, opts ...Option) iter.Seq2[Event, error]
}

Key types:

TypePurpose
BaseAgentEmbeddable struct with default implementations
PersonaRole, Goal, Backstory (RGB framework), Traits
PlannerInterface: Plan(state) and Replan(state)
PlannerStateInput, Messages, Tools, Observations, Iteration, Metadata
ActionWhat the planner wants: tool, respond, finish, handoff
ObservationResult of an executed action
HandoffAgent transfer with InputFilter, OnHandoff, IsEnabled
EventBusInterface for agent-to-agent async messaging
Hooks14 hook points covering the entire execution lifecycle
Middlewarefunc(Agent) Agent

Executor loop:

graph TD
    start["Receive Input"] --> onStart["OnStart Hook"]
    onStart --> plan["Planner.Plan(state)"]
    plan --> beforeAct["BeforeAct Hook"]
    beforeAct --> actionType{"Action Type?"}
    actionType -->|tool| execTool["Execute Tool"]
    actionType -->|handoff| handoff["Transfer to Agent"]
    actionType -->|respond| emit["Emit to Stream"]
    actionType -->|finish| done["Return Result"]
    execTool --> afterAct["AfterAct Hook"]
    handoff --> afterAct
    emit --> afterAct
    afterAct --> onIter["OnIteration Hook"]
    onIter --> replan["Planner.Replan(state)"]
    replan --> beforeAct
    done --> onEnd["OnEnd Hook"]

Planners (registered via RegisterPlanner()): ReAct, Reflexion, Plan-and-Execute, Structured, Conversational.

Workflow agents (deterministic, no LLM): SequentialAgent, ParallelAgent, LoopAgent.

Frame-based voice pipeline with cascading, S2S, and hybrid modes.

InterfaceMethodsProviders
FrameProcessorProcess(ctx, in chan, out chan)All voice components
VADDetect(ctx, frame) ActivityResultwebrtc, silero
STTTranscribe(ctx, audio) TranscriptEventwhisper, deepgram, assemblyai, gladia, groq, elevenlabs
TTSSynthesize(ctx, text) audioelevenlabs, cartesia, openai, playht, lmnt, fish, smallest
S2SNewSession(ctx) Sessionopenai_realtime, gemini_live, nova
AudioTransportConnect, Send, Receivelivekit, daily, pipecat

Pipeline modes:

graph TB
    subgraph cascade [Cascading Mode]
        c_transport["Transport"] --> c_vad["VAD"] --> c_stt["STT"] --> c_llm["LLM"] --> c_tts["TTS"] --> c_out["Transport"]
    end
    subgraph s2s [S2S Mode]
        s_transport["Transport"] --> s_s2s["S2S Provider<br/>OpenAI Realtime / Gemini Live"] --> s_out["Transport"]
    end
    subgraph hybrid [Hybrid Mode]
        h_default["S2S (default)"] -.->|"tool overload"| h_fallback["Cascade (fallback)"]
    end

Infrastructure packages provide cross-cutting concerns that apply to multiple capability packages: safety guards, resilience patterns, caching, authentication, human-in-the-loop approval, evaluation, shared state, and prompt management. These packages import from both the foundation and capability layers, and they are applied via middleware or hooks rather than being embedded in capability code.

Three-stage safety pipeline. Guards are applied at the agent level, so they protect all LLM interactions and tool executions uniformly.

type Guard interface {
Validate(ctx context.Context, input GuardInput) (GuardResult, error)
}

Providers: guardrailsai, lakera, llmguard, azuresafety, nemo.

Composable resilience patterns. Each wraps operations via middleware.

PatternFilePurpose
Retryretry.goExponential backoff + jitter
CircuitBreakercircuitbreaker.goClosed → Open → Half-Open
Hedgehedge.goParallel redundant requests
RateLimitratelimit.goRPM, TPM, MaxConcurrent

Durable execution engine.

type DurableExecutor interface {
Execute(ctx context.Context, workflowFn any, args ...any) (WorkflowHandle, error)
Signal(ctx context.Context, id string, signal string, data any) error
Query(ctx context.Context, id string, query string) (any, error)
Cancel(ctx context.Context, id string) error
}

Providers: inmemory (dev), temporal, kafka, nats, dapr, inngest.

Authorization with default-deny.

type Policy interface {
Authorize(ctx context.Context, request AuthRequest) (AuthResult, error)
}

Implementations: RBAC, ABAC, Composite. OPA integration.

PackageInterfacePurpose
cache/CacheExact + semantic caching
hitl/Manager, NotifierHuman-in-the-loop with confidence-based routing
eval/MetricEvaluation metrics (faithfulness, relevance, hallucination, toxicity)
state/StoreShared agent state with Watch
prompt/PromptManagerTemplate management, versioning, cache-optimal ordering

Protocol packages handle communication with external systems: exposing Beluga agents as network services and consuming remote tools and agents. These packages sit at the top of the dependency hierarchy, importing from capability and infrastructure layers.

MCP (Model Context Protocol) server and client. MCP is the standard protocol for discovering and calling tools across process boundaries.

  • Server: Expose Beluga tools as MCP resources via Streamable HTTP
  • Client: Connect to external MCP servers, wrap remote tools as native tool.Tool
  • Registry: Discover and search MCP servers

Transport: Streamable HTTP (2025-06-18 spec). OAuth authorization.

A2A (Agent-to-Agent) protocol.

  • Server: Expose Beluga agents as A2A remote agents
  • Client: Call remote A2A agents as sub-agents
  • AgentCard: JSON metadata at /.well-known/agent.json

Protocol: Protobuf-first (a2a.proto). JSON-RPC + gRPC bindings. Task lifecycle: submitted → working → completed/failed/canceled.

REST/SSE API for exposing agents.

OpenAI Agents SDK compatibility layer.

HTTP framework adapters implementing ServerAdapter interface.

type ServerAdapter interface {
Mount(pattern string, handler http.Handler)
Start(ctx context.Context, addr string) error
Stop(ctx context.Context) error
}

Adapters: gin, chi, echo, fiber, huma, grpc, connect.

Internal packages provide shared infrastructure that is not part of the public API. Placing these in internal/ ensures they cannot be imported by external code, giving the framework freedom to change their APIs without breaking compatibility.

Mock implementations for testing. Every public interface in the framework has a corresponding mock here, ensuring consistent test patterns across all packages:

  • MockChatModel — configurable LLM responses
  • MockEmbedder — fixed vector outputs
  • MockTool — configurable tool execution
  • MockMemory — in-memory message storage
  • MockVectorStore — in-memory vector operations
  • MockWorkflowStore — workflow state storage

Generic HTTP client with retry logic.

JSON Schema generation from Go struct tags (json, description, required, default, enum).

Sync utilities (pools, concurrent helpers).

OpenAI API compatibility layer for providers that use OpenAI-compatible APIs. Many LLM providers (Groq, Together, Fireworks, OpenRouter, and others) expose OpenAI-compatible endpoints. Rather than duplicating HTTP request/response handling across 12+ providers, this shared layer handles the common protocol, and each provider package adds only its registration and provider-specific error mapping.

The following sequence diagrams show how packages collaborate at runtime. These interactions are the practical manifestation of the layered architecture: foundation types flow through the entire stack, capability packages call each other through interfaces, and the agent runtime orchestrates everything.

sequenceDiagram
    participant Agent as agent.BaseAgent
    participant Executor as agent.Executor
    participant Planner as agent.Planner
    participant LLM as llm.ChatModel
    participant TR as tool.Registry
    participant Tool as tool.Tool
    participant Memory as memory.Memory

    Agent->>Memory: Load(ctx, input)
    Agent->>Executor: Run(ctx, input, planner, tools)
    Executor->>Planner: Plan(ctx, PlannerState)
    Planner->>LLM: Generate(ctx, msgs + tool_definitions)
    LLM-->>Planner: AIMessage{ToolCalls}
    Planner-->>Executor: []Action{type: tool}
    Executor->>TR: Get(toolName)
    TR-->>Executor: Tool
    Executor->>Tool: Execute(ctx, input)
    Tool-->>Executor: Result
    Executor->>Planner: Replan(ctx, state + observations)
    Planner->>LLM: Generate(ctx, msgs + tool_results)
    LLM-->>Planner: AIMessage{finish}
    Planner-->>Executor: []Action{type: finish}
    Executor->>Memory: Save(ctx, input, output)
sequenceDiagram
    participant Agent
    participant Retriever as rag.Retriever
    participant VS as rag.VectorStore
    participant Embedder as rag.Embedder
    participant LLM as llm.ChatModel

    Agent->>Retriever: Retrieve(ctx, query)
    Retriever->>Embedder: Embed(ctx, query)
    Embedder-->>Retriever: []float32
    Retriever->>VS: Search(ctx, vector, k)
    VS-->>Retriever: []Document
    Retriever-->>Agent: []Document
    Agent->>LLM: Generate(ctx, system + docs + query)
    LLM-->>Agent: Answer grounded in documents
CategoryCount
Top-level packages23
Interfaces50+
Registries19
Total providers100+
LLM providers23
Vector store providers13
Embedding providers9
Voice providers (STT+TTS+S2S+Transport+VAD)21
Memory stores8
Workflow providers6
Packages with middleware6
Packages with hooks11