PRODUCT

What Beluga can do.

A complete agent stack organized around three concerns. Build is where agents come from. Know is what agents remember and retrieve. Ship is how agents behave in production.

BUILD

Agents that reason, act, and recover.

The runtime runs a Plan → Act → Observe → Replan loop on every turn. Eight planning strategies share one Planner interface — swap them with a config change. LLM calls route across 22 providers with a unified ChatModel. Tools are any Go function wrapped with a schema. Handoffs between agents are auto-generated tools.

ReAct planner with two tools, streamed to the caller. Swap WithPlanner("react") for "lats", "tot", or "moa" without touching the rest.

examples/research-agent.go

import (
    _ "github.com/lookatitude/beluga-ai/llm/providers/anthropic"
    "github.com/lookatitude/beluga-ai/agent"
    "github.com/lookatitude/beluga-ai/tool"
)

model, _ := llm.New("anthropic", llm.Config{Model: "claude-sonnet-4-6"})

researchAgent, _ := agent.New(ctx,
    agent.WithLLM(model),
    agent.WithPersona("senior researcher, cites sources"),
    agent.WithPlanner("react"),
    agent.WithTools(tool.Must(tool.HTTPFetch()), tool.Must(tool.MarkdownParser())),
)

stream, _ := researchAgent.Stream(ctx, "summarise streaming-first patterns in Beluga")
for ev, err := range stream.Range {
    if err != nil { break }
    ev.Render()
}

The eight reasoning strategies

KNOW

Memory that persists. Retrieval that finds the right thing.

Three-tier memory — working, recall, archival — with graph-store support. RAG uses hybrid retrieval: BM25, dense vector, and graph traversal, fused with Reciprocal Rank Fusion. Strategies include CRAG, Adaptive RAG, HyDE, and GraphRAG. Thirteen vector-store backends, nine embedding providers, eight memory stores.

BM25 + dense vector retrieval fused with RRF — the recommended default for most knowledge bases.

examples/hybrid-retrieval.go

import (
    "github.com/lookatitude/beluga-ai/rag/retriever"
    "github.com/lookatitude/beluga-ai/rag/vectorstore"
    _ "github.com/lookatitude/beluga-ai/rag/vectorstore/providers/pgvector"
    _ "github.com/lookatitude/beluga-ai/rag/embedding/providers/openai"
)

store, _ := vectorstore.New("pgvector", vectorstore.Config{
    DSN:       "postgres://beluga@db:5432/kb",
    Dimension: 1536,
})

// Hybrid retrieval: BM25 + dense vector, fused with RRF.
hybrid := retriever.Hybrid(
    retriever.BM25(store, retriever.WithK(40)),
    retriever.Dense(store, retriever.WithK(40)),
    retriever.RRF(60),
)

docs, err := hybrid.Retrieve(ctx, "how does crash-durable workflow replay work?")

SHIP

Production defaults, not production afterthoughts.

The guard pipeline runs three stages — Input, Output, Tool — around every LLM interaction. Circuit breakers, rate limits, and retry are middleware on the same interface as your LLM call. OTel GenAI spans emit from 17 packages at every boundary. Durable workflows replay from an event log. Cost tracking attributes every token to a team.

Five lines of middleware around base is the difference between a demo and a system you page on.

main.go · wrapping a model for production

// Resilience + observability + safety compose on the same interface.
// Read outside-in: guardrails wrap tracing wraps rate-limit wraps retry.
safeModel := llm.ApplyMiddleware(base,
    llm.WithGuardrails(pipeline),     // input + output + tool guards
    llm.WithTracing(),                // gen_ai.* OTel spans at every boundary
    llm.WithRateLimit(60, 150000),    // 60 req/min, 150k tok/min
    llm.WithRetry(3),                 // respects core.IsRetryable()
    llm.WithCostTracking(costCenter),
)

VOICE

Frame-based voice, built in.

STT → LLM → TTS as a typed pipeline. Six STT providers, seven TTS providers, three speech-to-speech providers. LiveKit, Daily, and Pipecat transports. Silero and WebRTC VAD. No other Go agent framework includes this.

examples/voice-agent.go

import (
    "github.com/lookatitude/beluga-ai/voice"
    _ "github.com/lookatitude/beluga-ai/voice/stt/providers/deepgram"
    _ "github.com/lookatitude/beluga-ai/voice/tts/providers/cartesia"
    _ "github.com/lookatitude/beluga-ai/voice/vad/providers/silero"
    _ "github.com/lookatitude/beluga-ai/voice/transport/providers/livekit"
)

pipeline, _ := voice.NewPipeline(voice.Config{
    Transport: "livekit",
    VAD:       "silero",
    STT:       "deepgram",
    LLM:       model,
    TTS:       "cartesia",
})

// Frame-based — pipeline.Run handles barge-in, turn detection,
// and sub-800ms glass-to-glass latency.
pipeline.Run(ctx)

PROTOCOLS

Agents meet the network.

Expose the same Agent over MCP for tools, A2A for inter-agent discovery, REST/SSE for streaming clients, gRPC for low-latency internal calls, or WebSocket for bidirectional voice. Pick one or ship them all — the Runner is the deployment boundary, not the agent.

examples/runner.go

import (
    "github.com/lookatitude/beluga-ai/protocol/mcp"
    "github.com/lookatitude/beluga-ai/runtime"
)

runner, _ := runtime.NewRunner(runtime.Config{
    Agent: myAgent,
    Expose: runtime.ExposeAll{
        MCP:   mcp.Server(":8080"),    // Model Context Protocol
        A2A:   true,                   // /.well-known/agent.json
        REST:  ":3000",                // REST + SSE streaming
        GRPC:  ":50051",               // protobuf contracts
    },
})
runner.Serve(ctx)

Everything you just read is open source.

MIT licensed. 110 providers. Zero paid tiers.

Read the quickstart Compare with alternatives →