The Go Framework for
Production AI Agents
Build, orchestrate, and deploy agentic AI systems with streaming-first design,
22+ LLM providers, and enterprise-grade infrastructure.
One go get away.
model, _ := llm.New("openai", llm.ProviderConfig{Model: "gpt-4o"})
agent := agent.New("researcher",
agent.WithLLM(model),
agent.WithTools(webSearch, calculator),
agent.WithMemory(memory.NewSemantic(embedder, store)),
)
for event, err := range agent.Stream(ctx, "Analyze Q4 earnings") {
fmt.Print(event.Text())
} AI agent frameworks were built for Python. Your production stack runs Go.
Go teams building AI products are forced to wrap Python services, use incomplete Go libraries, or build from scratch. The Go AI ecosystem is fragmented — no single framework covers the full stack from LLM abstraction to production deployment.
One framework. Every building block. Pure Go.
Beluga AI provides streaming LLM abstraction, agent runtimes with pluggable reasoning, RAG pipelines, voice processing, guardrails, durable workflows, and protocol interoperability (MCP + A2A) — all idiomatic Go, all composable, all streaming-first.
Core capabilities
Everything you need to build, deploy, and operate agentic AI systems.
Agent Runtime & Reasoning
Build agents with pluggable reasoning — ReAct, Tree-of-Thought, LATS, Reflexion, and more. Handoffs-as-tools enable multi-agent collaboration with zero boilerplate.
Learn moreLLM Abstraction & Routing
Unified ChatModel interface across 22+ providers with intelligent routing, structured output, context window management, and provider-aware rate limiting.
Learn moreRAG Pipeline
Hybrid retrieval combining dense vectors, BM25, and graph traversal with RRF. Advanced strategies: CRAG, Adaptive RAG, HyDE, GraphRAG.
Learn moreVoice Pipeline
Frame-based STT to LLM to TTS processing with sub-800ms latency. Speech-to-speech modes, Silero VAD, semantic turn detection, WebRTC/LiveKit transports.
Learn moreGuardrails & Safety
Three-stage guard pipeline with prompt injection detection, PII filtering, and capability-based sandboxing with default-deny policies.
Learn moreProtocols & Interop
First-class MCP server/client for tool ecosystems and A2A for agent-to-agent communication. REST, SSE, gRPC, and WebSocket transports.
Learn moreLayered architecture. Clean separation of concerns.
Dependencies flow downward. Every layer is independently extensible.
Foundation Core primitives and shared types
Core primitives, shared schema types, configuration loading with hot-reload, and OpenTelemetry observability using GenAI semantic conventions. Zero external dependencies beyond stdlib and OTel.
Capability LLM, Agents, RAG, Voice, and more
LLM abstraction with router and structured output, agent runtime with planners and handoffs, tool system, MemGPT 3-tier memory, RAG pipeline with hybrid retrieval, and frame-based voice processing.
Infrastructure Safety, resilience, and operations
Three-stage guard pipeline for input/output/tool safety, circuit breaker and retry resilience, exact and semantic caching, RBAC/ABAC auth, human-in-the-loop approval, and durable workflow execution.
Protocol Interoperability and transport
MCP server and client for tool interoperability, A2A for agent-to-agent communication, REST/SSE endpoints, gRPC services, and HTTP framework adapters for Gin, Fiber, Echo, and Chi.
See it in action
From simple agents to multi-agent systems and voice pipelines, all in a few lines of Go.
model, _ := llm.New("openai", llm.ProviderConfig{
APIKey: os.Getenv("OPENAI_API_KEY"),
Model: "gpt-4o",
})
agent := agent.New("assistant",
agent.WithLLM(model),
agent.WithTools(webSearch, calculator),
)
for event, err := range agent.Stream(ctx, "Research GPU trends") {
fmt.Print(event.Text())
} Run this example embedder, _ := embedding.New("openai", embedding.Config{Model: "text-embedding-3-small"})
store, _ := vectorstore.New("pgvector", vectorstore.Config{DSN: pgDSN})
retriever := retriever.NewHybrid(store, retriever.WithBM25(), retriever.WithRRF(60))
docs := loader.LoadDir("./knowledge-base/")
splitter := splitter.NewRecursive(splitter.WithChunkSize(512))
chunks := splitter.Split(docs)
store.Add(ctx, embedder.EmbedDocuments(ctx, chunks))
agent := agent.New("researcher", agent.WithLLM(model), agent.WithRetriever(retriever))
for event, err := range agent.Stream(ctx, "What are our Q4 results?") {
fmt.Print(event.Text())
} Run this example billing := agent.New("billing-specialist",
agent.WithLLM(model),
agent.WithTools(lookupInvoice, processRefund),
)
shipping := agent.New("shipping-specialist",
agent.WithLLM(model),
agent.WithTools(trackPackage, updateAddress),
)
triage := agent.New("triage",
agent.WithLLM(model),
agent.WithHandoffs(billing, shipping),
agent.WithInstructions("Route customer requests to the right specialist."),
)
for event, err := range triage.Stream(ctx, "Where is my order #12345?") {
fmt.Print(event.Text())
} Run this example stt, _ := stt.New("deepgram", stt.Config{Model: "nova-3"})
tts, _ := tts.New("elevenlabs", tts.Config{Voice: "aria"})
vad := vad.NewSilero(vad.WithThreshold(0.5))
pipeline := voice.NewPipeline(
voice.WithSTT(stt),
voice.WithLLM(model),
voice.WithTTS(tts),
voice.WithVAD(vad),
)
transport := transport.NewWebSocket(":8080")
pipeline.Start(ctx, transport) Run this example Extensive provider ecosystem
108 integrations across 12 categories. All pluggable via the registry pattern.
Go is the production language for AI infrastructure
15,000+ RPS
with p95 < 30ms. Compiled to machine code. No GIL.
Single binary
No dependency management. Smallest container images.
Born for K8s
Kubernetes, Docker, Terraform are all Go. Your AI agents should be too.
Start building AI agents in 5 minutes
go get github.com/lookatitude/beluga-ai export OPENAI_API_KEY=sk-... go run main.go