Capability Layer

RAG Pipeline

Hybrid retrieval combining dense vectors, BM25 sparse search, and graph traversal with Reciprocal Rank Fusion. 12+ vector stores, 8+ embedding providers, and advanced strategies like CRAG, Adaptive RAG, HyDE, and GraphRAG.

12+ Vector Stores10+ Embeddings5 Advanced StrategiesHybrid Search Default

Overview

Retrieval-Augmented Generation (RAG) is how agents ground their responses in real data instead of relying solely on training knowledge. Beluga AI's RAG pipeline is built around a hybrid search default that combines dense vector similarity, BM25 sparse keyword matching, and optional graph traversal — fused with Reciprocal Rank Fusion (k=60). This three-signal approach consistently outperforms any single retrieval method alone.

The pipeline is modular at every stage. Choose from 8+ embedding providers (OpenAI, Google, Cohere, Voyage, Jina, and more), 12+ vector stores (pgvector, Qdrant, Pinecone, Milvus, and others), and 8+ document loaders for ingesting content from web pages, PDFs, APIs, and cloud storage. Each component implements a clean interface and is swappable via the registry pattern.

Beyond basic retrieval, Beluga AI includes 5 advanced strategies for production RAG systems: CRAG for relevance-aware fallback, Adaptive RAG for query-complexity routing, HyDE for zero-shot retrieval, SEAL-RAG for self-aligned generation, and GraphRAG for knowledge-graph-enhanced answers. These strategies compose with the base pipeline, letting you start simple and add sophistication as your requirements evolve.

Capabilities

Hybrid Search Default

Every retrieval query runs through a three-stage pipeline by default. First, BM25 sparse search returns approximately 200 keyword-matched candidates. Second, dense vector search narrows to the top 100 by semantic similarity. Finally, cross-encoder reranking selects the top 10 most relevant chunks. Results from sparse and dense stages are combined with Reciprocal Rank Fusion (k=60).

retriever := rag.NewHybridRetriever(
    rag.WithSparse(bm25Index),          // BM25 keyword matching
    rag.WithDense(vectorStore, embedder), // Dense vector similarity
    rag.WithReranker(crossEncoder),      // Cross-encoder precision
    rag.WithRRF(60),                     // Reciprocal Rank Fusion
    rag.WithTopK(10),                    // Final result count
)

Embedding Providers

Eight embedding providers covering proprietary and open-source models. Each implements the Embedder interface with batch embedding support and automatic dimension handling.

OpenAI — text-embedding-3-small/large, ada-002
Google — text-embedding-004, Gecko
Ollama — Local embedding models (nomic-embed, mxbai)
Cohere — embed-v3, multilingual
Voyage — voyage-3, code-optimized embeddings
Jina — jina-embeddings-v3, multilingual and cross-lingual
Mistral — mistral-embed
Sentence Transformers — Local ONNX-based inference

embedder, _ := embedding.New("openai", embedding.Config{
    Model: "text-embedding-3-large",
    Dimensions: 1536,
})
vectors, err := embedder.EmbedBatch(ctx, documents)

Vector Store Providers

Twelve vector store backends ranging from lightweight embedded options to distributed cloud-scale systems. All implement the same VectorStore interface with support for metadata filtering, namespace isolation, and batch operations.

pgvector — PostgreSQL extension, HNSW/IVFFlat indexes
Qdrant — Purpose-built, advanced filtering, hybrid search
Pinecone — Managed cloud, serverless option
ChromaDB — Developer-friendly, embedded or client-server
Weaviate — Graph + vector, hybrid BM25
Milvus — Distributed, billion-scale
Turbopuffer — Serverless, cost-optimized
Redis — In-memory speed, RediSearch integration
Elasticsearch — Full-text + vector, existing infrastructure
SQLite-vec — Embedded, zero-dependency local
MongoDB — Atlas Vector Search, document store integration
Vespa — Hybrid serving, real-time indexing

store, _ := vectorstore.New("pgvector", vectorstore.Config{
    ConnectionString: "postgres://localhost/beluga",
    Collection:       "documents",
    Dimensions:       1536,
})

Advanced Retrieval Strategies

Five strategies for production RAG systems that go beyond basic retrieve-and-generate:

CRAG (Corrective RAG) — Evaluates retrieved document relevance; falls back to web search when confidence is below threshold.
Adaptive RAG — Routes by query complexity: no retrieval for simple factual questions, single-step for straightforward lookups, multi-step for complex reasoning chains.
HyDE (Hypothetical Document Embeddings) — Generates a hypothetical answer first, then uses its embedding for retrieval. Enables zero-shot retrieval without training data.
SEAL-RAG — Self-Aligned RAG that iteratively refines retrieval and generation.
GraphRAG — Builds a knowledge graph with community summaries (Microsoft approach) for complex multi-hop questions.

retriever := rag.NewAdaptiveRetriever(
    rag.WithSimpleHandler(directLLM),        // No retrieval needed
    rag.WithSingleStep(hybridRetriever),     // Standard RAG
    rag.WithMultiStep(iterativeRetriever),   // Multi-hop reasoning
    rag.WithComplexityClassifier(classifier),
)

Contextual Retrieval Ingestion

During document ingestion, each chunk is enriched with document-level context before embedding. An LLM prepends a brief summary describing how the chunk fits within the larger document, significantly improving retrieval accuracy for chunks that would otherwise lack sufficient context on their own.

pipeline := rag.NewIngestionPipeline(
    rag.WithLoader(loader),
    rag.WithSplitter(splitter),
    rag.WithContextualRetrieval(model),  // Prepend doc-level context
    rag.WithEmbedder(embedder),
    rag.WithStore(vectorStore),
)

Document Loaders

Eight document loaders for ingesting content from diverse sources. Each returns a stream of Document objects with metadata preserved for downstream filtering.

Firecrawl — Web scraping with JavaScript rendering
Unstructured.io — PDF, DOCX, PPTX, HTML parsing
Docling — Advanced document understanding
Confluence — Atlassian wiki pages and spaces
Notion — Notion pages and databases
GitHub — Repository files and README content
Google Drive — Docs, Sheets, and file content
S3/GCS — Cloud object storage files

loader, _ := loader.New("firecrawl", loader.Config{
    APIKey: os.Getenv("FIRECRAWL_API_KEY"),
})
docs, err := loader.Load(ctx, "https://example.com/docs")

Text Splitters

Three splitting strategies to divide documents into chunks optimized for embedding and retrieval. Recursive character splits by hierarchy (paragraph, sentence, word) with configurable overlap. Semantic splits at topic boundaries detected by embedding similarity. Token-based splits by exact token count for precise context budget control.

splitter := splitter.NewRecursive(
    splitter.WithChunkSize(512),
    splitter.WithChunkOverlap(50),
    splitter.WithSeparators([]string{"\n\n", "\n", ". ", " "}),
)
chunks := splitter.Split(documents)

Cross-Encoder Reranking

The final stage of the retrieval pipeline uses cross-encoder models for maximum precision. Unlike bi-encoder embeddings that encode query and document independently, cross-encoders process the query-document pair together, capturing fine-grained relevance signals that dramatically improve top-k accuracy.

reranker, _ := reranker.New("cross-encoder", reranker.Config{
    Model: "cross-encoder/ms-marco-MiniLM-L-12-v2",
    TopK:  10,
})
reranked, err := reranker.Rerank(ctx, query, candidates)

Architecture

Documents

Firecrawl | Unstructured | Notion | S3 | ...

Split

Recursive | Semantic

Embed

8+ Providers

Store

12+ Vector Stores

Hybrid Retrieval

BM25 (~200) + Dense (~100) + RRF Fusion

Cross-Encoder Rerank

Top 10 precision-optimized results

LLM Generation

Context-grounded streaming response

Providers & Implementations

Embedding Providers

Provider	Priority	Key Differentiator
OpenAI	Core	text-embedding-3-small/large, industry standard
Google	Core	text-embedding-004, Gecko, multimodal
Ollama	Core	Local inference, nomic-embed, mxbai-embed
Cohere	Extended	embed-v3, multilingual, search-optimized
Voyage	Extended	voyage-3, code-optimized, high-quality retrieval
Jina	Extended	jina-embeddings-v3, multilingual, cross-lingual
Mistral	Extended	mistral-embed, EU-hosted
Sentence Transformers	Community	Local ONNX inference, no API dependency

Vector Stores

Provider	Priority	Key Differentiator
pgvector	Core	PostgreSQL extension, HNSW/IVFFlat, use existing Postgres
Qdrant	Core	Purpose-built, advanced filtering, hybrid search native
Pinecone	Core	Managed cloud, serverless option, zero ops
ChromaDB	Extended	Developer-friendly, embedded or client-server mode
Weaviate	Extended	Graph + vector hybrid, built-in BM25
Milvus	Extended	Distributed, billion-scale, GPU-accelerated
Turbopuffer	Extended	Serverless, cost-optimized storage
Redis	Extended	In-memory speed, RediSearch integration
Elasticsearch	Extended	Full-text + vector, leverage existing infrastructure
SQLite-vec	Community	Embedded, zero-dependency, local development
MongoDB	Community	Atlas Vector Search, document store integration
Vespa	Community	Hybrid serving engine, real-time indexing

Document Loaders

Loader	Priority	Key Differentiator
Firecrawl	Core	Web scraping with JavaScript rendering and crawling
Unstructured.io	Core	PDF, DOCX, PPTX, HTML, images — multi-format parsing
Docling	Extended	Advanced document understanding and layout analysis
Confluence	Extended	Atlassian wiki pages, spaces, and attachments
Notion	Extended	Pages, databases, and rich content blocks
GitHub	Extended	Repository files, READMEs, issues, and PRs
Google Drive	Community	Docs, Sheets, Slides, and file storage
S3/GCS	Community	Cloud object storage with prefix filtering

Full Example

A complete RAG pipeline: load documents, split, embed, store, retrieve, and stream an answer:

package main

import (
    "context"
    "fmt"
    "os"

    "github.com/lookatitude/beluga-ai/llm"
    "github.com/lookatitude/beluga-ai/rag/embedding"
    "github.com/lookatitude/beluga-ai/rag/loader"
    "github.com/lookatitude/beluga-ai/rag/retriever"
    "github.com/lookatitude/beluga-ai/rag/splitter"
    "github.com/lookatitude/beluga-ai/rag/vectorstore"
    "github.com/lookatitude/beluga-ai/schema"
)

func main() {
    ctx := context.Background()

    // 1. Load documents from a website
    webLoader, _ := loader.New("firecrawl", loader.Config{
        APIKey: os.Getenv("FIRECRAWL_API_KEY"),
    })
    docs, _ := webLoader.Load(ctx, "https://docs.example.com")

    // 2. Split into chunks with overlap
    chunks := splitter.NewRecursive(
        splitter.WithChunkSize(512),
        splitter.WithChunkOverlap(50),
    ).Split(docs)

    // 3. Embed and store in pgvector
    embedder, _ := embedding.New("openai", embedding.Config{
        Model: "text-embedding-3-large",
    })
    store, _ := vectorstore.New("pgvector", vectorstore.Config{
        ConnectionString: os.Getenv("DATABASE_URL"),
        Collection:       "docs",
    })
    store.AddDocuments(ctx, chunks, embedder)

    // 4. Build a hybrid retriever
    ret := retriever.NewHybrid(
        retriever.WithDense(store, embedder),
        retriever.WithReranker(retriever.CrossEncoder("ms-marco-MiniLM")),
        retriever.WithTopK(5),
    )

    // 5. Retrieve and generate a streaming answer
    model, _ := llm.New("openai", llm.ProviderConfig{Model: "gpt-4o"})
    results, _ := ret.Retrieve(ctx, "How do I configure authentication?")

    // Build context from retrieved chunks
    contextText := ""
    for _, doc := range results {
        contextText += doc.Content + "\n---\n"
    }

    for event, err := range model.Stream(ctx, []schema.Message{
        {Role: "system", Content: "Answer using the provided context:\n" + contextText},
        {Role: "user", Content: "How do I configure authentication?"},
    }) {
        if err != nil { break }
        fmt.Print(event.Text())
    }
}

AI Agents

Data & Retrieval

Infrastructure

Orchestration

RAG Pipeline

Overview

Capabilities

Hybrid Search Default

Embedding Providers

Vector Store Providers

Advanced Retrieval Strategies

Contextual Retrieval Ingestion

Document Loaders

Text Splitters

Cross-Encoder Reranking

Architecture

Providers & Implementations

Embedding Providers

Vector Stores

Document Loaders

Full Example

Related Features

AI Agents

Data & Retrieval

Infrastructure

Orchestration

RAG Pipeline

Overview

Capabilities

Hybrid Search Default

Embedding Providers

Vector Store Providers

Advanced Retrieval Strategies

Contextual Retrieval Ingestion

Document Loaders

Text Splitters

Cross-Encoder Reranking

Architecture

Providers & Implementations

Embedding Providers

Vector Stores

Document Loaders

Full Example

Related Features

Memory Systems

LLM Providers

Guardrails

Tools & MCP