Capability Layer

RAG Pipeline

Hybrid retrieval combining dense vectors, BM25 sparse search, and graph traversal with Reciprocal Rank Fusion. 12+ vector stores, 8+ embedding providers, and advanced strategies like CRAG, Adaptive RAG, HyDE, and GraphRAG.

12+ Vector Stores10+ Embeddings5 Advanced StrategiesHybrid Search Default

Overview

Retrieval-Augmented Generation (RAG) is how agents ground their responses in real data instead of relying solely on training knowledge. Beluga AI's RAG pipeline is built around a hybrid search default that combines dense vector similarity, BM25 sparse keyword matching, and optional graph traversal — fused with Reciprocal Rank Fusion (k=60). This three-signal approach consistently outperforms any single retrieval method alone.

The pipeline is modular at every stage. Choose from 8+ embedding providers (OpenAI, Google, Cohere, Voyage, Jina, and more), 12+ vector stores (pgvector, Qdrant, Pinecone, Milvus, and others), and 8+ document loaders for ingesting content from web pages, PDFs, APIs, and cloud storage. Each component implements a clean interface and is swappable via the registry pattern.

Beyond basic retrieval, Beluga AI includes 5 advanced strategies for production RAG systems: CRAG for relevance-aware fallback, Adaptive RAG for query-complexity routing, HyDE for zero-shot retrieval, SEAL-RAG for self-aligned generation, and GraphRAG for knowledge-graph-enhanced answers. These strategies compose with the base pipeline, letting you start simple and add sophistication as your requirements evolve.

Capabilities

Hybrid Search Default

Every retrieval query runs through a three-stage pipeline by default. First, BM25 sparse search returns approximately 200 keyword-matched candidates. Second, dense vector search narrows to the top 100 by semantic similarity. Finally, cross-encoder reranking selects the top 10 most relevant chunks. Results from sparse and dense stages are combined with Reciprocal Rank Fusion (k=60).

retriever := rag.NewHybridRetriever(
    rag.WithSparse(bm25Index),          // BM25 keyword matching
    rag.WithDense(vectorStore, embedder), // Dense vector similarity
    rag.WithReranker(crossEncoder),      // Cross-encoder precision
    rag.WithRRF(60),                     // Reciprocal Rank Fusion
    rag.WithTopK(10),                    // Final result count
)

Embedding Providers

Eight embedding providers covering proprietary and open-source models. Each implements the Embedder interface with batch embedding support and automatic dimension handling.

  • OpenAI — text-embedding-3-small/large, ada-002
  • Google — text-embedding-004, Gecko
  • Ollama — Local embedding models (nomic-embed, mxbai)
  • Cohere — embed-v3, multilingual
  • Voyage — voyage-3, code-optimized embeddings
  • Jina — jina-embeddings-v3, multilingual and cross-lingual
  • Mistral — mistral-embed
  • Sentence Transformers — Local ONNX-based inference
embedder, _ := embedding.New("openai", embedding.Config{
    Model: "text-embedding-3-large",
    Dimensions: 1536,
})
vectors, err := embedder.EmbedBatch(ctx, documents)

Vector Store Providers

Twelve vector store backends ranging from lightweight embedded options to distributed cloud-scale systems. All implement the same VectorStore interface with support for metadata filtering, namespace isolation, and batch operations.

  • pgvector — PostgreSQL extension, HNSW/IVFFlat indexes
  • Qdrant — Purpose-built, advanced filtering, hybrid search
  • Pinecone — Managed cloud, serverless option
  • ChromaDB — Developer-friendly, embedded or client-server
  • Weaviate — Graph + vector, hybrid BM25
  • Milvus — Distributed, billion-scale
  • Turbopuffer — Serverless, cost-optimized
  • Redis — In-memory speed, RediSearch integration
  • Elasticsearch — Full-text + vector, existing infrastructure
  • SQLite-vec — Embedded, zero-dependency local
  • MongoDB — Atlas Vector Search, document store integration
  • Vespa — Hybrid serving, real-time indexing
store, _ := vectorstore.New("pgvector", vectorstore.Config{
    ConnectionString: "postgres://localhost/beluga",
    Collection:       "documents",
    Dimensions:       1536,
})

Advanced Retrieval Strategies

Five strategies for production RAG systems that go beyond basic retrieve-and-generate:

  • CRAG (Corrective RAG) — Evaluates retrieved document relevance; falls back to web search when confidence is below threshold.
  • Adaptive RAG — Routes by query complexity: no retrieval for simple factual questions, single-step for straightforward lookups, multi-step for complex reasoning chains.
  • HyDE (Hypothetical Document Embeddings) — Generates a hypothetical answer first, then uses its embedding for retrieval. Enables zero-shot retrieval without training data.
  • SEAL-RAG — Self-Aligned RAG that iteratively refines retrieval and generation.
  • GraphRAG — Builds a knowledge graph with community summaries (Microsoft approach) for complex multi-hop questions.
retriever := rag.NewAdaptiveRetriever(
    rag.WithSimpleHandler(directLLM),        // No retrieval needed
    rag.WithSingleStep(hybridRetriever),     // Standard RAG
    rag.WithMultiStep(iterativeRetriever),   // Multi-hop reasoning
    rag.WithComplexityClassifier(classifier),
)

Contextual Retrieval Ingestion

During document ingestion, each chunk is enriched with document-level context before embedding. An LLM prepends a brief summary describing how the chunk fits within the larger document, significantly improving retrieval accuracy for chunks that would otherwise lack sufficient context on their own.

pipeline := rag.NewIngestionPipeline(
    rag.WithLoader(loader),
    rag.WithSplitter(splitter),
    rag.WithContextualRetrieval(model),  // Prepend doc-level context
    rag.WithEmbedder(embedder),
    rag.WithStore(vectorStore),
)

Document Loaders

Eight document loaders for ingesting content from diverse sources. Each returns a stream of Document objects with metadata preserved for downstream filtering.

  • Firecrawl — Web scraping with JavaScript rendering
  • Unstructured.io — PDF, DOCX, PPTX, HTML parsing
  • Docling — Advanced document understanding
  • Confluence — Atlassian wiki pages and spaces
  • Notion — Notion pages and databases
  • GitHub — Repository files and README content
  • Google Drive — Docs, Sheets, and file content
  • S3/GCS — Cloud object storage files
loader, _ := loader.New("firecrawl", loader.Config{
    APIKey: os.Getenv("FIRECRAWL_API_KEY"),
})
docs, err := loader.Load(ctx, "https://example.com/docs")

Text Splitters

Three splitting strategies to divide documents into chunks optimized for embedding and retrieval. Recursive character splits by hierarchy (paragraph, sentence, word) with configurable overlap. Semantic splits at topic boundaries detected by embedding similarity. Token-based splits by exact token count for precise context budget control.

splitter := splitter.NewRecursive(
    splitter.WithChunkSize(512),
    splitter.WithChunkOverlap(50),
    splitter.WithSeparators([]string{"\n\n", "\n", ". ", " "}),
)
chunks := splitter.Split(documents)

Cross-Encoder Reranking

The final stage of the retrieval pipeline uses cross-encoder models for maximum precision. Unlike bi-encoder embeddings that encode query and document independently, cross-encoders process the query-document pair together, capturing fine-grained relevance signals that dramatically improve top-k accuracy.

reranker, _ := reranker.New("cross-encoder", reranker.Config{
    Model: "cross-encoder/ms-marco-MiniLM-L-12-v2",
    TopK:  10,
})
reranked, err := reranker.Rerank(ctx, query, candidates)

Architecture

Documents
Firecrawl | Unstructured | Notion | S3 | ...
Split
Recursive | Semantic
Embed
8+ Providers
Store
12+ Vector Stores
Hybrid Retrieval
BM25 (~200) + Dense (~100) + RRF Fusion
Cross-Encoder Rerank
Top 10 precision-optimized results
LLM Generation
Context-grounded streaming response

Providers & Implementations

Embedding Providers

Provider Priority Key Differentiator
OpenAICoretext-embedding-3-small/large, industry standard
GoogleCoretext-embedding-004, Gecko, multimodal
OllamaCoreLocal inference, nomic-embed, mxbai-embed
CohereExtendedembed-v3, multilingual, search-optimized
VoyageExtendedvoyage-3, code-optimized, high-quality retrieval
JinaExtendedjina-embeddings-v3, multilingual, cross-lingual
MistralExtendedmistral-embed, EU-hosted
Sentence TransformersCommunityLocal ONNX inference, no API dependency

Vector Stores

Provider Priority Key Differentiator
pgvectorCorePostgreSQL extension, HNSW/IVFFlat, use existing Postgres
QdrantCorePurpose-built, advanced filtering, hybrid search native
PineconeCoreManaged cloud, serverless option, zero ops
ChromaDBExtendedDeveloper-friendly, embedded or client-server mode
WeaviateExtendedGraph + vector hybrid, built-in BM25
MilvusExtendedDistributed, billion-scale, GPU-accelerated
TurbopufferExtendedServerless, cost-optimized storage
RedisExtendedIn-memory speed, RediSearch integration
ElasticsearchExtendedFull-text + vector, leverage existing infrastructure
SQLite-vecCommunityEmbedded, zero-dependency, local development
MongoDBCommunityAtlas Vector Search, document store integration
VespaCommunityHybrid serving engine, real-time indexing

Document Loaders

Loader Priority Key Differentiator
FirecrawlCoreWeb scraping with JavaScript rendering and crawling
Unstructured.ioCorePDF, DOCX, PPTX, HTML, images — multi-format parsing
DoclingExtendedAdvanced document understanding and layout analysis
ConfluenceExtendedAtlassian wiki pages, spaces, and attachments
NotionExtendedPages, databases, and rich content blocks
GitHubExtendedRepository files, READMEs, issues, and PRs
Google DriveCommunityDocs, Sheets, Slides, and file storage
S3/GCSCommunityCloud object storage with prefix filtering

Full Example

A complete RAG pipeline: load documents, split, embed, store, retrieve, and stream an answer:

package main

import (
    "context"
    "fmt"
    "os"

    "github.com/lookatitude/beluga-ai/llm"
    "github.com/lookatitude/beluga-ai/rag/embedding"
    "github.com/lookatitude/beluga-ai/rag/loader"
    "github.com/lookatitude/beluga-ai/rag/retriever"
    "github.com/lookatitude/beluga-ai/rag/splitter"
    "github.com/lookatitude/beluga-ai/rag/vectorstore"
    "github.com/lookatitude/beluga-ai/schema"
)

func main() {
    ctx := context.Background()

    // 1. Load documents from a website
    webLoader, _ := loader.New("firecrawl", loader.Config{
        APIKey: os.Getenv("FIRECRAWL_API_KEY"),
    })
    docs, _ := webLoader.Load(ctx, "https://docs.example.com")

    // 2. Split into chunks with overlap
    chunks := splitter.NewRecursive(
        splitter.WithChunkSize(512),
        splitter.WithChunkOverlap(50),
    ).Split(docs)

    // 3. Embed and store in pgvector
    embedder, _ := embedding.New("openai", embedding.Config{
        Model: "text-embedding-3-large",
    })
    store, _ := vectorstore.New("pgvector", vectorstore.Config{
        ConnectionString: os.Getenv("DATABASE_URL"),
        Collection:       "docs",
    })
    store.AddDocuments(ctx, chunks, embedder)

    // 4. Build a hybrid retriever
    ret := retriever.NewHybrid(
        retriever.WithDense(store, embedder),
        retriever.WithReranker(retriever.CrossEncoder("ms-marco-MiniLM")),
        retriever.WithTopK(5),
    )

    // 5. Retrieve and generate a streaming answer
    model, _ := llm.New("openai", llm.ProviderConfig{Model: "gpt-4o"})
    results, _ := ret.Retrieve(ctx, "How do I configure authentication?")

    // Build context from retrieved chunks
    contextText := ""
    for _, doc := range results {
        contextText += doc.Content + "\n---\n"
    }

    for event, err := range model.Stream(ctx, []schema.Message{
        {Role: "system", Content: "Answer using the provided context:\n" + contextText},
        {Role: "user", Content: "How do I configure authentication?"},
    }) {
        if err != nil { break }
        fmt.Print(event.Text())
    }
}

Related Features