In-Memory Vector Store Provider
The in-memory vector store provider implements the vectorstore.VectorStore interface using a thread-safe map with linear-scan search. It supports all three distance strategies (cosine, dot-product, Euclidean) and requires no external dependencies.
Use the in-memory store for unit tests and development. It requires no setup, no external services, and provides deterministic behavior for test assertions. Pair it with the in-memory embedder for a fully self-contained test setup with no network dependencies.
Installation
Section titled “Installation”go get github.com/lookatitude/beluga-ai/rag/vectorstore/providers/inmemoryQuick Start
Section titled “Quick Start”package main
import ( "context" "fmt" "log"
"github.com/lookatitude/beluga-ai/config" "github.com/lookatitude/beluga-ai/rag/vectorstore" "github.com/lookatitude/beluga-ai/schema" _ "github.com/lookatitude/beluga-ai/rag/vectorstore/providers/inmemory")
func main() { store, err := vectorstore.New("inmemory", config.ProviderConfig{}) if err != nil { log.Fatal(err) }
ctx := context.Background()
docs := []schema.Document{ {ID: "doc1", Content: "Go is a statically typed language", Metadata: map[string]any{"lang": "en"}}, {ID: "doc2", Content: "Python is a dynamically typed language", Metadata: map[string]any{"lang": "en"}}, } embeddings := [][]float32{ {0.1, 0.2, 0.3, 0.4}, {0.5, 0.6, 0.7, 0.8}, }
err = store.Add(ctx, docs, embeddings) if err != nil { log.Fatal(err) }
results, err := store.Search(ctx, []float32{0.1, 0.2, 0.3, 0.4}, 2) if err != nil { log.Fatal(err) }
for _, doc := range results { fmt.Printf("ID: %s, Score: %.4f, Content: %s\n", doc.ID, doc.Score, doc.Content) }}Configuration
Section titled “Configuration”No configuration parameters are required. The provider is instantiated with no options:
store, err := vectorstore.New("inmemory", config.ProviderConfig{})Direct Construction
Section titled “Direct Construction”import ( memstore "github.com/lookatitude/beluga-ai/rag/vectorstore/providers/inmemory")
store := memstore.New()Distance Strategies
Section titled “Distance Strategies”The in-memory store supports all three distance strategies:
// Cosine similarity (default)results, err := store.Search(ctx, queryVec, 10)
// Dot productresults, err := store.Search(ctx, queryVec, 10, vectorstore.WithStrategy(vectorstore.DotProduct),)
// Euclidean (negated so higher = more similar)results, err := store.Search(ctx, queryVec, 10, vectorstore.WithStrategy(vectorstore.Euclidean),)Metadata Filtering
Section titled “Metadata Filtering”Full metadata filtering support:
results, err := store.Search(ctx, queryVec, 10, vectorstore.WithFilter(map[string]any{ "lang": "en", "category": "technical", }),)Threshold Filtering
Section titled “Threshold Filtering”results, err := store.Search(ctx, queryVec, 10, vectorstore.WithThreshold(0.8),)Thread Safety
Section titled “Thread Safety”The in-memory store uses sync.RWMutex for safe concurrent access. Reads (Search) can execute in parallel, while writes (Add, Delete) hold an exclusive lock.
Upsert Behavior
Section titled “Upsert Behavior”Adding a document with an existing ID overwrites the previous entry:
err = store.Add(ctx, []schema.Document{ {ID: "doc1", Content: "Updated content"},}, [][]float32{newEmbedding})Testing Usage
Section titled “Testing Usage”The in-memory store is the recommended choice for unit tests:
func TestSearchPipeline(t *testing.T) { store := memstore.New()
ctx := context.Background()
// Seed test data docs := []schema.Document{ {ID: "1", Content: "test document", Metadata: map[string]any{"type": "test"}}, } embeddings := [][]float32{{0.1, 0.2, 0.3}} err := store.Add(ctx, docs, embeddings) if err != nil { t.Fatal(err) }
// Search results, err := store.Search(ctx, []float32{0.1, 0.2, 0.3}, 1) if err != nil { t.Fatal(err) } if len(results) != 1 { t.Fatalf("expected 1 result, got %d", len(results)) } if results[0].ID != "1" { t.Fatalf("expected ID 1, got %s", results[0].ID) }}Limitations
Section titled “Limitations”- Linear scan: Search time is O(n) where n is the number of documents. Not suitable for large-scale production use.
- No persistence: Data is lost when the process exits.
- No indexing: No approximate nearest neighbor (ANN) index structures.