Production pgvector Sharding
As vector datasets grow beyond 1 million records, single-table scans become too slow and HNSW indexes may not fit in RAM. Table partitioning (sharding) combined with proper indexing provides the foundation for production-scale vector search. This approach leverages PostgreSQL’s built-in partitioning, which means you retain full SQL capabilities (joins, transactions, aggregations) alongside vector search — a significant advantage over purpose-built vector databases that sacrifice query flexibility.
What You Will Build
Section titled “What You Will Build”A production-grade pgvector setup with HNSW indexing, table partitioning by tenant, and query performance optimization.
Prerequisites
Section titled “Prerequisites”- PostgreSQL with the
pgvectorextension installed - Understanding of In-memory Vector Store
Step 1: Basic pgvector Setup
Section titled “Step 1: Basic pgvector Setup”Connect Beluga AI to PostgreSQL with pgvector. The connection uses the same vectorstore.New() registry pattern as all other vector store providers, so switching from in-memory to pgvector requires only a provider name and configuration change.
package main
import ( "context" "fmt" "os"
"github.com/lookatitude/beluga-ai/config" "github.com/lookatitude/beluga-ai/rag/vectorstore" _ "github.com/lookatitude/beluga-ai/rag/vectorstore/providers/pgvector")
func main() { ctx := context.Background()
store, err := vectorstore.New("pgvector", config.ProviderConfig{ "connection_string": os.Getenv("DATABASE_URL"), "table_name": "documents", }) if err != nil { fmt.Printf("Error: %v\n", err) return }
fmt.Println("Connected to pgvector") _ = ctx _ = store}Step 2: Create the Schema
Section titled “Step 2: Create the Schema”Set up the table and HNSW index. HNSW (Hierarchical Navigable Small World) is preferred over IVFFlat for most workloads because it provides better recall with comparable latency and does not require a separate training step. The vector_cosine_ops operator class matches the cosine similarity metric used by most embedding models.
-- Enable the vector extensionCREATE EXTENSION IF NOT EXISTS vector;
-- Create the documents tableCREATE TABLE documents ( id bigserial PRIMARY KEY, content text NOT NULL, metadata jsonb DEFAULT '{}', embedding vector(1536) -- Match your embedder's dimension);
-- Create HNSW index for cosine similarity-- m: max connections per layer (16-64, higher = better recall, more memory)-- ef_construction: build-time list size (64-512, higher = better recall, slower build)CREATE INDEX idx_documents_embedding ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);Step 3: HNSW Parameter Tuning
Section titled “Step 3: HNSW Parameter Tuning”HNSW parameters control the trade-off between recall, latency, and memory. The defaults (m=16, ef_construction=64) provide a good balance for most workloads. Increase these values when recall is more important than build time or memory usage — for example, in applications where returning the wrong document has high cost (medical, legal, financial).
| Parameter | Range | Effect |
|---|---|---|
m | 16-64 | Higher = better recall, more memory per node |
ef_construction | 64-512 | Higher = better index quality, slower build |
ef_search | 40-500 | Higher = better recall at query time, slower queries |
Set ef_search at query time:
-- Increase recall for important queriesSET hnsw.ef_search = 200;
SELECT id, content, embedding <=> $1 AS distanceFROM documentsORDER BY embedding <=> $1LIMIT 10;Step 4: Table Partitioning (Sharding)
Section titled “Step 4: Table Partitioning (Sharding)”For multi-tenant applications, partition by tenant to reduce search scope. Partitioning confines each query to a single partition’s index rather than scanning the entire table, which reduces both latency and resource consumption. Each partition has its own HNSW index, so adding a new tenant creates a small, fast index rather than expanding a single large one.
-- Create partitioned tableCREATE TABLE documents_partitioned ( id bigserial, tenant_id int NOT NULL, content text NOT NULL, metadata jsonb DEFAULT '{}', embedding vector(1536), PRIMARY KEY (id, tenant_id)) PARTITION BY LIST (tenant_id);
-- Create partitions for each tenantCREATE TABLE documents_tenant_1 PARTITION OF documents_partitioned FOR VALUES IN (1);
CREATE TABLE documents_tenant_2 PARTITION OF documents_partitioned FOR VALUES IN (2);
-- Create HNSW index on each partitionCREATE INDEX ON documents_tenant_1 USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);
CREATE INDEX ON documents_tenant_2 USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);Step 5: Route Queries to Partitions
Section titled “Step 5: Route Queries to Partitions”In Beluga AI, route queries to specific partitions using dynamic table names or metadata filtering. Creating a per-tenant store instance is the simplest approach — it maps directly to the partitioned table and ensures complete data isolation between tenants.
// Per-tenant storetenantStore, err := vectorstore.New("pgvector", config.ProviderConfig{ "connection_string": os.Getenv("DATABASE_URL"), "table_name": fmt.Sprintf("documents_tenant_%d", tenantID),})Step 6: Performance Optimization
Section titled “Step 6: Performance Optimization”Pre-warm the index
Section titled “Pre-warm the index”Load the HNSW index into shared buffers at startup. This eliminates cold-start latency for the first queries after a database restart, which is important in auto-scaling environments where new database instances may serve traffic immediately.
-- Requires pg_prewarm extensionCREATE EXTENSION IF NOT EXISTS pg_prewarm;SELECT pg_prewarm('idx_documents_embedding');Maintenance
Section titled “Maintenance”Run regular maintenance on vector tables. VACUUM ANALYZE updates the statistics that the query planner uses to choose efficient execution plans, and REINDEX rebuilds the HNSW graph after large batch inserts that may have degraded its structure.
-- Update statistics for the query plannerVACUUM ANALYZE documents;
-- Reindex periodically after large batch insertsREINDEX INDEX CONCURRENTLY idx_documents_embedding;Connection pooling
Section titled “Connection pooling”Use a connection pooler (PgBouncer, pgcat) in front of PostgreSQL. Vector similarity queries can be CPU-intensive — limit concurrent queries to avoid overloading the database.
Scaling Guidelines
Section titled “Scaling Guidelines”| Dataset Size | Recommendation |
|---|---|
| < 100k vectors | Single table, HNSW index |
| 100k - 1M | HNSW with tuned parameters |
| 1M - 10M | Table partitioning by tenant or category |
| > 10M | Horizontal sharding across PostgreSQL instances |
Verification
Section titled “Verification”- Load 10,000 vectors into a table without an index — measure query time.
- Create the HNSW index — measure query time again (should be < 10ms).
- Create tenant partitions — verify queries only scan the relevant partition.
Next Steps
Section titled “Next Steps”- Hybrid Search — Combine vector and keyword search
- In-memory Vector Store — Zero-dependency development