Vector Databases and AI-Ready Data Layers
Vector databases are what makes semantic search actually work at scale. The idea is to store high-dimensional numerical representations of your data – text, images, audio, code – and retrieve the items most similar to a query vector through approximate nearest-neighbour search. In practice, this means users can search for “how do I reset my password” and find your “account recovery guide” even though none of those exact words appear in the document title. That’s a qualitative improvement over keyword search that your users will notice. And for RAG systems in particular, vector retrieval is the foundational layer – without it, you’re just sending the LLM an empty context window.
Vector Embedding and Retrieval Pipeline
Hybrid search combining dense vectors and sparse BM25 consistently outperforms either approach alone on enterprise retrieval tasks.
The core operation is embedding-based search. Your AI model converts a query into a vector (typically 768 to 3072 dimensions, depending on the model), the database searches its index using an ANN algorithm like HNSW (Hierarchical Navigable Small World) or IVF-PQ, and returns the closest matches in milliseconds even across millions of entries. Pinecone is the leading managed option – it handles index management, scaling, and metadata filtering without database administration overhead. Weaviate and Qdrant are the leading open-source choices with strong multi-tenancy support. Chroma is popular for local development and prototyping. And for teams who already run PostgreSQL, pgvector adds vector search as an extension without introducing a separate database to operate – which is genuinely appealing from an ops perspective, and adequate for most teams until you’re into tens of millions of vectors with tight latency requirements.
Vector Similarity Metrics — When to Use Each
OpenAI text-embedding models are optimized for cosine similarity — always check your embedding model documentation before choosing a metric.
The interesting evolution is what’s happening beyond pure ANN retrieval. Hybrid search – combining dense vector similarity with sparse BM25 keyword scores – consistently outperforms pure vector search on enterprise document benchmarks, typically by 10-20% on recall, because it catches exact technical term matches that embedding models sometimes miss. Multi-vector representations (indexing a long document as multiple segment-level vectors rather than one whole-document embedding) improve recall for long-form content. Metadata filtering – restricting searches to documents a specific user is authorised to see – sounds like an afterthought but it’s a data breach waiting to happen if you miss it in multi-tenant systems. As AI data estates grow larger, the ability to handle hundreds of millions of vectors with sub-100ms p95 latency at reasonable cost is becoming a genuine competitive differentiator between platforms.
Frequently Asked Questions
What is a vector embedding?
A vector embedding is a numerical array (typically 384 to 3072 numbers) representing the semantic meaning of a piece of data – text, an image, audio, or code. Similar items end up with similar embeddings (close in vector space), which is what makes it possible to find “semantically related” content through mathematical distance rather than exact keyword matching.
Do you need a specialised vector database or can PostgreSQL handle it?
For small to medium workloads (up to a few million vectors with moderate query rates), pgvector in PostgreSQL is often good enough and avoids the operational complexity of a separate database. For large-scale production RAG with tens of millions of vectors and tight latency requirements, a specialised database like Pinecone, Weaviate, or Qdrant will deliver better performance and manageability.
What is hybrid search and why does it perform better than pure vector search?
Hybrid search combines dense vector similarity scores with sparse keyword matching (BM25) and merges results using reciprocal rank fusion. Pure vector search can miss exact keyword matches when the embedding model doesn’t capture a specific technical term well. Hybrid search catches both semantic similarity and exact matches, which is why it consistently outperforms either approach alone on enterprise document retrieval tasks.
How do you choose the right vector database for a production RAG system?
Key criteria: scale (how many vectors?), latency requirements (acceptable p95 query time?), filtering needs (metadata-based access control?), and operational preference (managed vs. self-hosted). Start with pgvector if you already run PostgreSQL and your scale is modest. Move to Pinecone, Weaviate, or Qdrant when you need dedicated scaling, advanced filtering, or billion-scale index support.
