Getting Started
Run Qdrant in Docker, generate embeddings, store and query - hello-world semantic search
Getting Started
This page boots Qdrant locally, generates embeddings, stores them, and runs a semantic-search query end-to-end. Patterns transfer to pgvector, Pinecone, Weaviate, Chroma.
Run Qdrant
# docker-compose.yml
services:
qdrant:
image: qdrant/qdrant:v1.10.1
ports:
- "6333:6333" # REST
- "6334:6334" # gRPC
volumes:
- qdrant-data:/qdrant/storage
volumes:
qdrant-data:docker compose up -d
curl http://localhost:6333/healthz
# healthz check passed
open http://localhost:6333/dashboard # web UICreate a Collection
A collection is like a table — it holds vectors with shared dimensions and distance metric.
curl -X PUT http://localhost:6333/collections/products \
-H 'Content-Type: application/json' \
--data '{
"vectors": {
"size": 1536,
"distance": "Cosine"
}
}'| Distance metric | When |
|---|---|
| Cosine | Most text embeddings (OpenAI, BGE, Voyage) — angle, not magnitude |
| Dot | When vectors are pre-normalized (faster than cosine) |
| Euclidean | When magnitude matters (rare for text) |
For OpenAI / most embedding models: Cosine.
Generate Embeddings and Upsert
npm install @qdrant/js-client-rest openaiimport { QdrantClient } from '@qdrant/js-client-rest';
import OpenAI from 'openai';
const qdrant = new QdrantClient({ url: 'http://localhost:6333' });
const openai = new OpenAI(); // requires OPENAI_API_KEY
const products = [
{ id: 1, name: 'Espresso Maker', description: 'Dual-boiler espresso machine for coffee enthusiasts' },
{ id: 2, name: 'French Press', description: 'Classic 8-cup press pot for rich coffee' },
{ id: 3, name: 'Coffee Grinder', description: 'Burr grinder for consistent particle size' },
{ id: 4, name: 'Reading Lamp', description: 'Warm-light LED desk lamp for evening reading' },
{ id: 5, name: 'Tea Kettle', description: 'Stovetop kettle with whistle' },
];
// Generate embeddings for all products
const embeddings = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: products.map(p => `${p.name}: ${p.description}`),
});
// Upsert into Qdrant
await qdrant.upsert('products', {
points: products.map((p, i) => ({
id: p.id,
vector: embeddings.data[i].embedding,
payload: { name: p.name, description: p.description },
})),
});payload is arbitrary metadata Qdrant returns with results — same as _source in Elasticsearch.
Query
// Embed the query the SAME way
const queryEmb = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'something to make coffee in the morning',
});
// Find top 3 most similar
const results = await qdrant.search('products', {
vector: queryEmb.data[0].embedding,
limit: 3,
with_payload: true,
});
for (const hit of results) {
console.log(`${hit.score.toFixed(3)} ${hit.payload.name}`);
}Expected output (rough):
0.523 Espresso Maker
0.487 French Press
0.412 Coffee GrinderNote that no product has the word "morning" — vector search found them by meaning. The tea kettle and reading lamp scored lower because they're semantically distant.
Add Metadata Filtering
Filter on metadata while doing vector search:
// First, recreate with payload indexes for fast filtering
await qdrant.createPayloadIndex('products', { field_name: 'category', field_schema: 'keyword' });
// Upsert again with category
// products[i].payload.category = ...
// Query with filter
const results = await qdrant.search('products', {
vector: queryEmb.data[0].embedding,
filter: {
must: [
{ key: 'category', match: { value: 'kitchen' } },
{ key: 'price', range: { lte: 100 } },
],
},
limit: 5,
});Filters narrow the candidate set before ANN search — much faster than retrieving 1000 results then filtering in your code.
pgvector Equivalent
If you'd rather use Postgres:
CREATE EXTENSION vector;
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
description TEXT,
category TEXT,
embedding vector(1536)
);
CREATE INDEX ON products USING hnsw (embedding vector_cosine_ops);
-- Insert
INSERT INTO products (name, description, embedding)
VALUES ('Espresso Maker', '...', '[0.21, -0.43, ...]'::vector);
-- Query
SELECT id, name, 1 - (embedding <=> '[0.19, -0.41, ...]'::vector) AS similarity
FROM products
WHERE category = 'kitchen'
ORDER BY embedding <=> '[0.19, -0.41, ...]'::vector
LIMIT 5;The <=> operator is cosine distance. Index type hnsw is the fast ANN; ivfflat is an alternative. Postgres handles filtering in the same query — convenient if your data already lives there.
Pinecone Equivalent
import { Pinecone } from '@pinecone-database/pinecone';
const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pc.index('products');
// Upsert
await index.upsert([
{ id: '1', values: embedding, metadata: { name: 'Espresso Maker' } },
]);
// Query
const results = await index.query({
vector: queryEmbedding,
topK: 5,
filter: { category: { '$eq': 'kitchen' } },
includeMetadata: true,
});Different SDK shape; same conceptual operations.
A Complete RAG Loop
async function ragQuery(question) {
// 1. Embed the question
const qEmb = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: question,
});
// 2. Retrieve top 5 relevant docs
const docs = await qdrant.search('knowledge', {
vector: qEmb.data[0].embedding,
limit: 5,
with_payload: true,
});
// 3. Build context
const context = docs.map(d => d.payload.text).join('\n\n---\n\n');
// 4. Ask the LLM with context
const completion = await openai.chat.completions.create({
model: 'gpt-4',
messages: [
{ role: 'system', content: 'Answer using only the context below.' },
{ role: 'user', content: `Context:\n${context}\n\nQuestion: ${question}` },
],
});
return completion.choices[0].message.content;
}That's RAG in 20 lines. Production RAG adds: chunking strategy, hybrid retrieval, reranking, citation tracking, eval — but the core loop is this.
Tear Down
docker compose down -vWhat's Next
You can store and search vectors. Real-world performance comes from combining vector with keyword:
- Hybrid Search — combining keyword and vector for better results, chunking, reranking
- Best Practices — choosing embedding models, dimensions, indexes, cost