Run Qdrant in Docker, generate embeddings, store and query - hello-world semantic search

Getting Started

This page boots Qdrant locally, generates embeddings, stores them, and runs a semantic-search query end-to-end. Patterns transfer to pgvector, Pinecone, Weaviate, Chroma.

Run Qdrant

# docker-compose.yml
services:
  qdrant:
    image: qdrant/qdrant:v1.10.1
    ports:
      - "6333:6333"   # REST
      - "6334:6334"   # gRPC
    volumes:
      - qdrant-data:/qdrant/storage

volumes:
  qdrant-data:

docker compose up -d
curl http://localhost:6333/healthz
# healthz check passed

open http://localhost:6333/dashboard   # web UI

Create a Collection

A collection is like a table — it holds vectors with shared dimensions and distance metric.

curl -X PUT http://localhost:6333/collections/products \
  -H 'Content-Type: application/json' \
  --data '{
    "vectors": {
      "size": 1536,
      "distance": "Cosine"
    }
  }'

Distance metric	When
Cosine	Most text embeddings (OpenAI, BGE, Voyage) — angle, not magnitude
Dot	When vectors are pre-normalized (faster than cosine)
Euclidean	When magnitude matters (rare for text)

For OpenAI / most embedding models: Cosine.

Generate Embeddings and Upsert

npm install @qdrant/js-client-rest openai

import { QdrantClient } from '@qdrant/js-client-rest';
import OpenAI from 'openai';

const qdrant = new QdrantClient({ url: 'http://localhost:6333' });
const openai = new OpenAI();   // requires OPENAI_API_KEY

const products = [
  { id: 1, name: 'Espresso Maker', description: 'Dual-boiler espresso machine for coffee enthusiasts' },
  { id: 2, name: 'French Press',   description: 'Classic 8-cup press pot for rich coffee' },
  { id: 3, name: 'Coffee Grinder', description: 'Burr grinder for consistent particle size' },
  { id: 4, name: 'Reading Lamp',   description: 'Warm-light LED desk lamp for evening reading' },
  { id: 5, name: 'Tea Kettle',     description: 'Stovetop kettle with whistle' },
];

// Generate embeddings for all products
const embeddings = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: products.map(p => `${p.name}: ${p.description}`),
});

// Upsert into Qdrant
await qdrant.upsert('products', {
  points: products.map((p, i) => ({
    id: p.id,
    vector: embeddings.data[i].embedding,
    payload: { name: p.name, description: p.description },
  })),
});

payload is arbitrary metadata Qdrant returns with results — same as _source in Elasticsearch.

Query

// Embed the query the SAME way
const queryEmb = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'something to make coffee in the morning',
});

// Find top 3 most similar
const results = await qdrant.search('products', {
  vector: queryEmb.data[0].embedding,
  limit: 3,
  with_payload: true,
});

for (const hit of results) {
  console.log(`${hit.score.toFixed(3)}  ${hit.payload.name}`);
}

Expected output (rough):

0.523  Espresso Maker
0.487  French Press
0.412  Coffee Grinder

Note that no product has the word "morning" — vector search found them by meaning. The tea kettle and reading lamp scored lower because they're semantically distant.

Add Metadata Filtering

Filter on metadata while doing vector search:

// First, recreate with payload indexes for fast filtering
await qdrant.createPayloadIndex('products', { field_name: 'category', field_schema: 'keyword' });

// Upsert again with category
// products[i].payload.category = ...

// Query with filter
const results = await qdrant.search('products', {
  vector: queryEmb.data[0].embedding,
  filter: {
    must: [
      { key: 'category', match: { value: 'kitchen' } },
      { key: 'price', range: { lte: 100 } },
    ],
  },
  limit: 5,
});

Filters narrow the candidate set before ANN search — much faster than retrieving 1000 results then filtering in your code.

pgvector Equivalent

If you'd rather use Postgres:

CREATE EXTENSION vector;

CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  name TEXT,
  description TEXT,
  category TEXT,
  embedding vector(1536)
);

CREATE INDEX ON products USING hnsw (embedding vector_cosine_ops);

-- Insert
INSERT INTO products (name, description, embedding)
VALUES ('Espresso Maker', '...', '[0.21, -0.43, ...]'::vector);

-- Query
SELECT id, name, 1 - (embedding <=> '[0.19, -0.41, ...]'::vector) AS similarity
FROM products
WHERE category = 'kitchen'
ORDER BY embedding <=> '[0.19, -0.41, ...]'::vector
LIMIT 5;

The <=> operator is cosine distance. Index type hnsw is the fast ANN; ivfflat is an alternative. Postgres handles filtering in the same query — convenient if your data already lives there.

Pinecone Equivalent

import { Pinecone } from '@pinecone-database/pinecone';

const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pc.index('products');

// Upsert
await index.upsert([
  { id: '1', values: embedding, metadata: { name: 'Espresso Maker' } },
]);

// Query
const results = await index.query({
  vector: queryEmbedding,
  topK: 5,
  filter: { category: { '$eq': 'kitchen' } },
  includeMetadata: true,
});

Different SDK shape; same conceptual operations.

A Complete RAG Loop

async function ragQuery(question) {
  // 1. Embed the question
  const qEmb = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: question,
  });

  // 2. Retrieve top 5 relevant docs
  const docs = await qdrant.search('knowledge', {
    vector: qEmb.data[0].embedding,
    limit: 5,
    with_payload: true,
  });

  // 3. Build context
  const context = docs.map(d => d.payload.text).join('\n\n---\n\n');

  // 4. Ask the LLM with context
  const completion = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      { role: 'system', content: 'Answer using only the context below.' },
      { role: 'user', content: `Context:\n${context}\n\nQuestion: ${question}` },
    ],
  });

  return completion.choices[0].message.content;
}

That's RAG in 20 lines. Production RAG adds: chunking strategy, hybrid retrieval, reranking, citation tracking, eval — but the core loop is this.

Tear Down

docker compose down -v

What's Next

You can store and search vectors. Real-world performance comes from combining vector with keyword:

Hybrid Search — combining keyword and vector for better results, chunking, reranking
Best Practices — choosing embedding models, dimensions, indexes, cost

Getting Started

On this page