06.C

Microservices · Docker

A Dockerized embedding pipeline. One service calls the embedding model and emits vectors; another writes them into the vector DB. Scale horizontally with one flag.
$docker compose up -d --scale embedder=1
>idle · throughput 0 chunks/s · completed 0
embedder replicas
ingest
text chunks queue
(awaiting docs)
embedder × 1
docker · cpu-bound
embedder-1idle
vector queue
redis · pub/sub
(empty)
vector-writer
docker · i/o-bound
idle
vector db
opensearch
0
indexed vectors
shards: 4
p95 upsert: 38ms
// the docker-compose for this
docker-compose.yml
# docker-compose.yml — embedding pipeline
version: "3.9"
services:
  embedder:
    image: jun/embedder-svc:1.0
    deploy:
      replicas: ${EMBEDDER_REPLICAS:-1}
    environment:
      - EMBED_API_URL=https://embed.example.com
      - QUEUE_OUT=redis://queue:6379/vectors
    depends_on: [queue]

  vector-writer:
    image: jun/vector-writer-svc:1.0
    environment:
      - QUEUE_IN=redis://queue:6379/vectors
      - VECTOR_DB_URL=https://opensearch:9200
    depends_on: [queue, opensearch]

  queue:
    image: redis:7-alpine

  opensearch:
    image: opensearchproject/opensearch:2.11.0
    environment: [discovery.type=single-node]

The two services are split deliberately: the embedder is CPU-bound (mostly waits on the model API and serialises tensors) so we scale it horizontally. The writer is I/O-bound (talks to the vector DB) so a single instance with batching is usually enough.

A small queue in the middle decouples them — writers don't block embedders, and a brief writer outage just backs the queue up instead of taking the pipeline down.

Bumping replicas at the top is the entire change to scale throughput ×N. Same image, same env, no code edit. That's the part people undersell about Docker — scale and deploy live in one config.

docker compose up -d