06.C

Microservices · Docker

A Dockerized embedding pipeline. One service calls the embedding model and emits vectors; another writes them into the vector DB. Scale horizontally with one flag.

$docker compose up -d --scale embedder=1

>idle · throughput 0 chunks/s · completed 0

embedder replicas

ingest

text chunks queue

(awaiting docs)

embedder × 1

docker · cpu-bound

embedder-1idle

vector queue

redis · pub/sub

(empty)

vector-writer

docker · i/o-bound

idle

vector db

opensearch

indexed vectors

shards: 4

p95 upsert: 38ms

// the docker-compose for this

● docker-compose.yml

# docker-compose.yml — embedding pipeline
version: "3.9"
services:
  embedder:
    image: jun/embedder-svc:1.0
    deploy:
      replicas: ${EMBEDDER_REPLICAS:-1}
    environment:
      - EMBED_API_URL=https://embed.example.com
      - QUEUE_OUT=redis://queue:6379/vectors
    depends_on: [queue]

  vector-writer:
    image: jun/vector-writer-svc:1.0
    environment:
      - QUEUE_IN=redis://queue:6379/vectors
      - VECTOR_DB_URL=https://opensearch:9200
    depends_on: [queue, opensearch]

  queue:
    image: redis:7-alpine

  opensearch:
    image: opensearchproject/opensearch:2.11.0
    environment: [discovery.type=single-node]

The two services are split deliberately: the embedder is CPU-bound (mostly waits on the model API and serialises tensors) so we scale it horizontally. The writer is I/O-bound (talks to the vector DB) so a single instance with batching is usually enough.

A small queue in the middle decouples them — writers don't block embedders, and a brief writer outage just backs the queue up instead of taking the pipeline down.

Bumping replicas at the top is the entire change to scale throughput ×N. Same image, same env, no code edit. That's the part people undersell about Docker — scale and deploy live in one config.