06.C
Microservices · Docker
A Dockerized embedding pipeline. One service calls the embedding model and emits vectors; another writes them into the vector DB. Scale horizontally with one flag.
$docker compose up -d --scale embedder=1
>idle · throughput 0 chunks/s · completed 0
embedder replicas
ingest
text chunks queue
(awaiting docs)
embedder × 1
docker · cpu-bound
embedder-1idle
vector queue
redis · pub/sub
(empty)
vector-writer
docker · i/o-bound
idle
vector db
opensearch
0
indexed vectors
shards: 4
p95 upsert: 38ms
// the docker-compose for this
● docker-compose.yml
# docker-compose.yml — embedding pipeline
version: "3.9"
services:
embedder:
image: jun/embedder-svc:1.0
deploy:
replicas: ${EMBEDDER_REPLICAS:-1}
environment:
- EMBED_API_URL=https://embed.example.com
- QUEUE_OUT=redis://queue:6379/vectors
depends_on: [queue]
vector-writer:
image: jun/vector-writer-svc:1.0
environment:
- QUEUE_IN=redis://queue:6379/vectors
- VECTOR_DB_URL=https://opensearch:9200
depends_on: [queue, opensearch]
queue:
image: redis:7-alpine
opensearch:
image: opensearchproject/opensearch:2.11.0
environment: [discovery.type=single-node]The two services are split deliberately: the embedder is CPU-bound (mostly waits on the model API and serialises tensors) so we scale it horizontally. The writer is I/O-bound (talks to the vector DB) so a single instance with batching is usually enough.
A small queue in the middle decouples them — writers don't block embedders, and a brief writer outage just backs the queue up instead of taking the pipeline down.
Bumping replicas at the top is the entire change to scale throughput ×N. Same image, same env, no code edit. That's the part people undersell about Docker — scale and deploy live in one config.