We process video, images, and documents through 20+ ML models simultaneously at Mixpeek. A single 10-minute video triggers transcription, visual embeddings, scene descriptions, face detection, object detection, brand safety classification, and more — all in parallel with different compute requirements.
We wrote up the full Ray architecture we use in production on KubeRay/GKE. Not a tutorial — more of a "here's what we actually run and what bit us."
Some highlights:
- *Custom resource isolation* — We use a synthetic `{"batch": 1}` resource to prevent batch pipeline tasks from starving Ray Serve inference replicas. Same cluster, zero interference, no runtime overhead.
- *Flexible actor pools* — Fixed-size `ActorPoolStrategy(size=8)` deadlocks when concurrent jobs compete for workers. `min_size=1, max_size=N` guarantees every job can make progress.
- *Shared preprocessing* — Naive approach runs S3 download + format normalization once per extractor. With 10 extractors on 1,000 files, that's 10,000 redundant reads. We preprocess once and fan out via Ray Dataset.
- *Distributed Qdrant writes* — Ray Data's `Datasink` API distributes vector DB writes across all workers with backpressure, instead of collecting everything on one node.
- *Fire-and-forget progress tracking* — A Ray actor as a shared counter lets workers report progress without blocking the pipeline.
- *Zero-CPU head node* — Learned this one the hard way when a runaway batch job took down our scheduler.
The post includes the KubeRay YAML, Ray Serve autoscaling configs, pipeline code, and the LocalStack parquet workaround that saved us hours of debugging silent hangs.
llm coordination is just one feature - the core (and why i built amux) was so that i can quickly delegate from my phone, see outputs, monitor, etc without raw ssh.
A couple implementation details for anyone curious: CAD previews and exploded views are rendered client-side using Replicad (WASM) + WebGL, so there’s no server-side geometry rendering.
I also recorded a short walkthrough showing a build from prompt → parts → enclosure → validation:
a distributed compute framework for unstructured data that treats retrieval as a first class citizen - it feels like we're rebuilding the modern data warehouse using all ai native primitives. joins, clustering, retrieval, all using distributed compute/inference primitives.
you have no idea what you're talking about - every single country that experiences domestic terrorism relies on israeli intelligence for counter terrorism. almost all of europe, us, much of the middle east all have very active intelligence partnerships.
if you think it's one-sided you're either severely misinformed or bigoted.
Obviously I must be an anti-Semite if I don’t 150% support the politics of Israel and their brutality in the West Bank.
In reality though, I have completed 5 CENTCOM US military deployments. There are few people on HN more qualified to speak to the nature of US alliances in the region.
Israel is a terrorist state. I don't know how else you can call it. They have a state policy of terrorizing their neighbors in order to get them to leave their land and have for decades. The fact that they also help spy on our citizens for our government should not be a reason to support them.
in what universe is that happening? you think the world is safer with even a 10% likliehood of the world's largest terror network getting access to WMD? you're off your rocker.
The US over the past few decades and Russia over the past five years and Israel over the past year have inflicted quite a bit of terror, and they all have nuclear weapons.
None of them would have done it if their victims had them.
Iran's contribution to inflicting misery, death, and indiscriminate destruction on the world is a rounding error in comparison, and its bound by the same formula of MAD as anyone else is.
If it wasn't suicide and I was the big boss, I would get some nuclear subs for my irrelevant South American nation ASAP. The "rules based order" is just wet toilet paper, who's to say that in 50 years we or our neighbors aren't next?
Gringos have always been crazy, but now y'all are getting extra spicy. Qaddafi, Ukraine and now Iran. Get nukes or bust is the name of the game now.
Are you suggesting that states may bomb each other when they don't want to "take the risk" of the other state possibly carrying out a dangerous attack on them in the future?
Plus, the nuclear issue is the excuse, not the reason. Palestine, Lebanon, Syria (+ regime change, sorta), Iraq (+ regime change), Afghanistan and now Iran. All attacked repeatedly and extensively over the past two decades.
We wrote up the full Ray architecture we use in production on KubeRay/GKE. Not a tutorial — more of a "here's what we actually run and what bit us."
Some highlights:
- *Custom resource isolation* — We use a synthetic `{"batch": 1}` resource to prevent batch pipeline tasks from starving Ray Serve inference replicas. Same cluster, zero interference, no runtime overhead.
- *Flexible actor pools* — Fixed-size `ActorPoolStrategy(size=8)` deadlocks when concurrent jobs compete for workers. `min_size=1, max_size=N` guarantees every job can make progress.
- *Shared preprocessing* — Naive approach runs S3 download + format normalization once per extractor. With 10 extractors on 1,000 files, that's 10,000 redundant reads. We preprocess once and fan out via Ray Dataset.
- *Distributed Qdrant writes* — Ray Data's `Datasink` API distributes vector DB writes across all workers with backpressure, instead of collecting everything on one node.
- *Fire-and-forget progress tracking* — A Ray actor as a shared counter lets workers report progress without blocking the pipeline.
- *Zero-CPU head node* — Learned this one the hard way when a runaway batch job took down our scheduler.
The post includes the KubeRay YAML, Ray Serve autoscaling configs, pipeline code, and the LocalStack parquet workaround that saved us hours of debugging silent hangs.
https://mixpeek.com/blog/ray-distributed-ml-pipeline-archite...
Happy to answer questions about any of the patterns or trade-offs.
reply