I personally want to run linux and feel like I'll get a better price/GB offering that way. But, it is confusing to know how local models will actually work on those and the drawbacks of iGPU.
iGPUs are typically weak, and/or aren't capable of running the LLM so the CPU is used instead. You can run things this way, but it's not fast, and it gets slower as the models go up in size.
If you want things to run quickly, then aside from Macs, there's the 2025 ASUS Flow z13 which (afaik) is the only laptop with AMD's new Ryzen Max+ 395 processor. This is powerful and has up to 128Gb of RAM that can be shared with the GPU, but they're very rare (and Mac-expensive) at the moment.
The other variable for running LLMs quickly is memory bandwidth; the Max+ 395 has 256Gb/s, which is similar to the M4 Pro; the M4 Max chips are considerably higher. Apple fell on their feet on this one.
LLM evaluation on GPU and CPU is memory bandwidth constrained. The highest-end Apple machines are good for this because they have ~500GBps high memory bandwidth and up to ~128GB, not just because they can share that memory with the GPU (which any iGPU does). Most consumer machines are limited to 2xDDR5 channels (~50GBps).
> So a home workstation with 64GB+ of RAM could get similar results?
Similar in quality, but CPU generation will be slower than what macs can do.
What you can do with MoEs (GLMs and Qwens) is to run some experts (the shared ones usually) on a GPU (even a 12GB/16GB will do) and the rest from RAM on CPU. That will speed things up considerably (especially prompt processing). If you're interested in this, look up llama.cpp and especially ik_llama, which is a fork dedicated to this kind of selective offloading of experts.
You can run, it will just run on CPU and will be pretty slow. Macs, like everyone in this thread said, use unified memory, so it's 64GB between CPU and GPU, while for you its just 64 for CPU.
So a home workstation with 64GB+ of RAM could get similar results?