Is it possible that in a few years time, only Mac silicon and PCs with high-end ...

anotherhue · on Oct 11, 2023

> Is it possible that in a few years time, only Mac silicon and PCs with high-end GPUs will be required to run "In-home LLMs" affordably?

No. They work well on the apple chips thanks to the integrated memory and the large size of the models. I know of no reason why an x86 chip could not be designed in a similar way if desired. IANAChipDesigner but I have worked for one of them.

behnamoh · on Oct 11, 2023

FWIW, while Apple silicon can _run_ huge models thanks to the unified memory (not to be confused with shared memory), the inference is pretty slow compared to dedicated GPUs, so it's a tradeoff. The significance of this PR is that inference speed can—at least in certain applications—be sped up using parallel decoding.