Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have not found this to be true at all in my field (natural language generation).

We have a 7 figure GPU setup that is running 24/7 at 100% utilization just to handle inference.



Also true of self-driving. You train a perception model for a week and then log millions of vehicle-hours on inference.


How do you train new models if your GPUs are being used for inference? I guess the training happens significantly less frequently?

Forgive my ignorance.


We have different servers for each. But the split is usually 80%/20% for inference/training. As our product grows in usage the 80% number is steadily increasing.

That isn't because we aren't training that often - we are almost always training many new models. It is just that inference is so computationally expensive!


Are you training new models from scratch or just fine tuning LLMs? I'm from the CV side and we tend to train stuff from scratch because we're still highly focused on finding new architectures and how to scale. The NLP people I know tend to use LLMs and existing checkpoints so their experiments tend to be a lot cheaper.

Not that anyone should think any aspect (training nor inference) is cheap.


Typically a different set of hardware for model training.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: