Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The $5.5m in compute wasn't for R1, it was for DeepSeek v3.

The R1 trick looks like it may be a whole lot cheaper than that. R1 apparently used just 800,000 samples - I don't fully understand the processing needed on top of those samples but I get the impression it's a whole lot less compute than the $5.5m used to train v3.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: