Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> In the case of gpt-oss 120B that would means sqrt(5*120)=24B.

That's actually in line with what I had (unscientifically) expected. Claude Sonnet 4 seems to agree:

> The most accurate approach for your specific 120B MoE (5.1B active) would be to test it empirically against dense models in the 10-30B range.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: