Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

DALL-E, Stable Diffusion, GPT-3, Whisper, CLIP, etc are all trained on "hot garbage" and all of them are SOTA. Whisper is a great example, as it shows that this broader use of imperfect training data helps to make the models more robust and general than their "perfectly" trained counterparts. The trick behind all of these is to build mechanisms on smaller scale, human labelled data that can then be used to filter and label the broader dataset. Or use training methods that are more robust to imperfect data, like contrastive learning ala CLIP.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: