Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And you think that it would be impossible to train a model to avoid outputs that are substantially similar to training data?


I certainly don't think it's impossible, but I think it is hard problem that won't be solved in the immediate future, and creators of data used for training are right to seek to stop wide availability of LLMs that regurgitate information they worked hard to obtain.


I think it will be a bit easier than you believe. The reason why it hasn’t been done yet is that there hasn’t been a compelling economic reason to do so.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: