Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think of large models, ones such as CoPilot, as lossy compression with content addressable retrieval. If you type the first few parts of some content it has stored then it will retrieve the rest for you.

The blocks retrieved are very small and many of them occur frequently. “if” followed by “(“ for example — hardly worthy of copyright, but we also know that they were literally taken from copyrighted material.

(I don’t think the model starts out with any existing knowledge of syntax / grammar of, say, Python?)

Even if some of that material was public domain, a lot of it wasn’t and at best requires attribution; at worst, full licensing conditions.

To put it another way: it doesn’t matter how many vegetable ingredients they throw into the sausage or how elaborate the sausage making machine is: if they put pork in, the stuff that comes out ain’t vegetarian.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: