While that is cool in principal, I'm not sure how well it'd actually work in reality. First, there is the technical challenge. My understanding is the weights can have a lot of fluctuation, especially early on. How do we actually determine how much influence a given piece of content has on the final weights?
Then if we get past that, my suspicion is that you could game the training. Like have as much of the process happen via public domain sources or pay-once licenses. That would cover a lot of the fundamental knowledge and processes. Then you could fine-tune on copyrighted data. That might actually make it easier to see how much influence on the final weights that content has, but is also would probably be a lot less influence. There's a big difference between a painting of an apple being the main contribution to the concept of "apple" in an image model, vs mention of that painting corresponding to a few weights that just reference a bunch of other concepts that were learned via open data.
> First, there is the technical challenge. My understanding is the weights can have a lot of fluctuation, especially early on. How do we actually determine how much influence a given piece of content has on the final weights?
Well, Bing AI already knows where it drew the information from and cites sources; so it would be a matter of making the deal.
How to enforce it? that's the main question I reckon.
> Then if we get past that, my suspicion is that you could game the training. Like have as much of the process happen via public domain sources or pay-once licenses.
There's been some work on memory lately like Transformer² and Titans. But that may not be necessary for decent agents. Even the "context in a loop" is getting better as general reliability of tasks increases, and as people find better ways to manage context. Like Cline has the Memory Bank prompt where there's a folder of markdown files at diff levels of abstractions of info on the project, current feature, current task progress, etc. And so it can update those files and read from them as a consistent memory.
We've already seen Qwen's new QWQ 32B (not distilled) model doing impressive things on benchmarks. It'll definitely be interesting to see how just good small models can get. When combined with rag and large context window for expanded knowledge, might be able to get pretty far.
Well, not sure if that part matters as much (from first principles). But the more important part being that RL lets a model figure out which methods are effective for it. Most of the time it probably has the tools already from pre-training, but doesn't "make the connection" to use them (or at least not often enough).
While it is nice to have more options, it still definitely isn't at a human level yet for hard to read text. Still haven't seen anything that can deal with something like this very well: https://i.imgur.com/n2sBFdJ.jpeg
If I remember right, Gemini actually was the closest as far as accuracy of the parts where it "behaved", but it'd start to go off the rails and reword things at the end of larger paragraphs. Maybe if the image was broken up into smaller chunks. In comparison, Mistral for the most part (besides on one particular line for some reason) sticks to the same number of words, but gets a lot wrong on the specifics.
The first couple of sections are for pdfs and you need to skip all that (search for "And Image files...") to find the image extraction portion. Basically it needs ImageURLChunk instead of DocumentURLChunk.
Then if we get past that, my suspicion is that you could game the training. Like have as much of the process happen via public domain sources or pay-once licenses. That would cover a lot of the fundamental knowledge and processes. Then you could fine-tune on copyrighted data. That might actually make it easier to see how much influence on the final weights that content has, but is also would probably be a lot less influence. There's a big difference between a painting of an apple being the main contribution to the concept of "apple" in an image model, vs mention of that painting corresponding to a few weights that just reference a bunch of other concepts that were learned via open data.