InvidFlower's comments

InvidFlower · on March 13, 2025

While that is cool in principal, I'm not sure how well it'd actually work in reality. First, there is the technical challenge. My understanding is the weights can have a lot of fluctuation, especially early on. How do we actually determine how much influence a given piece of content has on the final weights?

Then if we get past that, my suspicion is that you could game the training. Like have as much of the process happen via public domain sources or pay-once licenses. That would cover a lot of the fundamental knowledge and processes. Then you could fine-tune on copyrighted data. That might actually make it easier to see how much influence on the final weights that content has, but is also would probably be a lot less influence. There's a big difference between a painting of an apple being the main contribution to the concept of "apple" in an image model, vs mention of that painting corresponding to a few weights that just reference a bunch of other concepts that were learned via open data.

ivanmontillam · on March 14, 2025

> First, there is the technical challenge. My understanding is the weights can have a lot of fluctuation, especially early on. How do we actually determine how much influence a given piece of content has on the final weights?

Well, Bing AI already knows where it drew the information from and cites sources; so it would be a matter of making the deal.

How to enforce it? that's the main question I reckon.

> Then if we get past that, my suspicion is that you could game the training. Like have as much of the process happen via public domain sources or pay-once licenses.

I agree.

InvidFlower · on March 9, 2025

There's been some work on memory lately like Transformer² and Titans. But that may not be necessary for decent agents. Even the "context in a loop" is getting better as general reliability of tasks increases, and as people find better ways to manage context. Like Cline has the Memory Bank prompt where there's a folder of markdown files at diff levels of abstractions of info on the project, current feature, current task progress, etc. And so it can update those files and read from them as a consistent memory.

InvidFlower · on March 7, 2025

I think you'll get a lot of use from this video from Andrej Karpathy: https://www.youtube.com/watch?v=7xTGNNLPyMI

It is long, but don't get scared off. He goes over a ton of different stuff related to model training, but makes it very easy to understand.

InvidFlower · on March 7, 2025

We've already seen Qwen's new QWQ 32B (not distilled) model doing impressive things on benchmarks. It'll definitely be interesting to see how just good small models can get. When combined with rag and large context window for expanded knowledge, might be able to get pretty far.

InvidFlower · on March 7, 2025

Well, not sure if that part matters as much (from first principles). But the more important part being that RL lets a model figure out which methods are effective for it. Most of the time it probably has the tools already from pre-training, but doesn't "make the connection" to use them (or at least not often enough).

InvidFlower · on March 7, 2025

While it is nice to have more options, it still definitely isn't at a human level yet for hard to read text. Still haven't seen anything that can deal with something like this very well: https://i.imgur.com/n2sBFdJ.jpeg

If I remember right, Gemini actually was the closest as far as accuracy of the parts where it "behaved", but it'd start to go off the rails and reword things at the end of larger paragraphs. Maybe if the image was broken up into smaller chunks. In comparison, Mistral for the most part (besides on one particular line for some reason) sticks to the same number of words, but gets a lot wrong on the specifics.

InvidFlower · on March 7, 2025

It is confusing, but they have diff calls for pdfs vs images. In their example google colab: https://colab.research.google.com/drive/11NdqWVwC_TtJyKT6cmu...

The first couple of sections are for pdfs and you need to skip all that (search for "And Image files...") to find the image extraction portion. Basically it needs ImageURLChunk instead of DocumentURLChunk.