Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When LLMs are based on stolen work and violate GPL terms, which should be already illegal, it's very much okay to be furious about the fact that they additionally ruin respective business models of open source, thanks to which they are possible in the guest place.




> the fact that they additionally ruin respective business models of open source

The what now? Open source doesn't have a business model, it's all about the licensing.

FOSS is about making code available to others, for any purpose, and that still works the same as 20 years ago when I got started. Some seem to wake up to what "for any purpose" actually mean, but for many of us that's quite the point, that we don't make choices for others.


If something is not technically illegal that does not mean it cannot be bad.

Like I said, there is a part that should be illegal, and then part where that's used to additionally harm one of the ways that OSS can be sustainable. The second part on its own is not illegal but adds to damages and is perfectly okay to condemn.

Open source software can have business models, it's one of the ways it can be sustainable. It can work like, for example, the code is made available (for any purpose) and the core maintainer company provides services, like with Nginx (BSD). Or there is an open-source software, and companies create paid products and services on top while respecting the terms of that software and contributing back, like with Linux (GPL) and SUSE/Red Hat.


> If something is not technically illegal that does not mean it cannot be bad.

Ok? I agree, but unsure what exactly that's relevant to here in our discussion.

> Open source software can have business models

I believe "businesses" are the ones who have "business models", and some of those chose to use open source as part of their business model. But "open source" the ecosystem has nothing to do with that, it's for-profit companies trying to use and leverage open source, rather than the open source community suddenly wanting to do something completely different from what it's been doing since inception.


> unsure what exactly that's relevant to here in our discussion.

I'll remind then. Our discussion follows the top statement "It seems open source loses the most from AI". As far as I understand nobody narrowed the context to "what is currently legal". Something can be technically legal and still harmful to open source. Also, laws are never perfect and sometimes they need to be updated.

(For example, I know that a number of people would say US abducting and detaining citizens and brutally deporting immigrants is not illegal, but if it's technically legal does that make it OK?)

> what it's been doing since inception.

At inception open source was mostly personal side projects for funsies (like Linux) sponsored by maintainer having a dayjob. The big leap happened when copyleft licenses made it such that success of a big commercial company building products on open-source projects would directly improve these open-source projects. And it's nothing new, it happened long time ago. The desire for volunteer contributions to codebase to remain for public benefit in perpetuity is exactly the point of strong copyleft, and it's exactly what's being circumvented by LLM washing. The fact that these LLMs subsequently also harm open source communities adds insult to injury.


>“Free software” means software that respects users' freedom and community. Roughly, it means that the users have the freedom to run, copy, distribute, study, change and improve the software.

https://www.gnu.org/philosophy/free-sw.html

Being able to learn from the code is a core part of the ideology embedded into the GPL. Not only that, but LLMs learning from code is fair use.


> Being able to learn from the code is a core part of the ideology embedded into the GPL.

I have to imagine this ideology was developed with humans in mind.

> but LLMs learning from code is fair use

If by “fair use” you mean the legal term of art, that question is still very much up in the air. If by “fair use” you mean “I think it is fair” then sure, that’s an opinion you’re entitled to have.


> I have to imagine this ideology was developed with humans in mind.

Actually, you don't have to. You just want to.

N=1 but to me, LLMs are a perfect example of where the "ideology embedded into the GPL" benefits the world.

The point of Free Software isn't for developers to sort-of-but-not-quite give away the code. The point of Free Software is to promote self-sufficient communities.

GPL through its clauses, particularly the viral/forced reciprocity ones, prevents software itself from becoming an asset that can be rented, but it doesn't prevent business around software. RMS/FSF didn't make the common (among fans of OSS and Free Software) but dumb assumption that everyone wants or should be a developer - the license is structured to allow anyone to learn from and modify software, including paying a specialist to do it for them. Small-scale specialization and local markets are key for robust and healthy communities, and this is what Free Software ultimately encourages.

LLMs becoming a cheap tool for modifying or writing software, even by non-specialists (or at least people who aren't domain experts), furthers those same goals, by increasing individual and communal self-sufficiency and self-reliance.

(INB4: The fact that good LLMs are themselves owned by some multinational corps is irrelevant - much in the same way as cars are important tool for personal and communal self-sufficiently, despite being designed and manufactured by few large corporations. They're still tools ~anyone can use.)


Something can be illegal and it can be technically legal but at the same time pretty damn bad. There is the spirit and the letter of the law. They can never be in perfect agreement because as time goes bad guys tend to find new workarounds.

So either the community behaves, or the letter becomes more and more complicated trying to be more specific about what should be illegal. Now that GPL is trivially washed by asking a black box trained on GPLed code to reproduce the same thing it might be inevitable, I suppose.

> They're still tools ~anyone can use

Of course, technology itself is not evil, just like crypto or nuclear fission. In this case when we are discussing harm we are almost always talking about commercial LLM operators. However, when the technology is mostly represented by that, it doesn't seem required to add a caveat every time LLMs are mentioned.

There's hardly a good, truly fully open LLM that one can actually run on own hardware. Part of the reason is that hardly anyone, in the grand scheme of things, even has the hardware required.

(Even if someone is a techie and has the money and knows how to set up a rig, which is almost nobody on grand scale of the things, now big LLM operators make sure there are no chips left for them.)

So you can buy and own (and sell) a car, but ~anyone cannot buy and run an independent LLM (and obviously not train one). ~everyone ends up using a commercial LLM powered by some megacorp's infinite compute and scraping resources and paying that megacorp one way or another, ultimately helping them do more of the stuff that they do, like harming OSS.


That car analogy seems really weak. It might make sense, but only if we replace Ford, Chevy, et al with Enterprise or Hertz etc.

> Actually, you don't have to. You just want to.

Fair.

> The point of Free Software isn't for developers to sort-of-but-not-quite give away the code. The point of Free Software is to promote self-sufficient communities.

… that are all reliant on gatekeepers, who also decide the model ethics unilaterally, among other things.

> (INB4: The fact that good LLMs are themselves owned by some multinational corps is irrelevant - much in the same way as cars are important tool for personal and communal self-sufficiently, despite being designed and manufactured by few large corporations. They're still tools ~anyone can use.)

You’re not wrong. But wouldn’t the spirit of Free Software also apply to model weights? Or do the large corps get a pass?

FWIW I don’t have a problem with LLMs per se. Just models that are either proprietary or effectively proprietary. Oligarchy ain’t freedom :)


> > Actually, you don't have to. You just want to.

> Fair.

I don't think it's fair. That ideology was unquestionably developed with humans in mind. It happened in the 80s, and back then I don't think anyone had a crazy idea that software can think for itself and so terms "use" and "learn" can apply to it. (I mean, it's a crazy idea still, but unfortunately not to everyone.)

One can suggest that free software ideology should be expanded to include software itself in the beneficiaries of the license, not just human society. That's a big call and needs a lot of proof that software can decide things on its own, and not just do what humans tell it.


> It happened in the 80s, and back then I don't think anyone had a crazy idea that software can think for itself and so terms "use" and "learn" can apply to it. (I mean, it's a crazy idea still, but unfortunately not to everyone.)

Sure they did. It was the golden age of Science Fiction, and let's just say that the stereotype of programmers and hackers being nerds with sci-fi obsession actually had a good basis in reality.

Also those ideas aren't crazy, they're obvious, and have already been obvious back then.


> It was the golden age of Science Fiction, and let's just say that the stereotype of programmers and hackers being nerds with sci-fi obsession actually had a good basis in reality.

At worst you are trying to disparage the entire idea of open source by painting the people who championed it as idiots who cannot tell fiction from reality. At best you are making a fool of yourself. If you say that free software philosophy means "also, potential sentient software that may become a reality in 100 years" everywhere it mentions "users" and "people" you better quote some sources.

> Also those ideas aren't crazy, they're obvious, and have already been obvious back then.

Fire-breathing dragons. Little green extraterrestrial humanoids. Telepathy. All of these ideas are obvious, and have been obvious for ages. None of these things exist. Sorry to break it to you, but even if an idea is obvious it doesn't make it real.

(I'll skip over the part where if you really think chatbots are sentient like humans then you might be defending an industry that is built on mass-scale abuse of sentient beings.)


> I have to imagine this ideology was developed with humans in mind.

Given what a big deal RMS made over not descriminating over purpose (https://www.gnu.org/philosophy/free-sw.html#run-the-program) i think that is far from clear.


>question is still very much up in the air

It is not up in the air at all. It's completely transformative.


1. It's decided by courts in US. Courts in US currently are very friendly to big tech. At this point if they deny this and say something that undermines this industry it's going to be a big economic blow, the country is way over-invested in this tech and its infrastructure.

2. "Transformative means fair" is the old idea from pre-LLM world. That's a different world. Now those IP laws are obsolete and need to be significantly updated.


Last time I checked, there are still undecided cases wrt fair use. Sure, it’s looking favorable for LLM training, but it’s definitely still up in the air.

> it’s completely transformative

IANAL, but apparently hinges on how the training material is acquired


> IANAL, but apparently hinges on how the training material is acquired

That doesn't make sense. You are either transforming something or you are not. There might be other legal considerations based on how you acquired, but it doesn't affect if something is transformative.


That freedom for many free licenses comes with the caveat that you provide basic attribution and the same freedom to your users.

LLMs don't (cannot, by design) provide attribution, nor do LLM users have the freedom to run most of these models themselves.


I think LLMs could provide attribution. Either running a second hidden prompt (like, who said this?) or by doing reverse query on the training dataset. Say if they do it with even 98% accuracy it would probably be good enough. Especially for bits of info where there's very few or even just one source.

Of course it would be more expensive to get them to do it.

But if it was required to provide attribution with some % accuracy, plus we identified and addressed other problems like GPL washing/piracy of our intellectual property/people going insane with chatbots/opinion manipulation and hidden advertisement, then at some point commercial LLMs could become actually not bad for us.


That is if you redistribute or make a derivative work. Applying learnings you made from such software does not require such attribution.

Here we are talking about derivative works, not "learnings".

In the first sentence "you" actually refers to you, a person, in the second you're intentionally cheating and applying it to a machine doing a mechanical transformation. One so mechanical that different LLMs trained on the same material would have output that closely resembles each other.

The only indispensable part is the resource you're pirating. A resource that was given to you under the most generous of terms, which you ignored and decided to be guided by a purpose that you've assigned to those terms that embodies an intention that has been specifically denied. You do this because it allows you to do what you want to do. It's motivated "reasoning."

Without this "FOSS is for learning" thing you think overrules the license, you are no more justified in training off of it without complying with the terms than training on pirated Microsoft code without complying with their terms. People who work at Microsoft learn on Microsoft code, too, but you don't feel entitled to that.


I'm not sure it's always bad intent. People often don't get that "machine learning" is a compound industrial term where "learning" is not literally "learning" just like "machine" is not literally "machine".

So it's sort of sentient when it comes to training and generating derivative works but when you ask "if it's actually sentient then are you in the business of abusing sentient beings?" then it's just a tool.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: