Given the dude spent half a million dollars on training SD, I wouldn't be surpri...

nullc · on Oct 11, 2022

Perhaps, but the model weights themselves are currently understood to be uncopyrightable, and it's pretty inconceivable that the model could become copyrightable without becoming a derivative work of the training data.

Unless these AI companies want google and facebook to be literally the only companies in the world that can train large scale machine learning (by using their TOS to get licenses from their users) they should tread carefully.

In this particular case the leaked code apparently exposed that the proprietary codebase was also using the OSS developer's work without attribution.

shadowgovt · on Oct 11, 2022

If anything, in the absence of copyrightability, the only protection is trade secrecy and I'd expect Emad to be even deeper in the opinion space of "We cut off the oxygen (systematically speaking) of those who would steal trained ML data."

There's an interesting anecdote around how stand-up comedians protect jokes against theft, given how weak copyright is on jokes: it's keying cars, poisoning drinks (generally non-fatally, but it's hard to have a good night on stage when your lower GI tract wants to be elsewhere), and never-work-in-this-town-again agreements. We put these protections into the law because the alternative isn't no protection; it's people-take-it-into-their-own-hands protection.

pclmulqdq · on Oct 11, 2022

This method only works when the people you are trying to attack and/or blackball from an industry actually want to be in that industry. Lots of people will just use these ML models without trying to participate in the "ML community" the same way my wife and I tell each other jokes from comedians without trying to be comedians ourselves.

The users of these models and the developers are fundamentally different.

nonbirithm · on Oct 11, 2022

> [...] it's pretty inconceivable that the model could become copyrightable without becoming a derivative work of the training data.

Another perspective that may become important is the fact that not all cultures share the same interpretation of copyright. In Japan there was a case in which a court ruled that selling a memory card with preloaded save data for a video game was a breach of the original work's integrity.[1]

This I think will get greater attention in the near future because a large portion of the interest in SD stems from generating new art derived from the styles of art on Pixiv, a Japanese website. The data for many popular forks of SD like Waifu Diffusion and the proprietary NovelAI model were sourced from Western sites like Danbooru, which has been known for violating copyright and artist takedown requests by reposting art without the creator's permission for many years. With the sheer popularity of SD and the fact that so much of the innovation came off of the backs of thousands of artists who weren't so much as asked for consent, it remains to be seen if attitudes towards those sites and this process of mass-scale data collection will remain the same in the near future.

I also have to wonder what the implications would have been if NovelAI ended up launching what is now the leaked model as a paid service, given the unresolved question of consent that surrounds the original data.

HN and the people who support SD can have their own opinions about copyright not applying in this specific case. They can delve into the technicalities of why they think the models are not copyrightable. But even beyond legal means, the artists can still ask the programmers to take everything down, and potentially be refused. The insistence that "it's different in this case" can break the hearts of people that see the world differently.

I think this will be a debate that transcends arguing over the technicalities of copyright, involving fundamentally differing cultural values of how the acts of creation and reproduction should be treated with respect. It will not end with "how will this fit into the existing (Western-centric) framework of copyright," but "what is the right thing to do."

[1] https://ja.m.wikipedia.org/wiki/%E3%81%A8%E3%81%8D%E3%82%81%...

nullc · on Oct 11, 2022

The law is not ethics. The law is the bare minimum.

Anyone who sets their ethics based on the law is probably acting like a big jerk.

:)

ImprobableTruth · on Oct 11, 2022

>also have to wonder what the implications would have been if NovelAI ended up launching what is now the leaked model as a paid service

huh? It was launched as a paid (subscription only) service before even getting leaked.

nonbirithm · on Oct 12, 2022

Yes, you're right, sorry. I'm not sure what I was thinking when I wrote that.

Looking back it seems many in the Japanese community on Twitter aren't happy with discovering their art is being used as training data. In the last week since the NovelAI announcement the number of banned artists on Danbooru has doubled, approximately the same amount as all the banned artists that were registered in the site's entire history.

Beyond the training issue there is the fact that the art was reposted on another site at all, which is what many take note of. It seems Danbooru's initial statement to Japanese users made it seem like it was NovelAI's fault that things blew over, without bringing up the reposting part that was the root issue, and it didn't seem to be an effective apology as a result.

fomine3 · on Oct 13, 2022

Aside from reposting on Danbooru, training usage is explicitly allowed without original creator's permission by Japan law. https://storialaw.jp/en/service/bigdata/bigdata-12

Possibly that's why artists taking down from Danbooru , it's their right.

mensetmanusman · on Oct 11, 2022

“ it's pretty inconceivable that the model could become copyrightable”

If it costs $10 million to find information/weightings/etc., our current legal system would consider that intellectual property which might not be copyrightable but would be considered IP theft if stolen.

nullc · on Oct 11, 2022

Yes, it could be a trade secret, but if the trade secrecy would still apply is extraordinarily fact specific.

If they were negligent in handling it, e.g. left it on a publicly accessible share and some member of the public stumbled into it, then trade secret protection would likely be lost.

If some employee violated their NDA and snuck it out-- well that would be a different matter. etc.