I see the exact opposite - any open source model is going to become prohibitivel...

wraptile · on Dec 28, 2023

Exactly this. I work at a small web scraping company (so I might be a bit bias) and any small business can collect a fair, capable datasets of public data for model training, sentiment analysis or whatever today. If public data is stopped by copyright as this lawsuit implies that would just mean only giant corporations and pirates would be able to afford this.

This would be a huge blow to open-source and research developers and I'd even argue it could help openAI to get a bit of a moat ala regulatory capture.

pas · on Dec 28, 2023

research is fair use, also providing something amazing like Wikipedia is arguably educational (again fair use), reselling NYT articles on-demand via an API is by itself neither, so likely not free use

wraptile · on Jan 2, 2024

Fair use is irrelevant here as no small business would ever risk court dragging even though they are in the right. Especially since breaking ToS and "business damage" are easiest attachments to any lawsuit related to digital space.

lesuorac · on Dec 28, 2023

You may remember the Google Books lawsuit where Google was digitally copying the entirety of books and making them available online.

Google won that suit under fair-use as a massive searchable database was found to be transformative as well as the non-commercial nature.

So; if your web scraping companies goal is to allow people to bypass a paywall I suspect you'll have trouble in the future. If your web scraping company instead say allows people to do market analysis on how many people need a piano tuner in NYC and it doesn't do that by copying a NYT article doing original research I think you'll be fine.

xbar · on Dec 28, 2023

This feels like a 1996 "music is too expensive for kids so they HAVE to pirate it."

serjester · on Dec 28, 2023

NYT is seeking billions of dollars - I’m not sure that’s a fair comparison.

xbar · on Dec 29, 2023

I do not pretend to have any idea what the sum total of NYT content is worth, but we will see what a jury/judge decides.