Don’t get me wrong, but what’s the problem with scrapers? People invest in SEO to become more visible, yet at the same time they fight against “scraper bots.” I’ve always thought the whole point of publicly available information is to be visible. If you want to make money, just put it behind a paywall. Isn’t that the idea?
There's a difference between putting information easily online for your customers or even people in general (eg as a hobby), and working in concert with scraping for greater visibility via search, and giving that work away, or at a cost, to companies who at best don't care and possibly may be competition, see themselves as replacing you or otherwise adversarial.
The line is "I technically and able to do this" and "I am engaging with a system in good faith".
Public parks are just there and I can technically drive up and dump rubbish there and if they didn't want me to they should have installed a gate and sold tickets.
Many scrapers these days are sort of equivalent in that analogy to people starting entire fleets of waste disposal vehicles that all drive to parks to unload, putting strain on park operations and making the parks a less tenable service in general.
I think the counterargument is that a while ago ads became super annoying. They move, they grow in size, they feature nsfw things, they have weird js that annoys you when you try to leave. Perhaps some of this has toned down in recent years, but the damage is done. The ads are not good actors. It’s not as black and white as subverting or not subverting the will of the site owner.
You can also argue that the advertisers have abused their position with opaque and illegal uses of personal data, security hazards, and general scummishness that they are also guilty of doing where they can technically get away with rather than what they're "supposed" to do.
Not the that two wrongs make a right, and it's definitely a bit of an argument of convenience for people who find adverts annoying. But I think most people are less opposed to the idea of advertising as popularly imagined (i.e. paper newspaper-style where you just see an advert) to support their favourite blog than they are to the current web advertising model (just by viewing the advert to get an unspecified amount of information instantly stolen and sent off to a bunch of shady companies who process it and sell it on, and don't get any way to veto it before loading a website and having the damage done).
To stretch the park analogy it might be that the park sells a licence to a company to make some cash from advertising to its visitors, which it kind of expects to be things like adverts on the benches and so on. That company then starts photographing people from the bushes, recording conversations and putting Airtags in visitors' pockets to boost the profits it makes itself. Visitors then start wearing masks, stop talking and wear clothes with zipped pockets. You can say the visitors are wrong to violate the implicit park usage agreement that they submit to the surveillance to fund the park (and advertising company), or you can say that the company is wrong to expand the original license to advertise into an invasion of privacy without even telling the visitors what they were going to do before they entered, or, indeed, during or after.
Sand is the world's second most used natural resource and sand usable for concrete gets even illegally removed all over the world nowadays.
So to continue your analogy, I made my part of the beach accessible for visitors to enjoy, but certain people think they can carry it away for their own purpose ...
that grain of sand used to bring traffic, now it doesn't. it's pretty much an economic catastrophe for those who relied on it. and it's not free to provide the data to those who will replace you - they abuse your servers while doing it.
You are correct, and the hard reality is that content producers don't get to pick and choose who gets to index their public content because the bad bots don't play by the rules of robots.txt or user-agent strings. In my experience, bad bots do everything they can to identify as regular users: fake IPs, fake agent strings...so it's hard to sort them from regular traffic.