> I’ve disabled new page generation for now because someone ran a script overnig...

haileys · 2025-09-27T02:56:45 1758941805

It’s a poetic end, considering that the very same scraping activity without regard for cost to site operators is how these models are trained to begin with.

kristianp · 2025-09-27T08:04:55 1758960295

New page generation has been re-enabled, with a rate limit and "using openai/gpt-oss-120b instead of Kimi-K2".

dpark · 2025-09-27T03:43:09 1758944589

> but honestly bots clicking links is just what happens to every public site on the internet.

As a CS student ~20 years ago I wrote a small website to manage my todo list and hosted it on my desktop in the department. One day I found my items disappearing before my eyes. At first I assumed someone was intentionally messing with my app but logs indicated it was just a scraping bot someone was running.

It was a low stakes lesson on why GET should not mutate meaningful state. I knew when I built it anyone could click the links and I wasn’t bothered with auth since it was one accessible from within the department network. But I didn’t plan for the bots.

vunderba · 2025-09-27T04:09:09 1758946149

Reminds me of the Spider of Doom which was a similar issue where "Get/Delete" links were hidden by simple javascript to see if the user was logged in. All of a sudden pages and content on the website began to mysteriously vanish.

You know what doesn't care about Javascript and tries to click every link on your page? A search engine's web crawler.

https://thedailywtf.com/articles/The_Spider_of_Doom

userbinator · 2025-09-27T03:14:13 1758942853

Google and all the other search engines will crawl any public site too.

blourvim · 2025-09-27T02:47:10 1758941230

more clicks means a bigger wiki which I guess should be the point, unless the generated articles lead to nonsensical strings which sucks, but should be reasonable to prevent

leobg · 2025-09-27T18:26:43 1758997603

Would have been ironic if it was the crawler from OpenAI… :)

UltraSane · 2025-09-27T06:13:05 1758953585

You should always have per-IP rate limiting.