Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I’ve disabled new page generation for now because someone ran a script overnight to endlessly click links and cost me $70. I don’t really understand why anybody would do that.

Guess it wasn't so endless after all.

Author is assuming malice, but honestly bots clicking links is just what happens to every public site on the internet. Not to mention going down the link clicking rabbit hole is common among wikipedia readers.

All that said, i don't really see the point. Wikipedia's human controls is what makes it exciting.



It’s a poetic end, considering that the very same scraping activity without regard for cost to site operators is how these models are trained to begin with.


New page generation has been re-enabled, with a rate limit and "using openai/gpt-oss-120b instead of Kimi-K2".


> but honestly bots clicking links is just what happens to every public site on the internet.

As a CS student ~20 years ago I wrote a small website to manage my todo list and hosted it on my desktop in the department. One day I found my items disappearing before my eyes. At first I assumed someone was intentionally messing with my app but logs indicated it was just a scraping bot someone was running.

It was a low stakes lesson on why GET should not mutate meaningful state. I knew when I built it anyone could click the links and I wasn’t bothered with auth since it was one accessible from within the department network. But I didn’t plan for the bots.


Reminds me of the Spider of Doom which was a similar issue where "Get/Delete" links were hidden by simple javascript to see if the user was logged in. All of a sudden pages and content on the website began to mysteriously vanish.

You know what doesn't care about Javascript and tries to click every link on your page? A search engine's web crawler.

https://thedailywtf.com/articles/The_Spider_of_Doom


Google and all the other search engines will crawl any public site too.


more clicks means a bigger wiki which I guess should be the point, unless the generated articles lead to nonsensical strings which sucks, but should be reasonable to prevent


Would have been ironic if it was the crawler from OpenAI… :)


You should always have per-IP rate limiting.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: