Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

ArchiveTeam is trying to brute force the entire URL space before its too late. You can run a Virtualbox VM/docker image (ArchiveTeam Warrior) to help (unique IPs are needed). I've been running it for a couple months and found a million.

https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior



Looks like they have saved 8000+ volumes of data to the Internet Archive so far [0]. The project page for this effort is here [1].

0: https://archive.org/details/archiveteam_googl

1: https://wiki.archiveteam.org/index.php/Goo.gl


Docker container FTW. Thanks for the heads-up - this is a project I will happily throw a Hetzner server at.


im about to go setup my spare n100 just for this project. If all it uses is a lil bandwidth then that's perfect for my 10gbps fiber and n100.


Doing the same, even though I'm worried Google will throw even more captchas at me now, than before.


Same here. I am geniunely asking myself for what though. I mean, they'll receive a list of the linked domains, but what will they do with that?


It's not only goo.gl links they are actively archiving. Take a look at their current tasks.

https://tracker.archiveteam.org/


They are downloading and archiving the pages that the links point to


save it, forever*.

* as long as humanly possible, as is archive.org's mission.


After a while I started to get "Google asks for a login" errors. Should I just keep going? There's no indication on what I should do on the ArchiveTeam wiki


Thanks for sharing this. I've often felt that the ease by which we can erase digital content makes our time period susceptible to a digital dark ages to archaeologists studying history a few thousand years from now.

Us preserving digital archives is a good step. I guess making hard copies would be the next step.


Just started, super easy to set up


Why wouldn’t Google just publish a database of URLs? Even just a CSV file? Infuriating.


I suspect there are links to some really bad shit in there. Google is probably in damage control mode.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: