Hacker Newsnew | past | comments | ask | show | jobs | submit | someuser18's commentslogin

At least Ex supports torrents and also has some custom p2p software which you can run (serves content) from which data can be siphoned off.

And what is served through their website is resized. So web-scraping is an inferior approach.


You seem to be assuming

1. I'm scraping the resized galleries.

2. I don't have the Hath perk that makes the galleries full sized.

3. I don't have a phash-based fuzzy image deduplication system on top of all this (see https://github.com/fake-name/IntraArchiveDeduplicator). It's main purpose is to deduplicate manga (https://github.com/fake-name/MangaCMS).


Jesus, your projects are massive. Does your job involve working on these or are these just side things?


It's all entirely hobby things.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: