This is a great starting point. Can anyone recommend any resources for how to be...

mdaniel · on March 11, 2014

http://scrapy.org/ is more systemic than these ad-hoc solutions, and http://scrapyd.readthedocs.org/en/latest/ is the daemon into which one can deploy scrapers if you want a little more structure.

The plugin architecture alone makes Scapy a hands-down winner over whatever you might dream up on your own. And with any good plugin architecture, it ships with several optional toys, you can always add your own, and there is a pretty good community (e.g. https://github.com/darkrho/scrapy-redis)

http://crawlera.com/ will start to enter into your discussion unless you have a low-volume crawler, and http://scrapinghub.com/ are the folks behind Crawlera and (AFAIK) sponsor (or actually do) the development for Scrapy.