Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hey, article author here.

I've done extensive scraping in both Python and Ruby, as I wrote most of the scraping / crawling code at http://serpiq.com, so I can chip in.

Overall, I prefer Python. That is pretty much solely down to the requests library though, it makes everything so simple and quick. I haven't covered it in the article yet, but you can extend the Response() class easily, so you can for example add methods like images(), links(nofollow=True), etc. Overall, I just think the requests library is much more polished than anything available in Ruby.

grequests (Python) means I can make things concurrent in a matter of minutes. However in Ruby the only capable library supporting concurrent HTTP requests that I liked was Typhoeus. It just wasn't to the same standard though, and I ran across certain issues when using proxies etc.

As far as the HTML parsing goes, I don't really have any preference. Nokogiri and lxml are both equally capable.

I think they're both perfectly capable languages though, stick with what you prefer. I've been experimenting with Go lately.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: