Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"That said, if your site grows in some way you didn't originally anticipate and you get to a point where you need to shard, but can only do so by changing data stores, then it's sad."

I think you're being too absolute. For instance, Instagram used sharding in postgres, and they didn't have to throw anything away or dedicate any huge engineering team to solve it.



They had to put engineering effort into it. With Mongo, you don't.


With Mongo, you don't.

Bullshit.

The sharding impl in MongoDB still[1] crumbles pitifully[2] under load. Regardless of sharding MongoDB still halts the world[3] under write-load.

Their map/reduce impl is a joke[4][5].

If you had done the slightest research you'd know that every single aspect that you need to scale out Mongo is either broken by design or so immature that you can't rely on it.

MongoDB may be fine as long as your working set fits into RAM on a single server. If you plan to go beyond that then you'd better start with a more stable foundation - or brace yourself for some serious pain.

[1] https://groups.google.com/group/mongodb-user/browse_thread/t...

[2] http://highscalability.com/blog/2010/10/15/troubles-with-sha...

[3] http://2.bp.blogspot.com/_VHQJkYQ5-dY/TUO3RAn8SNI/AAAAAAAABq...

[4] http://stackoverflow.com/a/3951871

[5] http://steveeichert.com/2010/03/31/data-analysis-using-mongo...


[1] looks like an example where the data didn't fit in RAM. Mongo works best when data fits in RAM or if you use SSD's. Yes, it's sub-optimal.

[2] is from a year and a half ago. It doesn't belong in a sentence that includes the word "still." I work at foursquare, btw. Those outages happened on my first and second days at the company. I wasn't so keen on mongo then either. We've gotten much better at administering it. Basically all our data is in mongo, and it has its flaws, but I'm still glad we use it.

[3] is also from a year and a half ago. Mongo 2.2 will have a per-database write lock, which is at least progress, even though it's obviously not enough. Since 2.0 (or 1.8?) it's also gotten better at yielding during long writes.

I have no experience with their mapreduce impl and can't speak to it.


[2] When you look at [1] you'll notice that these exact problems are still prevalent.

I'd in fact be curious how exactly did you work around the sharding issues at 4square?

Remember I replied to someone who claimed it takes "no engineering effort" to scale MongoDB. That's not only obviously false, but last time I tried the sharding was so brittle that recommending it as a scaling path would border on malice.

I ran a few rather simple tests for common scenarios; high write-load, flapping mongod, kill -9/rejoin, temporary network partition, deliberate memory starvation. MongoDB failed terribly in every single one of them. The behavior would range from the cluster becoming unresponsive (temporary or terminally), over data-corruption (collection disappears or inaccessible with error), silent data-corruption (inconsistent query-results), to severe cluster imbalance, to crashes (segfault, "shard version not ok" and a whole range of other messages).

I didn't try very hard, it was terribly easy to trigger pathological behavior.

My take-home was that I most certainly don't want to be around when a large MongoDB deployment fails in production.

As such I'm a little disconcerted every time the Mongo scalability myth is reinstated on HN, usually by people who haven't even tried it beyond a single instance on their laptop.


"I ran a few rather simple tests for common scenarios; high write-load, flapping mongod, kill -9/rejoin, temporary network partition, deliberate memory starvation. MongoDB failed terribly in every single one of them. "

What other databases did you go to the same lengths to make fail which handled them gracefully?


I don't think what I said was bullshit. So you wrote tests to make mongo fail, and you've seen cases where people run into problems with it. That still doesn't disprove my point. With postgres, you roll your own sharding. With mongo, you don't have to.


Sure, makes sense. If you're happy with your deployment randomly failing.


/troll.


It seems you fall squarely into the bucket of 'people who haven't even tried' (and some other unfavorable buckets, but I'll leave that to your older self to judge).


Wow, I just saw your awesome response! FWIW, I too work at foursquare and sit next to Neil (nsanch). Feel free to verify by asking him!

I reiterate: /troll


Well that's just factually incorrect. MongoDB now has a per-database write lock and will have a per-collection write lock in the next version. So your halt under write-load statement is incorrect.

The map reduce implementation is quite new sure. But it is getting better and you can always link it up with Hadoop.

At the very least provide links that aren't nearly 2 years old.


MongoDB now has a per-database write lock and will have a per-collection write lock in the next version.

That doesn't help when you need to make a bulk-update on a busy collection. Busy collections have a tendency towards being in need for bulk updates occasionally.

At the very least provide links that aren't nearly 2 years old.

The first link is 1 month old.

The other links also still describe the state of the art. Feel free to correct them factually if that is not true.


I attended a talk by Instagram post-buyout. That's where I got the impression that sharding was not a huge obstacle for them (though it was significant). Keep in mind their entire data management team was 2 people I think.

My point was that sharding is not an absolute "have it or not". Some features require major engineering efforts to get anywhere at all, but sharding is not one of them.

But if you think the overall effort is less with MongoDB then go for it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: