Reddit's May 2010 "State of the Servers" report

jbellis · on May 11, 2010

Incidently, on EC2 the extra cpu in an XL or HMXL makes all the difference between "oh shit, we ran out of capacity and bootstrapping more slows things to a crawl" and "we ran out of capacity but we can limp along during the bootstrap." YYMV.

Reddit was originally on L instances.

(HMXL is generally the sweet spot for Cassandra price/performance on EC2, IMO.)

jfager · on May 12, 2010

Is it normal for people to run Cassandra on only 3 nodes in production environments? I realize that one of the selling points is starting small and then scaling out without any headaches, but 3 nodes seems extreme in that regard.

jbellis · on May 12, 2010

Well, roughly speaking, you can group Cassandra deployments in two categories+: new products that are hoping they need Cassandra's scaling ability someday, and existing products moving to Cassandra from something else because the pain of scaling something that wasn't designed to makes them.

The first category will start small. The smallest production deployment I know is a single 256MB VM, but usually even starting small you should not have less than 2 servers (why tempt fate with a non-redundant setup when Cassandra makes it so easy to be safe?).

The second group is where you see larger deployments out of the gate. 3 machines does seem small for reddit; I guess they made up the difference with memcached. Unfortunately they deployed just before the 0.6.0 final release was out, which is where we added the row cache feature that could have made memcached unnecessary.

+There are people using cassandra for non-scale-related reasons, though. Most of these people are motivated by multi-datacenter replication.

jedberg · on May 13, 2010

That's not quite correct. Our original Cassandra nodes were on XLs. Only the new ones were on Ls, and that was because we couldn't get XLs during the outage. Later in the outage Amazon helped us get the XLs we needed.

jbellis · on May 13, 2010

Thanks for the correction.

puredemo · on May 11, 2010

I can't believe they were originally on L instances. I assumed they chose HMXL considering their pageviews..

jedberg · on May 13, 2010

That's not quite correct. Our original Cassandra nodes were on XLs. Only the new ones were on Ls, and only temporarily.

apike · on May 11, 2010

I'm not surprised that Cassandra doesn't perform ideally with only three nodes, considering the scales it's intended for. Does anybody know how many nodes are required for its resiliency safeguards to work properly?

stephenjudkins · on May 11, 2010

If "to work properly" you mean that a cluster can suffer the loss of one node with no data loss, you only need two nodes.

The consistency guarantees Cassandra gives you are based upon configuration values (ReplicationFactor) and a ConsistencyLevel provided at runtime when performing an operation. A perfectly available, partition-tolerant and perfectly consistent system is impossible so adjusting these settings lets you specify the tradeoffs you desire.

If for all operations you specify a consistency level of ALL, you are guaranteed consistency. However, you lose availability if one server (for a given part of a keyspace) goes down.

Conversely, you can also use Cassandra in a way that offers few consistency guarantees by specifying a ConsistencyLevel of ANY, or ONE. Further, you can perform reads with ConsistencyLevel ONE that don't give you a guarantee that the information you read is consistent. As long as no servers go down, you have consistency, and as long as you have one server left, you have complete availability.

The QUORUM consistency level guarantees at least (ReplicationFactor/2 + 1) nodes agree on a write and gets the latest timestamp from a majority of servers on read. This offers both consistency and availability. For three servers with a ReplicationFactor of 2 this is the same as ALL; if your ReplicationFactor is 1 you lose effective consistency guarantees.

So, to be able to use the full range of options in the consistency/availability tradeoff space one needs a cluster of at least four servers. Someone let me know if my reasoning is incorrect.

See http://wiki.apache.org/cassandra/API for more info.

jbellis · on May 11, 2010

It wasn't that the resiliency safeguards didn't work, so much as reddit was simply underprovisioned.

I have no idea what TFA meant by "at only three nodes we weren't able to take advantage of most of Cassandra's safeguards for ending up in this situation," except in the sense that adding an extra 20k ops/s on a 30 node cluster will be 10% of capacity instead of 100%. (Just picking reasonable ballpark numbers, I don't know what reddit's are.)

megablast · on May 11, 2010

It seems odd that it is configured to look up a key-pair, when that key-pair is no longer needed. Surely it would better to have it no longer cache queries that are no longer needed.

jbellis · on May 12, 2010

It's like how in postgresql, if a client runs "select * from lots", and you kill the client, the query keeps going even though there's nobody to hand the answer to.

That said, there are ways we can mitigate this, primarily in https://issues.apache.org/jira/browse/CASSANDRA-685

hello_moto · on May 12, 2010

I'm in the middle of reading materials about system analysis and design, in particular about Enterprise Application Integration.

I took a break and checked out Reddit and HN. Stumbled upon this article and the GrooveShark AMA. I realized that these days some of the big websites are moving toward similar situation with that of a typical enterprise apps situation where there are different components/sub-systems written in different technologies. Is my assumption wrong?

Seems like between Enterprise App (whatever Enterprise means) and these big Web 2.0 apps, the difference is only in the matter of the users; the former is geared toward businesses where the latter is leaning toward customers/end-users. The technology obstacles are rather similar.

trin_ · on May 12, 2010

"We've written to Trend Micro explaining that we're actually neither a spammer nor an individual end user, but rather an honest website that's kind of a big deal, and they sent us a form letter explaining how to configure Outlook Express and encouraging us to ask our ISP for further information."

ahh the joy of robot-email-responders.

AdamN · on May 12, 2010

Even for a site the size of Reddit, an outsourced emailer is the way to go. I use AuthSMTP.net but there are probably better ones for higher volume.

antirez · on May 12, 2010

Rule #1: when you can't scale is always your fault.