Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are various non-FB fallback measures, including IRC as a last-ditch method. The IRC fallback is usually tested once a year for each engineer.


I just heard from a contact that the fallback/backup IRC is also down.


Bet it was located at irc.facebook.com ;)

Joking aside, I can see how an IRC network has potential to be used in these situations. Maybe FAMANG should work together to set something like this up. The problem is, a single IRC server is not fail safe, but a network of multiple servers would just see a netsplit, in which case users would switch servers.

Also, I remember back in the IRCnet days using simply telnet to connect to IRCnet just for fun and sending messages, so its a very easy protocol that can be understood in a global desaster scenario (just the PING replys where annoying in telnet).


I heard the same thing from my old coworker who is at FB currently. All of their internal DNS/logins are broken atm so nobody can reach the IRC server. I bet this will spur some internal changes at FB in terms of how to separate their DR systems in the case of an actual disaster.


Good planning! Now, where does the IRC server live, and is it currently routable from the internet?

While normally I know the advice is "Don't plan for mistakes not to happen, it's impossible, murphy's law, plan for efficient recovery for mistakes"... when it comes to "literally our entire infrastructure is no longer routable from the internet", I'm not sure there's a great alternative to "don't let that happen. ever." And yet, here facebook is.


Also, are the users able to reach the server without DNS (i.e. are the IP addresse(s) involved static and communicated beforehand) and is the server itself able to function without DNS?

Routing is one thing which you can't do without (then you need to fallback to phone communications), but DNS is something that's quite probable to not work well in a major disaster.


A lot of the core 'ops like' teams at FB use IRC on a daily basis.

When I worked there, I wasn't aware of any 'test once per year' concept or directive.

Of course, FB is a really big place, so things are different in different areas.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: