Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wondering how Facebook communicates now internally - most of their work streams likely depend on Facebooks systems which are all down.

Can engineers and security teams even access prod systems anymore? Like, would "Bastion" hosts be reachable?

Wonder if they use Signal and Slack now?



There are various non-FB fallback measures, including IRC as a last-ditch method. The IRC fallback is usually tested once a year for each engineer.


I just heard from a contact that the fallback/backup IRC is also down.


Bet it was located at irc.facebook.com ;)

Joking aside, I can see how an IRC network has potential to be used in these situations. Maybe FAMANG should work together to set something like this up. The problem is, a single IRC server is not fail safe, but a network of multiple servers would just see a netsplit, in which case users would switch servers.

Also, I remember back in the IRCnet days using simply telnet to connect to IRCnet just for fun and sending messages, so its a very easy protocol that can be understood in a global desaster scenario (just the PING replys where annoying in telnet).


I heard the same thing from my old coworker who is at FB currently. All of their internal DNS/logins are broken atm so nobody can reach the IRC server. I bet this will spur some internal changes at FB in terms of how to separate their DR systems in the case of an actual disaster.


Good planning! Now, where does the IRC server live, and is it currently routable from the internet?

While normally I know the advice is "Don't plan for mistakes not to happen, it's impossible, murphy's law, plan for efficient recovery for mistakes"... when it comes to "literally our entire infrastructure is no longer routable from the internet", I'm not sure there's a great alternative to "don't let that happen. ever." And yet, here facebook is.


Also, are the users able to reach the server without DNS (i.e. are the IP addresse(s) involved static and communicated beforehand) and is the server itself able to function without DNS?

Routing is one thing which you can't do without (then you need to fallback to phone communications), but DNS is something that's quite probable to not work well in a major disaster.


A lot of the core 'ops like' teams at FB use IRC on a daily basis.

When I worked there, I wasn't aware of any 'test once per year' concept or directive.

Of course, FB is a really big place, so things are different in different areas.


FB uses a separate IRC instance for these kinds of issues, at least when I used to work there


I would think that their internal network would correctly resolve facebook.com even though they've borked DNS for the external world, or if not they could immediately fix that. So at least they'd be able to talk to each other.


To the communication angle, I've worked at two different BigCo's in my career, and both times there was a fallback system of last resort to use when our primary systems were unavailable.


I haven't worked for a FAANG but it would be unthinkable that FB does not have backup measures in place for communications entirely outside of Facebook.

Hmm well I mean for key people, ops and so on. Not for every employee.

Only a few people need that type of access, and they should have it ready. They need to bring more people there should be an easy way to do it.

Maybe the internal FB Messenger app has a slide button to switch to the backup network for those in need.


> Maybe the internal FB Messenger app has a slide button to switch to the backup network for those in need.

Having worked for 2 FAANG companies, I can tell you most core services like which FB Messenger would be using internal database services and relying on those which would be ineffective in a case like this as it would not work and the engineering cost to design them to support an external database would be a lot more than just paying for like 5 different external backup products for your SRE team.


Facebook does use IRC and Zoom as a fallback.


Actually, in this situation: Discord.


If they planned ahead, they should have had their oncalls practice on the backup systems (like Signal/Slack/Zoom) before now.


My team set up a discord lol


Don't they have a separate instance for internal communications?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: