Facebook Inc (NASDAQ:FB) aims to be online all the time, and mostly succeeds, despite constantly rolling out changes (many invisible to users) and coding according to the maxim “Move fast and break things.” In response to a question on Quora, later published on Forbes, former Facebook Inc (NASDAQ:FB) software engineer Justin Mitchell explained the process that could funnel their manic programming style into a website that can’t afford downtime.
Facebook attempts to smooth out programming warts
Probably the most important technique is dividing the release into four phases so that new ideas don’t just show up in the wild, warts and all. These phases were called latest, p1, p2, and p3. The first phase, latest, is exactly what it sounds like. The latest code that developers are working on shows up here where it is completely separate from the web and free to wreak havoc on the system. This is basically the testing grounds where engineers can try out whatever they like.
Once a new feature is more or less in working order it is moved to p1, where code could run for longer periods of time and engineers could watch the logs for obvious warnings or flaws to show up. At this point, the code was still very much considered to be in development.
Once someone’s code moved into p2 it was running on a large section of the actual web servers, as much as 5 percent. “This offered several opportunities, including catching long tail fatals and monitoring CPU/memory/memcache fetches/DB queries/external service use along with key user metrics on the servers for any anomalies,” explains Mitchell. Bottom line, real people are using the code at this point, but not so many of them that Facebook Inc (NASDAQ:FB) as a network is in danger of crashing.
Finally, once everyone is confident that code is working well, it goes live. P3 is shorthand for the entire web tier, and at that point Facebook Inc (NASDAQ:FB) has completed that particular launch. The advantage of going through all these phases is that multiple new features and products can be rolled out in parallel without having to coordinate their schedules, and without bringing the service down for maintenance.
“Facebook Inc (NASDAQ:FB) evolved from the beginning with the idea of zero down time,” says Mitchell. “In my 4.5 years there, I can only remember a handful of experiences (one caused by me) where there was a widespread site disruption.”