StackState Blog

"Something is technically wrong" #TwitterDown

Posted by Joey Compeer on Jan 21, 2016 1:53:43 PM
Find me on:

 

‘Something is technically wrong’. That’s what Twitter said on Tuesday morning January 19 2016. Millions of Twitter users all over the world were blocked from the social network. How could this outage happen?

 

According to DownDetector, a site that tracks internet sites and mobile apps in real time, users were experiencing the most trouble with Twitter’s website, smartphone app and tablet apps. Also third-partyservices, such as TweetDeck, were intermittently unavailable. It turns out that Twitter experienced an issue ‘related to an internal code change’ that caused the outage for a long time. On Tuesday afternoon, Twitter said they reverted the change, which fixed the issue.

 

TwitterOutage.png

 

This application downtime had a huge impact on Twitter’s business. The average hourly cost of a critical application failure is $500,000 - $1 million. Tuesdays outage lasted for more than six hours and the stock price reached a new low, losing 7% and almost $700 million market value.

 

More importantly: how could this outage happen and why did it take so long to fix the issue? Probably someone wrote or edited the code, deployed it and as a result everything went down. It seems like the problem-finding process at Twitter is a hell of a job. They didn’t know who changed the code, what was changed and how this affected critical business services. They had to start a time-consuming investigation between DevOps teams to find and resolve the problem. The better way to deal with outages is to fully automate the problem-finding process across teams. Every DevOps team should be aware of what’s happening in the full IT stack. Providing business services is and always will be a multiple team effort. To prevent future outages Twitter has to step up their game and take a proactive visual approach for smooth IT operations. They can't wait for the next big incident to happen.

 

Let’s hope Twitter will learn from these outages. Eventually you and I, the customers, are suffering the most. We can’t tweet and have to login on Facebook to complain about our problems ;-)

 

Topics: Dev/Ops, ITSM

Our Mission

Creating an Error Free IT Environment

To accomplish this, we created the world’s first Algorithmic IT Operations platform that analyzes large volumes of IT topology, telemetry and time from disparate sources and applies various forms of algorithmics to the data in real-time. StackState is able to capture your entire IT stack. In one data model.


Join over 5,000 people from companies like eBay, American Express, Cisco, Tesco, ING and more who get our best new posts delivered via email. Subscribe below if you'd like to get it too:

Subscribe to Email Updates

Most Popular Posts