Resources:
Categories:
Give us your email and we'll send you the good stuff.
Categories:
When it comes to DNS, there's nothing we love more - except DNS management. And maybe Secondary DNS. Or Failover. Even anomaly detection. Oh who are we kidding, if it's even remotely close to the topic of DNS, we got you covered!
Out with the old and in with the new! So what are you getting rid of as we head into the New Year? Now is the time to reevaluate what triumphs you need to build on, and what shortcomings you need to cut loose. These are common practices, but sometimes we need to turn our lens outwards and study the successes and blunders of the industry as a whole. We decided to take a look back at the biggest internet outages of 2015, examine what went wrong, and how a repeat of these events can be avoided.
Where: Worldwide
Duration: Over 11 hours
Affected: App store, iTunes, Mac store, and even Apple storefronts.
Why: Blamed on an internal DNS error.
This was one of the most memorable outages of the year, being that it was the most expensive. Experts estimate that the 11 hour long outage cost over a million dollars per hour, with the grand total costing Apple over 13 million. After investigation, Apple released a statement blaming the outage on an internal DNS error.
Even though Apple is a massive corporation with an internal DNS infrastructure, we can still learn a great deal from this event. You can never have too many layers of redundancy, or failover protocols to prevent your site from going down completely. The Apple outage of 2015 was a worst case scenario that effects too many companies that overlook the fundamental connection that keeps their company online.
Where: US West
Duration: 5 hours
Affected: Millions of customers from Seattle to San Francisco
Why: Hardware failure
At first, Comcast claimed it was a hardware malfunction that brought down nearly the whole west coast. However, it was their attempts to redirect traffic that ultimately resulted in an outage. Comcast attempted to redirect traffic away from the failed server, but their backup locations were unable to handle the traffic load and slowed to a halt.
This is a scenario we often see among large national (or international) companies who suffer a local attack or outage, and try to fix it by redirecting traffic to other nearby servers. However, if these backup facilities aren’t large enough to handle the doubled (or even tripled) traffic then it too will fail. Constellix DNS has an airtight global infrastructure that spans across 16 geographically unique PoP’s. Our network is so strong it could handle an entire data center outage and continue to run effortlessly.
Where: Worldwide
Duration: 22 minutes
Affected: Cloudflare and Github (among many other clients)
Why: Name server addresses were unresolvable
DNS provider, Dyn, suffered an outage this past Summer due to unresolvable name server addresses. Sounds like a mouthful, but it was simply an internal error where name server address records were removed from the dynect.net zone. This little mistake had a ripple effect that led to companies like Github and Cloudflare suffering widespread site outages.
Usually, we’d advocate using a Secondary DNS provider that would take over your domain’s query load if your primary provider were to suffer an outage. However, since name server addresses were deleted, this wouldn’t have been a viable solution. The lesson we can learn from this is to simply do your due diligence when selecting a provider, and always have your own backup plan in case your provider does go down.
Where: US East
Duration: 5 hours
Affected: Reddit, Netflix, Tinder, Heroku, and Github
Why: Glitches in Ashburn, VA data center
Amazon actually suffered two major outages this year, once in August and then again in September. Both seemed to be the result of server glitches at their Ashburn, VA point of presence. The August outage made headlines when clients like Netflix and Tinder were unable to provide services to their US East customers.
A localized outage like Amazon’s could potentially be mitigated through the use of traffic redirection, to reroute traffic away from problem areas. While this may result in latency, it’s better than being down all together.
Where: Worldwide
Duration: 90 minutes
Affected: Netflix, Pornhub, Bloomberg, and many other clients
Why: Technical malfunction
Netflix suffered yet another outage, due to their DNS provider UltraDNS being brought down by a technical malfunction. The UltraDNS outage also took down popular domains like Bloomberg and Pornhub. August’s outage was very reminiscent of a similar event a year prior when Ultra DNS was downed by a 100 gbps DDoS attack. Engineers at UltraDNS originally thought August’s outage was the result of DDoS, but after further investigation they concluded it was a technical malfunction.
When your primary DNS provider goes down, the best way to keep your site online with minimal effects to end users is to use a Secondary DNS provider. Constellix Secondary DNS services create a clone of the original zone information that is found at the primary DNS provider and duplicates this on the Constellix IP Anycast + network.
Plusnet
Where: United Kingdom
Duration: 7 hours
Affected: Millions of broadband customers
Why: Internal DNS error
Historically, some of the largest and most widespread outages tend to stem from internet and broadband providers. This year telecommunications giant, Plusnet, went dark due to an internal DNS error, knocking a huge portion of the United Kingdom offline.
Customers took to Twitter to rant about the inconvenience, while others were more resourceful by sharing a way to bypass the outage through the use of public DNS. Many went into step by step instructions on how to use public DNS servers like 8.8.8.8 (Google) to get reconnected to the web.
Where: Worldwide
Duration: Intermittent throughout Black Friday
Affected: Millions of customers attempting to access the site
Why: Failure to redirect traffic
Every year there is always a slew of brand name retailers being brought down by unprecedented amounts of traffic during Thanksgiving weekend. This year big names like Target, Neiman Marcus, Victoria’s Secret, and PSN all took a hit when they were taken offline by the influx of traffic. However, Target was able to quickly recover by using virtual “shopping lines” or queues to allow traffic to come onto the site and checkout in increments. Now this wasn’t a quick fix, rather this was the result of months of preparation and investing in development to pull it off.
Target’s extensive preparations and successful backup plan (although inconvenient to some customers) serves as a great example of how to handle inordinate amounts of traffic. While your company or organization may never see numbers like Target did, the most important best practice of running an online storefront is to be prepared for anything. Now this can mean a great deal of practices in itself, from using monitoring tools, to implementing failover protocols and having a backup plan in case everything else fails.
Sign up for news and offers from Constellix and DNS Made Easy