Does it still hold: "Never keep anything important on AWS in US-EAST-1"?

jpluimers

Jan 31

Reminder to self to check if this still holds: [Archive] Varun Krishnan on Twitter: "Never keep anything important on AWS in US-EAST-1" / Twitter

Slightly more than a year ago, the Amawon Web Services region US-EAST-1 collapsed with world-wide downtime consequences for many AWS services. It took some 8 hours to recover most of the services.

Before that, it was plagued with outages, maybe because it was their first ever region:

The outage was covered many times. I have included this El Reg link, as I like their tone of voice: [Wayback/Archive] AWS technical woes in US East region cause widespread outage • The Register.

Basically, any cloud stack is founded on these three layers:

Storage (S3 or Simple Storage Service in AWS speak)
Compute (EC2 or Elastic Compute Cloud in AWS speak)
Authentication and Authorisation (IAM or Identity and Access Management in AWS speak)

On top of that, any other services are implemented. And for Amazon Web Services, many of these have become available over the last two decades.

Indeed Anders Borum was right in his tweet: US-EAST-1 is the first ever AWS EC2 region and started in 2006, more than 15 years ago. It is also the region with the largest capacity. Likely both play a role in US-EAST-1 being part or initiating factor in many of the major AWS outages. If you look in all AWS outages, US-EAST-1 plays a role in most if not all outages since 2017,

So for now, if hosting at AWS, I would host outside of US-EAST-1.

Depending on the kind of application and money involved, I would consider hosting in multiple regions, and if a truckload of money was involved: hosting on multiple clouds.

I fully agree with [Archive] Gergely Orosz on Twitter: "If you were impacted by the recent AWS outage, the decision to invest in multi-cloud / multi-datacenter is simple: How much did this outage cost you vs the cost of adding a (lot) more complexity & maintenance with multi-cloud/DC? If outage cost >> this, only then do it." / Twitter

Some more insight on multi-cloud hosting is via [Archive] Redmond on Twitter: "New feature from @jdanton: A full post-mortem from AWS is still to come, but in the meantime, IT pros should start bolstering their cloud disaster recovery strategies now -- before the next outage. https://t.co/ios5Re5ZCs" / Twitter at [Wayback/Archive] AWS Outage Fallout: What Lessons You Should Learn -- Redmondmag.com

Is It Time to Go Multicloud?

No. Well...if you are running a major property with a big customer-facing presence, it can be a good strategy to have static Web and app content hosted in a second cloud. In the case of an outage like yesterday's, you'd have the option to direct traffic to the static presence, which can supply some level of experience for your users.

A good example of how this approach can be useful is an outage dashboard. Whenever a cloud provider has an outage, they are notoriously bad at properly reporting ongoing status. This is because they have hosted their dashboards in their own clouds using their own APIs -- and when these APIs go down, they take the monitoring with them. Using DNS, you can quickly redirect traffic to this static site, where your engineers can update the page with status updates.