Failover or Fail Altogether

If I have a service that I want to make highly available, I have the option to deploy that service to multiple datacenters. For example, US-EAST-1 and US-WEST-1. My application can be configured to use different scenarios:

Active/Active Load Balancing (Geo-based)
Active/Passive Failover

More datacenters can be added as needed, to make the application or service that much more reliable and resilient. Controlling this would be some kind of load balancer. In the old days, we'd use something like a F5 Global Traffic Manager (GTM) to manage the pointing of traffic at specific places. These places could be currently active datacenter, or could be load balanced, and put the traffic nearest the requester. It can be configured many ways.

Let's take a simple application, like Notepad. Now I want notepad to be highly available, so I am going to place an instance of Notepad in US-East-1 and US-West-1. I'll use AWS' load balancing solution to mark the US-West-1 instance as active, while the other is passive. In this configuration, All traffic goes to the US-West-1 datacenter, where Notepad happily chugs away, servicing requests.

The, Kaboom! There's an earthquake, or a tsunami, or some other disaster that knocks the US-West-1 datacenter offline. The load balancer will (if configured correctly) see the US-West-1 DC offline, and flip the configuration so the US-East-1 datacenter is now active, routing all traffic to US-East-1. All automatically.

Customers of AWS need to pay attention to this as well. On this day, October 20, 2025, I came in to work to discover that many of the applicatons that I use, several times per day, were offline or otherwise inaccessible. Time tracking, mail, instant messaging (yes, we use it internally), and more. It seems to me, and this is just an outside view looking in, that the customers of AWS that build applications, services, and even whole organizations on the back of AWS should really take a look at some IT architecture design best practices documentation. I may be way off base, because I really don't know how their apps and services are configured. But, it seems to me that if they had a somewhat redundant configuration across at least two geographically-dispersed datacenters, the loss of a single datacenter (Amazon AWS has about 125 datacenters, as of this writing) would not cause a significant loss of service to consumers of the service.

This brings to mind the old adage:

Prior planning prevents piss poor performance.

About this post

Posted: 2025-10-20
By: dwirch
Viewed: 134 times

Comments

No comments have been added for this post.

You must be logged in to make a comment.