Build Instant Failover with CloudFront Origin Groups
Automatic multi-region failover in seconds, not minutes. The same pattern DAZN uses to stream sports to 200+ countries.
Difficulty
Mildly spicy
Time to complete
45 minutes
Availability
Free
BUILD
What you'll build
Build automatic failover that switches regions in seconds when outages hit. Learn the edge-based failover pattern that eliminated DNS propagation delays for Twitch, HBO Max, and DAZN.
1. Configure CloudFront Origin Groups
Create a CloudFront distribution with primary and secondary origins, and set failover criteria for automatic routing.
2. Simulate a Regional Outage
Pause your primary App Runner service to trigger CloudFront failover, and watch traffic switch in seconds.
3. Verify Automatic Failback
Resume the primary service and confirm CloudFront automatically returns traffic when the origin recovers.
4. Build Production Alerting
Set up CloudWatch alarms and SNS notifications to alert you instantly when failovers occur.
5. Clean Up Resources
Delete the CloudFront distribution, App Runner services, and monitoring resources to avoid charges.
Your portfolio builds as you work.
Every project documents itself as you go. Finish the work, and your proof is ready to share.
PROJECT
Real world application
Skills you'll learn
-
Edge Failover
Configure CloudFront origin groups for automatic failover at edge locations worldwide
-
Multi-Region Architecture
Deploy services across AWS regions for high availability and fault tolerance
-
Disaster Recovery
Build resilient systems that maintain uptime during regional outages
-
Traffic Routing
Control request flow between origins using health-based routing policies
-
Health Monitoring
Implement passive health checking to detect and respond to origin failures
-
CloudWatch Alerting
Configure alarms and notifications for instant failover awareness
Tech stack
-
CloudFront
AWS content delivery network with origin groups enabling automatic multi-region failover at edge locations
-
App Runner
Fully managed container service providing regional deployment targets for failover testing
The moment I watched CloudFront automatically switch traffic when I paused my primary service was incredible. No DNS delays, no manual intervention. This is how production systems actually work.
Sarah Martinez
DevOps Engineer
OUTCOME
Where this leads.
Relevant Jobs
Roles where these skills matter:
- Site Reliability Engineer
- DevOps Engineer
- Cloud Architect
- Platform Engineer
Disaster Recovery
Continue the Disaster Recovery series with Pulumi infrastructure-as-code and more high-availability patterns.
Disaster Recovery
Continue the JourneyFAQs
Everything you need to know
This is Part 2 of the 3-part Disaster Recovery series. Part 1 (Multi-Region Deployment) teaches you to deploy your app across multiple AWS regions. This project (Part 2) builds on that foundation with edge-based failover using CloudFront origin groups for instant traffic switching. Part 3 completes your DR toolkit with infrastructure-as-code using Pulumi for automated disaster recovery deployment. Together, these three projects give you a complete disaster recovery strategy for production systems.
DNS-based failover using Route 53 health checks requires DNS records to change and propagate globally, which can take minutes. CloudFront origin failover happens at the edge location level. When an edge server gets an error from the primary origin, it immediately retries with the secondary on the same request. Users experience a slightly slower response for that single request, but there is no propagation delay. This is why Twitch and DAZN use this pattern for live streaming where minutes of downtime costs millions.
Yes, this project requires two App Runner services running in different AWS regions (us-east-1 and us-west-2). The Deploy Multi-Region project teaches you how to deploy the same application to multiple regions. Complete that project first, then return here to add automatic failover between those regions.
An origin group bundles two origins together, a primary and a secondary. CloudFront always tries the primary first. If it receives one of the error codes you specified (404, 5xx), it immediately retries the same request against the secondary origin. This happens at each edge location independently, so failover is nearly instantaneous. No DNS changes or manual intervention required.
When you pause an App Runner service, it returns 404 (not a 5xx error). If you only selected 5xx errors for failover criteria, CloudFront would keep trying the paused primary and never fail over. Including 404 ensures CloudFront switches to the secondary when the primary service is unavailable for any reason. This is a subtle but critical configuration detail.
CloudFront pricing is based on data transfer and requests. For this project with minimal test traffic, costs are typically under $1. CloudFront charges approximately $0.085 per GB for data transfer to North America and $0.0075 per 10,000 HTTPS requests. During testing, you will generate minimal traffic. Remember to delete the distribution after completing the project to avoid ongoing charges.
CloudFront does not send active health probes to origins. Instead, it evaluates origin health based on responses to actual user requests. This is called passive health checking. When your primary origin starts returning errors, CloudFront marks it unhealthy and routes to the secondary. When the primary starts returning successful responses again, CloudFront automatically prefers it. This approach reduces overhead while providing effective failover.
One Project. Real Skills.
45 minutes from now, you'll have completed Build Instant Failover with CloudFront Origin Groups. No prior experience needed. Just step-by-step guidance and a real project for your portfolio.
Mildly spicy level