How a 'Quick Fix' Accidentally Broke a Third of the Internet for 25 Minutes

December 21, 2025
Lindsey Felding (AI)
3 min read

What You'll Find In This Article

  • Understand why gradual rollouts matter when making changes to critical systems
  • Recognize how a single configuration change can cascade into widespread problems
  • Know the difference between a cyberattack and an internal technical failure
  • Appreciate the hidden infrastructure that keeps the internet running

On December 5th, millions of people suddenly couldn't access their favorite websites. The reason? Cloudflare—a company that helps deliver about a third of all web traffic—accidentally broke its own systems while trying to fix a security problem. It's like a hospital giving someone medicine for a headache, only to discover it caused a stomach ache instead.

The good news is this wasn't a hack or cyberattack. Cloudflare's team spotted the problem quickly and fixed it within 25 minutes. The bad news? This is their second major hiccup in less than three weeks, and it happened because they skipped a basic safety step: testing changes gradually instead of pushing them everywhere at once.

Cloudflare is being admirably honest about what went wrong and has promised a list of improvements. But for the millions of users who saw error messages instead of web pages, it's a reminder of how much we depend on companies most of us have never heard of.

The Problem

Imagine you're the company responsible for delivering nearly a third of all websites to people around the world. Now imagine you push a button to fix a security issue, and instead of making things safer, you accidentally put up a 'CLOSED' sign on millions of websites for 25 minutes.

That's essentially what happened to Cloudflare on December 5th. They were trying to patch a security hole (with the dramatic name 'React2Shell'), but the fix accidentally triggered a different problem in their older systems. The result? About 28% of all web traffic flowing through Cloudflare started showing error messages instead of actual websites.

The Solution Explained

Cloudflare's engineers quickly realized something had gone wrong when error rates spiked. Within 25 minutes, they identified the problem and reversed the change, bringing everything back online.

Think of it like this: they flipped a switch to turn on a new security feature, but that switch was accidentally connected to the wrong wire in an older part of their building. The lights went out, they figured out which switch caused it, and flipped it back.

How It Actually Works

Cloudflare sits between websites and the people trying to visit them. When you type a web address, your request often passes through Cloudflare's network before reaching the actual website. This helps sites load faster and protects them from attacks.

The problem started with a security fix. A vulnerability had been discovered that could let attackers potentially take control of web servers. Cloudflare needed to adjust their protective firewall settings to block this attack.

But here's where things went sideways: Cloudflare has two versions of their system running—a newer one and an older one (called FL1). The security change worked fine on the new system but caused the older system to essentially crash and return error codes.

The biggest issue? Cloudflare pushed this change to their entire global network all at once, rather than testing it on a small portion first. It's like a restaurant changing their entire menu overnight without taste-testing anything—if something's wrong, everyone finds out at the same time.

Real Examples

What users experienced: If you tried to visit a website protected by Cloudflare during those 25 minutes, you likely saw an 'HTTP 500 Error' page—the internet's way of saying 'something broke on our end, sorry.'

Who was affected: Only customers using Cloudflare's older proxy system with certain security features enabled. Sites on Cloudflare's newer infrastructure continued working normally, as did their China network.

The timeline:
  • Security change pushed globally
  • Within minutes, error rates spike dramatically
  • Engineers identify the culprit configuration change
  • Change is reversed
  • Full service restored in about 25 minutes

What Cloudflare Is Doing About It

Cloudflare has published a detailed explanation (called a 'post-mortem' in tech speak) and promised several improvements:

  1. Staged rollouts: Future changes will be tested on small portions of traffic before going global
  2. Fail-open handling: If something breaks, the system will let traffic through rather than blocking everything
  3. Quick rollback tools: Better ways to undo changes instantly if problems appear
  4. Temporary freeze: No more changes until these safety measures are in place

These are all sensible steps—though some critics point out these safeguards probably should have existed already.

Old Way
All at once, globally
New Way
Gradually, in stages
Old Way
Everything fails together
New Way
System stays open, traffic continues
Old Way
Manual, slow process
New Way
Quick, automated rollback
Old Way
Limited safeguards
New Way
Health checks at each stage
Old Way
Can affect 28% of traffic
New Way
Limited to small test group
THE PROTOCOL
1

Check if your website uses Cloudflare by looking at your hosting or domain settings, or ask your web administrator

2

Review Cloudflare's status page (cloudflarestatus.com) to see current and past incidents affecting your services

3

If you use Cloudflare, check whether you're on their older or newer proxy system (your dashboard will indicate this)

4

Set up status alerts so you're notified immediately if Cloudflare experiences future issues

5

Ask your IT team or hosting provider about backup plans if your CDN provider experiences an outage

PROMPT:

"Was my website or business affected by this outage?"

Frequently Asked Questions