Should critical infrastructure be running on the cloud? Definitely not!
This isn’t even a black swan event. It’s just a fragile solution.
We’ve seen it again: a single configuration change in a third-party provider took down 28% of global internet traffic. Not an attack, just a bug in a protection mechanism intended to help security.
A single configuration change in a third-party provider took down 28% of global internet traffic.
And yet, the pressure to move critical systems into the cloud continues — often pushed by IT departments or OEMs chasing their own agenda.
Let me be clear: we must protect operational technology with different rules. OT must be built for continuity, not convenience. For resilience, not speed. For safety, not speculation.
OT must be built for continuity, not convenience.
The cloud might be fine for email, collaboration, even business tools. But not for the systems that keep our turbines spinning and our grids alive.
It’s time to ask the uncomfortable question — and be brave enough to answer it honestly.
Read more in this post about why fragility in cloud-connected systems isn’t a risk worth taking for critical infrastructure.
Root cause
Cloudflare was rolling out a change in its Web Application Firewall (WAF) body‑parsing logic to better detect exploit attempts for a critical React vulnerability, identified as CVE‑2025‑55182, which involved increasing the request‑body buffer from 128 KB to 1 MB. During this process they used a “killswitch” to disable an internal test ruleset with an action type of “execute”, and a long‑standing bug in the older FL1 proxy’s Lua rules engine produced a runtime error whenever such an “execute” rule was skipped, returning HTTP 500 for affected traffic.
Impact and scope
The error affected only a subset of customers: those served by the legacy FL1 proxy that also had the Cloudflare Managed Ruleset enabled, and it impacted about 28% of total HTTP traffic through Cloudflare globally between 08:47 and 09:12 UTC. Traffic through their newer FL2 proxy (written in Rust) and their China network was not affected, so some regions and sites saw no issues while others were completely down with 500 errors.[1]
Timeline
- 08:47 UTC – configuration change deployed; failures start on part of the network.
- 08:48 UTC – full impact as the config propagates through the global configuration system.
- 08:50 UTC – incident formally declared after automated alerts.
- 09:11 UTC – change reverted and rollback propagation begins.
- 09:12 UTC – all traffic restored and the incident closed, with monitoring ongoing afterward.
Why the bug existed
The bug was a simple nil‑dereference in Lua: the code assumed that if a rule had an “execute” action, the corresponding object would always exist, which was not true once the killswitch skipped evaluation of that rule. Cloudflare notes that this error had been latent “for many years” in FL1 and that their newer FL2 code path’s stronger type system prevented the same failure there.
Follow‑up and future changes
Cloudflare links this to a broader resiliency effort after a larger November 18, 2025 incident and says the safeguards planned then were not fully deployed yet, which is why a single config push again had wide impact. They describe upcoming changes such as safer rollout and configuration versioning, stronger “fail‑open” behavior when configs are bad, and better “break glass” mechanisms, and they have temporarily locked down network changes while these are implemented.



