We've tied our civilization to a few flimsy networks, but what happens when they break?
In the late 1990s, IT experts feared a theoretical “Y2K bug” would trigger widespread technology failures as the calendar changed from 1999 to 2000. With so many systems linked around the globe, many feared one coding error could end it all. Thankfully, that tech-pocalypse never arrived, but a similar cascading failure finally did last month.
On July 19, cybersecurity vendor CrowdStrike pushed a small update to systems using its wildly popular Falcon platform. The company realized it contained a coding error and sent a corrected update just 79 minutes later. By then, it was too late. The result wasn’t exactly Y2K, but it created what many consider the largest IT outage in history.
A Y2K outage might have delayed emails or access to an ATM, but an outage today affects everything from medical care to our food supply to our power grid.
CrowdStrike Falcon is widely used by organizations across many large and small industries. The company’s reputation was excellent, thanks to a decade of identifying sophisticated cybercriminals from countries such as China, North Korea, and Russia. This made its platform nearly ubiquitous, which only heightened the damage caused by its fatal update. As did its tight integration with Microsoft Windows OS.
The company had inadvertently introduced a logic error, crashing not only Falcon but entire Windows systems. Although CrowdStrike quickly corrected the problem, many of the systems had shut down for good. You can’t update an offline computer.
Microsoft estimated that fewer than 1% of Windows devices were directly affected, but those were the systems running critical operations elsewhere and heavily interconnected with other Windows devices.
A real-world cascading failure mirrored the digital linking of the internet. The impact was shocking.
Delta, United, and American Airlines had to cancel hundreds of flights, as did many other airlines and airports around the globe. Public transit in New York City, Washington, D.C., and other cities was shut down. Banks, hospitals, and 911 emergency services also failed. British broadcaster Sky News was knocked off the air.
According to one insurer’s analysis, one bad line of code will cost Fortune 500 companies more than $5 billion in direct losses. Lawsuits are already being drafted.
A failure of this magnitude should have been one of the most covered stories of the decade. Wedged between the failed Trump assassination attempt and Kamala’s quiet coup of President Biden, however, it made only a little splash in the media. I read more coverage from friends and families sleeping on airport floors than I did from legacy media.
While the CrowdStrike disaster took many by surprise, the most surprising thing was that it hadn’t happened earlier. DHS’ Office of Cyber and Infrastructure Analysis warned in 2016 of how susceptible our digital dependencies made us. Analysts had contemplated an attack from a hostile regime, but even that assessment wasn’t as dire as what resulted from the July 19 software update. According to the OCIA report, the most vulnerable systems are:
Cyber-physical technology which allows physical objects to communicate with a computer network. One example is the Colonial Pipeline ransomware attack of 2021 that triggered a run on gasoline because of a perceived shortage that never came to pass.
The Global Positioning System is regularly interrupted in war zones such as Ukraine and the Middle East. There is a growing reliance on this technology for autonomous vehicles, mapping, and military targeting.
Smart cities integrate tech and infrastructure to improve environmental and economic efficiency. These include features such as interconnected power grids, traffic management, water delivery, waste services — even governmental tasks.
The internet of things connects various devices to larger networks, such as home appliances, cars, and production lines. After recent home upgrades, my thermostat texted me and my dishwasher emailed me. Thanks, Silicon Valley.
Cloud technology poses significant security challenges through its endless entry points. The OCIA singled out airlines as particularly vulnerable since the industry “relies on cloud systems for scheduling passengers, flights, and cargo.” That prediction came true last month.
Although the incident was a disaster to travelers and companies’ bottom lines, it was an overdue wake-up call to the private and public sectors. Our tech needs to be more resilient, including far more backups and redundancies. Large organizations must diversify their software portfolios so that one company pushing one update doesn’t cripple entire sectors.
The CrowdStrike outage revealed the inherent danger of over-reliance on single sources of technology. Putting all your eggs in one basket has never been a good idea.
Our adversaries are undoubtedly studying what happened on July 19 and planning accordingly. Any attack against our critical infrastructure would unfold much like the CrowdStrike outage: One failure triggers a cascade of system failures. Together, those overwhelm our ability to respond, causing even more damage.
A Y2K outage might have delayed emails or access to an ATM, but an outage today affects everything from medical care to our food supply to our power grid. And even though July's was an accidental failure, malicious actors quickly leveraged the incident for cybercrime.
CrowdStrike prioritized speed over safety and quality assurance. The global release of a single update brought much of the digital world to its knees. Big businesses and governments around the world should take advantage of this opportunity to see how dependent we are on technology, accept the risks that brings, and prepare for the next cascading failure — because there will be another.
Want to leave a tip?
Jon Gabriel