Breaking Down the Worst IT Outage in History – The Cause, The Impact, and The Remediation

On July 19th, CrowdStrike, a global cybersecurity technology leader operating in over 170 countries, pushed a routine update that created an outage of an estimated 8.5 million Windows computers, making this outage felt around the world. Banks, healthcare providers, airlines, and businesses across many other industries were functionally shut down as IT professionals reacted to what is being called “the worst IT outage in history.” Some of these outages extended past remediation, disrupting travel and business operations into the following week.
What Caused the Outage?
With an outage of this scale and level of media attention, the immediate question is, “What caused it?” The good news – this was not a cyberattack.
The cause of this extensive service disruption has been pinpointed to an update issued to CrowdStrike’s Falcon Sensor. The RCA (root cause analysis) published by CrowdStrike points to a logic error with the “Channel Files” that are a part of the behavioral protection mechanisms used by the Falcon Sensor. Updates to these Channel Files can occur several times a day, and these updates have been a part of the Falcon architecture since its creation. It is not uncommon that security software requires frequent minor updates to keep up to date with the ever-evolving world of malware.
ivision’s Response
ivision began to receive support cases on the CrowdStrike incident shortly after the update started rolling out. During this outage, we experienced more than an 800% increase in volume of cases compared to a typical day. Cases were quickly escalated to our engineers, and when CrowdStrike issued a fix early Friday morning, remediation began soon thereafter. We had many of our clients back up and running before the middle of the day Friday, and all clients were restored to production capacity by the end of Saturday, July 20th.
Throughout the outage, the collaboration between ivision’s Global Service Center, Engineering team, and Delivery Management was pivotal in managing the crisis. Our teams worked tirelessly to establish numerous communication bridges, ensuring seamless coordination with clients who were experiencing disruptions across hundreds of servers. Their combined efforts facilitated rapid information sharing and problem-solving, minimizing downtime and mitigating the impact on client operations. This unified approach not only showcased their technical expertise but also their commitment to maintaining strong client relationships and delivering exceptional service under pressure.
How ivision Managed Services Can Help
In the case of another major outage, what can businesses do to help set themselves up for minimal disruption? By partnering with a managed services provider that understands your impacted tools, business operations, and endpoints, your business can reach remediation with greater speed and confidence. ivision’s consistency in going above and beyond on a daily basis translates to smoother sailing in the face of an emergency. Our Managed Services team is here to offer around-the-clock support to give your team peace of mind. Don’t just take our word for it, though. Hear what our clients have to say about our service.