How one software update brought world to its knees

 Air Asia passengers queue at counters inside Don Mueang International Airport Terminal 1 amid system outages disrupting the airline’s operations, in Bangkok, Thailand on July 19, 2024. PHOTO | REUTERS

What you need to know:

  • This incident also affected Uganda, with two leading financial institutions reporting failures on their systems, both at branches and online banking platforms. 

A widespread IT crisis gripped the globe after a software update from cyber security firm CrowdStrike caused widespread system failures. CrowdStrike is a leading cybersecurity company whose software is widely used to protect against cyber-attacks.

The incident, which occurred on July 19, 2024, was caused by a malfunctioning update that was automatically pushed to all users. This disrupted businesses, government agencies, and critical infrastructure across multiple countries, with millions of computers rendered inoperable after encountering the infamous ‘Blue Screen of Death’ error.

The fallout from this incident was immense, with several airlines grounding flights, banks facing long queues and online banking outages, hospital disruptions to critical systems, and the unavailability of government services. For some airports in India, written boarding passes were seen, reminiscent of pre-digital boarding days.

This incident also affected Uganda, with two leading financial institutions reporting failures on their systems, both at branches and online banking platforms. There were notable delays in operations, affecting productivity and revenue, as many leading banks in Uganda utilise CrowdStrike software.

A leading marketplace and logistics service provider was also impacted by the incident. Several other players might have been affected but were able to contain this internally without disclosure.

This disruption not only inconvenienced everyday banking activities but also affected businesses that rely on digital financial services for their operations. 

Although CrowdStrike apologised for the incident and issued an update to resolve the issue, the full extent of the damage is still being assessed. This incident highlights the vulnerability of our increasingly interconnected world and the potentially catastrophic consequences of even minor software glitches.

It also highlights the significant impact of supply chain risk, where an incident with a critical service provider could hinder your business’s ability to deliver on its commitments.

Although a hack has always been viewed as a potential avenue for a systemic risk event, this outage has demonstrated that disruption to service can present significant systemic risk.

This incident also presented an avenue for opportunistic scammers to run phishing campaigns targeting uninformed and unsuspecting clients. These were masquerading as representatives from CrowdStrike and directing users to malicious websites to download malware that could further damage their systems.

Some went as far as impersonating CrowdStrike representatives in phone calls and developing scripts that purportedly automated the recovery of disrupted machines. However, Social Engineering attacks like this can easily be combatted with a strong cybersecurity culture where employees are trained to recognise and report potential threats. 

After such an incident, one begins to wonder how this could have been better managed to limit the impact.

First, this happened because CrowdStrike was automatically updating the sensors on the user devices. In a situation where an organisation had turned off auto-updates, this update would not have applied, and as such, the incident would not have affected them.

It is established best practice that updates should be given a period of testing before they are rolled out into production. However, this is commonly ignored for small updates, and yet, as seen in this incident, they can have a significant impact.

However, questions also need to be raised about the rigorous testing procedures in place at CrowdStrike, which should have discovered this before rolling out the update to customers. 

Secondly,  over time, organisations have spent a lot of money on purchasing defence systems to protect their infrastructure from attack.

However, the cyber threat landscape changes as frequently as daily, and finances cannot allow protection to be bought for every new threat vector that surfaces.

In the realm of cyber security, it is no longer an issue of whether you will be attacked but rather when it will happen because you will most definitely be breached.

Security expenditure needs to begin shifting towards building robust incident response teams and capabilities to ensure that the “Mean Time to Recover” after an incident is shortened so that services are restored in the shortest time possible. 

Thirdly, continuous monitoring of systems also plays a role in ensuring that the impacts of incidents are reduced. For example, to date, some organisations still do not have an idea of how many devices installed the “bad” update and how many have successfully applied the mitigation patch from CrowdStrike.

In the absence of this, one cannot adequately prepare for and combat the effects of this incident, as affected machines cannot be identified.

Cyber insurance plays a critical role in helping businesses recover from the effects of attacks by compensating for business losses due to outages such as this. and receive compensation for business losses.

Furthermore, insurers can provide incident response services Policyholders can file claims for disruption of service during attacks and help policyholders navigate this challenging time.

However, this significantly depends on the coverage that has been agreed upon with your insurer.

Many of the cyber insurers in Uganda do not have these capabilities in-house and do not offer these services. However, many global insurers can provide a full suite of services to their clients. 

To prevent a single point of failure, it is also important to diversify software vendors to reduce reliance on a single provider. However, this also presents different risks as it is easier to manage one software architecture across the entire organization.

As such, the delicate balance between the convenience of one unified provider and the risk of a single point of failure needs to be carefully studied.

Nonetheless, the impact of a vulnerability in a third-party software provider can be catastrophic, and businesses must prioritise supply chain security and conduct rigorous vendor assessments. 

This incident underscores the critical need for robust cybersecurity measures and contingency planning. It highlights the risks associated with heavy reliance on single points of failure in digital infrastructure and the importance of having backup systems and recovery plans in place.

As organisations continue to advance their digital transformation agenda, this incident serves as a stark reminder of the vulnerabilities that come with increased digitisation. 

However, it also presents an opportunity for stakeholders to reassess and strengthen their cybersecurity risk management frameworks to safeguard against future disruptions. As the world works to recover and bolster defences, the lessons learned from this event will undoubtedly shape the future of digitization, cybersecurity practices, and cloud services.

Mr Rodney Hood Adriko is a technology, blockchain, cybersecurity & and privacy expert, and a PhD researcher at the Institute of Cyber Security for Society (iCSS) at the University of Kent, UK.