Calls for technical support started flooding the University of Maryland Medical System (UMMS) help desk around 1:30 a.m. Employees found themselves locked out of their computers, unable to boot the devices or access patient data. All they could see was the infamous "blue screen of death."
For the next 40 minutes, Joel Klein, MD, senior vice president and chief information officer for the Baltimore-based health system, thought the organization had been the target of ransomware. It wasn't a giant leap, considering the Change Healthcare cyberattack five months earlier had disrupted health information (HI) processes nationwide.
He realized the problem was more widespread when reports of similar issues came in—first from Baltimore County's 911 system and, later, airlines and other healthcare facilities. "Once we heard that, we actually breathed a sigh of relief," recalls Klein.
Last summer, CrowdStrike, a cybersecurity company offering threat detection and breach remediation services, mistakenly instigated the crash when it released a corrupted software file update for its popular Falcon platform, aimed at preventing cyberattacks.
The update froze just one percent of Windows devices worldwide, and most health systems recovered within a few days. To prevent a similar incident from happening, Crowdstrike says it added additional deployment checks and updated content configuration system test procedures, among other actions.
The event underscores the importance of preparedness and how quickly one small error can upend critical medical services in the interconnected ecosystem.
Here's how two HI professionals navigated the debacle and how they recommend you prepare if a similar incident happens again.
Rapid Deployment
While it was a relief that the botched file hadn't come from a bad actor and no data was compromised, health systems remained on the offensive. Mass General Brigham in Boston canceled surgeries and visits on the day the issue began. Downed medical record systems at California's Kaiser Permanente San Jose Medical Center meant that patient vitals and data had to be manually measured and recorded.
The event brought nearly every aspect of revenue cycle management (RCM) to a standstill, from patient access to accounts receivable and insurance verification, says Thomas Thatapudi, chief information officer at the RCM firm AGS Health in Washington, DC. Despite the internal crisis management team activating within 20 minutes of detecting the problem, about half of AGS's Windows machines—roughly 5,000—were affected. Facilities moved to downtime procedures, reverting to paper records and alternative communication methods, such as phones and walkie-talkies, to coordinate care and share essential information.
"Within eight hours, a workaround restored 75 percent of operations, and full operations within 20 hours," Thatapudi says. Clients experienced only minimal disruptions.
Meanwhile, UMMS had approximately 20,000 laptops, workstations, and servers that became inaccessible within 90 minutes of the first device receiving the corrupted file. "The computers could not boot until somebody came and fixed them. There wasn't any department that was uniformly spared," says Klein. The outage also temporarily took down two dozen of its third-party vendors.
Over the next three days, information systems and technology (IS&T) staff spent 3,000-person hours deploying to more than 50 sites, including 11 system hospitals, to remediate the most critical machines. That meant some remote coders and HI professionals were idled for a prolonged period before their computer access was restored the following week.
Although the damage occurred swiftly, the fix took longer. As IS&T staff visited each locked device, they had to manually enter a unique 48-digit numerical code, or BitLocker key. Multiplied by tens of thousands of machines, this verification method proved painstakingly slow, Klein says. The staff eventually moved to a simpler method using physical USB drives.
Preparing for the Next Time
Thankfully, significant incidents related to simple software updates are rare, says Thatapudi. Still, the Crowdstrike event is a warning that HI professionals should heed.
"The outage emphasized the importance of infrastructure resilience and a rapid incident response and recovery process," he says, adding that HI teams should evaluate recovery protocols to ensure they can be physically and remotely applied.
Ongoing training for HI professionals and other staff can help them recognize signs of common cyber threats so they can alert the proper team members, says Thatapudi.
"Facilities need to emphasize the importance of maintaining security best practices in daily operations and running regular cybersecurity awareness campaigns to keep security top-of-mind for all employees," he says.
HI departments should also be involved in reviewing and bolstering an organization's data backup strategy. Thatapudi says this plan must include regular, automated backups of all critical systems and data, taking special care to confirm backups are securely stored. Information can then be quickly uploaded if a cyber incident forces the health system to start fresh.
That wasn't necessary this time, and CrowdStrike's assistance in identifying the issue facilitated a faster return to normal operations, says Klein. His team, along with Thatapudi's, worked closely with CrowdStrike and Microsoft personnel to bring downed devices back online.
"CrowdStrike figured out the fix and was actually on the phone with us somewhere around 2 a.m.," says Klein. "If they had not done that, we would have had to decide whether or not to reimage the machines and boot them without CrowdStrike."
Reimaging is time-consuming, as it involves clearing the hard drive and installing a new operating system. The worry of essentially creating a brand-new machine many times over was unsettling enough that Klein plans to refine and strengthen the organization's reimaging protocol. He also thinks it's wise for HI departments to quantify how many computers need to be operational before staff can confidently shift from downtime processes back to the electronic health record (EHR).
Besides the vendor's assistance, Klein credits the minimal downtime to the multidisciplinary response team UMMS founded a few years ago. He describes MITEC, or Major IT Event Command, as a "hyper-structured" way to manage logistics, operations, and communications for complex incidents like CrowdStrike. The team runs drills three times a year, varying the focus each time.
He encourages HI professionals to participate in similar multidisciplinary teams at their facilities, so they can have a voice in developing comprehensive disaster responses. "Everyone knows you need a good downtime process, but it's not until you get punched in the face with it that you take it out and use it. That's why drills are so important to familiarize people with managing these situations in real life."
MITEC continues to practice incidents like nuclear disasters and systemwide ransomware attacks. However, as cybersecurity threats evolve, he says HI departments can benefit from incorporating more targeted drills.
"What we've moved to now are drills that target a particular system, like laboratory, registration, talent acquisition, pharmacy, and ambulatory sites," he says. "[These are] really interesting exercises. We know from other incidents that patient movement and knowing in a 500-bed hospital where Mrs. Jones is right now is a huge problem when you get to day seven of an incident."
Thatapudi agrees that robust downtime procedures and regular drills to practice them are nonnegotiable for HI professionals in today's data-driven healthcare landscape. Frequent drills can assist in identifying gaps in downtime or data backup processes, which can be refined and tested again during future drills.
Once operations resume, he says HI staff should have a protocol in place detailing how they should prioritize certain tasks and staff to minimize impacts on cash flow and patient care. For example, efforts shifted to addressing claims submission backlogs once the Change clearinghouse breach was resolved.
After CrowdStrike, UMMS had nurses transfer the most clinically important details and information recorded during downtime, like medications and vitals, into the EHR. That took some stress off HI teams trying to recover, but Thatapudi says that RCM departments might consider temporarily extending their hours or bringing in additional staff to help clear downtime backlogs.
Steph Weber is a Midwest-based freelance journalist specializing in healthcare and law.