The FAA Outage as a Resiliency Test

                The theme of cyber resilience engineering has been reverberating through the last few years, and the resilience gong is sounded again with the Federal Aviation Administration (FAA) network outage which impacted all airline service and caused thousands of flight delays.  Between concepts of DevOps,  Site Reliability Engineering (SRE), and Cyber Resiliency Engineering, there are a lot of methodologies which could have assisted the FAA in reducing their risk. 

                It is worthwhile to mention that the FAA is most likely drowning in technical debt in the form of outdated technology, fragile infrastructure, complexity, and lack of documentation.  Geoff Freeman, president and CEO of the U.S. Travel Association said “Today’s FAA catastrophic system failure is a clear sign that America’s transportation network desperately needs significant upgrades,” which amplifies the presence of technical debt at the FAA which desperately needed attention (Muntean & Wallace, 2023).

                The MITRE Cyber Resiliency Engineering Framework (MITRE CREF) focuses on these primary goals: Anticipate, Withstand, Recover, and Adapt.  When considering the FAA outage, they did not anticipate the outage and did not have resilience mechanisms (redundancy, automation, etc.) in place to withstand the outage.  Recovery has been slow and painful, indicating that there was probably a lack of documentation, damage assessment, monitoring, root cause analysis capabilities, simplicity, and protected backup and restore capabilities.  The question is:  How will the FAA adapt and improve after this outage?  Will the FAA designate significant budget toward reducing technical debt and improving resilience with the goals of preventing infrastructure failures in the future?

                The FAA is not alone in this predicament.  Incident response has taught us over the last several years that most companies, regardless of size (from large enterprise down to small business), have incredible technical debt, lack of documentation, lack of redundancy, and lack of recoverability.  Often, this is due to budget restrictions and the pressure to drive down costs and limit investments.  Sometimes, this technical debt and lack of resiliency is due to lack of attention and prioritization by the business.  IT is now at the backbone of most businesses, so it requires more priority and more investment at most organizations.  How should a business start to prioritize resilience?  Consider a resilience framework like the MITRE Cyber Resiliency Engineering Framework (MITRE CREF) as a starting point.

References:

Muntean, P. and Wallace, G. (2023, January 11). FAA system outage causes thousands of flight delays and cancellations across the US. CNN. Retrieved from https://www.cnn.com/travel/article/faa-computer-outage-flights-grounded/index.html

Published by Art Ocain

I am a DevOps advocate, not because I am a developer (I’m not), but because of the cultural shift it represents and the agility it gains. I am also a fan of the theory of constraints and applying constraint management to all areas of business: sales, finance, planning, billing, and all areas of operations. My speaking: I have done a lot of public speaking in my various roles over the years, including presentations at SBDC (Small Business Development Center) and Central PA Chamber of Commerce events as well as events that I have organized at MePush. My writing: I write a lot. Blog articles on the MePush site, press-releases for upcoming events to media contracts, posts on LinkedIn (https://www.linkedin.com/in/artocain/), presentations on Slideshare (https://www.slideshare.net/ArtOcain), posts on the Microsoft Tech Community, articles on Medium (https://medium.com/@artocain/), and posts on Quora (https://www.quora.com/profile/Art-Ocain-1). I am always looking for new places to write, as well. My certifications: ISACA Certified Information Security Manager (CISM), Certified Web Application Security Professional (CWASP), Certified Data Privacy Practitioner (CDPP), Cisco Certified Network Associate (CCNA), VMware Certified Professional (VCP-DCV), Microsoft Certified System Engineer (MCSE), Veeam Certified Engineer (VMCE), Microsoft 365 Security Administrator, Microsoft 365 Enterprise Administrator, Azure Administrator, Azure Security Administrator, Azure Architect, CompTIA Network+, CompTIA Security+, ITIL v4 Foundations, Certified ScrumMaster, Certified Scrum Product Owner, AWS Certified Cloud Practitioner See certification badges on Acclaim here: https://www.youracclaim.com/users/art-ocain/badges My experience: I have a lot of experience from developing a great company with great people and culture to spinning up an impressive DevOps practice and designing impressive solutions. I have been a project manager, a President, a COO, a CTO, and an incident response coordinator. From architecting cloud solutions down to the nitty-gritty of replacing hardware, I have done it all. When it comes to technical leadership, I am the go-to for many companies. I have grown businesses and built brands. I have been a coach and a mentor, developing the skills and careers of those in my company. I have formed and managed teams, and developed strong leaders and replaced myself within the company time and again as I evolved. See my experience on LinkedIn here: https://www.linkedin.com/in/artocain/

Leave a Reply

%d bloggers like this: