Article

from Belfer Center for Science and International Affairs, Harvard Kennedy School

How to Deal with Increasingly Complex Safety-Critical Technologies

Published: Mar. 28, 2019

The Three Mile Island nuclear power plant near Middletown, Pennsylvania, which was the site of a March 28, 1979 power plant accident. The accident at the Three Mile Island Unit 2 (TMI-2) nuclear power plant was the most serious in U.S. commercial nuclear power plant operating history.

Public Policy Recommendations from the Control Room of the Three Mile Island Nuclear Reactor to the Cockpit of the Boeing 737 Max

Just like the March 28, 1979 Three Mile Island (TMI) accident became a watershed event for the U.S. nuclear power industry and its regulator, the Nuclear Regulatory Commission (NRC), the recent back-to-back crashes of two Boeing 737 Max jets in Ethiopia and Indonesia could have long-lasting implications for both the U.S. aviation industry and its regulator, the Federal Aviation Administration (FAA).

What is particularly striking is how both events mirror one another forty years apart, and what this tells us about our abilities—and inabilities—to learn from catastrophic accidents and regulate increasingly complex technologies.

In these seemingly unrelated accidents, complex technological systems suddenly transitioned from a routine to a non-routine state. In both cases, there was no annunciator or indicator on control panels to help operators point out the causes of malfunction (In the case of the 737 Max, a crucial indicator was sold for a premium as an extra safety option.) Operators tried to figure out what was going on, searching for a match between what they could observe and the emergency procedures on which they had been trained. As the situation continued to deteriorate, confusion grew, and disaster loomed.

The inadequate responses of the operators to unfamiliar events, compounded by "design-induced errors" eventually led to catastrophic system failures. At TMI, the nuclear reactor core melted down; luckily, the radioactive contamination was contained. In Indonesia and Ethiopia, two 737 Max jets crashed, killing 346 people.

In both cases, there were eerily clear early warnings. On September 24, 1977, the Davis Besse Nuclear Power Plant near Toledo, Ohio, experienced an event amazingly similar to the TMI accident and its main root-cause, a stuck-opened valve on its primary cooling loop. The operators were able to regain control of the reactor, averting a disaster. But neither the NRC, nor the nuclear reactor vendor of both Davis Besse and TMI, Babcock and Wilcox (B&W), nor the utility owner of Davis Besse, shared this information at the time with other B&W plants or alerted the rest of the nuclear power industry.

In the case of the first Boeing 737 Max crash, a previous crew faced a similar malfunction on the exact same jet, but was able to regain control of the aircraft. At the time, the FAA did not notify other 737 Max users. Only after two planes crashed, and under tremendous international pressure, Boeing and the FAA made the decision to ground all 737 Max jets.

So why are we finding ourselves in two similar situations four decades apart? And where can we begin to address this problem?

First, it seems that in both the TMI and Boeing accidents, a cardinal rule of human-systems integration was broken: matching machines' requirements with human operators' physical and cognitive capabilities. The safe control of any complex machine requires designers to provide human operators with the requisite information to allow and enable them to attain total system comprehension, including the ability to develop a full mental model of the system status at any point in time.

An undeniable convenient truth is that human operators always constitute a technologically complex system's first and last layer of defense, as well as the society's last barrier, for averting disasters. There is ample evidence of this principle in action: Capt. Chesley B. 'Sully' Sullenberger's 2009 "Miracle on the Hudson" water landing of US Airways flight 1549—and the actions of Superintendent Naohiro Masouda and 200 dedicated personnel who undertook a cold shutdown of the four operating nuclear reactors of the Fukushima Daini Nuclear Power Station (the sister plant of the ill-fated Fukushima Daichi plant) after the 2011 Tōhoku earthquake and tsunami, despite an almost total "station blackout." Whatever the causes of a major safety-critical system failure—flawed safety analysis, inadequate certification process, failed oversight, faulty hardware or software—a human being with operational control and a full understanding of the system in the loop will significantly reduce the risk of catastrophe.

Second, it is clear that the rapid growth and increasing complexity of new technologies have been challenging the oversight capabilities of regulatory agencies that are often understaffed and operating on limited resources. This led the FAA, for example, to delegate significant authority to Boeing for the certification process of the 737 Max. Providing adequate resources and power to regulators is critical to avoid catastrophic accident due to failed oversight.

Third, regulating technological safety and security in critical areas of modern society will require a new regulatory paradigm based on trust, transparency, and accountability that cannot be conducted on a piecemeal basis.

Forty years after TMI, there is still no common Congress-enacted set of principles or standards guiding the work of a patchwork of siloed federal regulators on crosscutting issues, such as certification processes, human performance, and safety culture. Such standards, cutting through regulatory silos and applicable across industries and technologies, are badly needed. Their implementation would provide confidence that regulatory agencies are not only held to the same standard, but also have the resources, competence, and independence they need to develop healthy, proactive, and smart oversight in all safety-critical industries, such as transportation, energy, defense, and health care.

This is even more urgent as we deploy increasingly autonomous systems and relegate human operators to the backseat, where there is little they can do to save the day.

Najmedin Meshkati is a Research Fellow with the Project on Managing the Atom, and Sébastien Philippe a Stanton Nuclear Security Postdoctoral Fellow with both the Project on Managing the Atom and the International Security Program, at the Harvard Kennedy School's Belfer Center for Science and International Affairs. Meshkati is also a professor of engineering and aviation safety at the University of Southern California. Philippe was a safety engineer for the French strategic nuclear forces.

Statements and views expressed in this commentary are solely those of the author and do not imply endorsement by Harvard University, the Harvard Kennedy School, or the Belfer Center for Science and International Affairs.

Recommended citation

Meshkati, Najmedin and Sébastien Philippe ."How to Deal with Increasingly Complex Safety-Critical Technologies." Belfer Center for Science and International Affairs, Harvard Kennedy School, March 28, 2019.