Note: this note originaly appeared in Peter Neumann's "Risk Digest," volume 23, number 63 26 Dec. 2004. I added some material afterwards.
In RISKS 26.30, Peter Neumann recommended a paper by Scott Sagan entitled: "The Problem of Redundancy Problem: Why More Nuclear Security Forces May Produce Less Nuclear Security." (See references at end of this note.)
I want both to second the recommendation and also to expand upon it. Many attempts by both experts and amateurs in the world of security and safety actually weaken their systems.
Sagan provided three major reasons why this might be so: I add a fourth. Sagan's three reasons were:
1. Common-mode problems.
Adding redundancy only makes things more secure or safe if the new items are truly independent of the existing ones. They seldom are, and accident after accident demonstrates the common mode problem, where one accident takes out all the supposedly redundant system.
(Classic example: redundant hydraulic lies in a DC-10, but an accident destroyed the part of the fuselage that held all three lines. Poof. No more hydraulics.)
2. The "shirking" problem (also known to psychologists as "bystander apathy").
The more people that are asked to check upon a system, the less thorough any individual is apt to be. Think about it -- will you take extra steps to check something if you know that "n" people have already vetted it and "m" more will do so after you? But if everyone shirks their duty, the reliability goes to zilch. In Social Psychology, "bystander apathy" refers to the experimentally validated observation that the more people that witness a crime, the less likely it is to be reported.
Thus, NASA's Genesis spacecraft suffered an embarrassing crash, apparently due to switches being installed upside-down. One scientist pointed out that even though the spacecraft had undergone reviews by more than 100 people, "this somehow got through despite the normal reviews and the additional reviews" (N.Y. Times, Oct. 16, 2004). Well, I suspect the problem was not detected in part because so many people were involved in the checks: use less people and the chance of catching problems increases.
3. The overcompensation problem.
This can be phrased as "the system is now safer, so I can take more risks" problem. Make a system more safe or more secure and people learn they can take chances. Add seat belts in automobiles and people drive faster. Add a secondary limit detector on a mechanical system, and people are willing to go beyond the first limit ("because the backup will catch any problem").
I want to emphasize the importance of these problems, while adding an equally important fourth one:
4: The Dedicated Worker problem.
If the security or safety requirements get in the way of doing the work, then the most dedicated workers will defeat them. Put in locked doors, and they will prop them open with waste baskets. Require long, lengthy, hard-to-guess passwords, changed frequently, and they will write them down and post them in easy to reach places. After all, security and safety are risks, not realities (and usually low-probability at that.(See note *) Getting the work done on time is a reality, and these extra steps invariably make it harder to do the work. Hence, the most dedicated workers will remove whatever tends to block getting the work done.
Note (*) In 1992 I neamed this the "one in a million" problem. (In Norman, 1992 -- reference at end of this essay -- Chapter available here.) Low probability events are often judged to be non-existent, or at least, that happen to others. I've named it after the pilot who decided that all three of his engines could not be failing because "the chance of this happening is one in a million." My observation is, "yes, you are correct, and you are that one." Actually, with some 7 million flights a year, one in a million is not nearly good enough, but that is a different argument.
Item one of these four is a technical issue: the other three are psychological ones. When attempting to increase security and safety of systems, it is essential that the psychology of the people be considered to be of equal or greater importance than the purely technical analysis. Note, the most obvious response of security and safety people is "more training is necessary." Yes, proper training is always useful, but don't count on it solving these problems. These issues happen despite training. They often are present in the best, most well motivated, most effective people in the organization. Indeed, professionals in the security and safety industry have succumbed to just these issues. ("I know my home computer isn't secure, but it was absolutely essential that I finish this report, ..."). The correct solution lies in ensuring that the security and safety measures take into account both the technical and the psychological factors.
Norman, D. A. (1992). It's a one in a million chance. Chapter 15 of Turn signals are the facial expressions of automobiles. Cambridge, MA: Perseus Publishing.
Sagan, S. (2004). The problem of redundancy problem: Why more nuclear security forces may produce less nuclear security. Risk Analysis, 24 (4), 935-946.