Chapter 16: Coffee Cups in the Cockpit

Essays & Articles

Back

November 17, 2008

Chapter 16: Coffee Cups in the Cockpit

CHAPTER 16 OF TURN SIGNALS ARE THE FACIAL EXPRESSION OF AUTOMOBILES

Chapter Note: Please do not think that because most of my examples are from aviation that air travel is unsafe or that the same thing doesn’t happen elsewhere. Aviation is a very safe activity because of these careful investigations of each accident. They are done with great care and thoroughness, and the results are taken very seriously by the aviation community. The voluntary aviation safety reporting system is a major source of safety lessons. It is essential that this service be maintained and I recommend it to other industries.Moreover, other industrial areas don’t receive the same care and analysis as does aviation, so I can’t turn to them as readily for examples. But the problems are there, I assure you. Moreover, the problems are often worse. Just look at all those railroad accidents, industrial manufacturing plant explosions, toxic chemical leaks, nuclear power disasters, ship leaks and collisions: the same problems recur all over the place. In fact, aviation is extremely safe compared to some of these other areas. Moreover, their problems are fundamental to the industry: I refer you to Perrow’s study Normal accidents (1984). An excellent review of aviation issues is in Human Factors in Aviation (Wiener & Nagel, 1988).

In 1979, a commuter aircraft crashed while landing at an airport on Cape Cod, Massachusetts (USA) (NTSB report, 1980). The captain (the pilot) died and the first officer (the co-pilot) and six passengers were seriously injured. As the plane was landing, the first officer noted that they seemed to be too low, and he told the captain. However, the captain did not respond. The captain, who was also president of the airline and who had just hired the first officer, hardly ever responded: he was the strong, silent type. He was in charge, and that was that. United States airline regulations require pilots to respond to one another, but what was the co-pilot to do? He was new to the company and the captain was his boss. Moreover, the captain often flew low. There were obvious social pressures upon the first officer.

What the first officer failed to notice was that the captain was “incapacitated.” That’s technical jargon. What it means is that the captain was unconscious and probably dead from a heart attack. After their investigation of the resulting accident, the U.S. National Transportation Safety Board (NTSB) rather dryly described the incident this way:

“The first officer testified that he made all the required callouts except the ?no contact? call and that the captain did not acknowledge any of his calls. Because the captain rarely acknowledged calls, even calls such as one dot low (about 50 ft below the 3? glide slope) this lack of response probably would not have alerted the first officer to any physiologic incapacitation of the captain. However, the first officer should have been concerned by the aircraft’s steep glidepath, the excessive descent rate, and the high airspeed.”

Seems strange, doesn’t it: there they are, flying along, and the captain dies. You’d think the co-pilot would notice. Nope, it isn’t as obvious as you might think. After all, reconsider. During landing, the two pilots are sitting side by side in a noisy airplane with lots of things to do. They hardly ever look at each other. Assuming the captain dies a quiet, polite death, what is there to attract the copilot’s attention? Nothing. In fact, United Airlines had tried it out in their simulators: they told the captain to make believe he had died, only to do it very quietly. Then they watched to see how long it took for anyone else to notice. The NTSB reviewed that study in their typical dry fashion:

“In the United simulator study, when the captain feigned subtle incapacitation while flying the aircraft during an approach, 25 percent of the aircraft hit the ?ground.? The study also showed a significant reluctance of the first officer to take control of the aircraft. It required between 30 sec and 4 min for the other crewmember to recognize that the captain was incapacitated and to correct the situation.”

So there: the pilot dies and it takes some people as long as four minutes to notice. Even the quick ones took 30 seconds. And 1/4 of them crashed (or as the NTSB says, “hit the ‘ground’ ” (“ground” is in quotes because, fortunately, this was a simulator: embarrassment yes, but no injury).

Commercial aviation is a strange and wondrous place where perceived images are somewhat in conflict with the reality. The image is of a heroic, skilled adventurer, successfully navigating a crippled aircraft through storms, fires, and unexpected assaults. Images of Lindberg and Earhart flying alone over the ocean, or World War I fighter pilots in open cockpit with helmet, goggles, and scarf still come to mind.

The reality is that the commercial aviation pilot of today is a manager and supervisor, not the daredevil pilot of yore. Today’s flight crew must be well schooled in the rules, regulations, and procedures of modern aviation. They are not permitted to deviate from assigned boundaries, and on the whole, if they do their job properly, they will lead a routine and uneventful life. The flight crew is in charge of a large, expensive vehicle carrying hundreds of passengers. The modern flight deck is heavily automated, and multiple color computer screens show maps, instrument readings, and even checklists of the tasks they are to do. The flight crew must act as a team, coordinating their actions with each other, with the air traffic control system, and in accordance company and federal policies. Pilots spend much of their time studying the vast array of regulations and procedures and being tested and observed in the classroom, in the simulator and in actual flight. Economics and reliability dominate.

The old-fashioned image of the cockpit crew is more like a military hierarchy: Captain in charge, first and second officers serving subsidiary roles. As a result of numerous studies of aircraft crews performed by scientists from the National Aeronautics and Space Administration (NASA) and universities, we have learned a lot about the need for cooperative work and interaction. The lessons actually apply to almost any work situation. It is unwise to rely on an authoritative figure. Individual people can become overloaded and fail to notice critical events. People also have a tendency to focus on a single explanation for events, thereby overlooking other possibilities. It really helps to have several people looking over things, especially if they feel that their contributions are appreciated and encouraged. In fact, it isn’t a bad idea to consider alternative courses of action. This has to be done with some sensitivity. The goal is not to disobey suggestions from the person in charge, but rather to explore possible alternatives and implications of the actions now being done.

Once upon a time it used to be assumed that when an airplane got into trouble, it was the captain’s responsibility to fix things. No longer. Another dramatic example of why this philosophy fails comes about from an accident over the Everglades, a swampy, jungle-like region in southern Florida. This particular accident has become famous in the eyes of students of aviation safety and human error. As the plane was coming in for a landing in Miami, Florida, the crew lowered the lever that lowers the landing gear. However, the lights that indicate a fully lowered and locked gear did not come on. So the crew received permission to circle for awhile over the Everglades while they figured out what the problem was.

Chapter note: NTSB report (1973). It turned out that the landing gear had been lowered properly, but the light bulb that would have informed the cockpit crew was burned out.

Now imagine a crowded cockpit with everyone examining the lights, reading through the “abnormal procedures” manual, trying analyze the situation. Maybe the gear was down and the lights weren’t working? How can you tell? Simple, look at the gear. Alas, you can’t normally see the landing gear from inside the plane, and anyway, it was night. But airplane manufacturers have thought of a solution: you can remove a panel from the floor and lower a little periscope, complete with light, so you can look. Now imagine the entire crew trying to peek. Someone is on hands and knees taking off the plate. Someone is trying to lower the periscope. Someone reading from the manual. Oops, the light in the periscope broke. How to fix that? Everyone was so busy solving the various problems that arose that nobody was flying the airplane.

Nobody flying the airplane? That’s not quite as bad as it sounds. It was being flown by the automatic flight controls, automatically flying in a circle, keeping constant airspeed and altitude. Except that someone, probably the captain, must have bumped against the control wheel (which looks something like an automobile’s steering wheel) while leaning over to watch the other people in the cockpit. Moving the control wheel disconnected the altitude control, the part of the automatic-pilot that keeps the airplane at a constant height above ground. When the controls disconnected they also made a beep to notify the crew. You can hear the beep on the tape recording that was automatically made of all sounds in the cockpit. The recorder heard the beep. The accident investigators who listened to the tape heard the beep. Evidently, nobody in the cockpit did.

“Controlled flight into terrain.” That’s the official jargon for what happens when a plane is flying along with everyone thinking things are perfectly normal, and then “boom,” suddenly they have crashed into the ground. No obvious mechanical problem. No diving through the skies, just one second flying along peacefully, the next second dead. That’s what happened to the flight over the Everglades: controlled flight into terrain. As nobody watched, the plane slowly, relentlessly, got lower and lower until it flew into the ground. There was a time when this was one of the most common causes of accidents. Today, I am pleased to report, such cases are rare.

Today pilots are taught that when there is trouble, the first thing to do is to fly the plane. I have watched pilots in the simulator distribute the workload effectively. One example I observed provides an excellent demonstration of how you are supposed to do it. In this case, there was a three-person crew flying NASA’s Boeing 727 simulator. An electrical generator failed (one of many problems that were about to happen to that flight). The flight engineer (the second officer) noticed the problem and told the captain, who at that moment was flying the airplane. The captain turned to the first officer, explained the situation, reviewed the flight plan, and then turned over the flying task. Then, and only then did the captain turn to face the second officer to review the problem and take appropriate action. Newer aircraft no longer have a flight engineer, so there are only two people in the cockpit, but the philosophy of how trouble should be handled is the same: one person is assigned primary responsibility for flying the aircraft, the other primary responsibility for fixing the problem. It is a cooperative effort, but the priorities are set so that someone — usually the captain — is making sure the most important things are being taken care of.

Pilots now are trained to think of themselves as a team of equals. It is everyone’s job to review and question what is going on. And it is the captain’s job to make sure that everyone takes this seriously. Proper crew resource management probably would have saved the crew and passengers in both the Everglades crash and that 1979 crash on Cape Cod. And it has been credited with several successes in more recent years.

Chapter Note: “Pilots now are trained to think of themselves as a team of equals”: See Foushee and Helmreich (1988), Group interaction and flight crew performance.

The Strong Silent Type Recurs in Automation

Chapter Note: “The Strong Silent Type Recurs in Automation”: Some of the social issues of that affect the manner by which automation interacts with workers and changes the nature of jobs is discussed in Zuboff?s In the age of the smart machine: The future of work and power (1988). Her distinction between “automating” and “informating” is particularly relevant.

Alas, the lessons about crew resource management have not been fully learned. There is a new breed of strong, silent types now flying our airplanes. For that matter, they are taking over in other industries as well. They are at work in chemical plants, ships, nuclear power plants, factories, even in the family automobile. Strong silent types that take over control and then never tell you what is happening until, sometimes, it is too late. In this case, however, I am not referring to people, I am referring to machines.

You would think that the 1979 crash and a large bunch of other ones would have taught designers a lesson. Nope. Those crashes were blamed on “human error.” After all, it was the pilots who were responsible for the crashes, right? The lesson was not attended to by the folks who designed the mechanical and electronic equipment. They thought it had nothing to do with them.

The real lesson of crew resource management is that it is very important for a group of people doing a task together to communicate effectively and see themselves as a team. As soon as any one feels superior to the others and takes over control, especially if that one doesn’t bother to talk to the others and explain what is happening, then there is apt to be trouble if unexpected events arise.

The case of the loss of engine power

Chapter Note: Incident reported in NTSB report, 1986. Also Wiener (1988).

In 1985, a China Airlines 747 suffered a slow loss of power from its outer right engine. When an engine on a wing goes slower than the others, the plane starts to turn, in this case to the right (technically, this kind of a turn is called “yawing”). But the plane — like most commercial aviation flights — was being controlled by its automatic equipment, in this case, the autopilot, which efficiently compensated for the turn. The autopilot had no way of knowing that there was a problem with the engine, but it did note the tendency to turn to the right, so it simply kept the plane pointed straight ahead. Negative feedback (remember Chapter 15?). Eventually, however, the autopilot reached the limit of how much it could control the turn, so it could no longer keep the plane stable. So what did it do? Basically, it gave up.

Imagine the flight crew. Here they were, flying along quietly and peacefully. They noticed a problem with that right engine, but while they were taking the first preliminary steps to identify the problem and cope with it, suddenly the autopilot gave up on them. They didn’t have enough time to determine the cause of the problem and to take action: the plane rolled and went into a vertical dive of 31,500 feet before it could be recovered. That’s quite a dive: almost six miles! And in a 747. Ten kilometers! The pilots managed to save the plane, but it was severely damaged. The recovery was much in doubt.

In my opinion, the blame is on the automation: why was it so quiet? Why couldn’t the autopilot indicate that something was wrong? It wouldn’t have had to know what, but just say, “Hey folks, this plane keeps wanting to yaw right more than normal.” A nice informal, casual comment that something seems to be happening, but it may not be serious. Then, as the problem persisted, why didn’t the autopilot say that it was reaching the end of its limit, thus giving the crew time to prepare. This time perhaps in a more formal manner because, after all, the problem is getting more serious: “Bing-bong,” “Captain, sir, I am nearing the limit of my control authority. I will soon be unable to compensate anymore for the increasing tendency to yaw to the right.”

The Case of the Fuel Leak

Let me tell you another story, yet again where there was trouble developing, yet the automatic equipment didn’t say anything. This story comes from a report filed with the NASA Aviation Safety Reporting System. These are voluntary reports, filed by people in the aviation community whenever an incident occurs that is potentially a safety problem, but that doesn’t actually turn into an accident (so there would not ever have been any record of the event, otherwise). These are wonderful reports, for they allow safety researchers to get at the precursors to accidents, thereby correcting problems before they occur. In fact, let me tell you about them for a minute.

One accident researcher, James Reason of the University of Manchester, calls these reports and other similar signs of possible problems, the early signs of “resident pathogens.” The term comes from analogy to medical conditions. That is, there is some disease-causing agent in the body, a pathogen, and if you can discover it and kill it before it causes the disease, you are far ahead in your aim to keep the patient healthy. “All man-made systems,” says Reason, “contain potentially destructive agencies, like the pathogens within the human body.” They are mostly tolerated, says Reason, kept in check by various defense mechanisms. But every so often external circumstances arise to combine with the pathogens “in subtle and often unlikely ways to thwart the system’s defenses and to bring about its catastrophic breakdown.” (p. 197).

Chapter Note: The term “resident pathogens” comes from the book Human error, by James Reason (1990). A symposium on these issues for “High Risk Industries” was presented in 1990 at the Royal Society, London (I presented a very early version of this essay: See Broadbent, Baddeley & Reason, Human factors in hazardous situations, 1990).

This is where those voluntary aviation safety reports come in. They give hints about those pathogens before they turn into full-fledged accidents. If we can prevent accidents by discovering the “resident pathogens” that give rise to them before they cause trouble, we are far ahead in our business of keeping the world a safe place. You would be surprised how difficult this is, however. Most industries would never allow voluntary reports of difficulties or errors of their employees. Why, it might look bad. As for fixing the pathogens that we bring to their attention? “What are you talking about?” I can hear them say, “We’ve been doing things this way for years. Never caused a problem.” “Damn,” I can imagine them mumbling as we leave, “those damn university scientists. That’s a million to one chance of a problem.” Right, I reply: a million to one isn’t good enough.

The aircraft in the incident we are about to discuss is a three-engined commercial plane with a crew of three: captain, first-officer (who was the person actually flying at the time the incident was discovered), and second officer (the flight engineer). The airplane has three fuel tanks: tank number 1 in the left wing, tank 2 in the aircraft body and tank 3 in the right wing. It is very important that fuel be used equally from tanks 1 and 3 so that the aircraft stays in balance. Here is the excerpt from the aviation safety report:

“Shortly after level off at 35,000 ft. the second officer brought to my attention that he was feeding fuel to all 3 engines from the number 2 tank, but was showing a drop in the number 3 tank. I sent the second officer to the cabin to check that side from the window. While he was gone, I noticed that the wheel was cocked to the right and told the first officer who was flying the plane to take the autopilot off and check. When the autopilot was disengaged, the aircraft showed a roll tendency confirming that we actually had an out of balance condition. The second officer returned and said we were losing a large amount of fuel with a swirl pattern of fuel running about mid-wing to the tip, as well as a vapor pattern covering the entire portion of the wing from mid-wing to the fuselage. At this point we were about 2000 lbs. out of balance.”

Chapter Note: “Shortly after level off at 35,000 ft.”: The voluntary reporting incident was “Data Report 64441, dated Feb, 1987.” (The records are anonymous, so except for the date and the fact that this was a “large” commercial aircraft, no other information is available. Properly, in my opinion, for the anonymity is essential to maintain the cooperation of the aviation community.)

In this example, the second officer provided the valuable feedback that something seemed wrong with the fuel balance, but not until they were quite far out of balance. The automatic pilot had quietly and efficiently compensated for the resulting weight imbalance, and had the second officer not noted the fuel discrepancy, the situation would not have been noted until much later, perhaps too late.

The problem was very serious, by the way. The plane became very difficult to control. The captain reported that “the aircraft was flying so badly at this time that I actually felt that we might lose it.” They had to dump fuel overboard from tank 1 to keep the plane balanced and the airplane made an emergency landing at the closest airport. As the captain said at the end of his long, detailed report, “We were very fortunate that all three fuel gauges were functioning and this unbalance condition was caught early. It is likely that extreme out-of-balance would result in loss of the aircraft. On extended over-water flight, you would most certainly end up with ? a possible ditching.”

Why didn’t the autopilot signal the crew that it was starting to compensate the balance more than was usual. Technically, this information was available to the crew, because the autopilot flies the airplane by physically moving the real instruments and controls (in this situation, by rotating the control wheel to maintain). In theory, you could see this. In practice, it’s not so easy. I know because I tried it.

We replicated the flight in the NASA 727 simulator. I was sitting in the observer’s “jump seat,” located behind the pilots. The second officer was the NASA scientist who had set up the experiment, so he deliberately did not notify the pilots when he saw the fuel leak develop. I watched the autopilot compensate by turning the wheel more and more to the right. Neither pilot noticed. This was particularly interesting because the second officer had clipped a flight chart to the wheel, so as the wheel tipped to the right, he had to turn his head so that he could read the chart. But the cockpit was its usual noisy, vibrating self. The flight was simulating turbulence (“light chop” — as described elsewhere in the incident report) so the control wheel kept turning small amounts to the left and right to compensate for the chop. The slow drift of the wheel to the right was visible to me, the observer, but I was looking for it and had nothing else to do. It was too subtle for the pilots who were busy doing the normal flight activities.

Chapter Note: “We replicated the flight in the NASA 727 simulator”: The simulated flights that I refer to were done at the Boeing 727 simulator facilities at NASA-Ames in studies conducted by Everett Palmer of NASA, who is also the person who is the project manager for our research grant.

Why didn’t the autopilot act like the second officer? Look, suppose that there wasn’t any autopilot. The first officer would be flying, controlling the wheel by hand. The first officer would note that something seemed wrong and would probably mention it to the captain. The first officer wouldn’t really know what was happening, so it would be a casual comment, perhaps something like this: “Hey, this is peculiar. We seems to be banking more and more to the left, and I have to keep turning the wheel to the right to compensate.” The captain would probably take a look around to see if there were some obvious source of trouble. Maybe it is nothing, just the changing winds. But maybe it is a weight unbalance, maybe a leaky fuel tank. The second officer would also have heard the comment and so would have certainly been alerted to scan all the instruments to make sure they were all normal. The weight problem would have been caught much earlier.

Communication makes a big difference: you shouldn’t have to wait for the full emergency to occur. This is just as true when there a machine is working with a person as when there are two people working together. When people perform actions, feedback is essential for the appropriate monitoring of those actions, to allow for the detection and correction of errors, and to keep alert. This is hardly a novel point. All automatic controlling equipment has lots of internal feedback to itself. And we all know how important feedback is when we talk to one another. But adequate feedback between machines and people is absent far more than it is present, whether the system be a computer operating system, an autopilot, or a telephone system. In fact, it is rather amazing how such an essential source of information could be skipped. Without appropriate feedback, people may not know if their requests have been received, if the actions are being performed properly, or if problems are occurring. Feedback is also essential for learning, both of tasks, and also of the way that the system responds to the wide variety of situations it will encounter. Without feedback, the technical jargon for what happens when people are not given proper feedback is that they are “out of the loop.”

If the automatic equipment were people, I would say that it was way past time that they learned some social skills. I would advise sending them to class to learn those skills, to learn crew resource management. Obviously it wouldn’t do any good to send the equipment, but we certainly ought to send the designers.

Cognitive Science in the Cockpit

The pilots of modern aircraft have lots of different things to do. They have to follow numerous regulations, consult with charts, and communicate with themselves, their company, and air traffic control. And then, during the major portion of the flight, they have to sit in the cockpit with very little to do, but keep alert for unexpected problems. If the trip is long, say across an ocean, there can be hours of boredom.

Where are the cognitive aids for these tasks? Even the comfort of the flight crew is ignored. Only recently have decent places to hold coffee cups emerged. In older planes the flight engineer has a small desk for writing and for holding manuals, but the pilots don’t. In modern planes there are still no places for the pilots to put their charts, their maps, or in some planes, their coffee cups. Where can the crew stretch their legs or do the equivalent of putting the feet up on the desk? And when it is mealtime, how does one eat without risking spilling food and liquids over the cockpit? The lighting and design of the panels seem like an afterthought, so much so that a standard item of equipment for a flight crew is a flashlight. If comfort is ignored, think how badly mental functioning must be treated.

The same is true all over, I might add. I have seen power plants where the operators have to stand on the instruments in order to change burned-out light bulbs, plants where there are no facilities for eating snacks and drinking coffee. It’s as if the designers assumed that no equipment would ever burn out, that the workers would go for an entire eight-hour shift without nourishment.

Intuition and anecdote determine the training methods and the development of instrumentation, procedures and regulations. There has been almost no systematic attempt to provide support for cognitive activities. The crew has to fend for themselves, and because they sometimes recognize their own limitations, many have informally developed routines and “rules of thumb” to help them cope. Consider that many flight crews carry as standard equipment such wonderful cognitive tools as a roll of tape and an empty coffee cup to be used as reminders on the instruments and controls. The empty coffee cup is actually quite effective when placed upside down over the throttle or flap handles to remind the pilots that some special condition applies to future use of these controls.

Why Is an Empty Coffee Cup Such a Powerful Cognitive Aid?

One of the most widely studied areas within cognitive science is that of human memory and attention. Although the final scientific theories have yet to be developed, we do know a lot. There is a considerable body of well-understood phenomena and several approximate theories that can be used to good effect in design. Among the simple lessons are that:

We can only keep a very limited number of things consciously in mind at any one time in what is called “Working Memory.” How few? Perhaps only five.
Conscious attention is limited, so much so that it is best to think of a person as being able to focus on only one task at a time. Disruptions of attention, especially those caused by interrupting activities, lead to problems: people forget what was in working memory prior to the interruption and the interruption interferes with performance of the task they were trying to focus on.
Internal information, “knowledge in the head,” is subject to the limits posed by memory and attention. External information, “knowledge in the world,” plays important roles in reminding people of the current state of things and of the tasks left to be done. Good design practice, therefore, will provide external aids to memory.

In general, people are not very accurate at tasks that require great precision and accuracy or precise memorization. People are very good at perceptual tasks, tasks that involve finding similarities (analogies) between one situation and another, and novel or unexpected tasks that involve creative problem solving. Unfortunately, more and more of the tasks in the cockpit force people to do just those tasks they are bad at and detract from the ability to do the things they are so good at.

How does the flight crew guard against problems? There are surprisingly few aids. Most of the aids in the cockpit are casual, informal, or invented by the crew in response to their own experiences with error. The most common (and effective) cognitive aids in the cockpit are:

Speed bugs. Speed bugs are plastic or metal tabs that can be moved over the airspeed indicator to mark critical settings. These are very valuable cognitive aids, for they transform the task performed by the pilot from memorization of critical air speeds to perceptual analysis. The pilots only have to glance at the airspeed and instead of doing a numerical comparison of the airspeed value with a figure in memory, they simply look to see whether the speed indicator is above or below the bug position. The speed bug is an excellent example of a cockpit aid.

The speed bug is an example of something that started out as an informal aid. Some pilots used to carry grease pencils or tape and make marks on the dials. Today, the speed bugs are built into the equipment and setting them is part of standard procedures. Unfortunately, instrument designers have now gotten so carried away by the device that what used to be a single, easy to use tool has now been transformed into as many as five or more bugs set all around the dial. As a result, what was once a memory aid has now become a memory burden. I foresee speed bug errors as pilots confuse one bug with another. And, again, because of the lack of system knowledge, newer computer-displayed airspeed indicators sometimes neglect to include speed bugs or other memory aids for critical airspeed settings, sending us back to the dark ages of memory overload.

Crew-provided devices. Pilots and crew recognize their own memory deficiencies, especially when subject to interruptions. As a result, they use makeshift reminders in the cockpit. In particular, they rely heavily on physical marks. You know, want to remember something, tie a knot around your finger. Want to remember to take your briefcase, prop it against the door so you stumble over it when you go out. Want to remember to turn off the Air Conditioning Units before lowering the flaps, place an empty coffee cup over the flap handle. Crude, but effective.

Pilots need numerous reminders as they fly. They have to remember the flight number, radio frequencies, speed and altitude clearings. They need to know and remember the weather conditions. They may have special procedures to follow, or Air Traffic Control may have given them some special information.

But why hasn’t this need been recognized? The need for mental, cognitive assistance should be recognized during the design of the cockpit. Why don’t we build in devices to help the crew? Instead, we force them to improvise, to tape notes here and there, or even to wedge pieces of paper at the desired locations, all to act as memory aids. The crew needs external information, knowledge in the world, to aid them in their tasks. Surely we can develop better aids than empty coffee cups?

Cognitive aids. The need for memory aids applies to a wide range of human activities, in all professions and most human activities. Just think about your own activities. How many notes do you write to yourself? Why have Post-It? notes become so popular? How many times have you forgotten something because you weren’t reminded at the critical time? In everyday life, these issues are seldom of great importance, but in many industrial settings, they can be critical. There is a great need for cognitive aids in many aspects of life, aids that are designed with knowledge and understanding of human psychology.

Perhaps the one place where these problems are officially recognized is in the checklist, but even this appears to be an anomaly, designed more for the convenience of the training staff than for the needs of the crew.

Checklists

Chapter Note: My analyses of the checklists were done jointly with my colleague, Edwin Hutchins. Some of the information and analyses come from the excellent review by Degani & Wiener (1990).

One form of cognitive aid is in widespread existence: the checklist: a list of tasks to be performed. Personally, the checklist is an admission of failure, an attempt to correct for people’s errors after the fact rather than to design systems that minimize error in the first place.

How are checklists used? In many different ways. In American commercial aviation, the “normal” checklists used in the cockpit are designed solely to serve as checks of actions already done. That is, the cockpit crew first sets up the cockpit for the flight, then uses the checklist to confirm that they did everything properly. In other situations (e.g., the “abnormal” checklists used in aviation when there are a problems) and in other industries, checklists are often used as reminders, guiding and suggesting actions to be performed. In these cases, they are read just before or as the action is done.

The very fact that checklists exist is admission that not all human behavior is perfect, that errors occur, and so, for safety and thoroughness, some items need to be especially “checked” to ensure that they are done.

Aviation checklists serve multiple functions:

As checks: to make sure that everything that was supposed to have been done was in fact done. This is actually the primary function of most aviation checklists. This is what the term “check list” really means.
As “triggers”: to remind what needs to be done. This is how the “abnormal checklists” are used in aviation: when problems arise, the crew pull out the appropriate checklist for the problem and then go through it, doing each action as it is triggered by the list.
For Crew Communication. In most airlines, the entire flight crew takes part in going through the checklist: one person reads the items, one by one, while the others do the specified operations or check the specified state. This procedure ensures that the entire crew knows the state of the aircraft, especially anything unusual. This role of the checklist is often overlooked, but it may be one of its most important functions.
To Satisfy the Legal Department. Some items on the checklist are really not needed for anything except to guard against lawsuits. After an accident, during the court case, the legal experts want to make sure that the cockpit voice recording will distinctly show the pilots doing certain actions, even if they are not critically needed for flight safety. However, the longer the checklist, the more chance that there will be error in using it. Safety demands as short a list as possible: legal concerns demand longer lists.

But why do we need checklists at all? Checklists are not only a sign of human fallibility, they are also a sign that the procedures or equipment design is inappropriate. There are ways to design things to minimize the chance for skipping critical actions. A properly designed system might not even require a checklist.

Alas, each new requirement to aid the crew seems to result in new tasks for them to do, new procedures to be followed. I can see it now: my goal to help the pilots through my analysis of the valuable memory aid provided by empty coffee cups is mis-interpreted so as to add yet one more item to the procedures and checklists pilots must do. I can imagine a new checklist item: “Coffee cup supply?” Proper response: “Filled.”

Blaming the Person — A Way to Avoid the Real Issues

One last point: the prevalence to blame incidents on human error. In the past few years, human error has become the dominant blame for industrial accident. Thus, in the period 1982-1986, the pilot was blamed in 75% of fatal accidents.

Chapter Note: From the United States National Transportation Safety Board?s Annual review of aircraft accident data. U.S. air carrier operations calendar year 1987. (NTSB, 1990b).

Human error. How horrible! What’s the matter with those pilots, anyway? Clearly they aren’t being trained right. Fire them. Or at least send them back for more training. Change the training. Add some more flight regulations. Change the law. Add some more items to the checklists. This is what I call the “blame and train” philosophy.

Whenever I see such a high percentage of accidents blamed on individuals, I get very suspicious. When I am told that more than half of the world’s accidents — home and industrial — are blamed on the people involved, I get very, very suspicious indeed. One way of thinking about the issue is this. If people are only rarely thought to be the culprit for some problem or accident, then maybe there is some reason to think that, in the exceptional case, the person did do something wrong. But if people often seems to be at fault, especially different people over long periods of time, then the first place to look for the explanation is in the situation itself.

Look, suppose it really is something in people that gives rise to accidents: shouldn’t any sensible designer learn about those things and design the system so as to be resistant to that behavior or better yet, to avoid those situations? Alas, most engineers and designers are not well educated about human psychology. The psychological, cultural, and social knowledge relevant to human behavior is not part of the normal design or engineering training and education. Moreover, many designers fall prey to the “one chance in a million” syndrome (remember Chapter 15).

Until designers take seriously the usability of their designs and realize that inappropriate design is responsible for many accidents and casualties, we will never minimize such incidents. Let me give an example.

In 1988, the Soviet Union’s Phobos 1 satellite was lost on its way to Mars. Why? According to Science magazine, “not long after the launch, a ground controller omitted a single letter in a series of digital commands sent to the spacecraft. And by malignant bad luck, that omission caused the code to be mistranslated in such a way as to trigger the test sequence” (the test sequence was stored in the computer memory of the satellite, but was to be used only during checkout of the spacecraft while on the ground). Phobos went into a tumble from which it never recovered.

Chapter Note: The report came from an editorial in Science magazine, (Waldrop, 1989), and formed the basis of an opinion piece by me in the Communications of the ACM, the primary journal for American computer scientists (ACM stands for Association for Computing Machinery). I argued that computer scientists need to pay a lot more attention to the design of their programs, else they too will cause Phobos-like accidents (Norman, 1990).

The American journal Science wrote its report as if the incompetence of the human controller had caused the problem. Science interviewed Roald Kremnev, director of the Soviet Union’s spacecraft manufacturing plant. Here is how Science reported the discussion: “what happened to the controller who made the error? Well, Kremnev told Science with a dour expression, he did not go to jail or to Siberia. In fact, it was he who eventually tracked down the error in the code. Nonetheless, said Kremnev, he was not able to participate in the later operation of Phobos.” Both the reporter’s question and the answer presuppose the notion of blame. Even though the operator tracked down the error, he was still punished (but at least not exiled). But what about the designers of the language and software or the methods they use? Not mentioned. The problem with this attitude is that it prevents us from learning from the incident, and allows the same problem to be repeated.

This is a typical reaction to the problem — blame the controller for the error and “malignant bad luck” for the result. Why bad luck — why not bad design? Wasn’t the problem the design of the command language that allowed such a simple deviant event to have such serious consequences?

The crazy thing is that normal engineering design practices would never allow such a state of affairs to exist with the non-human part of the equipment. All electrical signals are noisy. That is, if the signal is supposed to be 1.00 volts, well, sometimes it will be .94, sometimes 1.18. In a really bad environment, where there are lots of radio transmitters sending energy all about and large electrical cables and motors turning on and off, there might be occasional “spikes” that make the 1 signal jump to 5 volts or down to zero or even less, perhaps -5 volts, for a few thousandths of a second. This noise can really wreck the operation of a computer. Fortunately, there are numerous ways to protect against these problems.

The spacecraft designers — who work in exactly this kind of a noisy environment — work hard to minimize the effects of noise bursts. If they didn’t, a burst of noise could turn a signal like 1 volt, which usually would encode the digital code for a “one” into just its opposite meaning, the digital code for a “zero.” This could cause untold error in the operation of the computer. But, fortunately, there are many techniques to avoid problems. Designers are expected to use error-detecting and correcting codes.

Suppose, just suppose, that some sort of electrical noise had corrupted the signal sent from the ground to the Phobos satellite, causing it to be destroyed. Who would be at fault then? It certainly wouldn’t be the ground controllers. No, the official verdict would probably state that the system designers did not follow standard engineering practice. The next time around that design would be redone so as to protect against future occurrences. Well, the same lesson applies to all situations: there is no excuse for equipment and procedures that are so sensitive to human error, and certainly no excuse for those so badly designed that they lead humans to err. People err, just as equipment does. Worse, most equipment seems designed so as to lead a person to err: all those tiny little switches, neatly lined up in an array of identical looking switches and readouts. Computer codes that are long, meaningless strings of letters and digits. Is it any wonder that sometimes the wrong switch gets pushed, the wrong gauge gets read, or a critical character isn’t typed? Why would anyone design a computer control language so that a single typing error could lead to catastrophe? And why is a procedure meant to be used only the ground, still possible to activate once in space?

The fault is the design that completely failed to take into account the needs of the people who had to use it. Don’t punish the controllers, change the design philosophy.

As automation increasingly takes its place in industry it is often blamed for causing harm and increasing the chance of human error when failures occur. I propose that the problem is not the presence of automation, but rather its inappropriate design. The problem is that the operations under normal operating conditions are performed appropriately, but there is inadequate feedback and interaction with the humans who must control the overall conduct of the task. When the situations exceed the capabilities of the automatic equipment, then the inadequate feedback leads to difficulties for the human controllers.

The problem, I suggest, is that the automation is at an intermediate level of intelligence, powerful enough to take over control that used to be done by people, but not powerful enough to handle all unusual conditions. Moreover, its level of intelligence is insufficient to provide the continual, appropriate feedback that occurs naturally among human operators. This is the source of the current difficulties. To solve this problem, the automation should either be made less intelligent or more so, but the current level is quite inappropriate. Either the person should be in control all the time or in continual communication with the automatic tools. The intermediate state where the automatic equipment is in control, but with no external feedback, no communication, is probably the worst of all possibilities, especially when trouble strikes.

Complex systems involve a mixture of automatic and human control. Alas, there is too much tendency to let the automatic controls do whatever activities they are capable of performing, giving the leftovers to people. This is poor system design. It does not take into account the proper mix of activities, and it completely ignores the needs and talents of people. The price we pay for such disregard for the total system performance comes when things go wrong, when unexpected conditions arise or the machinery breaks down. The total reliability and safety of our systems could be improved if only we understood and treated people with the same respect and dignity that we give to electronic signals and to machines.

< previous page | next page >