Computer managers at Bank of America did not run the new system and the old system in parallel to correct faults. Then, the bank's executives switched to the new programs without any backup. The new system and the old one were sufficiently different so that mistakes remained uncorrected. It turned out that on any day only a small percentage of the security trades was improper, but nobody could tell for sure which transactions were good and which were bad. Meanwhile, trading continued at an accelerated pace, accumulating an enormous number of doubtful records. All in all, billions of dollars of security transactions were unaccounted for. The failure of the software came close to destroying the bank.
Computer system blunders have also caused havoc at the United Education & Software Company. United Education, originally a trade-school operator, began handling repayments for student loans in 1983. It grew rapidly, developing a portfolio of more than $1 billion in loans. Its data-processing service division received management contracts from banks and the State of California to provide accounting and computer billing services. The computer problems apparently stemmed from a switch to a new system in 1987, in which United Education's programmers introduced major software errors when they failed to test the system before proceeding with conversion.
Instead of reverting to the old system, managers at United Education spent eight months trying to fix programming mistakes while processing regular business transactions. As a result, delinquency notices went to students who owed nothing, while those who lagged in their payments did not find out about their delinquencies. The computer system rejected payments from overdue borrowers and posted payments intended to repay loan principal as interest. The system also logged fictional telephone calls and failed to account for actual phone calls. All told, United Education's computer failure may end up costing banks $650 million in unaccounted for loans.
Similar computer foul-ups occur often, but such events usually remain a company secret in order to maintain public confidence.
The experience with automatic teller machines (ATMs) is a useful lesson about the economics of computer systems. When Citibank pioneered the installation of ATMs in New York, for example, it was to gain a competitive advantage. This advantage disappeared when other banks offered the identical technology and equivalent services. Citibank's desire to lower its costs by means of ATMs also has not worked as expected. The bank has recently begun to charge fees for ATM users who do not have sufficiently high balances. This suggests that the volume of ATM transactions is insufficient to recover their substantial costs. When everyone offers the identical technology, it ceases to be an advantage for anyone. (Though, of course, not investing in the technology is a certain disadvantage.)
Michael Hammer, a professor at Massachusetts Institute of Technology, says that "Citibank so successfully terrorized its New York City competitors with its advanced ATMs, that, herding together for warmth, they formed the New York City Cash Exchange. This shared network involves the participation of virtually all New York banks, except for Citibank. Such twists and turns make a mockery of attempts to build a competitive advantage from a proprietary technology."
It seems that the more you try to extract increased profitability through strategic deployment of information technology, the harder it becomes to achieve. Professor Michael Scott Morton, an M.I.T. professor, says that there is no correlation between the amount spent on information technology and payback. He and other researchers at the M.I.T. Center for Information Systems Research have looked carefully at the evidence. No matter how hard they examined the data, they concluded that the intensely competitive marketplace is driving innovation and is constantly demanding new services because everyone is doing it.
The idea that the escalation in the use of computers stems from a marketing-driven mania is interesting. To test this proposition, I asked the editors of Inc. magazine to survey the CEOs of the 100 fastest-growing small public companies. Each CEO was asked: "In your opinion, has the use of computers played a significant role in explaining the success of your company since 1984?"
Out of 87 responses, 77 percent replied that computers indeed made a significant contribution to the growth of their businesses. The remainder said that the help they were getting from computerization was immaterial to their success. The CEOs mentioned "overhead cost reduction" most frequently as their principal benefit. "Unique competitive advantage" or "offering innovative new services" was only a rare occurrence. The Inc. survey confirmed that information technology makes sense only when it solves a company's specific problems, such as overhead cost control, production management, or support of customer services. Imitation, especially when adopting another firm's "strategic" uses, wastes money. Next time a business speaker begins with the alleged competitive advantages of American Airlines or Citicorp, disregard most of what you hear. Your firm is not like American Airlines or Citicorp, or like your competitor. You must fit computers into your particular environment so that they will deliver profits in ways that reflect your own circumstances.
At General Foods Inc., I first experienced what I call "procedural instability" when quickly launched marketing promotions of products became an instant cure for budget shortfalls. Unfortunately, the normal inventory system could not keep up with the sudden spurts of product demand. We revised the computerized inventory system so that planners could manually override automatic warehouse replenishment rules with as little as a day's notice about a new marketing promotion.
At Xerox Corporation, the "procedural instability" manifested itself as frequent tampering with the sales staff's compensation formulas. As price competition became more intense, the sales commission application grew to a few hundred thousand lines of COBOL code. This system became one of the early examples of "structured programming." The systems organization adapted this innovative software technology to accommodate compensation schemes of labyrinthine complexity.
There is no reason to berate computer systems for poor adaptability unless you are looking for an excuse to make irresolute management appear blameless. Computer systems can change overnight if management wants it and is willing to pay for it.
This case shows that major mishaps can precipitate from a combination of errors, none of which can cause much damage if it occurs in isolation. Forgetting to erase a test program is not a calamity. When combined with an operator's mistake in running the wrong application, however, the result for the Fed was a $38 billion error.
Can careful training, backup computers, and redundant commands prevent the occurrence of combined errors? Is it possible to insure perfectly faultless results? The experience of a Russian spacecraft casts doubts on this.
A computer on board Russian spacecrafts must calculate the position of the horizon in order to fire reentry rockets. During day landings the cosmonauts just look out the window to check their settings. For a 1988 evening landing of a Russian manned craft, the crew had to rely on computer computations. When the commander of the capsule fired the reentry rocket the computer halted the firing because its program rejected inconsistent readings from a dimly lit horizon.
Seven minutes later, the computer accepted valid readings from the horizon sensors. Unfortunately, the cosmonauts forgot to clear the previous "fire" order and the computer fired the rockets improperly.
After two additional orbits, the landing sequence used a backup computer because the prime computer started executing a program that had remained from a spacecraft docking maneuver three months earlier. The engine burned for six seconds and then was shut down manually when the cosmonauts realized they were flying away from the Earth. The commander re-ignited the engine to reverse the thrust but the backup computer turned it off again because its program called for docking instead of landing. Finally, the reentry began manually, under visual control. The cosmonauts landed with only a few minutes of oxygen to spare.
As this story reveals, "computer errors" are not always the result of willful neglect. They can be a result of poor design so that even superbly trained operators do not understand what computer messages mean. You should never automate systems that have potentially catastrophic consequences without facilities to provide for easy personal detection and manual overriding of errors, so that common sense can prevail.
In a recent experiment, 117 college graduates had the task of scheduling production for a fictitious company to meet sales forecasts while minimizing inventory costs. One third of the subjects received costs too high by a factor of 10, another third received costs too high by a factor of 100, and the remaining one third received costs too high by a factor of 1,000. Only 11 out of the 117 detected the errors in the computer printouts and questioned the results. The group with the 1,000-times errors received special warnings about possible computer mistakes. Yet only three individuals questioned the computer output.
Executives should not assume that operating people will catch major computer errors and act as a safeguard against disastrous misuses of information. Defensive shields must be present in all computer applications, including the signaling of every answer that falls out of an expected range.
Full automation of risky decisions is always perilous. Controls must be subject to exhaustive safeguards that cover all damaging acts. There are legal precedents on the basis of which the managers of information systems are liable when a defect harms a person or property.
If your company produces a potentially dangerous computer program, hire outside experts to test the system in order to locate every hazardous risk. This precaution also must apply to all modifications of software or hardware, because that is where most errors slip through quality control. In addition, you should devise a training program for your operators to sensitize them to the possibility of danger and the need for personal intervention when the system malfunctions. Disqualify operators who follow computer instructions mindlessly.
The problem occurred when the company decided to modify Sabre's software to improve the automatic allocation of discount fares. The purpose of the allocation program was to juggle supply and demand on routes to produce a ticket mix that allowed the company to maximize revenues. An undetected flaw in the software enhancement caused errors in the discount fare tables. Travel agents querying the system ended up recommending that passengers seek discounted fares on other airlines.
American Airlines discovered this mistake only after reviewing operating statistics that showed below budget revenues. American Airlines' CEO told stock analysts, "We gave away $50 million of revenue. If we had done more thorough testing we would have discovered the problem before the new software was ever brought on-line."
American Airlines uncovered the loss because of a good financial audit. All systems changes should include an independent check for validating that software that passes a technical acceptance test also meets its performance goals. A system test that tests itself is insufficient. It only proves consistency, which may be consistently wrong.
TPF had a reputation as an exceedingly fast and expensive system built to handle thousands of transactions per second. TPF also required centralizing computer resources on a scale that was unprecedented in the banking industry. Such a concentration did not fit well with the highly decentralized traditions of Bank of America.
Hopper proceeded to install TPF, taking money from existing systems that already were performing poorly with the hope that the replacement systems would finally deliver much-needed improvements. The problem was how to justify the massive expenditures for TPF. Bank of America's current transaction volumes did not need that sort of computing power. To gain the expected benefits, the bank would have to reach the ambitious marketing objectives established by business planners. If the projected volumes missed, TPF was the wrong solution. If TPF was not available, the new markets could not materialize using the existing systems.
Plans for a large number of new products, such as banking by phone, banking by computers, and terminals in all client stores, never materialized. Hopper resigned and the CEO of the bank retired. Afterward, the systems investments were redirected to patching up existing malfunctions. The risk of TPF creating organizational, procedural, and technical failure were too great to add to the already precarious financial position of the bank.
The Bank of America case offers an excellent study of information systems strategies that got ahead of the capacity of an organization to commit and execute a business plan. Information systems strategies cannot serve as the vanguard in attempts to reform an otherwise reluctant organization.
A programming oversight that managed to elude three stages of testing triggered the crash. One of the network computers experienced a minor failure. It sent out the usual trouble messages to other computers across the country to divert calls. The faulty computer recovered quickly but without informing other computers that it was back in operation. It then sent out a burst of backlogged calls. That burst overwhelmed the next switching computer, which shut down. The second computer again sent out trouble messages to others, thus magnifying the number of trouble messages circulating in the network. Soon, more than 100 message-routing computers could not function. They kept interrupting each other with distress calls programmed to cope with local failures. The network became filled with priority trouble signals, and only a limited capacity was available to handle customer calls. AT&T management finally cured the problem by sending out a program modification to all of the computers.
The chance of undetectable chain-reaction failures keeps growing as computer networks gear up to faster response times and greater interdependency. The possibility of a spontaneous collapse also grows with decentralized processing in which computers interact with each other in ways that are unpredictable and uncontrollable. As computer networks grow in complexity, organizations will have to incur the costs of dealing with interactions that no testing can anticipate. Electronic interconnections make it possible to propagate a local failure into network-wide chaos before any human can intervene. The financial losses from such a failure can amount to billions of dollars.
All complex systems will fail regardless of what safeguards you use. To prevent the loss of the system's benefits your precautions should reflect to some degree the magnitude of your financial exposures. Millions of dollars can evaporate in a few moments of failure. Therefore, you ought to design all systems not only for when they work, but also for when they fail.
Failed turnkey contracts in the private sector remain hidden from public view, since everyone is interested in forgetting the fiasco as quickly as possible. A U.S. Government agency, however, cannot always hide its aborted systems contracts and therefore offers a rare opportunity to understand how to manage turnkey contracts.
As a result of delays in the U.S. Patent Office's information systems overhaul, it was decided that a turnkey vendor who could deliver the system rapidly would be used. The new system would place all of the Patent Office documents "on-line" for search and retrieval. Because of the urgency and uniqueness of the task, the Patent Office obtained a waiver so that it did not have to follow established Federal vendor selection, vendor evaluation, and competitive bidding procedures. The Patent Office proceeded to select one vendor who seemed to be the most responsive to its need for rapid product delivery.
Shortly after the contract was awarded, it was discovered that improvements in searching for and retrieving documents did not create budget savings. As a result, the Patent Office decided to change the scope of the project. The emphasis shifted from clerical efficiency to improving the quality of the patent search by the patent examiners. The contractor then spent 18 months negotiating changes in contract terms and systems specifications. Meanwhile, money continued to flow for software for which there was no agreement and for which the hardware was not as yet available because of a lack of supplemental funding. Four years and $448 million later, it is still not clear what benefits the Patent Office will be getting from its new computer system.
As the Patent Office case demonstrates, the biggest cause of systems failure is an inadequate definition of expected gains and a too hasty rush to buy technology for speedy startup. A management disagreement about objectives or a redirection of goals midway through the effort will destroy all of the benefits one can get from turnkey contracting. Prior to any contract award, the desired end results must be unambiguously spelled out. Upper management must also limit the unavoidable minor modifications in the original systems specifications from accumulating into a different design. When a company hands over systems execution to a project manager, it courts disaster if everyone tries to accommodate changes without thinking about their cumulative consequences.
The MRP would turn over the detailed control of plant schedules to a central computer. The computer would specify work assignments for each shift and for each machine based on a master schedule. Such discipline required that the computer's data base contain comprehensive information about the amount of material and labor required to produce any of thousands of machine parts. The computer would also need perfect data about labor efficiency, labor skills, on-hand materials, delivery schedules of purchased components, inventories, and the status of all work-in progress. The data base would include standard cost information and wage categories. It would then generate comparisons of actual against planned costs, by operator, by machine, by shift, by department, by product line, and by contract. Saltarelli saw the installation of a MRP system as a crusade for restructuring the business so that he and his staff could understand it in financial terms.
After a rapid conversion to the MRP, the craftsmen who produced the machine tools developed a deep-seated antagonism to its disrupting effects. Everyone became flooded with paperwork to keep the MRP system informed of what was going on. Otherwise, the MRP would generate materials tickets and production orders that made no sense. If the system did not get data about what actually happened, it would print out reports showing meaningless deviations from planned performance. Every machine breakdown, each delay due to a scrapped part, a machinist's sickness, late material receipts, and reworking of parts for salvage required entry using complex identification codes. Small human errors became immediately magnified as the all encompassing MRP detected inconsistencies or gaps.
In theory, Saltarelli's MRP concept was sound. The problem lay in its implementation. In a machine-tool manufacturing plant, you produce only a few machines of great complexity. This requires parts produced to extremely tight tolerances. Engineering modifications are frequent. Productivity in such a plant is the result of the flexibility of the individual craftsmen, their sense of workmanship, and their adherence to uncompromising quality. There are just toe many variables, such as unpredictable lead times, to presume that there will be any resemblance between the master and the actual schedules.
After the installation of the MRP systems, plant costs became astronomical. Production output slowed to a fraction of its previous capacity. Machine shop foremen literally begged management to release them from the MRP. Management never gave up on pushing compliance with the MRP. It simplified its product line to fit better into the MRP just when foreign competition started encroaching on their customers with more complex and less expensive products. Management also initiated payroll cutbacks in manufacturing while increasing nonproductive staffs to cope with increased paperwork.
What were the results? The company closed down the machine-tool division after a few years of accelerating losses. A new owner ultimately sold off what remained of the parent conglomerate, Houdaille.
Inexperienced and enthusiastic management, in search of easy solutions, will make conditions worse if they latch on to inappropriate computerization. Applications that deliver superior results, under favorable conditions, can destroy a company that suffers from chronic mismanagement. Conditions for extracting success from computers originate from management, not from technology.
The most practical approach to institutionalizing an uncompromising search for improved systems quality is through ongoing and post-implementation assessments. Although auditing has a role in this process insofar as you may need the independent verification of facts, the purpose of assessments is organizational learning and not the conduct of audits. Guard against witch hunts. You will destroy the educational value of any assessment if you use it for finding sacrificial victims. You do not get a post-implementation assessment when the project manager hands over test results to a customer and walks away with a memorandum certifying a completed technical job. Post implementation assessment takes place a few months, and sometimes years, after systems installation when the actual economic benefits finally show up.
In the post implementation review, every participant should have a say in how the job should be improved next time. People responsible for delivering the expected project benefits should have a particularly prominent role in speaking up during such discussions. However, there is no gain from spending too much time going over past events. A constructive post-implementation review will focus on immediate corrective actions and not dwell on past circumstances or future eventualities.
The capacity of individuals and organizations to learn is the key to successful use of information technologies. Executives must insure that learning from mistakes--the best source of all educational experience--remains uninhibited.
You can alleviate the pain of learning by studying what others have done that you wish not to repeat. Instead of attending vendors' and other conferences about the merits of computers, reserve some time to study the rare report about somebody else's computerized misfortunes. A busy executive can never learn enough to become a computer expert, but he or she can certainly acquire sufficient expertise to know what not to do.