Runaway Computer Projects

by Paul A. Strassmann

Across The Board
April, 1991, p.28

Twenty Deadly Sins of Project Management

To help guard against runaway computer projects, I have compiled a list of 20 potentially deadly sins in systems management, every one of which I have committed at some point during my career. When I look at a business proposal, I count the incidence of deadly sins. If the number is greater than three, I consider that equivalent to a hurricane warning. All managers could benefit from using a similar approach because I am sure that each can compile a long catalogue of their own transgressions to supplement my list, which appears below.

1. Never introduce a critical new application using totally new technology.

2. Never install for immediate operational use early shipments of new hardware or software products.

3. Never commit both to budget and to schedule for an application you have never delivered on budget or on schedule before.

4. Never assume you have the capacity to restore an operation from backup files without first trying it out during an unscheduled test.

5. Never base your implementation schedules on vendors' promised delivery dates for software.

6. Never hire as a project manager for a critical project someone who lives in a trailer hitched to a pick-up truck.

7. Never convert an old application to a new one without being able to retrace your steps in case of failure.

8. Never use an operational data base for testing program changes.

9. Never give programmers direct access to the computer console that controls a large computer network when there is a program failure.

10. Never program a computer application in a programming language that is known only to 1 percent of your staff.

11. Never automate fully anything that does not have a manual override capability.

12. Never allow your staff to modify the vendor's operating system.

13. Never design anything that cannot work under degraded conditions in emergency.

14. Never rely on 100 percent availability of a single communications link or a single data base.

15. Never operate a computer system or network that has not failed during an independent test.

16. Never take over and consolidate data centers that operate with obsolete technology, dissatisfied customers, and excessive manual intervention.

17. Never assume responsibility for running an undocumented system.

18. Never take over responsibility for running an application involving money unless previous management is audited by an independent party.

19. Never hire consultants to deliver an application on a time-and-materials basis.

20. Never start up a system without a prior acceptance test from the paying user.

Runaway projects are the bane of computer budgets. Cash, long hours, personal careers, and potential returns of business value disappear without hope of recovery. Runaways arise from a combination of mistakes. Even though each error may contribute only a small percentage to the incapacity, the cumulative effect can overwhelm even the best-laid plans. Fortunately, there are steps that a company can take to help avoid such disasters. Here is a checklist to help you rein in your computer projects:

Never install software without adequate testing

The near bankruptcy of Bank of America, once the largest and most powerful of U.S. banks, is an infamous case of what can happen when a company switches computer systems. In 1983, the bank installed a new system to handle the rapidly expanding volume of security trading. The application involved conversion from an existing computer system, but also required switching to a new accounting method. In the banking industry, such conversions frequently occur because the technical limitations of computer hardware and software generally do not allow for expanding operations without a complete overhaul.

Computer managers at Bank of America did not run the new system and the old system in parallel to correct faults. Then, the bank's executives switched to the new programs without any backup. The new system and the old one were sufficiently different so that mistakes remained uncorrected. It turned out that on any day only a small percentage of the security trades was improper, but nobody could tell for sure which transactions were good and which were bad. Meanwhile, trading continued at an accelerated pace, accumulating an enormous number of doubtful records. All in all, billions of dollars of security transactions were unaccounted for. The failure of the software came close to destroying the bank.

Computer system blunders have also caused havoc at the United Education & Software Company. United Education, originally a trade-school operator, began handling repayments for student loans in 1983. It grew rapidly, developing a portfolio of more than $1 billion in loans. Its data-processing service division received management contracts from banks and the State of California to provide accounting and computer billing services. The computer problems apparently stemmed from a switch to a new system in 1987, in which United Education's programmers introduced major software errors when they failed to test the system before proceeding with conversion.

Instead of reverting to the old system, managers at United Education spent eight months trying to fix programming mistakes while processing regular business transactions. As a result, delinquency notices went to students who owed nothing, while those who lagged in their payments did not find out about their delinquencies. The computer system rejected payments from overdue borrowers and posted payments intended to repay loan principal as interest. The system also logged fictional telephone calls and failed to account for actual phone calls. All told, United Education's computer failure may end up costing banks $650 million in unaccounted for loans.

Similar computer foul-ups occur often, but such events usually remain a company secret in order to maintain public confidence.

Make sure the competition does not wipe out gains.

You may be one of the numerous executives who view information technology as a weapon in a competitive race. You may not wish to invest in it, but competitors will force you to do so just to keep up. Ted Freiser, president of the Diebold Group, summarized this viewpoint when he said: "If you wait until others make the technology work and only then do it, you are left without the benefits and with all the costs." But is it true that you must rush with innovation in order to gain all of its benefits?

The experience with automatic teller machines (ATMs) is a useful lesson about the economics of computer systems. When Citibank pioneered the installation of ATMs in New York, for example, it was to gain a competitive advantage. This advantage disappeared when other banks offered the identical technology and equivalent services. Citibank's desire to lower its costs by means of ATMs also has not worked as expected. The bank has recently begun to charge fees for ATM users who do not have sufficiently high balances. This suggests that the volume of ATM transactions is insufficient to recover their substantial costs. When everyone offers the identical technology, it ceases to be an advantage for anyone. (Though, of course, not investing in the technology is a certain disadvantage.)

Michael Hammer, a professor at Massachusetts Institute of Technology, says that "Citibank so successfully terrorized its New York City competitors with its advanced ATMs, that, herding together for warmth, they formed the New York City Cash Exchange. This shared network involves the participation of virtually all New York banks, except for Citibank. Such twists and turns make a mockery of attempts to build a competitive advantage from a proprietary technology."

It seems that the more you try to extract increased profitability through strategic deployment of information technology, the harder it becomes to achieve. Professor Michael Scott Morton, an M.I.T. professor, says that there is no correlation between the amount spent on information technology and payback. He and other researchers at the M.I.T. Center for Information Systems Research have looked carefully at the evidence. No matter how hard they examined the data, they concluded that the intensely competitive marketplace is driving innovation and is constantly demanding new services because everyone is doing it.

The idea that the escalation in the use of computers stems from a marketing-driven mania is interesting. To test this proposition, I asked the editors of Inc. magazine to survey the CEOs of the 100 fastest-growing small public companies. Each CEO was asked: "In your opinion, has the use of computers played a significant role in explaining the success of your company since 1984?"

Out of 87 responses, 77 percent replied that computers indeed made a significant contribution to the growth of their businesses. The remainder said that the help they were getting from computerization was immaterial to their success. The CEOs mentioned "overhead cost reduction" most frequently as their principal benefit. "Unique competitive advantage" or "offering innovative new services" was only a rare occurrence. The Inc. survey confirmed that information technology makes sense only when it solves a company's specific problems, such as overhead cost control, production management, or support of customer services. Imitation, especially when adopting another firm's "strategic" uses, wastes money. Next time a business speaker begins with the alleged competitive advantages of American Airlines or Citicorp, disregard most of what you hear. Your firm is not like American Airlines or Citicorp, or like your competitor. You must fit computers into your particular environment so that they will deliver profits in ways that reflect your own circumstances.

Anticipate changes.

Every company has at least one system, usually in marketing, that suffers from the vagaries of customers who keep changing their minds about what they want. A competent systems organization will respond by designing systems that can quickly change, even at the cost of increased operating expenses.

At General Foods Inc., I first experienced what I call "procedural instability" when quickly launched marketing promotions of products became an instant cure for budget shortfalls. Unfortunately, the normal inventory system could not keep up with the sudden spurts of product demand. We revised the computerized inventory system so that planners could manually override automatic warehouse replenishment rules with as little as a day's notice about a new marketing promotion.

At Xerox Corporation, the "procedural instability" manifested itself as frequent tampering with the sales staff's compensation formulas. As price competition became more intense, the sales commission application grew to a few hundred thousand lines of COBOL code. This system became one of the early examples of "structured programming." The systems organization adapted this innovative software technology to accommodate compensation schemes of labyrinthine complexity.

There is no reason to berate computer systems for poor adaptability unless you are looking for an excuse to make irresolute management appear blameless. Computer systems can change overnight if management wants it and is willing to pay for it.

Expect operator errors.

The U.S. Federal Reserve Bank of San Francisco routinely transmits funds electronically to other banks. On January 21, 1986, the Fed operators erroneously transferred $2 billion to each of 19 banks. The reason: Employees failed to erase a dummy test program with large numbers to test new software.

This case shows that major mishaps can precipitate from a combination of errors, none of which can cause much damage if it occurs in isolation. Forgetting to erase a test program is not a calamity. When combined with an operator's mistake in running the wrong application, however, the result for the Fed was a $38 billion error.

Can careful training, backup computers, and redundant commands prevent the occurrence of combined errors? Is it possible to insure perfectly faultless results? The experience of a Russian spacecraft casts doubts on this.

A computer on board Russian spacecrafts must calculate the position of the horizon in order to fire reentry rockets. During day landings the cosmonauts just look out the window to check their settings. For a 1988 evening landing of a Russian manned craft, the crew had to rely on computer computations. When the commander of the capsule fired the reentry rocket the computer halted the firing because its program rejected inconsistent readings from a dimly lit horizon.

Seven minutes later, the computer accepted valid readings from the horizon sensors. Unfortunately, the cosmonauts forgot to clear the previous "fire" order and the computer fired the rockets improperly.

After two additional orbits, the landing sequence used a backup computer because the prime computer started executing a program that had remained from a spacecraft docking maneuver three months earlier. The engine burned for six seconds and then was shut down manually when the cosmonauts realized they were flying away from the Earth. The commander re-ignited the engine to reverse the thrust but the backup computer turned it off again because its program called for docking instead of landing. Finally, the reentry began manually, under visual control. The cosmonauts landed with only a few minutes of oxygen to spare.

As this story reveals, "computer errors" are not always the result of willful neglect. They can be a result of poor design so that even superbly trained operators do not understand what computer messages mean. You should never automate systems that have potentially catastrophic consequences without facilities to provide for easy personal detection and manual overriding of errors, so that common sense can prevail.

Expect uncritical acceptance of computer results.

Despite numerous stories about computer errors, there still persists a tendency to accept computer printouts uncritically. Consistent large mistakes may go undetected for a long time because people are afraid of questioning computer-generated results.

In a recent experiment, 117 college graduates had the task of scheduling production for a fictitious company to meet sales forecasts while minimizing inventory costs. One third of the subjects received costs too high by a factor of 10, another third received costs too high by a factor of 100, and the remaining one third received costs too high by a factor of 1,000. Only 11 out of the 117 detected the errors in the computer printouts and questioned the results. The group with the 1,000-times errors received special warnings about possible computer mistakes. Yet only three individuals questioned the computer output.

Executives should not assume that operating people will catch major computer errors and act as a safeguard against disastrous misuses of information. Defensive shields must be present in all computer applications, including the signaling of every answer that falls out of an expected range.

Protect against computer controlled damage.

A 63-year old patient undergoing radiation treatment at the Kennestone Oncology Center in Georgia mistakenly received two bursts of radiation that were 125 times greater than prescribed. The 25,000 rads from each of those pulses burned a hole in the patient's chest, destroyed a nerve controlling the left hand, and necessitated a mastectomy. The error came from software in the hospital's radiation device, controlled by a PDP-11 computer.

Full automation of risky decisions is always perilous. Controls must be subject to exhaustive safeguards that cover all damaging acts. There are legal precedents on the basis of which the managers of information systems are liable when a defect harms a person or property.

If your company produces a potentially dangerous computer program, hire outside experts to test the system in order to locate every hazardous risk. This precaution also must apply to all modifications of software or hardware, because that is where most errors slip through quality control. In addition, you should devise a training program for your operators to sensitize them to the possibility of danger and the need for personal intervention when the system malfunctions. Disqualify operators who follow computer instructions mindlessly.

Test the system independently.

An error in a program cost American Airlines $50 million in lost revenue in one quarter in 1989. The system involved American Airlines' Sabre reservation system, which may be the most sophisticated commercial application of computers.

The problem occurred when the company decided to modify Sabre's software to improve the automatic allocation of discount fares. The purpose of the allocation program was to juggle supply and demand on routes to produce a ticket mix that allowed the company to maximize revenues. An undetected flaw in the software enhancement caused errors in the discount fare tables. Travel agents querying the system ended up recommending that passengers seek discounted fares on other airlines.

American Airlines discovered this mistake only after reviewing operating statistics that showed below budget revenues. American Airlines' CEO told stock analysts, "We gave away $50 million of revenue. If we had done more thorough testing we would have discovered the problem before the new software was ever brought on-line."

American Airlines uncovered the loss because of a good financial audit. All systems changes should include an independent check for validating that software that passes a technical acceptance test also meets its performance goals. A system test that tests itself is insufficient. It only proves consistency, which may be consistently wrong.

Watch out for over ambitious systems.

Max Hopper, one of the architects of American Airlines' Sabre system, found a mess when he became the chief information officer of the Bank of America. Systems were technically obsolete, poorly organized, and incompatible. Auditors accumulated a long list of deficiencies, some of which threatened the integrity of the Bank of America business. To solve the problem in one giant sweep, Hopper proposed installing the transaction processing facility (TPF) as the master system for all Bank of America processing. TPF was an improved version of a system running the American Airlines' reservation system. The proposed budget of $4 billion, the largest nonmilitary computer project ever reported, was discussed in magazine articles as an imaginative move to restore the bank's "competitive edge."

TPF had a reputation as an exceedingly fast and expensive system built to handle thousands of transactions per second. TPF also required centralizing computer resources on a scale that was unprecedented in the banking industry. Such a concentration did not fit well with the highly decentralized traditions of Bank of America.

Hopper proceeded to install TPF, taking money from existing systems that already were performing poorly with the hope that the replacement systems would finally deliver much-needed improvements. The problem was how to justify the massive expenditures for TPF. Bank of America's current transaction volumes did not need that sort of computing power. To gain the expected benefits, the bank would have to reach the ambitious marketing objectives established by business planners. If the projected volumes missed, TPF was the wrong solution. If TPF was not available, the new markets could not materialize using the existing systems.

Plans for a large number of new products, such as banking by phone, banking by computers, and terminals in all client stores, never materialized. Hopper resigned and the CEO of the bank retired. Afterward, the systems investments were redirected to patching up existing malfunctions. The risk of TPF creating organizational, procedural, and technical failure were too great to add to the already precarious financial position of the bank.

The Bank of America case offers an excellent study of information systems strategies that got ahead of the capacity of an organization to commit and execute a business plan. Information systems strategies cannot serve as the vanguard in attempts to reform an otherwise reluctant organization.

Manage your risks.

For nine hours on January 15, 1990, the world's largest and technically most expert computer network--AT&T's interstate and international network--could not fully function. The rapidly cascading failure originated in a routine malfunction, on proven hardware, controlled by presumably well-tested software. Normally, such a common computer failure repairs itself instantly by rerouting messages, but in this case self-inflicted interactions within the complex network threw the system into confusion.

A programming oversight that managed to elude three stages of testing triggered the crash. One of the network computers experienced a minor failure. It sent out the usual trouble messages to other computers across the country to divert calls. The faulty computer recovered quickly but without informing other computers that it was back in operation. It then sent out a burst of backlogged calls. That burst overwhelmed the next switching computer, which shut down. The second computer again sent out trouble messages to others, thus magnifying the number of trouble messages circulating in the network. Soon, more than 100 message-routing computers could not function. They kept interrupting each other with distress calls programmed to cope with local failures. The network became filled with priority trouble signals, and only a limited capacity was available to handle customer calls. AT&T management finally cured the problem by sending out a program modification to all of the computers.

The chance of undetectable chain-reaction failures keeps growing as computer networks gear up to faster response times and greater interdependency. The possibility of a spontaneous collapse also grows with decentralized processing in which computers interact with each other in ways that are unpredictable and uncontrollable. As computer networks grow in complexity, organizations will have to incur the costs of dealing with interactions that no testing can anticipate. Electronic interconnections make it possible to propagate a local failure into network-wide chaos before any human can intervene. The financial losses from such a failure can amount to billions of dollars.

All complex systems will fail regardless of what safeguards you use. To prevent the loss of the system's benefits your precautions should reflect to some degree the magnitude of your financial exposures. Millions of dollars can evaporate in a few moments of failure. Therefore, you ought to design all systems not only for when they work, but also for when they fail.

Contracting out isn't a substitute for management.

It is now possible to contract for design and delivery of a complete system. This offers a turnkey solution and isolates the costs and results from other activities. A turnkey solution is often the only way to install a major new system without hiring a massive number of new personnel. It offers the advantages of separating objective-setting, performance measurement, and the delivery of a system. If managed well it forces the clarification of the scope of work; after all, you cannot make a binding contract for a system that does not have adequate specifications. If managed poorly it will create legal disputes that will force an indefinite postponement of the new computer system. For a worthwhile system the damage from lost benefits is always more severe than that from cost overruns.

Failed turnkey contracts in the private sector remain hidden from public view, since everyone is interested in forgetting the fiasco as quickly as possible. A U.S. Government agency, however, cannot always hide its aborted systems contracts and therefore offers a rare opportunity to understand how to manage turnkey contracts.

As a result of delays in the U.S. Patent Office's information systems overhaul, it was decided that a turnkey vendor who could deliver the system rapidly would be used. The new system would place all of the Patent Office documents "on-line" for search and retrieval. Because of the urgency and uniqueness of the task, the Patent Office obtained a waiver so that it did not have to follow established Federal vendor selection, vendor evaluation, and competitive bidding procedures. The Patent Office proceeded to select one vendor who seemed to be the most responsive to its need for rapid product delivery.

Shortly after the contract was awarded, it was discovered that improvements in searching for and retrieving documents did not create budget savings. As a result, the Patent Office decided to change the scope of the project. The emphasis shifted from clerical efficiency to improving the quality of the patent search by the patent examiners. The contractor then spent 18 months negotiating changes in contract terms and systems specifications. Meanwhile, money continued to flow for software for which there was no agreement and for which the hardware was not as yet available because of a lack of supplemental funding. Four years and $448 million later, it is still not clear what benefits the Patent Office will be getting from its new computer system.

As the Patent Office case demonstrates, the biggest cause of systems failure is an inadequate definition of expected gains and a too hasty rush to buy technology for speedy startup. A management disagreement about objectives or a redirection of goals midway through the effort will destroy all of the benefits one can get from turnkey contracting. Prior to any contract award, the desired end results must be unambiguously spelled out. Upper management must also limit the unavoidable minor modifications in the original systems specifications from accumulating into a different design. When a company hands over systems execution to a project manager, it courts disaster if everyone tries to accommodate changes without thinking about their cumulative consequences.

Beware of implementing inappropriate computerization.

In 1977, Gerald Saltarelli, chairman and CEO of the multibillion dollar Houdaille Industries conglomerate, became taken with books describing a new concept of manufacturing management called materials requirements planning (MRP). Saltarelli was a believer in tight financial controls and a practitioner of the numbers-only approach to dealing with all problems. He believed that imposing the MRP discipline on his operations would become the unifying managerial scheme for the control of Houdaille. It would replace the incompatible systems installed prior to his acquisition of the owner-managed enterprises. It would dictate order where informality and flexibility frequently got in the way of neatness or accounting discipline.

The MRP would turn over the detailed control of plant schedules to a central computer. The computer would specify work assignments for each shift and for each machine based on a master schedule. Such discipline required that the computer's data base contain comprehensive information about the amount of material and labor required to produce any of thousands of machine parts. The computer would also need perfect data about labor efficiency, labor skills, on-hand materials, delivery schedules of purchased components, inventories, and the status of all work-in progress. The data base would include standard cost information and wage categories. It would then generate comparisons of actual against planned costs, by operator, by machine, by shift, by department, by product line, and by contract. Saltarelli saw the installation of a MRP system as a crusade for restructuring the business so that he and his staff could understand it in financial terms.

After a rapid conversion to the MRP, the craftsmen who produced the machine tools developed a deep-seated antagonism to its disrupting effects. Everyone became flooded with paperwork to keep the MRP system informed of what was going on. Otherwise, the MRP would generate materials tickets and production orders that made no sense. If the system did not get data about what actually happened, it would print out reports showing meaningless deviations from planned performance. Every machine breakdown, each delay due to a scrapped part, a machinist's sickness, late material receipts, and reworking of parts for salvage required entry using complex identification codes. Small human errors became immediately magnified as the all encompassing MRP detected inconsistencies or gaps.

In theory, Saltarelli's MRP concept was sound. The problem lay in its implementation. In a machine-tool manufacturing plant, you produce only a few machines of great complexity. This requires parts produced to extremely tight tolerances. Engineering modifications are frequent. Productivity in such a plant is the result of the flexibility of the individual craftsmen, their sense of workmanship, and their adherence to uncompromising quality. There are just toe many variables, such as unpredictable lead times, to presume that there will be any resemblance between the master and the actual schedules.

After the installation of the MRP systems, plant costs became astronomical. Production output slowed to a fraction of its previous capacity. Machine shop foremen literally begged management to release them from the MRP. Management never gave up on pushing compliance with the MRP. It simplified its product line to fit better into the MRP just when foreign competition started encroaching on their customers with more complex and less expensive products. Management also initiated payroll cutbacks in manufacturing while increasing nonproductive staffs to cope with increased paperwork.

What were the results? The company closed down the machine-tool division after a few years of accelerating losses. A new owner ultimately sold off what remained of the parent conglomerate, Houdaille.

Inexperienced and enthusiastic management, in search of easy solutions, will make conditions worse if they latch on to inappropriate computerization. Applications that deliver superior results, under favorable conditions, can destroy a company that suffers from chronic mismanagement. Conditions for extracting success from computers originate from management, not from technology.

Institutionalize learning from your own mistakes.

Can you learn enough from others to avoid repeating their mistakes in your company? Probably not. Lasting learning experiences come from dealing successfully with immediate occurrences in your own environment. Your own projects will have enough faults so that you need not depend on other companies' stories to discover what needs fixing. To do better, you need to create an environment that will direct everybody's attention to solving immediate, specific problems that reflect your organization's weaknesses, not somebody else's.

The most practical approach to institutionalizing an uncompromising search for improved systems quality is through ongoing and post-implementation assessments. Although auditing has a role in this process insofar as you may need the independent verification of facts, the purpose of assessments is organizational learning and not the conduct of audits. Guard against witch hunts. You will destroy the educational value of any assessment if you use it for finding sacrificial victims. You do not get a post-implementation assessment when the project manager hands over test results to a customer and walks away with a memorandum certifying a completed technical job. Post implementation assessment takes place a few months, and sometimes years, after systems installation when the actual economic benefits finally show up.

In the post implementation review, every participant should have a say in how the job should be improved next time. People responsible for delivering the expected project benefits should have a particularly prominent role in speaking up during such discussions. However, there is no gain from spending too much time going over past events. A constructive post-implementation review will focus on immediate corrective actions and not dwell on past circumstances or future eventualities.

The capacity of individuals and organizations to learn is the key to successful use of information technologies. Executives must insure that learning from mistakes--the best source of all educational experience--remains uninhibited.

You can alleviate the pain of learning by studying what others have done that you wish not to repeat. Instead of attending vendors' and other conferences about the merits of computers, reserve some time to study the rare report about somebody else's computerized misfortunes. A busy executive can never learn enough to become a computer expert, but he or she can certainly acquire sufficient expertise to know what not to do.