The Mythical Man Month
The Mythical Man-Month was written by Frederick D. Brooks in 1975 and deals with various aspects of problems related to software engineering. The book is considered to be a masterpiece and features on various suggested reading lists including that of Joel Spolsky. After undergoing an elective course on "Managing Software Projects and Enterprises" during my 4th term of PGP curriculum at IIM Ahmedabad, I decided to read the book and hopefully summarize the essence of its text. The following is a chapter-wise summary of the book. The work was first published in 1975, and republished as an anniversary edition in 1995 with the essay "No Silver Bullet" and commentary by the author.
1. THE TAR PIT
An interesting concept of how debugged programs can be turned into useful commercial entities. A 'program', here, means a debugged piece of code which performs some specific task. A 'programming product' is a program that can be run, repaired and extended by anybody. It is useable in many operating environments and for many set of data. A 'programming system' can be thought to be analogous to a development platform. It comprises of debugged components and established interfaces which can be used to build larger programs.
1. Program | 2. Programming System |
3. Programming Product | 4. Programming Systems Product |
Brook's says that it is 3 times for difficult to transition across a horizontal or vertical boundary. Hence, moving from A to B or C requires 3 times the effort of building A. So, moving from A to D would require 9 times the effort!
The chapter emphasizes the fact the programming is a craft and the pleasure associated with it are due to several reasons like cognitive freedom, thrill of creation etc. The same reasons result in the woes of the craft as well - jumbled thoughts, no physical system to test complications, burden of taking the entire responsibility on our self for failed ideas, etc.
2. THE MYTHICAL MAN MONTH
Schedule estimates go awry because they are made on an optimistic premise that everything will go right. The strong belief in our own competency and the fact that a programming idea is purely internal, unaffected by physical limitations of equipment or material resources available, clouds the reality.
The Brook's law is postulated as - "Adding manpower to late software project makes it later". The author argues that using 'man-month' as a unit of measurement of effort is a dangerous practice. Reasons behind this are:
1. Increasing number of people increases communication overheads and training time.
2. Tasks are sequential and adding people doesn't make them parallel.
The thumb rule for scheduling a software task is given as:
1/3 planning
1/6 coding
1/4 component test and early systems test
1/4 complete system test
3. THE SURGICAL TEAM
The problem addressed in this chapter is that competencies among peers is not equal. Sometimes the order of difference can be as large as 10 times! Ideally, one would like 'a small, sharp team'. A new organization structure based on a surgical team paradigm is proposed:
1. The Surgeon - The chief programmer. He personally defines the functional and performance specifications and designs the program.
2. The Copilot - Alter ego of the surgeon, he knows all the code intimately. He researches alternative design strategies and points out flaws in the Surgeon's approach. However, the Surgeon is not bound to accept his recommendations.
3. The Administrator - He handles money, people, resources and interfaces with rest of the administration of the organization. The Surgeon is the boss, The Administrator is the manager.
4. The Editor - Though the Surgeon generated documentation for clarifying his own thought process, the Editor criticizes it and improves it by adding references and maintaining versions.
5. Two Secretaries - One each for the Administrator and the Editor to assist in paper work.
6. The Program Clerk - He maintains all the technical records, maintains libraries and keeps the code repository in shape.
7. The Toolsmith - Responsible for providing, maintaining and upgrading programmer tools like file editing, text editing, IDEs, makefiles, debuggers, build environment etc.
8. The Tester - He writes the test cases, carries out all phases of testing and acts as a friendly adversary to the Surgeon.
9. The Language Lawyer - If the system under development is a programming language or a platform, this guy will showcase to the world the neat tricks and tips of using the system. He is the spokesperson, tech-support and marketer of the system.
The surgical team works because there is no division of work and a superior-subordinate relationship is maintained. However, it can't work for large projects. Moreover, the issue of suppression of creativity of members other than the Surgeon comes into play.
4. Aristocracy, Democracy and System Design
A system design should reflect a conceptual integrity rather than a patch up of good but anomalous features. Also, there should be a fine balance between functionality and conceptual simplicity of a programming system. Excess of either can result in loss of purpose. Conceptual integrity demands that design should be a product of a single mind or a very small number of similar thinking minds. However, large projects need many people for implementation. This paradox can be resolved in two ways - by separating architecture from implementation and by following the proposed Surgical Team organization. The issue of hindered creativity of developers can be resolved by recognizing that even the implementation within the bounds of a defined architecture requires lot of creativity and decision making.
5. The Second System Effect
Some important tips are given for the architecture to not get overenthusiastic while designing the system.
1. Remember that the programmers have the creative freedom on their implementation. So don't start dictating on that front.
2. Always be prepared to suggest one way of implementing things. However, accept the developer's way whenever he differs.
3. Deal quietly and privately in such suggestions.
4. Be ready to forego any credit for improvements.
When an architect develops a system for the first time, he will be extra cautious and careful. However, some or the other problem will always creep in as the exercise proceeds. The third and later systems he designs will be in conformation to his earlier experiences and general patterns he has observed during the development cycle. However, the second system is the most dangerous as the architect will always have a tendency to over-design the second system. The architect should avoid functional ornamentation and stretching of functions far beyond their required purpose.
6. Passing The Word
Dissemination of information regarding design and implementation details is critical to the success of a project. Written specifications should capture all the intents of the user; even if this process has to through several iterations to achieve concurrency of minds. The style must be precise, full and accurately details. However, the specification document should not venture into the implementation aspects of the system.
Generally, when there is a confrontation between a manual and the actual implementation, the manual loses. However, this can't be afforded when a programming language or a platform is being built.
7. Why Did The Tower of Babel Fail?
Tower of Babel is described as the first human engineering project to fail. The first one, Noah's Ark, was a resounding success. The failure came because the groups involved developed dissonance because of different languages they spoke. A software project is doomed to meet the same fate if the different stakeholders are not on the same communication plane. Teams require to communicate among themselves in as many ways as possible - informally, through meetings and by maintaining workbooks.
The maintenance of workbook has been explained in detail. For large projects, workbooks tend to grow to prohibitive sizes. In such circumstances, maintaining a change summary on top of major revisions (with the help of already available well-advanced text editors) should be practiced.
As the organization size grows, the number of communication interfaces invariable explodes. With this respect, achieving division of labor and specialization of function achieves paramount importance. To illustrate the fact, a relevant excerpt for the book "The Man Who Sold The Man" has been presented.
8. Calling The Shot
The chapter describes the inherent fallacy in extrapolating short-run project productivity and phase-wise time distribution data (i.e. fraction of total time used for design, coding, testing etc.) to estimate the effort for large projects. It is shown that the effort increases exponentially rather than linearly. The exponent estimated from sample data is 1.5. Data from various studies are given to highlight this aspect.
9. Ten Pounds in a Five-Pound Stack
Program size control and memory footprint reduction are the main them of this chapter. The information presented is from the view of Systems Programming and emphasis has been given to prevention of the already limited system memory with large programs. No specific technique is dealt in detail. Mention has been given to algorithm complexity, space vs. time trade-offs, space vs. functionality trade-offs, overlay technique to reuse memory etc.
10. The Documentary Hypothesis
The hypothesis, verbatim copied from the book, is as follows:
"Amid a wash of paper, a small number of documents become the critical pivots around which every project's management revolves. There are the manager's chief personal tools."
The critical documents are - Objectives, Specifications, Schedule, Budget, Organization chart, Space Allocations and Cost Estimates. An illustration of the idea is made by describing the documents required for managing a University Department.
11. Plan The System for Change
The ideas presented in this chapter are quite radical. The journey from prototype to final customer deliverable product cannot be one giant leap. There is bound to be an intermediate state of system development when unforeseen problems accumulated while scaling from prototype to deliverable product make further development tedious and patchy. The system has to be redesigned from scratch keeping in mind the learning from such an exercise. Moreover, customer requests and expectations will also change as the system develops and they play with it. In such a scenario, it will be foolhardy to be headstrong about "doing it right the first time". A throw-away system should be part of the schedule planning itself and not come as an afterthought or compulsion.
Also, the organization structure should be such as to allow the architect and implementers the flexibility to have one throw-away system. The architect will tend to be defensive about his design and will not commit to put it in writing as later it is bound to backfire. The managers and technical team should be interchangeable as much as their talents allow to achieve such a flexibility.
The bug-fixing process is described as 'two steps forward, one step backward'. All repairs tend to destabilize the overall architecture of the system. This happens because bug fixing is generally done by junior programmers who don't have completely overview of the system and it is unknown that removal of a local defect causes what effects in some other part of the system. To quote from C. S. Lewis:
"That is the key to history. Terrific energy is expended - civilizations are built up - excellent institutions devised; but each time something goes wrong. Some fatal flaw always brings the selfish and cruel people to the top, and then it all slides back into misery and ruin. In fact, the machine conks. It seems to start up all right and runs a few yards, and then it breaks down."
12. Sharp Tools
Tools are quintessential to the system development process. However, there exist personal preferences of each member of team for a particular tool amongst various available options. Also, people tend to guard their tools as they are a reflection of hard-earned personal skills. This approach is destructive for a programming team. Hence, as discussed in the Surgical Team, the toolmaker is responsible for acquiring, distributing, maintaining and upgrading the tools. He also makes specific tools for the chief programmer as the need be. The various tools which a manager must plan for are:
1. Computer facility - Target machines (for which the system is being developed) and vehicle machines (on which the system is developed)
2. Operating system
3. Programming language
4. Debugging aids
Note: The technical details in this chapter are quite dated. C is not even mentioned as a system programming language. RAD tools are too far beyond the horizon.
13. The Whole and The Parts
Testing philosophy and details are the focus of this chapter. Careful function definition and testing the specification itself is the first safeguard against potential bugs. A top-down approach of specification development is proposed. One should sketch a rough task definition and a rough solution method that achieves the principal result. Then one examines the definition more closely to see how the result differs from what is wanted, and one takes the large steps of solution and them down into smaller steps. Each refinement in the definition of the task becomes a refinement in the algorithm for solution, and each may be accompanied by refinement in data representation.
The details of debugging techniques are dated. Structured programming and discouragement of GOTO usage is heralded as the way ahead. Debugging techniques like On-machine debugging, Memory dumps, Snapshots and Interactive debugging are discussed.
It is suggested that for system testing, one should use only debugged components. One should not use bolt-it-together-and-try approach as this mixes system bugs with component bugs. Whenever a new or improved component is added or replaced in the system, the entire system testing should be redone. Plenty of test support architecture should be built. Generating automated test code and test data may require as much as 50% of the effort to write actual code. But this infrastructure is necessary for speedy testing and delivering a robust product. Also, changes made during bug fixing should be well controlled. All changes should get documented and should be clearly identifiable till they get tested and do not become integral part of the system.
14. Hatching a Catastrophe
"How does a project get late? …. One day at a time." Thus begins the chapter on managing schedule slippage. It is argued that big catastrophes or failures in design receive immediate attention and hence get sorted as quickly as possible. However, small slippages like sick leaves, executive meeting, unplanned customer visit, system outage etc. go unnoticed and people take optimistic attitude that they can be covered.
It is suggested to have concrete, definable "milestones" to keep track of the project progress. Milestones should be unambiguous. Fuzzy milestones will result to overlooking of slippages by both the line managers as well as the project manager.
An interesting result is shown from two studies conducted on estimating behavior of government contractors on large scale projects:
1. Estimates of the length of an activity, made and revised carefully every two weeks before the activity starts, do not significantly change as the start time draws near, no matter how wrong they ultimately turn out to be.
2. During the activity, overestimates of duration come steadily down as the activity proceeds.
3. Underestimates do not change significantly during the activity until about three weeks before the scheduled completion.
There is an inherent conflict of interest between line-managers and project manager on dealing with slippages. A line-manager would not want to reveal the small problems to project manager and try to deal it himself. Project Manager, though, needs to know the true picture in order to plan for slippages. The situation is aggravated when the Project Managers convert a 'status review' meeting into a 'problem solution' exercise. The Project Manager should accept the status reports without panic or preemption and let the line-manager deal the situation.
PERT techniques go a long way in providing the exact picture of effects of schedule slippages in the components. Because PERT is an elaborate technique, at least a critical path analysis should be done. A 'Plans and Controls' team is suggested to manage the tasks of gathering information and carrying out the PERT analysis. The team should be diplomatic and should devise inventive ways of unobtrusive but effective control methods.
15. The Other Face
A program has two faces - one which is seen by the machine and other which is seen by human user. The other face should be intelligible to a user of the program. Note that that 'the user' here may be different from the programmer who wrote the program code. Thus, documenting a program is very essential. However, documentation is more preached than practiced. A company should not establish elaborate documentation policies because experience shows that earlier generation of programmers were not able to follow the practice as desired and neither will current generation of programmers.
There may be 3 types of program users - a casual user of program (load and execute), one who depends upon the program for functionality (use library or component) and one who must adapt the program (feature enhancement or future versions).
A simple 3-4 page prose documentation suffices for the needs of a casual user. The following information should be captured - purpose of the program, system requirements, valid domain of input and range of output, any standard algorithms used, input-output format, command line syntax and abort sequence, expected running time for a problem of specified size on a specified machine configuration and means to perform error checking of results.
A component user requires a suite of test cases to validate the library after it has been integrated in his own code. The test cases needed are:
o Functionality test for commonly encountered data
o Edge cases on the input data domain
o Barely illegitimate cases from the other side of input domain boundary to test for proper exception raising and handling.
For future code modifiers, the source code itself should serve as a documentation. Variable names, initialization labels, code block indentations, function names etc. should all help in telling the story in their own right. Paragraph form of comments should be inserted to explain the logic flow. One should not get into complication of drawing elaborate multi-page flowcharts. A flowchart has utility only if it can be compresses into a single page to show the top level program organization. In theory, flow chart should drive the code. But in practice, flow chart is generated after code is written to comply to company policies. Why create rules and harp on them which have traditionally never been followed?
16. No Silver Bullet - Essence and Accident in Software Engineering
This chapter was added into the 1987 edition of the book. No Silver Bullet - essence and accidents of software engineering is a well-known paper on software engineering written by Brooks in 1986. Brooks argues that there will be no more technologies or practices that will serve as 'silver bullets' (legends say tat only silver bullets can be used to kill werewolves) and create a twofold improvement in programmer productivity over two years. The entire text can be found here.
17. "No Silver Bullet" Refired
Several articles were written by Brook's contemporaries as a rebuttal as well as support of Brook's arguments in his No Silver Bullet paper. This chapter is meant as an answer and explanation to these responses. Again, the brief summary of the arguments can be found here.
18. Propositions of The Mythical Man-Month: true or false?
This chapter presents a point-wise summary of ideas presented in the 15 chapters of the initial edition of the book. It may be a useful section to preserve as a handy reference-guide.
19. The Mythical Man-Month after 20 Years
Brooks goes on to expound on relevance of his theories and ideas in context of 1995, 20 years after the first edition was published. While some of the core principles have attained increasingly strong position, some implementation ideas have either been proven wrong or become obsolete with microcomputer revolution and advances in programming tools and languages.
The central argument of the conceptual integrity and the separation of architect from implementers appears more convincing than ever. Though enterprise level solutions pose the challenge of efficiently implementing the functional separation concept, yet divide-and-conquer and recursion techniques like maintain architects for major function areas under a principal architect can achieve the benefits proposed.
Shrinkwrapped software products give rise to 'Featuritis', the besetting temptation for the architect to overload the product with marginal utility features. Also, the larger and more amorphous the user set, the more necessary it is to define it explicitly if one is to achieve conceptual integrity. A system architect should find out (or guess, if data is not available) on the relative frequency of different attributes of the user set. Such exercise is immensely helpful in defining the scope and complexity of the product.
The idea of a planning for a throw-away system (Chapter 11) seems to contradict the caution given against second system effect (Chapter 5). However, Brooks explains that the confusion is linguistic. The second system in Chapter 11 is the second try at building what should be the first system fielded.
The success of WIMP interface (Windows, Icons, Menus and Pointing Devices - a terminology used for GUI when it was first proposed by Doug Engelbart in his historic 1968 demo and later developed at Xerox Palo Alto Research Center. The story of Steve Jobs "coaxing" Xerox management to showcase the technology to him, its porting to 'Lisa', triumph on Macintosh and 'influence' on Windows is an interesting sidetrack to follow) is cited as a triumphant example of conceptual integrity and successful metaphor conforming to the mental models of desktop workspace arrangement. Substantial critique is made of various aspects of WIMP in about 4 pages.
Attention is next drawn to the Waterfall Model of Software Engineering. This sequential method of developing software is acknowledged as implicitly assumed in suggestions such as 'build one to throw away' and proportioning time between design, development and testing. Brooks admits that that the suggestions derived as such suffer from the basic fallacy of waterfall model in that it assumes a project goes through the process once, that the architecture is excellent and easy to use, the implementation design is sound and the realization is fixable as testing proceeds.
An incremental-build model is suggested as better for progressive refinement. An end-to-end exoskeleton is first built according to the design decisions but has blank function bodies. The functions are incrementally fleshed out and system testing is performed at each significant increment. This ensures that there is a debugged, tested system at every stage. Microsoft's daily-build process, incremental-build and rapid prototyping are explained in this context. Further, the importance of Object Oriented programming and its properties like information hiding, data abstraction and inheritance are shown to give tremendous advantage in following an incremental-build as well as code reuse approach.
Empirical, independent studies are then referred for showing the validity of Mythical Man-Month and Brooke's Law. The Brooke's Law is modified on the basis of data to show that its possible to finish a slipping project on time by adding manpower but only if this is down before the middle of the project timeline. Any later additions will result in project delay. There will be a drop in efficiency immediately after manpower addition. However, given enough time (i.e. for new developers to become familiar with program and knowledge dissemination having taken place) the efficiency can come back on track.
'Peopleware' is given more credence now. The real assets of the project are its people. A heavy price has to be paid for moving projects between different teams or loss of knowledge due to attrition. It is also suggested that power of decision making should be transferred deep down the organization hierarchy as much as can be warranted. Example of Microsoft product development teams is given to highlight the productivity gained by such a management philosophy.
The last 10 pages of the book are devoted the phenomenon of explosive growth of microcomputer and the allied industry of shrinkwrapped software. The technological improvements in hardware have made overcome various accidental (term used by Brook to refer to the implementation aspect of the software development) restrictions mentioned in earlier chapters. The development platform choices have coalesced into few operating systems. One can now buy software off-the-shelf rather than incurring expenses for in-house or outsourced development. Brooks makes the observation that organizations have learnt to modify their business processes to fit to these available software products rather than putting efforts into building a software highly customized to their original processes. In the areas where this does not change the very basic nature of organization's business model, there is cost saving to be made by this paradigm shift. Also, accidental complexities are proposed to be only 1/10th of the overall complexity in building a software product. Hardware and development tools improvements only attack accidental complexities and the order of magnitude in efficiency gains obtained thus can at maximum eliminate 1/10th of overall complexity. The rest 9/10th complexity is due to essence, I.e. the design conceptualization of software product. Earlier nothing significant was available to attack this complexity. But emergence of shrinkwrapped software and expected emergence of components market (off-the-shelf C++ classes) has given rise to build-on-package phenomenon which truly attacks the essence complexity.
Lastly, it is hoped that software engineering will truly attain the properties of an engineering discipline when distinctive processes giving exact results would have been identified to build software.