The moral, as it were, of the McNamara fallacy is that measurement schemes tend to become enshrined and inflexible in their organisations. This means that any shortcomings in their validity will also tend to be long-lasting. In this section, the four steps11 of the McNamara are considered and mechanisms proposed.
The ‘easily measured’ part of the Yankelovich formula plays out in numerous ways. ‘Easy’, from a purely technical point of view, might mean a preference for data readily at hand, or easy to collect or process into a suitable format. As Yankelovich/Smith say, “This is okay as far as it goes” (ibid), however, how far that is may be a function of the design paradox (Frei, 2015). The paradox is that one knows less at the start of a project or design process than at the end. Therefore, if a measurement scheme is set-up early in a design or project lifecycle, unless revised, it will lack validity.
In terms of foresight, if we take Stark’s (ibid) point of view, erroneous measurement will create in actors a faulty perception of how the organisation works, and this will be reflected in their view of future possibilities.
For example, in 2014, SNCF discovered that 1,000 of its railway platforms were too narrow to allow adequate clearance for 2,000 new trains it had ordered. A quirk of the sampling methodology meant that measurement of existing platforms focussed on those built in the previous 30 years. However, older stations were built when trains were slimmer and, in consequence, with wider platforms. According to the French minister for transport, M. Frederic Cuvillier “When you separate the rail operator from the train company," he said, "this is what happens." A programme of platform modifications was put in place, an unplanned expenditure exceeding €50 million. (BBC, 2014)
M. Cuvillier’s explanation points to the involvement of stakeholders, another issue implied by the McNamara fallacy. At the start of design projects (and this may also apply to projects in ageing systems, according to Horrocks, 2010) the decision-makers involved in defining measurements might not be the same as those working elsewhere, in mature parts of the system, or who inherit the design when it comes into service. If ease does indeed define measurement, then what is easy for the project leaders may create measurement systems focussed on their short term goals or sub-systems rather than the longer-term operational functioning of the whole mature system.
In response to numerous operational problems in the running of public services by contractors, the UK Committee of Public Accounts (CPA) looked into how the civil service managed the contracts. They found there to be a disproportionate emphasis on measuring performance in the early phases of government contracts. Furthermore, not only was there a relative lack of operational performance measurement, there was also a Principal-Agent problem. The civil servants responsible for these contracts usually moved jobs before delivery of the contracted service or product. Therefore, the moral- hazard was for civil servants to set-up measurement of aspects they could control well, but not those that would create difficulties for them. In consequence, CPA recommended that civil servants should “remain accountable for spending throughout the life of contracts”. (CPA 2014).
The CPA findings suggest a number of problems for foresight. Firstly, a focus on measurements relevant in the short-term may displace timely consideration of factors that will be relevant in the longer term. Secondly, variables that are not measured provide degrees of freedom for decisions. However, if these unmeasured variables turn out to be important later in the lifecycle, decisions taken early in a project may unwittingly trade-off measured variables against those that are unmeasured. Thirdly,
moral-hazard may reinforce self-serving visions of the future that make it harder to foresee problems.
The second step of the McNamara fallacy is to “to disregard that which can't be measured or give it an arbitrary quantitative value”. The phrasing suggests that this is seen as a deliberate decision by measurement designers. However, the McNamara fallacy may rely on mechanisms that operate below the level of awareness.
The first mechanism—skilled unawareness—is fundamental. As described by Argyris (1999) skilled unawareness screens from consciousness the things the actor is doing to avoid confrontation or protect their own beliefs from dissonance. Some of those things involve subordinating their own views and beliefs, which Argyris calls skilled incompetence. Argyris has demonstrated that these defensive skilled routines are common in organisations. This would predict that actors will design measurements that make confrontation less likely or avoid dissonance, and for the same reasons and using the same psychological mechanisms, self-censor criticism of flawed measurements.
Ralph Nader gives an example of self-censorship in car safety. “There was no Human Factors engineering being the subject of the legal discussion. And then I met a retired engineer from American Motors Corporation, his name was Henry Wakeland. And in my long conversations with him, he told me about the enormous self-censorship inside the industry. And I said, this can’t be so. I’m mean, you mean they don’t speak up; even in closed doors among themselves? He says, they’re all afraid. I said, afraid of what? Afraid of being marked as trouble-makers, non-players on the team. And he gave me a lot of examples” Nader, 2016.
Avoiding confrontation is not necessarily a goal in itself, and in organisations it may serve a higher purpose: to preserve the political status quo. Lessig (2011) calls this dependence corruption. To ensure that the relationships they depend on continue, individuals make complex adjustments in their perceptions. Lessig makes the point that this is not corruption in the sense of bribery, but an aspect of the reciprocity12 that is fundamental in human relationships. However, in the organisational setting, dependence corruption means that individuals will interpret policies and data, and act in ways that protect the interests of those others who enable the individual to be effective and successful in their work. As well as the risks of measuring what is technically easy, it is likely that easily measured is defined in a political dimension. Lessig gives several examples, such as scientific studies of the harmfulness of Bisphenol-A (BPA is a constituent of some soft plastics). He reports that no industry- funded study (n=13) has found evidence of harm, whereas 86% of independent studies (n=163) have. Lessig makes the point that the scientists who found no harm were not dishonest, but consented to work in studies that sought to determine risks of exposure rather than to look for evidence of harm. The overall effect is to make it less certain
12 Lessig notes that “We all recognize the drive deep in our bones (or, more accurately, our DNA) to reciprocate. Some of it we see directly. Some of it we don’t. The subconscious is guided by interactions of reciprocity as much as the conscious. We reciprocate without thinking. We are bent to those to whom we are obliged, even when we believe, honestly, that we are not. What Robert Brooks wrote over a century ago we can repeat today: “By far the worst evil of the present system is the ease with which it enables men otherwise incorruptible to be placed tactfully, subtly, and— as time goes on— always more completely under obligations incompatible with public duty.” (Lessig, 2008; p132)
that it is safe or unsafe to use plastics containing BPA. Even better documented is the example of tetraethyl lead (TEL, an additive to petrol) in which, from the 1920s onwards, the scientific evidence for safety was overwhelmingly funded by the industry. In the mid-1960s, independent research “drew attention to the fallacy of assuming that observed (“typical”) lead in foods and bodies of Americans are natural and therefore safe and harmless. He [Clair Patterson] used a geochemical argument to estimate that the average (typical) body burden and concentrations of lead in the blood of Americans in the 1960s were at least 100 times above the background values” (Nraigu, 1998). Whether subject to unawareness or not, arbitrary measurement or disregarding data that contradicts vested interests may also reflect organised hypocrisy. This term was coined by Brunsson (2002) to describe what he found in studies of Swedish public sector organisations. There he saw that decision-makers created situations in which talk and decisions were quite separate from actions. This allowed stakeholder concern about a topic to be addressed, if not resolved, by talk that never transferred into action. Brunsson was not moralising, he was describing a rational means of controlling outcomes when objectives are incompatible. Arbitrary measurement makes it easier to talk rather than act. As Hand (1996; p.453) points out, “to be useful, the numerical assignment procedure has to be well defined. Arbitrariness in the procedure will reflect itself in ambiguity in the results.” A suitably ambiguous measurement creates a buffer between the world of talk and the world of action. In this way the decision-maker’s second step into McNamara fallacy carries the organisation across the threshold into organised hypocrisy.
Most readers will have some familiarity with statistics about the punctuality of the railways. In the UK, until June this year, figures suggested that 92% to 97% trains were on time (BBC, 2017). However, if arrival time is measured more precisely, to the nearest minute, those figures fall to 65%. However, even this does not directly measure the impact on passengers; it excludes missed onward connections, missed or cancelled appointments, and so forth. An attempt to quantify this impact arrived at a figure of 73.47 GBP (about €82) per minute of delay (NAO, 2008). Most UK train operators have a policy of compensating passengers for late arrivals at stations of at least 30 minutes, and then for a proportion of the train fare, which in the vast majority of cases will be rather less than €82 per minute.
Technical error of measurement can be hard to spot, even when skilled unawareness is not obscuring it. There is always a chain of arguments that link the conclusions we draw from evidence to the empirical reality we seek to know about and influence. This is true whether the evidence is qualitative or quantitative. Sometimes we are fully aware of this chain, but more often not, especially when some links between the ‘map and the territory’ are not accessible. As Hand (1996) makes plain, there is plenty of scope for technical errors in constructing and using measurement schemes.
It is one thing to make errors, but another thing not to detect or correct them. One can imagine a number of reasons why systematic errors of measurement, or other problems of validity, would go uncorrected. Firstly, the practical impact of poor measurement might be invisible or inconsequential (even if borne by others in the form of pollution, ill-health etc., if those people lack power or representation).
Secondly, the measurement scheme might be inscrutable. It is quite intimidating to question the validity of a measurement produced by a sophisticated analysis, perhaps even a computer generated algorithm, and based on a huge amount of data. Often the person who might be able to show the steps that connect the map to the territory is not there to explain things, even if such a person exists.
Thirdly, as predicted by Argyris, when a measurement scheme is legitimised at a senior level, it is unlikely to be challenged by subordinates. The misuse of Total Recordable Injury Rate (TRIR) as a general measure of plant safety at the BP Texas City refinery, may be an example of this. As CSB (2007) point out “TRIR and LTIR13 do not
effectively predict a facility’s risk for a catastrophic event”. Staffing cuts was a factor in that accident, but any adverse effects of this on process safety would not have been visible in occupational safety measurements such as TRIR. Concerning the lack of challenge by subordinates, the Baker Panel concluded that in BP even apparently capable individuals had “weak process safety voices” and that they did “not appear to participate substantially in the critical decision-making process with respect to BP’s U.S. refineries” (Baker, et al. 2007).
Fourthly, the simplification implicit in the models which underlie most measurement schemes, may not be recognised as an over-simplification, in the sense that some relevant aspects of the system measured are not represented in the measurement. As Aulnay et al. (2017) note, “Modelling allows a representation of a system but results in loss of information and especially for liaisons between elements and sub-systems. Nevertheless, level of understanding induced [by modelling] is more important than losses of information.” However, the implication of the McNamara fallacy is that the third step—to presume that what can't be measured easily really isn't very important— leads to the fourth step: “what can't be easily measured really doesn't exist”. Modelling is an essential, beneficial activity in safety; and imperfection is not a reason to throw the baby out with the bathwater. The problem is how to maintain vigilance and allocate sufficient resources to closing the gaps between the model and the evolving empirical reality.
3.
Conclusions
One goal of this article is to highlight the often subtle ways in which quantification may reduce the scope and acuity of foresight. However, none of the arguments presented deny or minimize the role and importance of quantification in dealing with safety.
The McNamara fallacy describes what happens to foresight if an organization lives on a ‘starvation diet’ of information filtered through imperfect measurement. There is always scope for improving measurements, but even as good as they could be made, they are filters on reality. The nature of perception, of life in Plato’s cave, is that we can never ‘peek around the back’ and see reality directly.
Foresight is integral to the creative aspect of safety practice. It is particularly evident in the form of hazard identification, which is meat and drink to risk assessment, both
in everyday evaluation of data and in more formal risk analyses. But does foresight contain all the nourishment it needs?
The danger to foresight is that measurement can define our reality, and there are numerous ways in which this can happen. This paper has tried to show that many of these ways are properties of our institutions rather than of ourselves—even scrupulous scientific conduct by individuals is not a complete answer.
The McNamara fallacy seems to be a disease of bureaucracy, not a lapse by individuals. Measurement schemes once established tend to endure, sometimes longer than their creators. And what was too hard to measure reflects not just technical limits, but the political realities of those creators. Does current practice do everything needed to continuously improve measurement schemes? Published accident investigation reports and plant ageing studies suggest not.
Our professions, both as institutions and identities, also have a role. Stopping models and measurements from becoming our masters is something we all need to do. As Feynman (1974) reminded us, we shouldn’t fool ourselves. However, we have to take some things for granted, but which things are safely left without ‘long existential epistemological consideration’, as Patton (2008) puts it?
Afterword
The McNamara fallacy has powerful intuitive appeal, but little basis in empirical research. The authors hope that this paper will sensitise practitioners to the impact of an expedient approach to measurement on the quality of foresight in their organisations. We hope also that it will stimulate researchers to see if the fallacy reflects what they find in the field, and whether the mechanisms proposed here, and others we didn’t think of, are valid.
Acknowledgements
We would like to thank Phil Parry, a retired inspector of major hazards installations, for his input to this paper.
References
Argyris, C. (1999) On organisational learning. Oxford, Blackwell.
Aulnay, R., Batiot, B. and Rogaume, T. (2017) Problématique de la gestion des configurations dans l'analyse préliminaire des risques, Rencontre inter-GTR : vers une vision systémique de la maîtrise des risques, Paris, 30 juin
Baker, James et al., 2007. “The Report of the BP U.S. Refineries Independent Safety Review Panel” http://www.safetyreviewpanel.com/, January 30, 2007.
BBC (2014) “French red faces over trains that are 'too wide'” BBC News, 21 May. http://www.bbc.co.uk/news/world-europe-27497727 (Accessed, 2/10/2017)
BBC (2017) “Train punctuality to be measured accurate 'to the minute'” BBC News, 18 July. http://www.bbc.co.uk/news/business-40635372 (Accessed, 2/10/2017)
Brunsson, N. (2002) “The Organization of Hypocrisy: Talk, Decisions and Actions in Organizations” Abrakt Forl: Copenhagen Business School Press, 2002, 2nd ed.
Columbia Accident Investigation Board (CAIB) (2003), Report Volume 1, National Aeronautics and Space Administration and the Government Printing Office
Committee of Public Accounts [UK] (2014) Transforming contract management. Twenty-third Report of Session 2014–15. HC 585.
CSB (2007) Investigation Report: Refinery Explosion and Fire. Report No. 2005-04-I- TX. U.S. Chemical Safety and Hazard Investigation Board.
Dien, Y. and Dechy, N. (2016) L’impensé est-il impensable ? Ce que nous apprennent les accidents industriels, In : M. Merad, N. Dechy, L. Dehouck, M. Lassagne (Edts), Risques majeurs, incertitudes et décisions – Approche pluridisciplinaire et multisectorielle, MA éditions, p. 69-96.
Feynman, R.P. (1974) Cargo Cult Science. Engineering and Science. Volume 37:7. Fischer, D.H. (1970) Historians’ Fallacies: Toward a Logic of Historical Thought. Harper Collins.
Frei, R., Garforth, A., Kingston, J., and Pegram, J. (2015) Using Operational Readiness to improve the Management of Risk. White Paper 2, Vol 1. Pub. Noordwijk Risk Initiative Foundation.
Hand, D. J. (1996) Statistics and the Theory of Measurement. Journal of the Royal Statistical Society. Series A (Statistics in Society), Vol. 159, No. 3, p.445-492.
Horrocks, P., Mansfield, D., Thomson, J., Parker, K., and Winter, P. (2010) Plant Ageing Study – Phase 1 Report, ESR/D0010909/003/Issue2. A report prepared for the Health and Safety Executive, 27th February 2009.
Kingston, J., and Mertens, F.J.H., 2007. Future directions in accident investigation: lessons from the evaluation research literature. Proceedings of the ESReDA 33rd Seminar, Joint Research Centre, Ispra, Italy.
Lessig, L. (2011) Republic Lost. Twelve, Hachette Book Group. McNamara, R.S. (1995) “In Retrospect”. Random House.
Nader, R. (2016) Acceptance speech to the “Automotive Hall of Fame”. https://youtu.be/vSYZNImEkys. Accessed and transcribed on 3 May 2017.
National Audit Office (2008) Reducing passenger rail delays by better management of incidents. Pub. The Stationery Office.
Nriagu, J.O. (1998) Clair Patterson and Robert Kehoe’s Paradigm of “Show Me the Data’’ on Environmental Lead Poisoning. Environmental Research, 71-78.
Osman, M. (2015) Future-minded: the role of prospection in Agency, Control and other future-directed processes. Frontiers in Psychology. Vol. 6, article 154.
Patton, M.Q. (2008) Utilization-Focused Evaluation. 4th Edition. Sage.
Perrow, C. (1984) "Normal accidents: Living with high risk-technologies", Princeton UP.
Piirainen, K.A., and Gonzalez, R.A . (2015) Theory of and within foresight. Technological Forecasting & Social Change, 2015, Vol.96
Pooley, E. (1996) Nuclear Warrior, TIME magazine, March 4.
Rohrbeck, R., Battistella, C., and Huizingh, E. (2015) Corporate foresight: An emerging field with a rich tradition. Technological Forecasting & Social Change, Vol 101.
Smith, A. (1972) The Last Days of Cowboy Capitalism, The Atlantic Monthly, September, p 43-55.
Vaughan, D. (1996) The Challenger Launch Decision. Risky Technology, Culture, and Deviance at NASA, The Chicago University Press, Chicago.
Woods, D.D. (2009) Escaping failures of foresight, Safety Science 47(4), p. 498-501. Yankelovich, D. (2012) Personal correspondence with the author (Kingston).
70
Foresight for Risk Prevention and Resilience:to what Extent
do they Overlap? – Extended Abstract
Nicolas Dechy, IRSN, Institut de Radioprotection et de la Sûreté Nucléaire, Fontenay- aux-Roses, France
Myriam Merad, CNRS, UMR ESPACE, Nice Sophia Antipolis University - UMR LAMSADE, PSL*, CNRS, Université Paris Dauphine, Paris, France