Gli errori nelle metodologie di valutazione commessi dalle forze armate USA presentano impressionanti analogie con quelli commessi dall’ANVUR nella VQR. «I address the proliferation of “junk arithmetic” and flawed logic within the currently used assessment processes and discuss why regional commanders and their staffs should care about these problems, by describing the damage to commanders’ credibility and decision support created by flawed processes. […] Using arithmetic on numeric metrics is optional, but the rules of arithmetic are not optional. The following examples of junk arithmetic I encountered suffice to demonstrate the broader problem. Many of the assessments processes I observed in-theater take qualitative and quantitative data, rank order them, and average the rank-order numbers. […] Averaging ordinal numbers, such as rank orders, within an assessment process is just as nonsensical, and this kind of obvious error subjects the credibility of the assessment, and the command promoting it, to justifiable suspicion. […] The rules of arithmetic—including the fact that adding or averaging rank orders is nonsense—were established over two millenniums ago by, among others, Pythagoras and are taught in every elementary school worldwide.» Non state leggendo una traduzione in inglese di un articolo di Roars che denuncia l’uso della somma dei ranks, vero e proprio fatal error della VQR. La citazione è invece presa da “Operations assessment in Afghanistan is broken“, un articolo dedicato agli errori nelle metodologie di valutazione commessi dalle forze armate USA.
Che il ricorso alla somma dei percentile rank non abbia base scientifica è cosa nota in diverse discipline che includono la psicometria, la pedagogia, la chimica, la geografia e il diritto. A questo elenco vanno aggiunti anche gli studi militari. In un articolo del 2011, vincitore del secondo premio per gli articoli apparsi sulla rivista Naval War College Review, Stephen Downes-Martin discute il fallimento delle metodologie di operations assessment messe in opera dalle forze armate USA in Afghanistan.
Tra le varie cose viene messo alla sbarra il ricorso alla junk arithmetic (aritmetica spazzatura). Come esempio emblematico, Downes-Martin cita l’abitudine di mediare numeri ordinali, una procedura analoga all’algoritmo bibliometrico ideato dall’ANVUR per la VQR 2011-2014.
Ma le ragioni per segnalare un lavoro dedicato ad un argomento del tutto insolito per il nostro blog non si fermano qui. L’analisi complessiva che denuncia gli effetti deleteri dell’uso meccanico di indicatori quantitativi, problematici sia per quanto riguarda la raccolta che la successiva elaborazione, si adatta in modo sorprendente anche ai fallimenti dell’agenzia di valutazione italiana.
Quando leggiamo di
illusion of precision without the reality of accuracy
come si fa a non pensare alla «fotografia dettagliatissima e soprattutto certificata della ricerca italiana» vantata nel 2013 dall’allora presidente Stefano Fantoni con riferimento alla VQR 2004-2010?
E che dire della sezione intititolata “Distrust generated by poor assessment practice”? Le seguenti parole sembrano adattarsi bene anche all’ANVUR e ai vertici accademici:
the visibility of these flaws means that military assessments, and by association the military commanders, are rightfully distrusted by higher civilian authority and by other organizations within the theater.
E quando l’autore mette in guardia i comandanti dal dare credito a valutazioni inaffidabili, viene da pensare ai rettori che hanno fatto quadrato attorno a una VQR altrettanto poco credibile:
The flaws in the operations assessment processes I observed in-theater clearly produce untrustworthy decision support; they are so manifest that commanders place their own credibility at risk when they support the resulting assessments. […] The continued use of junk arithmetic and flawed logic robs decision makers of the most essential requirements that assessment is supposed to supply—sound, verifiable, and accurate information upon which to make life-and-death decisions.
Di seguito riportiamo alcuni estratti dell’articolo, la cui versione integrale è scaricabile qui.
OPERATIONS ASSESSMENT IN AFGHANISTAN IS BROKEN
What Is to Be Done?
[…] when a new commander and staff take over duties as a regional command in Afghanistan, they inherit an operations assessment process riddled with highly visible flaws that emanate from the improper use of numbers and flawed logic. While no assessment process can be perfect or free of any criticism, the flaws the author observed during a six-week stint in-country are sufficiently egregious that they seriously reduce the value those assessments provide to commanders’ decision support. In addition, the visibility of these flaws means that military assessments, and by association the military commanders, are rightfully distrusted by higher civilian authority and by other organizations within the theater. It is therefore imperative that incoming commanders and staffs taking over responsibilities for regional commands address these flaws to improve decision making and to earn the trust of higher civilian authority and organizations with whom they have to work.
I address the proliferation of “junk arithmetic” and flawed logic within the currently used assessment processes and discuss why regional commanders and their staffs should care about these problems, by describing the damage to commanders’ credibility and decision support created by flawed processes. Finally, I propose an approach to operations assessment that regional commanders can immediately put into place. I do not discuss or comment on strategy, operations, or the broader arguments concerning counterinsurgency versus counter-terrorism. I focus solely on the operations assessment process.
Forecasting has a long and dubious history, full of pseudoscience, junk arithmetic, and flawed logic, practiced by witches and listened to by kings.7 Forecasting should be done using a combination of subjective professional judgment, objective logic, (social) science, and mathematics.8 However, for many people the differences between pseudoscience and real science are hard to spot. Furthermore, approaches that are valid in one context can become invalid in others, even if the differences in the contexts are not obvious. So although most officers would subscribe to the notion of using valid logic, mathematics, and science (everyone believes that they themselves are rational and logical), it is difficult for those not explicitly educated and trained in science, analysis, and critical thinking to identify whether an approach is logically or scientifically valid. This difficulty is the root cause of many of the flaws I have observed in operations assessment as practiced in Afghanistan.9
FLAWS IN ASSESSMENT AS CURRENTLY PRACTICED
Using arithmetic on numeric metrics is optional, but the rules of arithmetic are not optional.17 The following examples of junk arithmetic I encountered suffice to demonstrate the broader problem.
Many of the assessments processes I observed in-theater take qualitative and quantitative data, rank order them, and average the rank-order numbers. For example, in the RODEA process, assessors coded answers to questions on a point scale of one through five, similar to the “rating definition levels” used by ISAF and IJC. These codes are not ratio-scale numbers, and therefore, by the laws of arithmetic, functions such as “averaging” cannot be performed on them—it would be meaningless.18 To put this into a familiar context, officer pay grades are rank ordered by “O number”—that is, pay grades O-1 (second lieutenant) through O-10 (four-star general). But no one believes that a brigadier general (O-7) is the same as a major (O-4) paired with a captain (O-3) just because four plus three is seven.19 Averaging ordinal numbers, such as rank orders, within an assessment process is just as nonsensical, and this kind of obvious error subjects the credibility of the assessment, and the command promoting it, to justifiable suspicion.
Higher-Command Demands for Objective Assessments
ISAF and IJC (supported by higher civilian authority) demand a “set of indicators that complements the commander’s qualitative assessment of the environment.”27 Unfortunately, in practice, “indicators” are all too often interpreted as being “quantitative” (or “numeric”) and thus “objective,” whereas the “commander’s qualitative assessment” is seen as “subjective.” An example is a report from one provincial reconstruction team that referred to “overreliance on qualitative and subjective assessments” as a challenge.28
In the absence of a credible numbers-based theory of counterinsurgency in Afghanistan, there is no objective, numbers-based assessment for military operations there. Pretending otherwise gives the illusion of precision without the reality of accuracy.
The obsession with objective assessments is tactical thinking applied to strategic problems. Although purely objective (and numbers-based) predictive theories of the physical world are possible, the likelihood that the same will become true for operational- and strategic-level complex social interactions—such as insurgency and counterinsurgency, terrorism and counterterrorism, and warfare —in time to be useful in Afghanistan is extremely small. Therefore, operational/ strategic counterinsurgency assessment in Afghanistan must be subjective, based on senior leaders’ subjective professional judgment of pertinent qualitative and quantitative data.33 Even if all relevant data were available and all of them accurate, numeric, and objective, assessing what they mean for success is still a professional military subjective judgment call, since there is no credible, objective numbers-based theory of counterinsurgency.
DISTRUST GENERATED BY POOR ASSESSMENT PRACTICE
In my opinion, the number of metrics demanded overwhelms the collection capacity of regional commands’ partner civilian organizations and major supporting commands. Furthermore, neither those organizations nor supporting commands appear to trust the value of collecting on those metrics or of assessments done using them. For example I was openly told by a head of planning in one civilian two-star-equivalent organization that in response to his regional command’s request for assessment metrics he makes up what he does not have and does not check the quality of what he does have. An additional reason given me for not taking metrics seriously was the absence of feedback from requesting organizations.
WHAT IS TO BE DONE?
The flaws in the operations assessment processes I observed in-theater clearly produce untrustworthy decision support; they are so manifest that commanders place their own credibility at risk when they support the resulting assessments. Regional commanders have the authority and means to fix operations assessment within their commands. However, doing so requires institutionalizing a rigorous process and separating it from the task of responding to higher-command requests for information. If the regional commander decides that this separation is unacceptable or does not have the time or staff resources to implement it, an alternative is to base the regional command’s operations assessment entirely on its commander’s subjective professional judgment combined with that of the region’s civilian provincial reconstruction team and of other regional stakeholders. The continued use of junk arithmetic and flawed logic robs decision makers of the most essential requirements that assessment is supposed to supply—sound, verifiable, and accurate information upon which to make life-and-death decisions.
7. Belief in pseudoscience and conspiracy theories and inability to use valid reasoning are disturbingly frequent in the American population—see “Science and Technology: Public Attitudes and Public Understanding,” National Science Foundation: Science and Engineering Indicators 2002, www.nsf.gov/, esp. “How Widespread Is Belief in Pseudoscience?” It would be unwise to assume that excellence in leadership is incompatible with these kinds of thinking failure. Such thinking failures are not a cause for concern when those involved are facing familiar operational and strategic situations of the kind they have successfully dealt with in the past. However, experience with past operational (or strategic) situations is only as relevant to the current situation as the past and current situations are similar.
8. See J. Scott Armstrong, Principles of Forecast- ing: A Handbook for Researchers and Practitioners (New York: Springer, 2001), for a good overall introduction to forecasting. See also Forecasting Principles: Evidence-Based Forecasting, www.forecastingprinciples.com/.
9. I have also observed these flaws throughout both the U.S. Department of Defense and civilian commercial organizations—as would be expected, since the root causes are present throughout the Defense Department and civilian worlds.
17. The rules of arithmetic—including the fact that adding or averaging rank orders is nonsense—were established over two millenniums ago by, among others, Pythagoras and are taught in every elementary school worldwide.
18. The attempt to get around this by scoring metrics using Likert-like items (for example, the five-point rating definition level) fails, since with every point defined by a text description the numbers associated with each text item are rank-ordered ordinals that, by the rules of arithmetic, cannot be averaged (or have any other arithmetic function used on them).
19. It may be that in certain instances one can replace a brigadier general by a major paired with a captain, but I suggest that in these cases one has other problems that are beyond the scope of the assessments process.
27. IJC guidance to the ISAF Joint Command Metrics Workshop.
28. Coffey International Development, “Introduction to the Helmand Monitoring & Evaluation Programme,” Helmand Provincial Reconstruction Team, Afghanistan, October 2010.
33. A combination of diplomatic/political, informational/ideological, military, and economic leadership and expertise must be involved.