Battling Bias Across Disciplines by Cook & Bynum
With the reproducibility of studies in question for both hard and soft sciences, there is a movement afoot to bring improved discipline to the research process. One way to balance the number of overt and covert biases facing researchers is to increase the use of blind analysis approaches.
Decades ago, physicists including Richard Feynman noticed something worrying. New estimates of basic physical constants were often closer to published values than would be expected given standard errors of measurement. They realized that researchers were more likely to ‘confirm’ past results than refute them -results that did not conform to their expectation were more often systematically discarded or revised. To minimize this problem, teams of particle physicists and cosmologists developed methods of blind analysis: temporarily and judiciously removing data labels and altering data values to fight bias and error. By the early 2000s, the technique had become widespread in areas of particle and nuclear physics. Since 2003, one of us has, with colleagues, been using blind analysis for measurements of supernovae that serve as a ‘cosmic yardstick’ in studies of the unexpected acceleration of the Universe’s expansion.
In several subfields of particle physics and cosmology, a new sort of analytical culture is forming: blind analysis is often considered the only way to trust many results. It is also being used in some clinical-trial protocols (the term ‘triple-blinding’ sometimes refers to this), and is increasingly used in forensic laboratories as well. But the concept is hardly known in the biological, psychological and social sciences. One of us has considerable experience conducting empirical research on legal and public-policy controversies in which concerns about bias are rampant (for example, drug legalization), but first encountered the concept when the two of us co-taught a transdisciplinary course at the University of California, Berkeley, on critical thinking and the role of science in democratic group decision-making. We came to recognize that the methods that physicists were using might improve trust and integrity in many sciences, including those with high-stakes analyses that are easily plagued by bias.
Many motivations distort what inferences we draw from data. These include the desire to support one’s theory, to refute one’s competitors, to be first to report a phenomenon, or simply to avoid publishing ‘odd’ results. Such biases can be conscious or unconscious. They can occur irrespective of whether choices are motivated by the search for truth, by the good mentor’s desire to help their student write a strong PhD thesis, or just by naked self-interest.
We argue that blind analysis should be used more broadly in empirical research. Working blind while selecting data and developing and debugging analyses offers an important way to keep scientists from fooling themselves.
* * * * *
Blind analysis ensures that all analytical decisions have been completed, and all programmes and procedures debugged, before relevant results are revealed to the experimenter. One investigator – or, more typically, a suitable computer program – methodically perturbs data values, data labels or both, often with several alternative versions of perturbation. The rest of the team then conducts as much analysis as possible ‘in the dark’. Before unblinding, investigators should agree that they are sufficiently confident of their analysis to publish whatever the result turns out to be, without further rounds of debugging or rethinking. (There is no barrier to conducting extra analyses once data are unblinded, but doing so risks bias, so researchers should label such further analyses as ‘post-blind’.)
There are many ways to do blind analysis. The computer need not (and probably will not) be blinded to data values; it is the display of results that masks information. Techniques must obscure meaningful results while showing enough of the data’s structure to allow researchers to find and debug measurement artefacts, irrelevant variables, spurious correlates and other problems. For example, researchers who analyse clinical-trial results without knowing which patients received a placebo should still be able to identify implausible values. The best methods for blinding depend on the properties of the data (for example, the type of statistical distribution, lower and upper bounds, whether values are discrete or continuous and whether cases were randomly assigned to experimental conditions or passively observed). Both data values and labels can be manipulated to develop a suitable strategy.
* * * * *
Finally, blind analysis helps to socialize students into what sociologist Robert Merton called science’s culture of ‘organized skepticism’. As Feynman put it: “This long history of learning how to not fool ourselves – of having utter scientific integrity – is, I’m sorry to say, something that we haven’t specifically included in any particular course that I know of. We just hope you’ve caught on by osmosis. The first principle [of science] is that you must not fool yourself – and you are the easiest person to fool.”