3.5.1 Summary
My first three experiments produced very mixed results: in one case, weak evidence for
selective exposure, in another, evidence of the opposite tendency, and in the third, no evidence in either direction. These mixed results are not particularly surprising given
the mixed status of the selective exposure more generally, but what is perhaps surprising is that we obtain such mixed results from relatively subtle tweaks to the same paradigm.
Furthermore, these results seem to conflict with Taber and Lodge (2006), who found a significant selective exposure effect using a very similar experimental design. We
successfully replicate this result in an online experiment following Taber and Lodge’s design as closely as possible - prompting the question of what might account for the
difference in results between this and the previous studies.
The most notable difference between these experiments seems to be how the options of
what to read are presented to participants: Taber and Lodge (2006) have participants choose between arguments from different political interest groups, whereas we simply
present people with a choice between an argument ‘for’ or ‘against’ a given issue. Our initial findings provide weak support for the hypothesis that selective exposure is less
likely to occur when choosing between arguments in this latter, more abstract, way - of the first three studies, the two in which arguments were presented more abstractly
(experiments 2 and 3) found less selective exposure than the first experiment (where people chose between arguments from sentence summaries.) Of course, these differences
may be explained by some other factor, but the argument presentation does stand out as the most obvious difference between these studies.
I therefore tried investigating the impact of how choices are presented, by re-running the Taber and Lodge replication with a manipulation that varied how arguments were
presented to people. The primary hypothesis was that we would find more selective exposure when people were choosing between arguments from known sources (political interest groups, as in Taber and Lodge (2006), than when choosing between arguments
abstractly labelled as either ‘for’ or ‘against’ a given issue. We ran two experiments to test this hypothesis. In the first (experiment 5), though bias scores are slightly higher
the overall bias score across both conditions is lower than in Taber and Lodge’s original experiment, and not significantly different from zero. However, this lack of significant
findings may be because our sample is smaller and therefore underpowered (we had the same number of subjects as Taber and Lodge, but split into two conditions.) We
therefore run the same experiment on a larger sample (experiment 6) - but this time find that effect sizes are even smaller, and again not significant. Though bias scores are
still higher in the political groups condition, they are not significantly so - suggesting that if there is any meaningful effect here, it is small.
When we analyse the data using a slightly different method - looking at the correlation between initial attitudes and arguments chosen, and therefore capturing more of the
variation in people’s opinions - we find broadly similar results. However, especially in the later experiments, this measure of selective exposure sometimes produces statisti-
cally significant results where the simple ‘bias score’ measure does not. In particular, this indicates slightly more support for the hypothesis that reading arguments from spe-
cific sources makes people more likely to engage in selective exposure. We should be wary of drawing conclusions from this, however, since the correlations are weak (and
not significant in all cases), and especially given the lack of significant findings with the bias score measure. What this does reveal, however, is that analysis choices can
have subtle but significant effects on the conclusions drawn. If we had solely used the correlational analysis then we might have been more likely to conclude support for our
hypothesis, than had we used just the bias score measure (or both measures.) I discuss the implications of this in more detail in the next sections.
3.5.2 Discussion: just another failed replication?
What should we make of all this? First, our results taken together suggest that selective exposure in the domain of political attitudes is not a particularly robust phenomenon,
at least not measured in the way it has been here. Though mixed evidence for selective exposure is not a novel finding (Hart et al., 2009), we are not aware of previous research
which finds such mixed results for selective exposure within the same basic paradigm.10 Second, though the way information sources are presented may affect the extent to which
10
Prior research has discussed how selective exposure effects seem to vary greatly depending on dif- ferent topics, contexts, etc. - not how they vary across different studies of the same topics and context.
selective exposure occurs, we do not find convincing evidence for this in my experiments - if an effect like this exists, then it is relatively small.
What might explain such confusing and mixed findings? It might be that the effect Taber and Lodge (2006) found does in fact exist, but the later studies I conducted failed
to replicate the right conditions in some subtle way. Alternatively, perhaps no such effect exists, and Taber and Lodge’s original finding was some kind of statistical fluke.
Both of these explanations seem implausible, however, given that (a) I did successfully replicate Taber and Lodge’s result once; and (b) the series of experiments I conducted
were practically identical in design and procedure. What is perhaps more likely is that some tendency to select more supporting argumentsdoes exist, but there are also a large number of other motivations and factors influencing how people select information, and a ‘selective exposure’ effect is not strong enough to override them. This means that any
‘selective exposure’ effect is hard to detect and may easily be swamped by variations in the sample or experimental design.
It’s also worth considering these mixed results in the context of the recent ‘replication crisis’ in psychology. In the last few years, the robustness of various findings in psy-
chology has been challenged, as many have failed to replicate. Most notably, a large team of researchers as part of the ‘Open Science Collaboration’ recently attempted to
replicate 100 studies published in three different well-established psychology journals. The final paper (Open Science Collaboration, 2015) reports that replication effect sizes
were on average half the magnitude of originals, only 36% of replication studies found a significant effect (compared to 97% of the original studies), and in all, only 38% of the
studies were ‘subjectively rated’ as successful replications.
The authors acknowledge that there is no single, agreed upon standard for evaluating
the success of a replication - and use a combination of different measures, including the statistical significance of the replication effect, the difference between the original and
replication effects, directly comparing effect sizes, and combining original and replication results in a meta-analysis. These different methods have their respective advantages and
disadvantages - which has allowed others to challenge whether the failures of replication are really as severe as claimed. Gilbert et al. (2016) argue in a commentary paper that
reproducibility in social science. They suggest that due to sampling error and noise, we should not expect all effects to replicate even if they are true.
Statistician Andrew Gelman responds to Gilbert et al., suggesting that they are too quick to take the original result as solid evidence - and suggests that the burden of proof
should be the other way around (Gelman, 2016a,b). Gelman suggests using a ‘time- reversal heuristic’: if the replication result had come first, if first someone had run a
study with a large sample that found no effect, and then afterwards someone else came along with a smaller sample and did find an effect, would we then be confident that such
an effect exists? It seems like we would not - and so we should not continue believing in the effect simply because the positive finding came first.11
The fact that psychologists and statisticians cannot even agree on what it means for a finding to be reproduced successfully should itself concern us about the state of evidence
in psychology. If we can’t agree on what it means for a finding to be successfully reproduced, can we really say we know what constitutes solid evidence for a psychological
claim in the first place?
Beyond this specific discussion of reproducibility in psychological science, there are var-
ious deeper reasons to be concerned about the validity of findings in social science. Going as far back as the 1960s, Meehl (1967) argued that, due to noise and variation
in statistical studies, it’s possible to find statistically significant effects where no true effects exist if you look hard enough. Gelman and Loken (2013) explain how this can
occur even without any deliberate attempt to ‘fish’ for a significant effect on the part of the researchers. This builds on the notion of ‘researcher degrees of freedom’ (Simmons
et al., 2011): there are various different decisions that researchers have to make in col- lecting and analysing data - when to stop collecting more data, which observations to
include, what kind of analysis to do, and so on - and often these decisions are not all made in advance. This means that even if researchers do not actually conduct multiple
different analyses, there are multiple different analyses they could have done - meaning the chance of any one of these analyses yielding a statistically significant results is sub-
stantially larger than 0.05. Simmons et al. (2011) show that subtle manipulations to these researcher degrees of freedom can result in finding statistically significant results
11
It’s also interesting to note here that the idea that what result we hear first influences our interpre- tation of subsequent results is, of course, very close to a claim of confirmation bias!
supporting a hypothesis that is blatantly false - that listening to children’s music makes people younger.
John Ioannadis argues independently, in a provocatively titled paper, that in all likeli- hood, “most published research findings are false” (Ioannidis, 2005). Ioannadis shows,
using Bayesian reasoning, that due to publishing and analytic practices, more than half of published research results are likely false. Gelman (2016a) further argues (also using
the provocative term “piss-poor social science”) that it’s basically impossible for most of social science’s findings to be true, since so many claim large effects of small manipu-
lations. If all these findings represented genuine effects, Gelman argues, we would live in a very strange world (not to mention the fact that many such results appear to directly
contradict one another.)
Gelman argues that the solution to all of these issues is not as straightforward as simply
improving statistical practices (Gelman, 2016a,b). More replications and better statistics will help of course, but if many of the purported findings do not exist, then better
practices will simply result in lots of negative findings and no real knowledge. What we need really, he argues, is better measurement tools and research design: we need to learn
to ask better questions, and to reward scientists for rigour and careful testing of clever hypotheses, not for findings that make good headlines. Better methodological practices
will help indirectly, in that they will shift the cost-benefit ratio - so that it is harder, and so costlier, to produce ‘sexy’ results by using sloppy practices - but are not themselves
the solution.
With this backdrop of issues - questions about the reproducibility of psychological stud-
ies and the robustness of scientific evidence even more broadly - the mixed results for selective exposure are perhaps not surprising. Perhaps the most straightforward expla-
nation for what is going on is that the ‘selective exposure’ effect is relatively small, just one factor influencing how people approach information among many. And as Gelman
(2016a) suggests, the solution to getting clearer on what is going on here may not simply be larger samples and better statistics. Instead, what may be needed is to step back
a bit and ask: why are we interested in selective exposure, really - and how might we measure what we’re interested in better?
It seems to me that it’s not selective exposure per se that’s important, but more the broader ways that people may be biased towards confirming their existing beliefs.
Selective exposure may be one way in which this occurs, or one contributing factor - but if it is confirmation bias more broadly that’s actually important to understand, we
need to consider bias in all stages of reasoning. Instead, then, of continuing to try to understand when and whether this specific kind of ‘selective exposure’ occurs, we might
learn more by coming up with better ways of measuring how people form and update their beliefs more broadly.