Capítulo 3. Hogar en el que viven las personas jóvenes: caminos hacia una vida
3.4. Interpretaciones de la permanencia en el hogar familiar
Study 3 investigated the influence of mode of data presentation on the
interpretation of results, using partly a one factorial between-subjects design with mode of presentation as the only factor. Specifically, I was interested in exploring which of three ways of displaying the same results (Figure 2.4) yields in a more accurate estimation of similarities and is perceived as clearer and more informative.
Method
Participants. Two hundred and ninety-one participants (Mage= 33.75, SDage= 11.17, 46% women) remained in the analysis after excluding twenty-four participants because they failed the instructional manipulation check twice (Oppenheimer, Meyvis,
& Davidenko, 2009), and one participant was excluded because he or she responded to all items with 0. Participants were recruited via a paid online platform.
Materials and Procedure. Three types of graphs were created, displaying simulated data with 1500 participants in each group (i.e., the average sample size per country of the World Values Survey data). Data were simulated from a normal
distribution, with a standard deviation of 0.80 and an overall mean of 3. The three types of graphs are depicted in Figure 2.4: a graphical representation of the overlapping distributions assessed by the PCR measure, a default barplot with confidence intervals, and superimposed histograms that represent the PCS measure. For each type of graph, nine versions were created with varying effect sizes: d = 0, 0.20, 0.40, 0.50, 0.60, 0.80, 1, 1.5, and 2. Participants were randomly allocated to rate one type of graphs, which were presented randomly. The instructions for the participants were “In your opinion, to what extent do the data as depicted in this plot indicate that the two groups A and B are different or similar? Each group consists of around 1500 people.” To make the variable more concrete, I labelled it sociability. Participants responded on a slider measure, ranging from 0 (“very different”) to 100 (“very similar”). Also, for each graph,
participants rated how comprehensible they found the figure on a 5-point scale ranging from 1 (“extremely incomprehensible”) to 5 (“extremely comprehensible”).
Figure 2.4. Three modes of depicting the same data. See text for more details. Next, participants ranked which out of three possibilities (Cohen’s d, PCR, and PCS) is the clearest and most informative way to describe scientific findings. The first option was “The difference between men and women was Cohen’s d = .43 with Cohen’s d being the difference in the two groups' means divided by the average of their standard deviations”, the second option was “The overlap of the responses given by men and women was 83 percent”, and the last option was “83 percent of the responses given by men were mirrored by women” (options were presented randomly). Finally, participants responded to some demographic items, including their education level and their
statistical training, before being debriefed and thanked.
Results and Discussion
First I compared participants with a university degree and some or a lot of statistical training with the other participants. The response pattern was highly similar across all items; therefore the results are presented across educational levels and statistical training.
Next, I tested the influence of the mode of presentation on the perceived similarity. All but one between-subjects one-way ANOVAs reached statistical significance at Fs > 5.2 and ps < .006, indicating that mode of presentation had an impact on the perceived similarity of two groups. For 6 out of the 9 graphical
comparisons, a very similar linear pattern was observed, with the superimposed normal distribution plots being more accurate, followed by the barplots, and the superimposed histograms least accurate (Figure 2.5). In other words, people were more accurate estimating similarities with the measure that represented the PCR than with any other measure. For example, for graphs displaying medium effect size (i.e., d = 0.50), the mean estimated amount of similarity was 78% for the superimposed normal
distributions, 65% for the barplots displaying the means and confidence intervals, and 57% for the superimposed histograms. The correct amount of overlap is, using the PCR, 80 percent. Thus, as expected, people underestimate similarities when shown results in the standard presentation format (means and confidence intervals); they even do so with the superimposed histograms, but presenting overlapping distributions attenuates this error. Participants in the superimposed normal distribution condition also rated the graph as more comprehensible (M = 3.92, SD = 0.93) than did participants in the barplot (M = 3.59, SD = 1.12, p = .03) and histogram conditions (M = 3.50, SD = 1.13, p
= .006).
Figure 2.5. Estimated similarities for plots displaying data for various Cohen's ds.
Finally, all participants were asked to rank which out of three possibilities was the clearest and most informative way to present scientific findings. A within-subject ANOVA was significant, F(2, 254) = 342.54, p < .001, = .73, with the option
describing the PCS measure rated as the clearest and most informative way to present the findings (M = 1.47, SD = 0.59), followed by the overlapping coefficient (M = 1.69, SD = 0.55), and Cohen’s d (M = 2.84, SD = 0.52). Pairwise comparisons revealed that all three groups differed significantly from each other at p < .001.
General Discussion
In all three studies, there was evidence for the view that similarities between categories should be reported, alongside differences: (1) similarities are typically stronger than differences; and (2) reporting similarities as well as differences results in more accurate interpretations of research findings. As described below, the implications of these results are wide-ranging.
Implications. My focal argument is not that differences should not be reported;
rather, my point is that simultaneously reporting similarities helps readers to interpret differences more appropriately. Consider an example taken directly from the mean result of the between-country comparisons: A researcher comparing two countries could report the mean difference between them had an effect size of d = .39, but with 84% common responses and only an 11% difference in the scale use. Reporting the results in this way avoids the tendency to over-simplify by focusing on differences. This latter focus is one way in which psychology may inadvertently steer people into regarding differences between groups as entrenched. By reporting observed differences between groups in the context of the fact that the groups in question are more similar to each other than they are different provides one way of counteracting racism and xenophobia. Emphasizing similarities may also offer a way to bridge the gap between people with different levels of education (Spruyt & Kuppens, 2014), by making it more evident that members of the outgroup are more similar to the ingroup than they are sometimes thought to be.
Furthermore, the presentation of similarity information is useful even when differences are reliable but small. Although people may more easily infer from small differences that similarity is potentially high, this inference is misleading if it is not framed concretely. Discrimination on grounds of ethnicity or gender is a case in point.
For example, if people belonging to a specific group (e.g., ethnic minorities or women) earn less than those in other groups, such differences matter a great deal, even if the similarities are large. The measures of similarities discussed here would enable researchers to frame such differences more concretely. For example, stating that the (small) gender wage difference in a specific company is d = .24 is less easy to apprehend than stating that the 90% of men and women employees share the same salary, but that men have higher salaries in the remaining cases. Furthermore, this conceptualization fosters greater realization that the key problem is to understand when these salaries differ and why, rather than encouraging absolutist, reductionist statements.
It should be noted that my proposal is descriptive. Null hypothesis significance testing and Bayesian statistics are complementary with my approach, and could be used to answer questions such as, “Is the degree of similarity between experimental group 1 and the control group larger than that between experimental group 2 and the control group?” When comparing two groups, larger p-values and smaller Bayes factors –
ceteris paribus – imply greater similarity, but neither index directly expresses similarity. In addition to reporting differences, researchers should report measures of similarity to fully reflect the nature of the effects in question, rather than encouraging absolutist, reductionist statements about the problem. Thus, I am not arguing that small effects are meaningless (Prentice & Miller, 1992), but rather that researchers should always report such differences in the context of the much larger similarities between the categories, thereby helping others to interpret the data more accurately (Study 3).
A stronger focus on similarities in published research should help to reduce the “file-drawer” problem (Rosenthal, 1979) because both statistically large and statistically small differences between groups are potentially more interesting against the backdrop of similarity information. Reporting similarities helps to complement specific tests of
differences with a broader descriptive analysis. This practice would make the
documentation of similarities between some groups and variables an interesting exercise in its own right, which may also increase the number of variables that are used to
compare groups. At the moment, variables are often chosen on the basis of their perceived likelihood of revealing differences, potentially biasing the literature before any data are collected (Fiedler, 2011).
Conclusion. This chapter has presented alternative ways to describe quantitative
data comparisons between groups. In the course of making more than 225,000 pairwise comparisons, I found that similarities between two groups generally far outweigh the differences between them. At the same time, some interesting exceptions to this pattern occur (e.g., for personal sexual-moral behaviours). Routinely reporting the extent of similarities in the presentation of results offers a more balanced way to describe research findings.