• No se han encontrado resultados

Recursos didácticos y materiales curriculares

4. PROPUESTA DE PROGRAMACIÓN DOCENTE

4.6. Recursos didácticos y materiales curriculares

The reproducibility of published scientific studies is a pressing concern, especially given the influence these studies may have on policy, medical treatment, and other real-world problems. In Nature’s survey of 1,576 researchers who answered a questionnaire on reproducibility in research, nearly 90% advocate for better statistics and more robust experimental design (Baker 2016a). One of the reasons for this crisis is the abuse of null-hypothesis statistical testing (NHST) methods, which can result in threats to the statistical validity (Leek & Peng 2015). This discussion surfaced in CHI, for example, in the form of ignoring effect sizes (Kaptein & Robertson 2012). However, awareness of these issues has not necessarily translated into better practice. Attempts to switch to alternative approaches (such as Bayesian methods) have been proposed. However, the vast majority of scientific research still depends on traditional null hypothesis statistic testing and related methods (which we will refer to as NHST throughout this article). In the meantime, it is important to ensure that NHST, when used, is both valid (applied correctly) and validatable (reported correctly) in published articles. In addition, reporting of assumptions is not just relevant in the NHST context, but are also relevant in alternative analytic settings, such as Non-parametric or Bayesian statistical tests (e.g., (Zimmerman 1998; Gelman & Shalizi 2015)).

Part II, Papers: Assessing Statistical Assumption Reporting in CHI and Other Fields

The statistical validity of analysis depends on a large set of factors, such as power analysis, experiment design, and representativeness of the sample taken. One very prominent factor is the need to choose the appropriate method for a given analysis setting. Often, data and experimental properties require a specific approach to be used. For example, to assess a mean difference between two conditions using a t-test (a commonly used statistical method), one first needs to ensure that the data (either directly for small samples or asymptotically) follow a normal distribution (Open Science Collaboration 2015; Fiske 2016). Depending on the degree of the violation, the validity of the results may be undermined, which may reduce the reproducibility or generalizability of the research (e.g., Glass et al. 2012). Hence, some analyses can be correct, even if the assumptions are violated, but this violation needs to be discussed and justified.

Thus, even if some methods are robust to violations of their assumptions in certain cases, it is essential to report on these violations to assess their impact on the analysis. Lack of reporting of assumptions in an article could mean that some assumptions were never tested, which therefore raises questions about the validity of the results attained via these statistical methods. A lack of reporting could also mean, that the statistical assumptions in an article were tested but are not described. Both conditions contribute to our status-quo, where reviewers are asked to almost blindly trust authors on their data analysis – a culture that stands in contrast to the scientific method.

A focus on assumption reporting naturally concerns itself with the products of the scientific research process (published papers). As we will show, the process currently does not emphasize assumption reporting. Assumptions are not emphasized in reviewing and authoring standards. The lack of attention to the issue of assumption reporting is perhaps further influenced by limited reviewer time and lack of a consistent, easy way of measuring whether statistical assumptions are reported.

This paper outlines a solution to these problems. Our contributions, in order of presentation, include:

• A review of standards on reporting statistical assumptions (along with a discussion of common statistical flaws and related issues). This includes both a literature review and an analysis of the materials provided to reviewers by journals in our sample.

• A scalable method testing whether publications meet basic assumption reporting standards. For this, the method relies on an extensible rule-base to determine the assumptions expected to be reported for the applied statistical methods and crowdsourcing to ascertain the presence of the assumption reporting in a paper. Our approach yields an F1-Score of 83%. We used this approach to analyze over 600 papers in the analyses described next.

• A comparison of statistical assumption reporting of CHI with five top journals from different fields (medicine, psychology and management). To do so, we empirically determined the most frequently used statistical methods in these journals (i.e., ANOVA, linear regression, logistic regression, Chi-square test, and t-test). Our results indicate that across the disciplines, on average, three out of four papers lack any reporting of the statistical assumptions or a discussion thereof. In addition, we find that regardless of the field, only very

Part II, Papers: Assessing Statistical Assumption Reporting in CHI and Other Fields

few assumptions (less than 6%) are reported for these frequently used methods.

• An analysis of the reporting of statistical assumptions at CHI over time, where we sampled a total of 261 papers over 25 years (from 1989 and 2016). Our data indicates that from the papers at CHI that use either the t-test or ANOVA, most (87%) do not report any assumptions. Additionally, we find evidence for only very mild improvement of the fraction of papers reporting at least one assumption over time.

Supported by these findings, we argue that a discussion of how to develop a common ground on assumption reporting is warranted.

Documento similar