• No se han encontrado resultados

1.4. La tipografía

1.4.5. Proceso de diseño para la elaboración de tipografía

In the case of both test versions, the research variables had to be created using a structural query language (SQL) from the raw data which was stored in the test applications’ relational database. Through this process, the raw data including transactions, choices and responses from database was converted to a two- dimensional research data matrix with variables and values that enabled the processing with statistical programming language (Python) and software (SPSS). The preparation phase included also the analysis of missing data. The examination of missing values is important for several reasons (Kwak & Kim 2017, 407), namely to avoid reducing the available data, compromising the statistical power of the study, and disputing the reliability of its results by causing a significant bias and degrading the efficiency of the data. Emphasis was put on preventing the problem of missing data in advance and the missing values which still passed this sieve were carefully examined.

Missing data causes two types of problems; bias and error. While bias causes an external validity problems, error causes defects in the hypothesis testing. (Newman 2014, 377–378.) According to Roderick Little and Donald Rubin (1990, 294), it is common in social sciences to use imputation, weighting and direct analysis of the incomplete data. Imputation could sound attractive, but it has serious pitfalls and should only be used with caution. Weighting, instead, is applicable only with monotone patterns of missing values as it ignores the missing cases and gives each of the involved cases a new weight to compensate for the missing cases. (Little & Rubin 1990, 294–296.) Newman (2014, 387) recommends that in the case of construct-level missingness, missing values are imputed applying a maximum likelihood or multiple imputation if 10% or more of the sample is made up of construct-level partial respondents.

The test application was designed to prevent the missing values. Therefore, the test phases and items and sub-tasks were mandatory (an empty value prevented proceeding). In those cases where participants dropped out before the test ended, their data was not saved to the research database. This automatically blocked the collections of incomplete response sets to the database. In some schools old and outdated web browser which did not support input validation functionality were still in use. This led to missing values in survey answers being included in the data despite the efforts made to prevent them. In this case the missingness refers to the construct-level issue as the missingness of the values is not associated with observed values. However, it does depend on other missing values and the missingness is not random as the missing values concentrate on certain respondents (see e.g., Newman 2014, 375). Missingness did not impact specific schools, as the old browser versions were still in use in a large number of Finnish schools during data collection. Nonetheless, in the end, only less than 1% of the participants had missing values. Thus, in this study, due to the large sample size and only a small proportion of missing data, the missing values in usage habit survey responses were left untreated. Instead, if there occurred missingness in the background information (such as gender, age or education) this led to the exclusion of a particular participant out of the data.

Another preparatory analysis concerned outliers. Outliers are extreme or incorrect values, which lie outside the overall distribution or pattern of variables (Gordon 2015, 422). Outliers can significantly influence the statistical evaluation (like distorting the mean and standard deviation of a sample), resulting either in overestimation or underestimation of the values. (Kwak & Kim 2017, 407.) Traditional regression models, in particular, have been said to be sensitive to outliers (e.g., Huang & Tzeng 2008, 14). Outliers may originate in data errors caused by faults in data entry or management or be correct values suggesting the need of subgroup analysis or demonstrate the inapplicability of the applied methods. (Gordon 2015, 422–424.) According to Kwak and Kim (2017, 410), there are three methods for treating outliers: trimming (i.e., excluding), winsorisation (i.e., modifying) and robust estimation.

Before the actual analyses, the outlier values in the respondents' background information were examined and the respondents who did not belong to the target group or had deliberately misused the test application were removed from the data. There were two causes that led to the removal of the respondents from the data. Firstly, if time used in the original test was less than 6 minutes and in the renewed test less than 9 minutes (as the short execution time indicated giving up or messing with the test system), the person was excluded from the data. Secondly, if the

respondent’s age was under 12 or higher than 22 (the lower values were interpreted as mistakes or misleading actions and the higher values were removed due to their rareness), the person was not included in the data. Since the intention was to apply regression analysis to analyse the data, before analysis all the variables included in the regression analyses were standardised with min-max normalisation (e.g., Suarez-Alvarez, Pham, Prostov & Prostov 2012) to range between 0 and 1 and the influence of outlying values was examined during the analysis, for example, by examining regression residuals and possible influence of rare observations on the particular results.