• No se han encontrado resultados

Legislaciones aplicables en el Ecuador, sus perspectivas de aplicación

CAPÍTULO II EL PROBLEMA DE LA SEGURIDAD

3. FUNDAMENTOS JURÍDICOS DE LA SEGURIDAD EN EL TRABAJO

3.2. Legislación en Seguridad y Salud en el Trabajo en el Ecuador

3.2.4. Legislaciones aplicables en el Ecuador, sus perspectivas de aplicación

There were 26% of participants who had missing information for at least one variable in the dataset. Attempts were made to impute this missing information rather than remove these participants.

The level of missing information for each variable is presented in Figure 4.8. There were high levels of information missing for lung conditions; pneumonia, COPD and emphysema as well as asbestos exposure, ethnicity, education and BMI. Finally, there was one missing entry for age and 37 participants who did not provide ETS information.

Variable Missing Observed

Age Registered 1 2,743 Ethnicity 108 2,636 Education 116 2,628 BMI 123 2,619 ETS 37 2,707 Asbestos 214 2,530 Pneumonia 215 2,529 COPD 249 2,495 Emphysema 240 2,504 Eczema 95 2,649

Table 4.8: MSH-PMH Dataset: Identifying the Missing Information

As the MSH-PMH study recruited participants in a hospital setting, it is unlikely the information is purposely withheld. Additionally, for the small level of missing information the imputation should be robust whether MAR or MNAR. Then for all the variables that required imputation, except for age with only 1 missing entry, t-tests were performed to evaluate if the missingness of data was dependant on the observable information for another variable. The t-test results are presented in Table 4.9. All the variables

were shown to be MAR in comparison to MCAR as the missingness of the data was dependant on the results for different variables.

Variable One Variable Two T-Test

Age Ethnicity 3,41 Age Education -5.01 Age BMI -4.31 CPD ETS -3.38 Age Asbestos -5.99 COPD Pneumonia -9.36 Emphysema COPD -9.44 COPD Emphysema -17.01

Table 4.9: T-Test Results by Missing Information in the MSH-PMH Dataset for Variables considered in Imputation

The next stage was to conduct the imputation and assess if all missing information was successfully imputed. The following code was used in the imputation;

\ t e x t i t {∗ S e t t h e i m p u t a t i o n } mi s e t wide

\ t e x t i t {∗ R e g i s t e r t h e v a r i a b l e s f o r i m p u t a t i o n s }

mi r e g i s t e r imputed A g e R e g i s t e r e d E t h n i c i t y S c a l e PLCOEducationScale BMI ETS /// A s b e s t o s Pneumonia COPD Emphysema Eczema CPD Gender PMT LCC

\ t e x t i t {∗ Impute t h e m i s s i n g i n f o r m a t i o n }

mi imp c h a i n e d (pmm) A g e R e g i s t e r e d BMI CPD( l o g i t , augment ) ETS A s b e s t o s /// Pneumonia COPD Emphysema Eczema Gender PMT LCC ( o l o g i t , augment ) ///

E t h n i c i t y S c a l e PLCOEducationScale , add ( 1 1 ) r s e e d ( 1 9 ) d o t s

The imputation was successfully conducted and all missing information imputed. The estimated values were reasonable as a review of the new values did not identify any unexpected values. These values will be used to apply the prediction models in the dataset.

4.12

Summary

Ten datasets were collected from ILCCO. These could be applied to multiple lung cancer prediction models although not all models were applicable to each dataset. The datasets were prepared so that the variable information was complete for each model that was applicable.

MI used to impute missing information in instances of large levels of missing information (exceeding 10%). This was conducted if the data was shown to be MAR in comparison to MCAR as the complete case analysis may lead to biased results. Attempts were made to demonstrate the missingness of data was MAR in comparison to MNAR, however, for small levels of missing information the imputations should be robust. Two imputations were conducted, using MICE, in the ReSoLuCENT and MSH-PMH datasets.

The datasets were now fully prepared for the models to be applied and validated. The next stage is to review the datasets, their population demographic and their participant recruitment strategy.

CHAPTER

5

Dataset Descriptive Analysis

5.1

Introduction

This chapter will present the characteristics of the ten studies which will be used in the external validation. The datasets were uniquely collected for different study objectives; this may result in some unusual study populations. The chapter will provide a descriptive analysis which will highlight any similarities, differences, strengths and limitations of the datasets available by reviewing how participants were recruited for the study and population demographic. This will give an indication into any models that could have a sub-par performance in a dataset due to distinct dataset characteristics. This follows the guidelines promoted by TRIPOD in how to report any influences and limitations in the datasets used in a validation study to allow confidence in the validation results [46].

5.2

Objectives

To perform a dataset descriptive analysis, the chapter will: 1. Present how participants were recruited for each study.

2. Present the population demographic for every variable required by the models for each dataset. 3. Discuss the dataset collection and population demographic to assess if this could influence any model

performance.