GRÁFICO 9: COMUNICACIÓN

Por servicios no personales 80 27

Logistic regression is typically used for classification or regression problems involving

multiple categorical, binary or continuous predictor variables and a binary outcome

(dependent variable) such as development of a disease. This is the model used for the original Framingham risk function presented in 1976 by Kannel, McGee and Gordon (6). In such cases, the outcome (e.g. development of CVD) is not continuous and Normally distributed, a requirement of linear regression analysis.

In logistic regression, a logarithmic transformation of the odds ratio (the ‘logit’) is used instead of the probability of a positive outcome. This avoids deriving meaningless probability values greater than 1.0 or less than zero (61). The other

advantage of the transform is that the logit takes values from -∞ to +∞, allowing

confidence intervals to be derived around an estimated value within this range. The logistic regression equation can then take a form similar to a multiple linear regression function, with the dependent variable (the logit) equal to the sum of an intercept (constant) and a number of predictor variables, each multiplied by its regression co- efficient:

Log (odds ratio)

=

β0+ β1X1+ β2X2+ β3X3

……

[Equation 1]

where β0 is a constant and β1, β2, β3….. are the regression co-efficients for each

risk factor X1, X2, X3….etc.

Fitting the equation to the data involves maximum likelihood techniques to derive the optimal intercept and co-efficient values.

The relationship between risk factor values and the outcome is non-linear, but the log (odds ratio) is a linear function of the co-efficient values (Equation 1). Each risk

factor (X1, X2, X3 etc) makes an independent contribution to the outcome. The

proportion of overall risk attributable to each risk factor is estimable. The logit can be transformed back to produce a probability value p for a positive outcome:

p = 1/1 + exp(-(β0 + β1X1 + β2X2 + β3X3……)) [Equation 2]

In survival analysis (where the outcome of interest is the time to death or

development of some other end point) the Cox proportional hazards model is

appropriate. This uses the hazard ratio(HR) in place of the odds ratio. The HR is the

ratio of the hazard of developing the disease in the presence of one or more risk factors to the hazard in a comparator population with zero or baseline risk factor values (61). The outcome of the risk function is the log of the hazard ratio (rather than the log of the odds ratio).

Whilst Cox regression introduces a continuous dimension (the timescale at which the hazard ratio may be measured), the hazard still relates to binary outcome events. The Cox model includes an assumption that the hazard ratio itself is constant over time, even though the hazard itself may be rising or falling with time. An individual who is twice as likely to develop the disease as another individual after (say) five years remains twice as likely after ten years, even though the hazard for both may have increased. The probability distribution of the baseline survival function does not need to be specified if the constant hazard ratio assumption is valid. Cox proportional hazards was brought in to Framingham risk modelling subsequent to the original logistic regression model, to recognise the importance of the time dimension in CVD

risk, and is used by Anderson et al in paper published inCirculationin 1991 (12).

is appropriate for degenerative processes (both in medicine and engineering) where functioning components of a system tend to ‘wear out’ over time. For those at risk of a cardiovascular event, the hazard increases over time (although the hazard ratio may still in principle remain constant). Anderson et al in this later paper claimed superiority of the new algorithm over both the logistic regression and Cox proportional hazards precursors, and this model became the basis for the most widely used Framingham algorithm. The co-efficients from this paper were used in the programming of the e- Nudge algorithm described later in this thesis.

In the regression models described so far, interactions between risk factors are assumed to have a relatively minor influence on outcomes, but can be built in if expected to be important. For instance, in the Anderson equation (8), interactions between age and female gender, and between left ventricular hypertrophy and male gender, were built in to improve the statistical fit. These authors also introduced a

quadratic term, the (log (age))2, as an additional risk variable, and also built in an

interaction between this and female gender. These were found to improve the performance of the standard equations.

This discussion is intended simply to illustrate that traditional CVD risk equations, whether based on logistic regression, Cox proportional hazards, or a Weibull model, are designed to identify the independent influence of the explanatory variables and include a limited range of interaction terms. The interaction terms (and the quadratic term mentioned above used by Anderson et al) have the same status as the other weighted risk variables in the function linking predictors to outcomes (e.g. Equation 1 for logistic regression). This approach is designed to identify the most important risk factors and to measure their relative contributions to overall risk.

In document El clima laboral y su incidencia en el desempeño laboral de los trabajadores administrativos de la Municipalidad Distrital de Alto Selva Alegre 2017 (página 114-118)