Por servicios no personales 80 27
GRÁFICO 9: COMUNICACIÓN
Logistic regression is typically used for classification or regression problems involving
multiple categorical, binary or continuous predictor variables and a binary outcome
(dependent variable) such as development of a disease. This is the model used for the original Framingham risk function presented in 1976 by Kannel, McGee and Gordon (6). In such cases, the outcome (e.g. development of CVD) is not continuous and Normally distributed, a requirement of linear regression analysis.
In logistic regression, a logarithmic transformation of the odds ratio (the ‘logit’) is used instead of the probability of a positive outcome. This avoids deriving meaningless probability values greater than 1.0 or less than zero (61). The other
advantage of the transform is that the logit takes values from -∞ to +∞, allowing
confidence intervals to be derived around an estimated value within this range. The logistic regression equation can then take a form similar to a multiple linear regression function, with the dependent variable (the logit) equal to the sum of an intercept (constant) and a number of predictor variables, each multiplied by its regression co- efficient:
Log (odds ratio)
=
β0+ β1X1+ β2X2+ β3X3……
[Equation 1]
where β0 is a constant and β1, β2, β3….. are the regression co-efficients for each
risk factor X1, X2, X3….etc.
Fitting the equation to the data involves maximum likelihood techniques to derive the optimal intercept and co-efficient values.
The relationship between risk factor values and the outcome is non-linear, but the log (odds ratio) is a linear function of the co-efficient values (Equation 1). Each risk
factor (X1, X2, X3 etc) makes an independent contribution to the outcome. The
proportion of overall risk attributable to each risk factor is estimable. The logit can be transformed back to produce a probability value p for a positive outcome:
p = 1/1 + exp(-(β0 + β1X1 + β2X2 + β3X3……)) [Equation 2]
In survival analysis (where the outcome of interest is the time to death or
development of some other end point) the Cox proportional hazards model is
appropriate. This uses the hazard ratio(HR) in place of the odds ratio. The HR is the
ratio of the hazard of developing the disease in the presence of one or more risk factors to the hazard in a comparator population with zero or baseline risk factor values (61). The outcome of the risk function is the log of the hazard ratio (rather than the log of the odds ratio).
Whilst Cox regression introduces a continuous dimension (the timescale at which the hazard ratio may be measured), the hazard still relates to binary outcome events. The Cox model includes an assumption that the hazard ratio itself is constant over time, even though the hazard itself may be rising or falling with time. An individual who is twice as likely to develop the disease as another individual after (say) five years remains twice as likely after ten years, even though the hazard for both may have increased. The probability distribution of the baseline survival function does not need to be specified if the constant hazard ratio assumption is valid. Cox proportional hazards was brought in to Framingham risk modelling subsequent to the original logistic regression model, to recognise the importance of the time dimension in CVD
risk, and is used by Anderson et al in paper published inCirculationin 1991 (12).
is appropriate for degenerative processes (both in medicine and engineering) where functioning components of a system tend to ‘wear out’ over time. For those at risk of a cardiovascular event, the hazard increases over time (although the hazard ratio may still in principle remain constant). Anderson et al in this later paper claimed superiority of the new algorithm over both the logistic regression and Cox proportional hazards precursors, and this model became the basis for the most widely used Framingham algorithm. The co-efficients from this paper were used in the programming of the e- Nudge algorithm described later in this thesis.
In the regression models described so far, interactions between risk factors are assumed to have a relatively minor influence on outcomes, but can be built in if expected to be important. For instance, in the Anderson equation (8), interactions between age and female gender, and between left ventricular hypertrophy and male gender, were built in to improve the statistical fit. These authors also introduced a
quadratic term, the (log (age))2, as an additional risk variable, and also built in an
interaction between this and female gender. These were found to improve the performance of the standard equations.
This discussion is intended simply to illustrate that traditional CVD risk equations, whether based on logistic regression, Cox proportional hazards, or a Weibull model, are designed to identify the independent influence of the explanatory variables and include a limited range of interaction terms. The interaction terms (and the quadratic term mentioned above used by Anderson et al) have the same status as the other weighted risk variables in the function linking predictors to outcomes (e.g. Equation 1 for logistic regression). This approach is designed to identify the most important risk factors and to measure their relative contributions to overall risk.