• No se han encontrado resultados

Capítulo 1. Introducción

1.3 Materiales como electrodo negativo en baterías de ión litio

1.3.3 Sistema binario Co-Sn

The odds ratio, which is defined as the ratio of the probability of the occurrence of an event to that of its non-occurrence (38), was used to understand the influence of each of the candidate variables on the severity of the crash. An event, in this study, is referred to the case where the crash-severity variable took a value of 1. Odds ratio (O) is given by the following expression:

O =

(4)

where,

p = probability that the crash severity takes a value of 1

Probabilities are generally bounded and linear functions are unbounded. Transforming the probability to odds and taking its logarithm removes the bounded nature of the dependent variable and a logistic model is obtained by setting the logarithm of odds of the dependent variable to a linear function of the explanatory variables (38).

A logistic regression model with k explanatory variables and i = 1, 2 … n individuals has a general form as follows (38):

log [

]

= α +β1xi1 +β2xi2 + β3xi3+………βkxik (5)

where,

α = value of the intercept,

β= estimates of different independent variables in the model, and

30 30

30

The expression for pi can be obtained by solving the logistic equation (5) as follows:

pi =

(6)

Since pi is the probability of the crash-severity variable taking the value of 1, the value

of pi ranges between 0 and 1 for all values of x’s and ’s. A logistic regression model predicts the

probability that the dependent variable takes a given value for a particular set of explanatory variables (19). In the case of a binary logistic regression model, the dependent variable takes the values of either 0 or 1.

The binary logistic regression model is an efficient tool to model crash severity, which has been considered as a dichotomous dependent variable (38). Crash severity, denoted as ‘Y’ in this case, is redefined as follows:

Y = 1, if the occupants involved in the truck crash sustained injury of any severity level. Y = 0, if the occupants involved in the truck crash did not sustain any injury.

A total of 46 independent variables related to vehicle, driver, road, and environmental conditions such as alcohol, light conditions, speed limit, etc. were considered for the model. The PROC LOGISTIC statement, available in SAS version 9.2 (40), was used to develop models using the three variable selection methods, which include forward selection, stepwise selection, and backward elimination methods. In the forward selection method, the model initially starts with no variable in it and then the variables enter one by one until all the variables in the model have significant p-value (40). A p-value of 0.05 was chosen as the level of significance and any variable having a p-value greater than 0.05 did not stay in the model (27). In the forward selection procedure, a variable once entered into the model will never leave the model (40). In

31 31

31

the backward elimination method, model initially starts with all variables and then each variable is removed one by one until all variables left in the model have the significant p-value of 0.05 (40). Variables once left can never enter the model again. The stepwise selection procedure is a combination of forward and backward selection methods, where the variables keep entering and leaving the model until the best possible model is obtained (40). These methods are used to identify the significant variables that are to stay in the model.

The maximum likelihood method (MLM) was used for estimating the coefficients of the explanatory variables in the model. Maximum likelihood is a general approach of estimation which is widely used in many different methods of statistical modeling. According to P. D. Allison, “The basic principle of this method is to choose those parameter values as the estimates which if true, would maximize the probability of observing what we have, in fact, observed (38).”

The value of the R2 statistic, which represents the amount of variability in the model explained by the independent variables, was used for selecting the best model with greater values of R2 corresponding to the better model. Also, MLM generates important model fit statistics such as the Akaike Information Criterion (AIC), Schwarz Criterion (SC), and the value of twice the negative of log likelihood ( -2 log L), both for the intercept only and the fitted model. AIC and SC values are calculated as follows (38):

AIC = -2 * log-likelihood + 2k (7) SC = -2 * log-likelihood + k log (n) (8) where

32 32

32

n = sample size.

These statistics can be used for making comparisons among a set of models obtained by different variable selection methods, with smaller values representing a better model (38). Other goodness-of-fit statistics include the percentage concordant, percentage discordant, percent tied, pairs, Somer' s D, Goodman – Kruskal Gamma, Tau-a, and C values which can evaluate the strength of the model developed. Descriptions of these parameters are as follows (7):

Percent concordant – A pair of observations with different observed responses is said to be concordant if the observation with the lower ordered response value has a lower predicted mean score than the observation with the higher ordered response value.

Percent discordant – If the observation with the lower ordered response value has a higher predicted mean score than the observation with the higher ordered response value, then the pair is discordant.

Percent tied – If a pair of observations with different responses is neither concordant nor discordant, it is a tie.

Pairs – This is a number of distinct ways of pairing up different observations. The concordant pairs, discordant pairs, and tied pairs altogether aggregate to give the total number of pairs. Each of the percent concordant, percent discordant and percent tied is calculated with respect to the total number of pairs.

• Somer’s D – Somer’s D is used to determine strength and direction of the relation between pairs of variables. Its values range from -1.0 (all pairs disagree) to 1.0 (all pairs

33 33

33

agree). It is defined as (nc-nd)/t, where nc is the number of pairs that are concordant, nd the

number of pairs that are discordant, and t the total number of pairs with different responses (38).

Gamma – The Goodman-Kruskal Gamma value closer to one indicates good association among the variables in the model. This method does not penalize for ties on either variable. Its values range from -1.0 (no association) to 1.0 (perfect association). It is defined as (nc - nd)/ (nc + nd), where nc is the number of pairs that are concordant and nd

is the number of pairs that are discordant (38).

Tau-a – Kendall's Tau-a is a modification of Somers’ D to take into account the difference between the number of possible paired observations and the number of paired observations with different responses. It is defined as (nc-nd)/n where nc is the number of

pairs that are concordant, nd the number of pairs that are discordant, and n the total

number of pairs (38).

c – Another measure of rank correlation of ordinal variables is ‘c’. It ranges from 0 (no association) to 1 (perfect association). It is a variant of Somers’ D index. The value of c is given as (38):

c = 0.5 * (1 + Somer’s D) (9)

34 34

34

Documento similar