Ntzoufras (2009) explains that data encountered with a binary response are often modelled with logistic regression. Logistic regression is a special case of the Generalized Linear Models (GLMs). For credit scoring data, a response represents a default or “bad” score and a response represents no default or “good” score. Logistic regression makes use of the canonical link function, ( ). The logistic regression model is given below
( ) ( ) ∑ ( ) for . is the
element in the ith row and jth column of the model matrix . From this, the probability of default is given by
( ∑ ) ( ∑ ).
Other link parameters are also possible to model binary response data, for example the probit and clog-log links.
36 ( | ) ∏ ( ) (3.13) ∏ ( ( ( ∑ ∑ ) ) ) ( ( ∑ ) ( ∑ )) .
Estimation of the parameters for logistic regression can be done using the IRWLS procedure.
Parameter interpretation
The parameters in logistic regression have an interpretation in terms of odds and odds ratios. Odds is defined as the relative probability of success ( ) compared to the probability of failure ( ) when the data is binomial (Ntzoufras, 2009). Thus,
and the logistic regression model can be rewritten as
( ), ( ) ∑ ( ) .
Odds provides a number to multiply the probability of failure by in order to calculate the probability of success. can be interpreted as follows: a unit increase in with all the
other ’s held fixed increases the log-odds of success by or increases the odds of
success by . This interpretation is a major advantage of logistic regression as no such simple interpretation exists for other link functions such as the probit.
In credit scoring, a success corresponds to a default or bad applicant. Thus, the log-odds of success is the log-odds of default in the context of credit scoring. Therefore, can be interpreted as follows: a unit increase in with all the other ’s held fixed increases the
log-odds of default by or increases the odds of default by . A positive value for thus increases the odds of default as increases, while a negative value for decreases
37
Assessment of fit
Dobson and Barnett (2008) state that one way of assessing the fit of a model is to compare it with a model with the maximum number of parameters. The model with the maximum number of parameters is called the saturated model and has the same number of parameters as covariate patterns (i.e. observations with the same values of all the variables). The saturated model tells us no more than the actual data and is often non-informative (Faraway, 2006). However, we can use the saturated model to compare prospective models. The difference between the log-likelihood for the full model and model under consideration gives the likelihood ratio statistic, known as the deviance
( ̂ ) ( ̂) .
The deviance for the binomial model is now derived. This follows from Dobson and Barnett (2008). From Equation (3.13) the likelihood function is
( ) ∏ ( ) ( ) ( ) ( ) . This in term means the log-likelihood function is
( ) ∑ ( ) ( ) ( ) ( ) . (3.14) From this we find the maximum likelihood estimate for . Now, differentiating and equating to zero we have
( ) ( )
38
̂
.
Now, the maximum value of the log-likelihood function Equation (3.14) is
( ̂ ) ∑ ( ) ( ) ( ) ( ) .
For any other model with number of parameters less than the number of covariate patterns, let ̂ ̂ denote the fitted values. Then, the log-likelihood evaluated at these values is
( ̂) ∑ ( ̂) ( ̂) ( ̂) ( ) . Therefore, the deviance for the Binomial model is
[ ( ̂ ) ( ̂)] ∑ ( ) ( ) ( ) ( ) ∑ ( ̂ ) ( ̂) ( ̂) ( ) ∑ ( ) ( ̂) ( ) ( ̂) ( ) ( ̂) ∑ ( ( ) ( ̂)) ( ) ( ) ( ) ( ̂) ∑ ( ̂) ( ) ( ) ( ̂) ∑ ( ̂) ( ) [ ( ̂)] ( )
This deviance has a chi-squared distribution with degrees of freedom equal to the number of covariate patterns less the number of parameters. The deviance can, therefore, be used in a hypothesis test to assess the fit of a model. However, when the outcome is binary, i.e. when takes on the values zero or one, this goodness-of-fit measure is no longer useful.
39 There is also a Hosmer-Lemeshow statistic which tries to overcome the problem of a goodness-of-fit statistic for binary data (Hosmer and Lemeshow, 2000). However, its use is still questionable.
Classification
Logistic regression models a binary outcome. The objective is often to classify.
In order to perform classification, Hosmer and Lemeshow (2000) explain that a cut-off point, , must be defined. The estimated probabilities from the logistic regression model are compared to this cut-off point. If the estimated probability exceeds , we let the derived variable be equal to 1; if the estimated probability is less than , we let the derived variable be equal to 0.
Classification tables, according to Hosmer and Lemeshow (2000) are a good way to summarize the results of a fitted logistic regression model. The outcome variable is cross classified with a dichotomous variable whose values are derived from the estimated logistic probabilities.
The predictive performance of the logistic regression model is probably the best way to assess the fit of the model. The way to do this will be to split a data set into a training and a test set. The model will be estimated on the training set and its performance will be tested on a test set with a certain cut-off probability, . A classification table will then be established and the error rate of the model can be used as a measure of how well the model fits (Table 3.1).
40 Table 3.1 Classification table of the predictive performance of the logistic regression model.
Predicted
Good Bad
Actual Good Bad
From Table 3.1, a number of facts can be established.
- represents the number of applicants in the test set.
- is the number of applicants classified as bad. This is the number of applicants who were rejected in their application for credit.
- is the number of applicants classified as good. This is the number of applicants who were accepted in their application for credit.
- is the number of applicants correctly classified as good and is the number of applicants correctly classified as bad.
- is the number of applicants classified as bad but are in fact good. This number represents missed out profits for the financial institution.
- is the number of applicants classified as good but are in fact bad. This number represents bad debts and losses in income for the financial institution.
- The total error probability of the classification is ( ) ( ). This value must be small. It also gives an indication of the goodness-of-fit of the model.
- For the applicants that will be accepted by the financial institution, the error probability is
( ). This is the error rate realized by the bank. Thus, it is very important that is as small as possible.
- A cut-off probability , needs to be found which minimizes the classification error.
The choice of the cut-off probability , is often a subjective choice. For lower , more applicants will be classified as bad. For higher , the more applicants will be classified as good. A lower cut-off probability means that the financial institution is more risk averse as opposed to one with a higher cut-off probability. Because the error rate realized by the financial institution is greatly affected by how large is, it is important that the error among the bad applicants is minimized as well as the total error.
41 The optimal cut-off probability can be found by using a validation set. The classification error can be determined for different cut-off probabilities. The cut-off probability which gives the lowest classification error on the validation set will be chosen and used in further analysis.