𝑝(𝑔𝑜𝑜𝑑 𝑙𝑜𝑎𝑛)) = 𝑙𝑜𝑔 ( 𝑝(𝑌 = 1) 𝑝(𝑌 = 0)) = 𝑒 𝛽0+∑ 𝛽𝑖𝑥𝑖 𝑛 𝑖=1 + 𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑟𝑟𝑜𝑟
The parameters were estimated by using the maximum likelihood estimation (MLE) method and appropriate interpretation of these estimated parameters. To determine attribute scores, the regression would be done against WOE created in the initial characteristic analysis. This would be done as an alternative bypass to grouped variable credit risk modelling. The normal approach was to regress the credit quality against a set of predictor variables, which constitute numeric and created dummy variables for categorical data. In the process of selecting a good fit model various regression model building techniques were employed these include forward selection, backward elimination and stepwise in R. This was additional to the initial characteristic analysis, where p- values were used to assess the suitability of each and every suspected predictor variable.
4.4.7 Designing an initial CRM model
The designing of a statistical model required the use of statistical tools to decide which independent variables best explained the variability observable in the outcome variable. These included statistical measures such as the Chi-square, R-square, p-values and other relevant statistical model adequacy tools. In addition to the decision of the eventual initial model, business goals like risk appetite, market share strategy among other things were considered in the model construction process. Given the importance of variable selection and reduction, a risk profile was first developed through initial character analysis. This was built using a variety of suspecting predictive variables which included demographics, financial data, repayment patterns, and time-related data except for credit bureau inquiries.
The credit bureau facility was non-existent in Zimbabwean financial service sector, a limitation to development of a robust model. The envisaged model was expected to be coherent with the decision support system of ZimSME bank. Of course, this intended model was supposed to be a sole arbiter, an epitome of an experienced credit analyst, thereby making the construction of a comprehensive credit risk profile was a must. The risk profile for the ZimSME bank included characteristics of the owner, the business as well as financial data, except for external characterization due to the unavailability of the credit bureau information.
4.4.7.1 Credit risk profile
The original characteristic variables were 26 and had to be strategically and statistically reduced to build a risk profile for the ZimSME bank which would be used to develop the preliminary CRM model on the initial know-good-bad (KGB) ZimSME sample.
Table 4.4: ZimSME Bank original characteristic variable profile
Owner characteristics Enterprise characteristics Financial characteristics
Age Sector Liquidity ratio
Qualifications Local trade Gearing ratio
Experience Export trade Stock Turnover
Income New-firm Debtors days
Number of Directors Length of relationship Creditors days
Income Annual Turnover Net Profit margin
Number of employees Other loans
Collateral Tenure of loan
Technology Interest rate
Asset size Loan amount
Purpose of loan
Based on the eventual risk profile, a single regression approach would be adopted. When it was run, characteristics were placed in the regression equation in order based on information type as well as strength. Information type was ranked weaker to stronger accordingly and again attributes within each information type, characteristic variables were also ordered from weakest to strongest by means of IV. This was the sequence in which the single regression considered each characteristic. As an alternative, the predictor characteristic variables were IV ranked from highest to lowest regardless of information type.
The eventual accepted single regression model accounted for known performance of screened applicants. In fact, the preliminary model was built from a KGB sample was meant for measuring credit risk for those who have been screened to be good borrowers, the “cherry-picked” ones. This was in contrast to whole purpose of developing a decision to help bankers to classify good/bad applicants for loans from the TTD population. The adaptation of the developed preliminary model would bring in a lot of inaccuracies and misclassification due to selectivity bias induced by the non-randomness and truncation of the known-good-bad sample on which it was constructed. In fact, the known-good-bad sample represented the population of accepted applicants only and did not account for the rejected applicants. Therefore, if the preliminary model were applied to the TTD SME loan applicant population, a great deal of misclassification would ensue.
There was need to find a method to account for the missing credit quality of the rejected since all other characteristic variables were available. The method was found in the form of the reject inference techniques, to impute the credit quality of the rejected applicants such that an AGB sample was developed prior to the development of the envisaged CRM model able to classify into good or bad of the TTD SME loan applicants. There are various techniques for reject inference, but majority fall short substantive results as demonstrated by other researchers (Lin 2007; Nguyen, 2016; Kennedy; 2013; Al Baz, 2017). Therefore, for this thesis, a model-based reject inference methodology grounded on the theory of missing data and Bayesian inference analysis, built on theoretically supported assumptions was adopted.
The method incorporated the impact of the incomplete sample by imputing missing data of the response variable based on the estimated probabilities of missingness. It was a flexible approach which as well was able to incorporate supplementary information about the rejected into modelling process. In fact, it is grounded on a firm theoretical support, unlike other reject inference techniques which are grounded on tenuous assumptions, thereby making it have an edge over other reject inference techniques (Chen & Astebro, 2012; Nguyen, 2016; Ditrich, 2015; Kennedy, 2013). The method was fully described in Chapter 3, is called Bound and Collapse (BC) technique which uses Bayesian procedure to construct a model basing on the theory of missing data developed by Rubin (1976), a model-based imputation methodology.
4.4.8 Reject inference
Owing to the fact that credit quality of rejected loan applicants was not observable in ZimSME bank loan register, which contained complete data of only the presumed ‘good’ borrowers. This engendered a non-random sample due to selectivity bias which is quite rife in CRM modelling (Chen & Astebro, 2012; Smith & Elkan, 2004; Nguyen, 2016; Lin, 2007; Kraus, 2014). This background augured well for an effective implementation of the Bound and collapse (BC) methodology, to impute the credit quality of the rejected applicants thereby, eventually, developing an AGB development sample representative of the through-the-door population to be scored in future.
The preliminary model was applied on the known-good-bad (KGB) sample to create two (2) regions: the “accepted” and the “rejected”. To estimate the credit quality of the rejected clients, we emulated some banking acceptance policy making process. We defined 𝑦 = 1 be the credit quality of an SME that has been in arrears on at least one material obligation within the two-year outcome window. This meant enterprises with high credit measurement scores were likely to default, unlike those with low credit scores. If a bank were risk-averse, it would set a low cut-off point whilst if it were a risk-taker, the cut-off score would be high, implying divergent rejection policies based on risk appetite. The bank risk appetite translates into “strong” and “weak” rejection policies defining their respective market share strategy.
These two (2) rejection policies were simulated by setting logical thresholds to eliminate selectivity bias. We first instituted a “weak” rejection policy by picking up a threshold of a high CRM cut-off score. Subsequently “accepted” and the “rejected” region were created. The same was done for the “strong” rejection policy, where a low threshold CRM score was picked. All this was possible after rank-ordering the KGB sample by CRM score. This gave birth to two (2) samples with different sizes of the respective “accepted” and “rejected” regions. To create an AGB random development sample, it was needed to impute the credit quality of rejected in the respective rejected regions of the two (2) samples created through weak and strong selection procedures. We use equation (83) to estimate the respective missing credit quality. The simulated AGB samples were labelled “weak selection” and “strong selection” respectively.
providing complete random sample information indicative of the TTD applicant pool, on which the subsequent CRM model was estimated. To infer the credit risk for the rejected loan applicants, we apply the preliminary model the original KGB sample. From loan process, it is logical to suggest the probability of having a bad credit quality as a good proxy for the missing data mechanism (MDM) (Chen & Astebro, 2012; Kraus, 2014; Ditrich, 2015; Nguyen, 2016). This implied that the original credit risk score provides the much-needed information that paved way for the estimation of the missingness mechanism. For estimating the missing data mechanism, we used linear extrapolation of bad rates versus original score. The estimated probability for the rejected loan application’s being bad is considered as the probability of missingness. The simple linear regression model (SLRM) is of the form: