• No se han encontrado resultados

3.2 ESPECIFICACIÓN DE REQUERIMIENTOS DEL SISTEMA

3.2.1 REQUERIMIENTOS DEL HARDWARE

Logistic regression is a non-linear method of modeling for dichotomous dependent variables (Liou, 2008). That is, the classifying variable, usually known as a binary variable, can only have two defined outcomes. In the present study these are category (y) equals (1) or (0). Logistic regression or logit analysis is therefore considered suitable for this study because of the existence of binary or dichotomous dependent variables (Mazzarol, 1998). Besides ability to perform binary classification, the method allows for tests of overall fit of a model, and takes all variables, of all constructs simultaneously in assessing satisfaction with test requirements. In a review of prediction methods, both data modern and traditional techniques, logistic regression (also classified as a data mining algorithm) was ranked second to popular neural networks in terms of prediction accuracy among 32 classification cases (Liou, 2008).

The purposes of logit analysis in this study were to estimate the conditional probability that a firm belongs to either classification (successful or not successful in commercialisation), to identify significant predictors of success or lack of it, and test the effectiveness of fitted models in classifying the sample of 103 firms ((Liu & Lee, 1997; Liou, 2008; Kennedy, 1998; Laittinen, 2002). The logistic procedure estimates the coefficients of a probabilistic model involving a set of

independent variables that best predict the value of the dependent variable. A positive coefficient increases the probability, while a negative value decreases the predicted probability of the outcome being in either of the two dependent categories (Mazzarol, 1998). Variables with larger coefficients are more useful in identifying success cases.

In the current study, two measures of success are investigated, the LMA and CI dependent variables both representing the likelihood of success with commercialisation. The analysis examined the effectiveness of a list of 33 possible predictor variables (see Table 3.2) for future success in commercialisation of African MFIs. Future success in commercialisation is measured for two years and predicted by prior year one (2001) data under the analysis (Laittinen, 2002 Lekkos, 2001). Fitted models were investigated or assessed on their ability through measures of goodness of fit.

Liu and Lee (1997) point out that logistic regression fits well, particularly when the data are not normally distributed and when many independent variables are binary in nature. As suggested earlier, it was necessary to consider more prediction models investigating the binary-classification problem, especially to enable comparison of observed goodness-of-fit indices based on conventional prediction models. This was considered useful in obtaining robust results for the predictive ability of the explanatory variables and sub-models (Liou, 2008; Lekkos, 2001), and also to investigate whether other prediction techniques using classification trees (such as random forests) perform better, particularly when performance is low, as it is indicative that there is more room for improvement (Lariviere & Van den Poel, 2004).

Due to the small size of the sample and the need to preserve a degree of freedom, the logistic analysis applied stepwise logistic regression procedures to all the data, and also to a sub-set of the most important variables identified in the original run in random forests (Laittinen, 2002). Another motivation to perform a stepwise logistic regression analysis was to isolate variables with significant variables to be used in further tests due to the fact that the number of explanatory regressors was considered many (Refer to Table 3.2). This makes it easy to interpret the results and assess predictive power of significant variables (Liou, 2008). Besides investigating the binary classification problem and identifying best predictors, a variety of statistical tests or sub-models were investigated in order to check robustness, control for the effects of associated variables that mask others, benchmark RF results and use the results to develop a better prediction model (Konish & Yasuda, 2003; Pille & Paradi, 2002; Liou, 2008; Kolari et al., 2002).

The logistic model was estimated by the method of maximum likelihood for all regression techniques. A maximum likelihood method as a conditional probability model is usually used to find the model that best distinguishes the two groups in the expected outcomes. The logic of the

analysis is formulated by the linear rating rule, namely classify an MFI with characteristics given by the explanatory variables 91, . . . , 97 to category y equals (1) or (0) if the conditions are met.

The generalised form of a logistic regression model for the case of a single dichotomous dependent variable, and multiple independent variables can be expressed as follows (Liou, 2008: 653; Mazzarol, 1998: 170; Laittinen, 2002: 880):

X W 1 1

1 en

where: ^ o_ o9 oL9L . . . oF9F

у the dichotomous dependent variable, successful commercialisation and is measured by either > 789 and ',

Ρ у 1 the conditional probability of an MFI being classified as successful or less successful. pF are the independent variables or predictors from 2001 (the 33 variables, see Table 3) o_ is an intercept term

oF the parameters for the logistic regression coefficients for predictor variables () the quantity 2.1828+, the base of natural logarithms 9, ..., 9F

For the case of a multivariate logistic regression model, the above expression can be specified as:

7Jq/ 1 qS r o9. . . oFst u

Where q probability that the value of the dichotomous dependent variable, W, equals 1 9, ..., 9F = independent variables

f = constant

o, . . . , oF = coefficients

U stochastic disturbance term representing that part of T7Jq/ 1 qS

which is unexplained by the independent variables. It is noted that the left hand side of the equation is not the dependent variable, y, itself; but the so-called ‘log odds’ or ‘logit’ of y. It is usually recommended that dichotomous independent variables are treated as if they are continuous.

Documento similar