j j
a a a b b b
G Y Y x G Y Y x (Hess, 2005; Train, 2003).
The simplest form of GEV model is the multinomial logit model where the correlation between the error terms across different alternatives is zero. Thus in this case, the GEV distribution is the product of independent extreme value distributions (Train, 2003).
Looking at the theoretical foundation for logit models, the next section details the workings of each of the relevant models in context of the research question.
4.3.2 Types of Choice Model
Data from the choice experiment can be analysed using a range of logit models based on the theoretical and empirical requirements. This section details the theory underpinning those choice models which have been specifically applied to address the research question. The section will begin by the most basic choice model, the multinomial logit model and proceed successively based on the level of complexity.
4.3.2.1 Multinomial Logit
The multinomial logit model (MNL) is the most basic form of logit model which states that in any particular choice situation, the probability that an alternative i will be chosen from the set of alternatives is given by the following equation:
jn n
in V
A j V
n e e
A i
P( : ) /
where,
Vinis the deterministic utility of alternative i in choice situation n Vjnis the deterministic utility of alternative j in choice situation n
The value of the deterministic utility is assumed to depend on the values of the variables which affect choice and the functional relationship is generally assumed to be linear in parameters (Louviere et al., 2000).
The binary logit (BL) model is the special case of the MNL where there are only two alternatives available to the respondent. The binary logit probability therefore takes the following form:
( ) Vin Vin Vjn
P i e e e (Ben-Akiva and Lerman, 1985)
The MNL model rests on two main assumptions –
1) The variance of the error terms is independent and identically distributed (i.i.d.) from each other and,
2) The probability of choosing an alternative is unaffected by the inclusion of another alternative in the choice set (the independence of irrelevant alternatives property - IIA)
Thus both the BL and MNL model forms do not take any form of heterogeneity or error correlation into account. In the simplest type of analysis, the binary logit
method can be used to analyse the binary choice data while the data from the one and two stage Likert scales can be analysed using the MNL model.
In case of the MNL application for the Likert scale data, differences in the preference certainty level of the alternatives can be modelled by adding an alternative specific constant. An alternative specific constant (ASC) represents the mean distribution of the unobserved effects of the error components on the alternative (Louviere et. al, 2000) and for J alternatives, only J-1 ASCs can be defined (Train, 2003). Thus, a constant added to the option say, ‘Definitely A’ can be interpreted as the average effect of the unincluded factors on utility of
‘Definitely A’ relative to ‘Probably A’.
While the value of the ASC and its associated t-ratio can be used to judge the average effect of the unobserved factors in the utility model, a MNL model does not consider the effect of the difference in the error variance structure between the relevant alternatives. Hence, though ASC can be applied to estimate the relative effect of unobserved factors, further examination of error structure is important through application of higher model forms as the MNL analysis does not provide any significant insight on the type of i.i.d. or IIA violation affecting the preference levels.
4.3.2.2 The Nested Logit
The first step to relax the IIA assumption can be carried through the nested logit (NL) model which considers the possibility of nesting alternatives with some degree of similarity between them. The rationale of the nested logit model thus lies in the fact that the commonality of characteristics between alternatives results in some parts of the random error to be correlated. Nested logit model is formed by partioning the choice set such that the alternatives that share common unobserved components are grouped together resulting in the i.i.d. assumption to be relaxed as the error component are correlated within a nest but not across the nests. Thus, NL models are applied when there are shared unobserved stochastic components associated with different choice dimensions (Ben-Akiva and Lerman, 1985). By
grouping the alternatives into subsets that allow for varying error variance, the homoskedastic assumption of the MNL model is relaxed. However, though the variance of the error differs across the alternatives, the overall variance of the error of all the alternatives remains constant (Shen, 2005). Thus, homoskedasticity is maintained with the NL model.
While the nested logit structure can imply a hierarchy of choice, the derivation of the model makes no assumption of the structure of the choice process (Koppelman and Bhat, 2006). Thus, the nesting structure is reflective of the assumptions made on the correlation between alternatives and not on the sequence adopted during the decision-making process.
The assumptions of the nested logit model can be summarised as – 1) the IIA condition holds within a nest and 2) the IIA condition does not hold in general for alternatives in different nests (Train, 2003).
In case of the NL model, the joint probability of an alternative in a nest can be given as the product of the probability of choosing a nest and the conditional probability of choosing the alternative given a particular nest is chosen.
Assuming for example that a five point Likert scale (Definitely A, Probably A, Uncertain, Probably B and Definitely B) can be nested in the following way (and noting that other nesting structures are possible):
Figure 4.1Nesting Structure for the preference levels
The composite utility of selecting Definitely A can be given as:
While the joint, marginal and conditional probabilities can be respectively given as:
, .
Dis known as the logsum parameter or the expected marginal utility (EMU) or inclusive value (IV) and is equal to D(Ortuzar and Willumsen, 2004). The degree of independence or dissimilarity in the random error across the alternatives in a nest is given by this log-sum coefficient (also called the nesting or the structural parameter) which can differ over nests. When this parameter equals unity, the nested logit model reduces to the MNL. For the model to be consistent with utility maximising behaviour, the nesting parameter has to range from 0 to 1.
When this is greater than 1, the model is consistent with utility maximising (UM) behaviour for some of the explanatory variables but not all while in the case of negative values, the model is inconsistent with UM behaviour (Train, 2003).
As data based on preference certainty level share all generic coefficients, the distinction between non-normalised nested logit (NNNL) model and utility
maximising nested logit (UMNL) needs to be taken into consideration depending on the software used for analysis.
Based on the scaling of the deterministic component of the utility, the two different types of NL model can be specified as follows:
a) The Utility Maximising Nested Logit (UMNL) model is based on McFadden’s Generalised Extreme Value (GEV) where marginal and conditional probabilities are multiplied by λ and respectively and the IV parameter is multiplied by λ
b) The Non-Normalised Nested Logit (NNNL) is the basis of the software ALOGIT and has an implicit nest specific scaling within the specification
Based on which level the specification is normalised, two forms of UMNL can be derived.
a) Normalisation on the lower level mn 1leads the model to become NNNL (RU1)
b) Normalisation on the upper level m n 1is UMNL or RU2
If a constant is added to each alternative’s deterministic utility, NNNL/RU1 does not give results consistent with theory but only RU2 satisfies the condition.
Consistency of RU1 with RUT can be achieved by imposing the upper level scale parameter to be held constant whereas consistency in NNNL can be achieved by holding the lower level scale parameter to be constant across the nests (Silberhorn et al., 2007).
When there are no generic coefficients in the nest, both the UMNL and NNNL specifications are equivalent but the coefficient estimated with NNNL have to be rescaled with the accordingly estimated IV parameter to be equivalent to that obtained from UMNL. When generic coefficients enter the model, NNNL software cannot be used to estimate the parameters and only the RU2 normalisation of the UMNL should be carried out to be consistent with RUT.
In the UMNL specification, VD and VA|D are scaled explicitly with parameters λm
and m respectively while this is automatically and implicitly done in the NNNL software. For the parameters to be consistent with RUT using NNNL, certain restrictions have to be imposed. Firstly, the coefficients in each nest have to be scaled equally and the IV parameters have to be constrained to be equal for all nests. Rescaling the parameter estimates in the restricted NNNL model with the estimated IV parameters results in the parameter estimates of RU2 (Silberhorn et al., 2007).
Another possibility to guarantee consistency with RUT without imposing restriction on the IV parameters can be achieved by introducing additional dummy nests below the lowest level and the additionally estimated scale parameter has to be defined in such a way that the products of all ratios of scale parameters between levels must be identical from the root to all elemental alternatives (Koppelman and Wen, 1998).
Using the UMNL or the modified NNNL model (such that it is consistent with RUT) to analyse the preference certainty data, the NL model allows the nesting of the different alternatives based on the preference level. Thus, alternatives with similar error structure based on similar preference certainty level (such as
‘Definitely A’ and ‘Definitely B’ or ‘Probably A’ and ‘Probably B’) can be grouped together. The nest parameter estimated in this case allows examining any patterns of error correlation between nested alternatives as well as estimating the level of error variance across the different nests. While the NL model sheds some light on the correlation of the error variance, in its pure form, it does not examine the possibility of heteroskedasticity. To examine whether a further relaxation of the i.i.d. assumption is required, the mixed logit model needs to be applied in the analysis.
4.3.2.3 The Mixed Logit Model
The mixed logit (ML) model has been developed to account for the richness in the unobserved component by considering the potential for correlation and heteroskedasticity. Based on the cause of the flexible error assumption, two forms
of the ML model can be derived – one allowing the parameter estimates to follow a non-Gumbel distribution and another allowing for heteroskedasticity of the error variance.
In the first case, the probability equation of the ML model is a weighted average of the logit probability evaluated at different values of the parameters, with the weights given by the density which is a function of the parameter. The function of the parameters can be specified to have any distribution apart from Gumbel (i.e., normal, log-normal etc.). Under this derivation of the ML model, the values of the parameter can be interpreted as representing different tastes of the individual decision makers. As the researcher cannot know each individual’s parameter estimates, the unconditional choice probability is an integral of all possible values of the parameter (Train, 2003).
Considering β to be the parameter vector which can be interpreted as representing the taste of an individual decision maker, the utility of a person n from alternative j can thus be specified as:
nj nj
nj x
U n'
The coefficients βnis allowed to vary across individuals and the choice probability is therefore conditional on this parameter. Though the decision maker is assumed to be aware of his own βn and εn and choose based on utility maximisation, the researcher is unaware of βn and therefore cannot form a probability function which is conditional on β. The unconditional choice probability is therefore given as an integral of the conditional probability over all possible variables of the parameter.
It can therefore be given as:
Where, f()is the density function
The distribution of the parameter is specified by the researcher and the parameters then estimated. The random parameters model is most suitable to capture inter-respondent taste heterogeneity.
The second type of the ML model is built by allowing for heteroskedasticity and/or correlation in the error components across the utilities of the different alternatives.
Using this Error Components Logit (ECL) model, the stochastic error portion of the utility can be decomposed into two parts – one which is correlated over alternatives depending on the specification and another which is i.i.d. Gumbel distributed. For the standard logit model, the former error component is identically zero, implying no correlation in utility over alternatives (Train, 2003).
The decomposition of the error component can be shown as follows:
Uni= Vni + (ηni +ni)
Where ηniis the random component with zero mean and distribution dependent on the underlying parameters and niis the random component with zero mean that is independent and identically distributed across alternatives for an individual n.
For any given value of ηni, the conditional choice probability is logit as thenierror component is assumed to be i.i.d. and is given by:
( ) exp( ) exp( )
ni ni ni j nj nj
L x
x Since ηniis not given, the unconditional probability is the integration of all values of η weighted by its density.
( ) ( )
ni ni
P
L f d(Hensher and Greene, 2001) Where Ω are the fixed parameters of the distribution.Through the flexibility provided by the ECL model, various forms of correlation and heteroskedasticity can be examined from the preference data. This model is especially interesting due to the larger range of error effect that can be examined.
An analogue of nested logit can be obtained by specifying a dummy variable for each nest that equals unity for each alternative in that nest and zero for each alternative outside the nest. The heteroskedastic random error of the ECL model can be entered as the error associated with each alternative in the nest thus inducing correlation among alternatives. By not entering for any of the alternatives in the other nests, no correlation is induced between alternatives across nests while it is done so for those within a nest. The variance captures the magnitude of correlation and it plays an analogous role to the inclusive value coefficient of the NL models (Train, 2003). However, while the NL model maintains homoskedasticity of the error variances, this is not the case with the ECL model which mimics correlation structure developed with the NL model (Munizaga and Alvarez-Daziano, 2001).
An analogue to the heteroskedastic extreme value (HEV) model can be obtained by specifying the error variance to have a different variance for each of the preference levels.
The HEV model is a relatively restrictive model as it relaxes the identically distributed error structure assumption without relaxing the independence assumption. The fundamental underpinning of this model lies in the heteroskedasticity of the error terms thus relaxing the IIA assumption. The alternative error terms are assumed to be type 1 Gumbel distributed with the variances allowed to be different across the alternatives (Bhat, 2000). This is achieved within the HEV framework by allowing different scale parameter for the error term across alternatives.
Let z i/ i
where, iis the error and i is the scale parameter.
The choice probability of the HEV model can then be given as:
,
By allowing for different scale parameters, the HEV model avoids the problem of the i.i.d. property. Intuitively, the random term can represent the unobserved characteristics of the alternatives and hence the scale parameter in the case of the HEV model represents the uncertainty related with expected utility of the alternative (Hensher, 1999).
4.3.2.4 Ordered Logit
The ordered logit (OL) model has been used to analyse the strength of respondents’
preferences under the rating exercise. With this method, the respondent’s preference certainty level is interpreted as an ordinal indicator of the relative size of the utility difference. Considering A and B to be two alternatives, the difference in the utilities across the alternatives can be given as:
( A, A) ( B, B)
dU U X Y U X Y dv (Swallow et al., 2001)
Where, XA, XB, YA and YB are the vector of attributes associated with the corresponding alternatives and ε is the difference in the stochastic error from the two alternatives. The ordered response implies the existence of unobserved threshold parametersj that demarcate the intervals or categories within which the utility difference may fall. The ordinal indicator defines a set of dummy variables Djthat records to which category an observation belongs. Therefore,
Dj = 1 if θ j-1 < dU < θj
= 0 otherwise; where j = 1, 2… J
The ordered response implies that the respondent’s strength of preference or preference certainty level will be given by the following probability function:
1 1
Pr(j dUj)Pr(dUj) Pr( dUj)j Pr(Dj 1) F(j dV)F(j1dV)
By normalising θ0to -∞ and θjto ∞, the following log likelihood can be obtained:
(Whelan and Tapley, 2006)
This log likelihood has been used for estimation to measure the strength of preference where respondents are asked to indicate how strongly they prefer the choice they have already made. Thus the respondent establishes his choice first and then rates his strength of preference (Swallow et al., 2001).
For each of the model, along with the parameter coefficient, a threshold coefficient is estimated. The threshold coefficient is along the scale of the utility difference and the probability of an alternative being chosen is the probability that the utility difference is within a particular range of the associated threshold coefficient (Train, 2003). Thus in case of the five point Likert choice: Definitely A, Probably A, Uncertain, Probably B and Definitely B, the distribution of the utility difference can be given in the following manner:
Figure 4.2 Preference Levels with Ordered Logit Analysis
1
1 1
[ ( ) ( )] jn
N J
D
j n j n
n j
LL F dV F dV
The probability that ‘Definitely A’ is chosen in this case is given by the probability that the utility difference is less than threshold value k1.
As the semantics of the levels of preference certainty can be different for different people, models have been developed with variable threshold coefficients using a mixed ordered logit model (Whelan and Tapley, 2006).
4.3.2.5 Other Model forms
In addition to the models outlined in the preceding sections, other model forms such as cross nested logit (CNL), ordered generalised extreme value (OGEV) and latent class models can also be used to analyse the preference uncertainty data. While this section briefly outlines the methods associated with each of these models, they are not considered in depth due to the absence of their application.
The Cross Nested logit (CNL) is an extension of the Nested Logit model allowing for ambiguous allocation of alternatives to nests. The CNL model allows nests to share some alternatives having degrees of similarity between them. Along with the nesting parameter, an additional allocation parameter is estimated in the model which reveals the degree to which an alternative belongs to a nest. An important constraint is that the sum of all allocation parameters for an alternative across all nests must be equal to unity. Thus the value of the allocation parameter lies between 0-1 (Train, 2003). Under certain specifications of the allocation parameter (which show the degree of membership of an alternative in a nest) and the nesting parameter (which indicate the degree of independence in the random error between alternatives within a nest), the CNL model reduces to NL and MNL (Hess et al., 2004).
Ordered Generalised Extreme Value (OGEV) derives from the Generalised Extreme Value class of RUM models. The OGEV model differs from ordered logit in the way that the alternatives ordered in close proximity have higher correlation
Ordered Generalised Extreme Value (OGEV) derives from the Generalised Extreme Value class of RUM models. The OGEV model differs from ordered logit in the way that the alternatives ordered in close proximity have higher correlation