CAPÍTULO 2. DIMENSIÓN ÉTICA DEL TRABAJO SOCIAL Y DEL SISTEMA DE SERVICIOS SOCIALES.
2.2. El Trabajo Social hoy: hacia la reflexión ética de su intervención en el Sistema de
2.2.2. Ética y Trabajo Social desde el referente de los Derechos Humanos
For log-linear models, including graphical log-linear models, we can apply the inferential theory derived for generalised linear models. We now insist on model comparison. This is because the use of conditional independence graphs per- mits us to interpret model comparison and choice between log-linear models in terms of comparisons between sets of conditional independence constraints. In data mining problems the number of log-linear models to compare increases rapidly with the number of considered variables. Therefore a valid approach may be to restrict the class of models. In particular, a parsimonious and effi- cient way to analyse large contingency tables is to consider interaction terms in the log-linear expansion that involve at most two variables. The log-linear models in the resulting class are all graphical. Therefore we obtain an equiv- alence relationship between the absence of an edge between two nodes, say i
and j, conditional independence between the corresponding variables, Xi and
Xj (given the remaining ones), and nullity of the interaction parameter indexed
by both of them.
As we saw with generalised linear models, the most important tool for compar- ing models is the deviance. All three sampling schemes for log-linear models lead to an equivalent expression for the deviance. Consider, for simplicity, a log-linear model to analyse three categorical variables. The deviance of a model M is
G2(M)=2 jkl njkllog njkl ˆ mjkl =2oilog oi ei
where mˆjkl =npjkl, the pjkl are the maximum likelihood estimates of the cell
probabilities, theoi are the observed cell frequencies and theei indicate the cell
frequencies estimated according to the model M. Notice the similarity with the deviance expression for the logistic regression model. What changes is essentially
the way in which the cell probabilities are estimated. In the general case of a
p-dimensional table, the definition is the same but the index set changes:
G2(M0)=2 i∈I nilog ni ˆ m0 i
where, for a cellibelonging to the index setI,ni is the frequency of observations
in theith cell andmˆ0
i are the expected frequencies for the considered modelM0.
For model comparison, two nested models M0 and M1 can be compared using the difference between their deviances:
D=G20−G21=2 i∈I nilog ni ˆ m0 i −2 i∈I nilog ni ˆ m1 i =2 i∈I nilog ˆ m1 i ˆ m0 i
As in the general case, underH0, D has an asymptotic chi-squared distribution whose degrees of freedom are obtained by taking the difference in the number of parameters for modelsM0 and M1.
The search for the best log-linear model can be carried out using a forward, backward or stepwise procedure. For graphical log-linear models we can also try adding or removing edges between variables rather than adding or remov- ing interaction parameters. In the backward procedure we compare the deviance between models that differ by the presence of an edge and at each step we elimi- nate the less significant edge; the procedure stops when no arc removals produce a p-value greater than the chosen significance level (e.g. 0.05). In the forward procedure we add the most significance edges one at time until no arc additions produce ap-value lower than the chosen significance level.
5.5.5 Application
We can use a log-linear model to determine the associative structure among variables in a credit risk evaluation problem. The considered sample is made up of 8263 small and medium-sized Italian enterprises. The considered variables are
A, a binary qualitative variable indicating whether the considered enterprise is deemed reliable (Good) or not (Bad);B, a qualitative variable with 4 levels that describes the age of the enterprise, measured from the year of its constitution;C, a qualitative variable with 3 levels that describes the legal status of the enterprise;
D, a qualitative variable with 7 levels that describes the macroeconomic sector of activity of the enterprise; andE, a qualitative variable with 5 levels that describes the geographic area of residence of the enterprise.
Therefore the data is classified in a contingency table of five dimensions and the total number of cells is 2×4×3×7×5=840. The objective of the analysis is to determine the associative structure present among the five variables. In the absence of a clear preliminary hypothesis on the associative structure, we will use a backward procedure for model comparison. Given the small number of variables to be determined, we can consider all log-linear models, including non-graphical models.
The first model to be fitted is the saturated model, which contains 840 parame- ters, equal to the number of cells. Here is the corresponding log-linear expansion embodying the identifiability constraints described earlier:
logµABCDE ijklm =u +uA i +uBj +ukC+uDl +uEm +uAB ij +uACik + +uADil +uAEim +uBCjk +uBDjl +uBEjm +uklCD+uCEkm +uDElm +uABC ijk +u ABD ijl +u ABE ijm +u BCD jkl +u BCE jkm +u CDE klm +u ACD ikl +u ACE ikm +u ADE ilm +u BDE jlm +uABCD
ijkl +uABCEijkm +ujklmBCDE+uACDEiklm +uABDEijlm
+uABCDE ijklm
Notice that the saturated model contains interaction terms of different order, for example, the constant (first row), terms of order 2 (third row) and one term of order 5 (fifth row).
The backward strategy starts by comparing the saturated model with a sim- pler model that omits the interaction term of order 5. At a significance level of 5% as the p-value for the deviance difference is 0.9946. We then look at interaction terms of order 4, removing them one at a time to find the simpler model in each comparison. We continue through the terms and down the orders until we can achieve no more simplification at our chosen significance level of 5%. The final model contains the constant term; the main effects and the interac- tionsAB, AC, BC, AD, AE, BE, ABCthat is, 6 interactions of order 2 and one interaction of order 3. These interaction terms can be described by the generators (AD, AE, BE, ABC). Figure 5.8 shows the conditional independence graph for the final model.
Notice that Figure 5.8 contains three cliques:ABC, andABEandAD. Since the cliques of the graph do not coincide with the generators of the log-linear model, the final model is not graphical. The log-linear model without the order 3 interaction term ABC would have the generators (AB, AC, BC, AD, AE, BE) and would be graphical. But on the basis of deviance difference, we need to
A
B
C
D E
include the order 3 interaction. The model could be converted into a logistic regression model for variable Awith respect to all the others. Then the logistic regression model would also have to contain the explanatory variable B∗C, a multiplicative term that describes the joint effect ofB andCon A.