4 INTRODUCCIÓN
4.5 FORMACIÓN DE CONTAMINANTES EN PROCESOS TÉRMICOS
4.5.5 Polibromodibenzo-p-dioxinas y dibenzofuranos (PBDD/Fs)
The need for a curvilinear development has been confirmed in the last section by means of LGM. This assumption will be now used in LCGA analysis, where unobserved hetero- genity in the distribution of the observed outcomes will be searched for.
As a first step, models with increasing number of classes are estimated and compared with each other by means of model-fit indices. All the estimated curves have a curvilin- ear function with respectively intercept, slope and quadratic terms. As already specified no variance is estimated for these parameters. In Table 5.6 the estimated models are reported; for comparison purposes, the first model presented is the quadratic LGM esti- mated in the section above. This model has only one trajectory and thus no LMRT and entropy Ek is reported.
Table 5.6: LCGA models comparison
Classes
Est. Par.
E
kAdj. BIC
LMRT-LRT (P )
-
16
-
15527.43
-
2
12
0.999
14421.46
2893.60 (0.029)
3
16
0.992
12555.06
1821.10(0.019)
4
20
0.993
11582.98
956.22 (0.478)
5
24
0.992
10368.67
1190.47 (0.513)
(In brackets are reported the significance values for the LMR-LRT).
The adjusted BIC can be used for comparison purposes; the smaller the BIC value the better the model fits the data. However, this is not always the case. There are some instances where the BIC decreases by every new added class, even if the interpretability of the new added classes could be questionable. We can however notice, that the larger drop in BIC occurs when three classes are estimated; here the BIC difference from the two classes model is twice bigger than any other comparison in the table, which could be a first hint of the goodness of the three-classes model. Another largely used model-fit indice is the LMR-LRT test, which tests whether the new estimated class is statistically necessary or redundant. A significant LMRT value suggests that the new estimated class is not statistically redundant. According to the results of this test in Table 5.6 the estimation of a third class is still held necessary, whereas a fourth class is deemed redundant (the LMRT value is not significant at the 5% level); this is also the case when a fifth class is introduced in the model. Thus, based on merely statistical evidence a three class model can be considered the best solution so far.
Beside the above mentioned model-fit tests, the entropy value uses the posterior class membership probabilities to sum up how well the estimated model assigns each subject to a given class. The model estimates for each person the probability to belong to a class; then, the subject is assigned to the class where it has the highest probability to belong to. A good model is one where probability values close to one are assigned to each individual. The Ek for a three class solution is close to one, and suggests that the average individual
probablity to be assigned to a specific class is close to one.
Furthermore, posterior class membership probabilities can be used for comparisons among models with different specifications. Table 5.7 reports a cross classification between the two and three class solutions. In this way it is possible to see how class composition changes when a new class is added. The values represent the stability and change between the solutions. The values on the diagonal represent subjects that remain in the same class across the two solutions. Off-diagonal values, on the contrary, represent subjects that are assigned to a different class.
5.3. Results
Table 5.7: Class comparison between the two- and three-class LCGA models
Class
1
2
3
P
1
1413
85
0
1498
2
0
5
48
53
P
1413
90
48
1551
(Class interpretation: 1=low use; 2=adolescence limited; 3=desisters).
Before the comparison all classes have been matched with the 3-class solution in order to make the interpretability of the table easier. Thus, class 1 represents the low-use group, class 2 the adolescence limited, and class 3 the desister group. The results from the table above suggest that the old second class lend near all its members to the desister class in the 3-class model. The old first class do not loose any considerable number of subjects, the few who move built the adoloscence limited group in the new model. Thus, adding a new class to the old 2-class model seems to fill the gap between the older two classes and generate a new group with a quite specific substantive meaning. This new group shows a well known trajectory which increases during adolescence and decrease thereafter. Beside the results of the model-fit tests, a third class seems to bring new important information in the model. Similarly, a table can be produced for the comparison between the three- and the four-class models.
Table 5.8: Class comparison between the three- and four-class LCGA models
Class
1
2
3
4
P
1
1404
4
5
0
1413
2
0
87
1
2
90
3
0
0
31
17
48
P
1404
91
37
19
1551
(Class interpretation: 1=low use; 2=adolescence limited; 3=desisters; 4=new class).
The picture obtained from the table above is quite reassuring. In fact, it seems to suggest that a fourth class is not necessary, confirming thus the results of the model-fit indices. Looking at the values along the diagonal it can be noticed that the low-use and the adolescence limited classes are very stable across the two solutions. They loose only a few portion of members. In fact, the new fourth class seems to be the result of the split of the old third one in two groups. Beside that this new class is both very small and shares a very similar trajectory with class 3 (compare figure with Figure 5.7), suggesting that this new trajectory might be more a mere representation of the variance on the intercept of the desister class than a real class on its own. This table confirms the reliability of the first two classes, and suggests that there is no need for a forth one, since the desister class already well represents that specific pattern of development. The three-class model remains, so far, the best representation of the observed data.
The next step in exploring the substantive meaning of a mixture model consists in the comparison of the estimated classes with the cross-sectional means of the observed vari- ables. This method, however, should be taken with a pinch of salt. There are important statistical and substantial differences between the two measures compared in the table; on the one hand are reported trajectories estimated by means of individual-longitudinal data, and on the other hand are presented cross-sectional mean values. Thus, one should not be surprised if differences are found. However, similarity in the patterns could support the substantive validity of the estimated results. On the other hand, by means of logging the observed frequencies the influence of outliers and large values have been reduced. However, when comparing the classes that have been estimated with logged frequencies with the original frequencies, outliers and large values regain their original influence on the calculation of the means. The mathematical meaning and the problems of such a comparison will not be discussed here any further. The following table has in fact the simple purpose of giving a general picture of the possible frequency distribution of the found classes. Table 5.9 shows the results for the three-class model.
Table 5.9: Posterior class probabilities and observed frequencies of drug use
in the sample
Class
n.
xt
1xt
2xt
3xt
4xt
5Cl.1 low use
1413
0.02
0.18
0.39
1.74
2.59
Cl.2 adolescence lim.
90
0.15
15.96
58.42
51.38
54.24
Cl.3 desisters
48
11.00
19.68
38.14
30.25
19.84
Sample x
1551
0.36
1.52
4.51
4.94
5.79
The low-user class seems to well represent the observed mean frequencies of drug use measured at each time point. The trend respects the pattern of the estimated curve where a small and slow increase in consumption is observed across time. This use does not go as further as twice in the last twelve months (only at the last measurement point it is slightly larger than 2), confirming a low and experimental use. A similar picture is presented for the adolescence-limited group. Here, the observed value match the expected development, even if at the last measurement point a small increase is witnessed. This should be, however, interpreted not as an upturn of the group development, but rather as a stabilization as depicted in the estimated curve. The development of this class, in fact, can be rightly defined as adolescence limited, since it starts from nearly no consumption at the beginning of the measurement (at age 12-13 the participants can be considered at the beginning of adolescence), reaches a peak around age 15, and then slowly decreases thereafter.
The last class, the desisters, shows a not perfect match with the observed means. Even if, as stated above, a perfect match should not be expected due to the different nature of the values compared (on the one side estimated longitudinal individual values, and on the other side cross-sectional aggregate means), similar patterns might confirm the validity of the estimated curves. Thus, for the case of the desister group, instead of a decreasing pattern a curvilinear development of the observed means is reported. This group is, however, very small in number and thus greater variation is expected within it than for the other two classes. In fact, as seen above, the estimation of a fourth class causes the splits of the desister group, suggesting the presence of different curves within
5.3. Results
it.
As a final step, the four- and three-class models can now be graphically compared.
Figure 5.6: LCGA model for drug use, three classes
Figure 5.7: LCGA model for drug use, four classes
As outlined above the larger difference between the two solutions consists in the split of the desister class and in the subsequent estimation of a new fourth one, which shares
a very similar developmental pattern with its original “mother-class”, and differ from it for developing at a lower frequency level. Since no variance was estimated for the growth parameters, this result seems to point out that the last fourth class is a mere representation of the variance within the decreaser group. Thus, the LCGA analysis suggests that three classes are the best representation of the observed data. Suspecting, however, that some variability in the classes exists (especially in the desister class), which is not captured by the LCGA estimation. This hypothesis can be further explored by means of GMM.