• No se han encontrado resultados

In Table 6.2 I presentP(Ct|xt) fort= 1. . .12 for the model whereC1 ={v1,v2, {v3, . . . , v6}, {v7, . . . , v10}} with probability 1 and k = 0.9, ρ = 0.9 and q = 0.2. The latter two parameter values ensure that few new models will be kept in the

Time Ct P(Ct|xt) 1 1, 2, {3,4,5,6}, {7,8,9,10} 1 2 1, 2, 3, {4,5,6}, {7,8,9,10} 0.824 1, 2, {3,4,5,6}, {7,9,10}, 8 0.175 3 1, 2, 3, {4,5,6}, {7,8,9,10} 0.766 1, 2, 3, {4,5,6}, {7,10}, {8,9} 0.233 4 1, 2, 3, {4,5,6}, {7,8,9,10} 0.677 1, 2, 3, {4,5,6}, {7,10}, {8,9} 0.322 5 1, 2, 3, {4,5,6}, {7,8,9,10} 0.328 1, 2, 3, {4,5,6}, {7,10}, {8,9} 0.671 6 1, 2, 3, {4,5,6}, {7,10}, {8,9} 1 7 1, 2, 3, {4,5,6}, {7,10}, {8,9} 0.609 1, 2, 3, {4,5,6}, {7,10}, 8, 9 0.390 8 1, 2, 3, {4,5,6}, {7,10}, {8,9} 0.304 1, 2, 3, {4,5,6}, {7,10}, 8, 9 0.695 9 1, 2, 3, {4,5,6}, {7,10}, 8, 9 1 10 1, 2, 3, {4,5,6}, {7,10}, 8, 9 1 11 1, 2, 3, {4,5,6}, {7,10}, 8, 9 1 12 1, 2, 3, {4,5,6}, {7,10}, 8, 9 1

Table 6.2: All possible stagings and their posterior probabilities at each timet for

k= 0.9,ρ= 0.9,q= 0.2 withP(C1={v1,v2,{v3, . . . , v6},{v7, . . . , v10}}) = 1

analysis, as the high value ofρ gives a low prior probability on transitions between stagings and the high value ofq makes the Occam’s window set of equation (5.40) small. This speeds up the computation of the forecasts at the expense of possibly worse predictions through fewer stagings being included in the model averaging.

An alternative way of presenting this information is to plot howPt(vi, vj

u|xt), the a posteriori probability that situations v

i, vj are in the same stageu at

timet, changes over time. This can be calculated from

Pt(vi, vj ∈u|xt) = ∑ C∈C

P(Ct=C |xt)I(∃u∈C :vi, vj ∈u) (6.1)

Figure 6.3 shows this for the information in Table 6.2.

t= 6 are either totally independent of one another or certainly in the same stage. The stages that remain by that time that are not composed of one situation are

{v4, v5, v6} which are the situations concerning the availability or missingness of grades for the second module after getting a top, middling or bottom grade re- spectively in the first module; and{v7, v10}, which are the situations for the florets describing the grades gained in the second module after either getting a grade 3 or not having a grade at all in the first module. The former stage indicates that whether a mark is available for the second module is independent of the grade achieved in the first one, assuming that is itself not missing; the second stage says that the grade gained in the second module is independent of whether the student did poorly in, or just has a mark missing for, the first module. Both of these inferences would have been impossible to achieve with a Bayesian network search of the same probability model: the first one demands an asymmetric sample space (because if there is no mark available then it cannot be described), while the second is a context-specific conditional independence.

The above analysis is “quick and dirty”, in that very clear signals were gained from the dynamic model quickly. To illustrate how the level of detail in the CEG distribution changes as a function of the modelling hyperparameters, allowing more subtle analyses, I ran the algorithm again with radically different values: I set

k = 0.5 (so that floret distributions are flattened more quickly and therefore past observations more heavily discounted, allowing the data to “speak for itself” more),

ρ = 0.25 (so that the probability of moving between stagings is more likely), and

q = 0.05 (so that stagings with poorer Bayes factors relative to the most likely are nonetheless kept in the analysis) with the initial degenerate staging distribution

P(C1 ={v1,v2,{v3, . . . , v6},{v7, . . . , v10}}) = 1still assumed for consistency. The

time is as shown in Figure 6.4.

It can be seen from the latter figure that the analysis with the new hyperpa- rameter values gives much the same qualitative description of the system as the more conservative hyperparameters at greater computational expense, with the pay-off of greater detail.

Some interesting characteristics of the system can be discerned from this analysis. With regard to the situations concerning the missingness of marks, θ(v3)

— the probability distribution for the second module’s marks being available given that the mark in the first module is itself missing — retains the appearance of being unrelated to the floret distributions at any time point. Until t = 7 or so the situationsv4,v5 and v6, whose state spaces represent the missingness of marks for the second module after respectively gaining a high, medium or low mark in the first module, had initially high but then gradually falling probabilities of being in the same stage, implying that independence of the missingness of the second module’s marks from the marks gained in the first module kept decreasing. At

t= 8, in contrast, these probabilities become much lower, although the probability distributions of marks being missing after gaining a medium or low mark in the first module are deemed to become slightly more likely to be the same after that, with students performing well in the first module continuing to have a very different probability distribution for the missingness of their second module marks. This more subtle analysis was not captured by the more conservative analysis earlier which claimed these situations were simply in the same stage with probability 1 throughout the process. I investigate a possible causal hypothesis that might explain what might have changed att= 8 in the next section.

Another notable finding is thatv7andv10— the situations concerning marks

in the first module, respectively — are always strongly related, just as in the first analysis. It therefore appears that the second module marks of students who did poorly in the first module should be used to predict the second module performance of students whose first module marks are missing.

It is worth noting again that these detailed homogeneities within the sys- tem would not have been as easily identifiable if the model class was restricted to Bayesian networks.

Documento similar