• No se han encontrado resultados

In this section, we develop two quantities to examine data patterns in realizations of Markov random field models. The first quantity, which we call conflicts, assists in providing intuition for the data patterns observed in Markov random field models, especially when the dependence parameters become large. The second quantity, which we call run length distributions, provides a method of examining data patterns in real data on a regular lattice.

3.4.1

Conflicts in Realizations of MRF Models

Any binary Markov Random Field model specifies either positive or negative conditional dependence between each possible pair of locations that are not conditionally independent. For a realized field, we define a conflict to exist for a pair of locations if the data values at those locations differ under positive dependence (0 at one location and 1 at the other) or if the pair are the same under negative dependence (both 0 or both 1). In simple models with a single dependence parameter, the proportion of location pairs that exhibit a conflict tends to decrease as the strength of the dependence increases. For example, the top left realization displayed in Figure 3.2, with η = 0.50, contains conflicts for 78 of the 386 non-independent pairs of locations while the upper right realization in Figure 3.2, with η = 1.0, contains 59 conflicts.

In what follows, we will count conflicts in the largest cliques allowed by a given neigh- borhood structure. For a four-nearest neighbor structure this is the same since counting conflicts in neighboring pairs as the largest cliques for this configuration are of size two. In contrast, an eight-nearest neighbor structure contains cliques of size two, three and four, so we count conflicts for each clique of size four. Examples of possible conflicts are presented for cliques of size four in Figure 3.6, the four panels of which show all possible configurations of 0 and 1 values, up to their symmetric counterparts (interchanging 1s and 0s) and rotations. Each of these four panels could exist under both positive and negative dependence. The

upper left and right panels of Figure 3.6 result in 4 conflicts for positive dependence and 2 conflicts for negative dependence. The lower left panel results in 3 conflicts for both positive and negative dependence. The lower right panel results in 0 conflicts for positive dependence and 6 conflicts for negative dependence. Notice that the number of conflicts for positive and negative dependence always sum to 6, the number of pairs of locations in the clique. Also notice that for negative dependence, it is not possible for any configuration of points in a clique of size four to have 0 conflicts, the minimum is 2 conflicts.

We return to the notion that the number of conflicts should decrease as the magnitude of the dependence parameter increases. This implies that as the dependence parameter is increased to extreme values, the number of conflicts in all cliques of size 4 will tend to 0 for positive dependence and 2 for negative dependence. To illustrate this, 2000 MRF on a 33x33 lattice were generated for each combination of κ = 0.50 and η = ±4.0, ±2.5, ±1.5, ±0.5, 0.0. Table 3.3 presents the relative frequencies of cliques of size 4 that contained 2, 3 and 6 conflicts for negative dependence or 0, 3 and 4 conflicts for positive dependence. There are 1024 cliques of size 4 in a 33x33 lattice. In addition to verifying that the probabilities of observing realized fields with more than 2 or 0 conflicts approaches 0 as the magnitude of η increases for negative and positive dependence, respectively, Table 3.3 suggests that models with positive dependence approach a state of degeneracy more quickly than do models with negative dependence as the absolute value of η grows large.

A realized field from an approximately degenerate model with η = −100 and eight nearest neighbors is presented in Figure 3.8. This realization, or a rotation of it, nearly always results for data generated from this model, and contains the minimum number of 2 conflicts in all cliques of size 4. Recall that the upper two panels of Figure 3.6 show two configurations of values that both result in the minimum number of 2 conflicts in cliques of size 4 for negative dependence. The pattern of Figure 3.8 corresponds to repetition of only one of these configurations, the one shown in the upper right panel of right panel of Figure 3.6. The reason for this is illustrated in Figure 3.7, which shows configurations and associated

conflicts for sets of four concatenated cliques of size 4. The configuration in the left panel of Figure 3.7, which corresponds to repetition of the upper left panel of Figure 3.6, results in 8 conflicts. In contrast, repetition of the configuration in the upper right panel of Figure 3.6 results in only 6 conflicts, as shown in the right hand panel of Figure 3.7. Such an argument can be generalized to any k × k field using the same concatenation technique. Thus, the total number of conflicts in a realized field from a model with eight nearest neighbors and negative dependence is minimized by the striped pattern in Figure 3.8.

3.4.2

Distribution of Run Lengths

We have seen that extending the concept of conflicts within individual cliques to entire fields results in larger-scale data patterns (e.g. Figure 3.8). These patterns appear to be related to the number and length of sequences of the same data values, either 0s or 1s. We define a run of values to be a sequence of consecutive 0s or 1s within a transect embedded in a random field. Such transects may be considered to consist rows, columns, or diagonals. For example, a 33x33 lattice would contain a potential of 32 runs of length 1 within each row, column or the main diagonal, and a varying number of runs of length 1 (from 30 to 1) along the off diagonals. Consideration of conflicts suggested that in models with increasingly strong positive dependence, long run lengths should be expected to become more common. Similarly, in models with increasingly strong negative dependence short run lengths should become more common. To illustrate this, a Monte Carlo experiment was conducted in which models consisting of 4 nearest neighbors with η = ±0.5, ±1.0, and 8 nearest neighbors with η = ±0.25, ±0.5 were simulated. For each situation, 2000 data sets were simulated and compared to data sets simulated from a model with no dependence (i.e., the independence model). The value of κ was held fixed at 0.50 for each model used. The number of runs of length x were counted within each row and summed to a field-level total, for x = 1, 2, . . . , 33. The same was done separately for columns. Monte Carlo approximations to the expected number of runs of length x were then computed as the average field totals across the 2000

simulated data sets. Results for x = 1, 2, 3, 4 are presented in Figure 3.9 for rows and Figure 3.10 for columns. These results verify that in the case of positive dependence there is a smaller expected number of runs of short lengths (1 and 2) than in the independence model. Conversely, the expected number of runs of short lengths is higher than the independence model in the case of negative dependence.

These results suggest that examining the number of short run lengths in actual data fields can provide guidance about the type and strength of dependence expected to result from a simple Markov random field model. We develop a diagnostic for this purpose as

follows. For a given observed data field, let ˆp denote the empirical mean. Simulate M fields

from an independence model with parameter ˆp. That is, simulate kxk values from a binary

distribution with parameter ˆp and arbitrary location labels. For a given transect definition

(e.g., rows or columns) let qr,mbe the total number of runs of length r in field m = 1, 2, . . . , M .

Let ar be the number of runs of length r in the actual data field. A quantity to compare the

actual number of runs with what is expected under an independence model is then

pr = 1 M M X m=1 I(ar≤ qr,m). (3.9)

We computer this quantity for small values of r (r = 1, 2, 3, 4) and restricting attention to rows and columns. Extensions to other runs lengths and transect definitions are possible depending on the size and shape of an observed lattice. As an example, we generated two ”actual data fields” by simulated from MRF models with η = {−0.5, 0.5} and κ = 0.5 in

each case. The empirical means for these fields were ˆp = {0.499, 0.502}. In each case, 10000

fields were simulated from independence models having parameters equal to the empirical

means. Resultant values of pr for the rows are reported in Table 3.4 for r = 1, 2, 3, 4. The

model with positive dependence had fewer runs of short length (small pr for r = 1, 2) while

the model with negative dependence had more runs of short length (large pr for r = 1, 2).

Repeating this exercise 2000 times resulted in the distribution of pr values, r = 1, 2 shown

values for columns are nearly identical and thus omitted. The box plot displayed in Figure

3.11 illustrates a clear separation between the distribution of pr values for r = 1, 2 for each

model.