Middle Ordovician succession

SARDINIA. GEOLOGICAL SETTING AND STRATIGRAPHY

4.2 Nappe Zone

4.2.2 Middle Ordovician succession

We deﬁne similarity between time series, which underlies the retrieval pro-cedure. We measure similarity on a zero-to-one scale; zero means no likeness and one means perfect likeness. We review three basic measures of similar-ity and then propose a new measure. First, we deﬁne similarsimilar-ity between two numbers,a and b:

sim(a, b) = 1 − |a − b|

|a| + |b|.

The mean similarity between two series,a₁, . . . , a_nandb₁, . . . , b_n, is the mean of their point-by-point similarity:

1 n·ⁿ

i=1

sim(a_i, b_i).

We also deﬁne the root mean square similarity:

1 n·ⁿ

i=1

sim(a_i, b_i)².

In addition, we consider the correlation coeﬃcient, which is a standard statistical measure of similarity. It ranges from−1 to 1, but we can convert

WSPC/TrimSize:9inx6inforReviewVolumeChap03

E.FinkandK.B.Pratt

Table 2. Accuracy of three compression techniques. We give the average difference between an original series and its compressed version using the three difference measures; smaller differences correspond to more accurate compression.

Mean difference Maximum difference Root mean square diff.

important fixed random important fixed random important fixed random

points points points points points points points points points

Eighty percent compression

Stock prices 0.02 0.03 0.04 0.70 1.70 1.60 0.05 0.14 0.14

Air temp. 0.01 0.03 0.03 0.33 0.77 0.72 0.03 0.10 0.10

Sea temp. 0.01 0.03 0.03 0.35 0.81 0.75 0.03 0.10 0.10

Wind speeds 0.02 0.03 0.03 0.04 1.09 1.01 0.04 0.05 0.05

EEG 0.03 0.06 0.07 0.68 1.08 1.00 0.10 0.18 0.17

Ninety percent compression

Stock prices 0.03 0.06 0.07 1.10 1.70 1.70 0.08 0.21 0.21

Air temp. 0.02 0.05 0.05 0.64 0.80 0.78 0.08 0.16 0.14

Sea temp. 0.01 0.04 0.05 0.60 0.83 0.82 0.07 0.16 0.14

Wind speeds 0.03 0.04 0.04 0.06 1.09 1.03 0.05 0.06 0.06

EEG 0.08 0.13 0.12 0.82 1.10 1.09 0.17 0.27 0.24

Ninety-ﬁve percent compression

Stock prices 0.05 0.10 0.12 1.30 1.80 1.80 0.11 0.32 0.30

Air temp. 0.03 0.09 0.08 0.74 0.83 0.83 0.12 0.23 0.21

Sea temp. 0.03 0.08 0.08 0.78 0.85 0.85 0.12 0.23 0.21

Wind speeds 0.05 0.04 0.04 0.08 1.09 1.10 0.07 0.08 0.08

EEG 0.13 0.17 0.16 0.90 1.10 1.10 0.24 0.31 0.28

–a b a psim (a,b)

Fig. 5. Peak similarity of numbers a and b, where |a| ≥ |b|.

it to the zero-to-one scale by adding one and dividing by two. For series a1, . . . , an and b1, . . . , bn, with mean values ma = (a1+· · · + an)/n and m_b= (b₁+· · · + b_n)/n, the correlation coeﬃcient is

i=1(ai− ma)· (bi− mb)

i=1(a_i− m_a)²·_n

i=1(b_i− m_b)²

We next deﬁne a new similarity measure, called the peak similarity. For numbersa and b, their peak similarity is

psim(a, b) = 1 − |a − b|

2· max(|a|, |b|).

In Figure 5, we show the meaning of this deﬁnition for|a| ≥ |b|, based on the illustrated triangle. We draw the vertical line throughb to the intersec-tion with the triangle’s side; the ordinate of the intersecintersec-tion is the similarity ofa and b. The peak similarity of two series is the mean similarity of their points (psim(a₁, b₁) +· · · + psim(a_n, b_n))/n.

We next give an empirical comparison of the four similarity measures.

For each series, we have found the five most similar series, and then deter-mined the mean difference between the given series and the other five.

In Table 3, we summarize the results and compare them with the perfect exhaustive-search selection and with random selection. The results show that the peak similarity performs better than the other measures, and that the correlation coeﬃcient is the least eﬀective.

We have also used the four similarity measures to identify close matches for each series, and compared the results with ground-truth neighborhoods.

For stocks, we have considered small neighborhoods formed by industry sub-groups, as well as large neighborhoods formed by industry sub-groups, accord-ing to Standard and Poor’s classiﬁcation. For air and sea temperatures, we have used geographic proximity to deﬁne two ground-truth neighborhoods.

The ﬁrst neighborhood is the 1× 5 rectangle in the grid of buoys, and the second is the 3× 5 rectangle. For wind speeds, we have also used geo-graphic proximity; the ﬁrst neighborhood includes all sites within 70 miles,

WSPC/TrimSize:9inx6inforReviewVolumeChap03

E.FinkandK.B.Pratt

Table 3. Differences between selected similar series. For each given series, we have selected the five most similar series, and then measured the mean difference between the given series and the other five. Smaller differences correspond to better selection. We also show the running times of selecting similar series.

Similarity Comp. Stock prices Sea temperatures Air temperatures Wind speeds EEG

measure rate

Mean Max Time Mean Max Time Mean Max Time Mean Max Time Mean Max Time diff. diff. (sec) diff. diff. (sec) diff. diff. (sec) diff. diff. (sec) diff. diff. (sec)

Perfect selection 0.094 0.437 0.016 0.072 0.024 0.121 0.021 0.136 0.038 0.170

Random selection 0.287 1.453 0.078 0.215 0.070 0.235 0.029 0.185 0.072 0.370

Peak 90% 0.103 0.429 0.024 0.018 0.068 0.021 0.029 0.103 0.022 0.023 0.138 0.016 0.052 0.241 0.015 Similarity 95% 0.110 0.534 0.022 0.019 0.073 0.019 0.030 0.136 0.020 0.023 0.148 0.016 0.063 0.306 0.015 Mean 90% 0.110 0.525 0.026 0.026 0.092 0.022 0.031 0.134 0.022 0.023 0.137 0.017 0.055 0.279 0.016 Similarity 95% 0.126 0.570 0.024 0.033 0.112 0.021 0.037 0.152 0.022 0.025 0.152 0.017 0.066 0.323 0.014 Root mean 90% 0.103 0.497 0.026 0.024 0.090 0.022 0.030 0.133 0.022 0.023 0.134 0.017 0.051 0.261 0.016 Square sim. 95% 0.115 0.588 0.024 0.031 0.106 0.021 0.035 0.147 0.022 0.023 0.153 0.017 0.064 0.317 0.014 Correlation 90% 0.206 1.019 0.048 0.054 0.162 0.044 0.051 0.214 0.046 0.024 0.138 0.042 0.056 0.281 0.030 Coeﬃcient 95% 0.210 1.101 0.045 0.063 0.179 0.042 0.051 0.224 0.043 0.024 0.154 0.033 0.068 0.349 0.028

and the second includes the sites within 140 miles. For electroencephalo-grams, the ﬁrst neighborhood is the 3×3 rectangle in the grid of electrodes, and the second is the 5× 5 rectangle.

For each series, we have found the ﬁve closest matches, and then determined the average number of matches that belong to the same neighborhood. In Table 4, we give the results and compare them with the perfect selection and random selection; larger numbers correspond to better selections.

The results show that the peak similarity is usually more eﬀective than the other three similarities. If we use the 90% compression, the peak simi-larity gives better selection than the other simisimi-larity measures for the stock prices, air and sea temperatures, and eeg; however, it gives worse results for the wind speeds. If we use the 95% compression, the peak similarity out-performs the other measures on the stocks, temperatures, eeg, and large-neighborhood selection for the wind speeds; however, it loses to the mean similarity and correlation coeﬃcient on the small-neighborhood selection for the wind speeds.

We have also checked how well the peak similarity of original series correlates with the peak similarity of their compressed versions (Figure 6).

We have observed a good linear correlation, which gracefully degrades with an increase of compression rate.

In document The Sardic Phase in the Ordovician of Southern Sardinia and Eastern Pyrenees: (página 60-64)