Facies and sedimentary features of the El Baell unit

The Sardic unconformity and the Upper Ordovician successions of the Ribes de Freser area, Eastern Pyrenees

4 Facies and sedimentary features of the El Baell unit

Suppose that we have two time series A = (a1, a2, . . . , an) and B = (b₁, b₂, . . . , b_m). ForA let Head(A) = (a₁, a₂, . . . , a_n−1). Similarly, forB.

Deﬁnition 2. The LCSS betweenA and B is deﬁned as follows:

LCSS(A, B) =

The above deﬁnition is recursive and would require exponential time to compute. However, there is a better solution that can be oﬀered inO(m^∗n) time, using dynamic programming.

Dynamic Programming Solution [11,42]

The LCSS problem can easily be solved in quadratic time and space. The basic idea behind this solution lies in the fact that the problem of the sequence matching can be dissected in smaller problems, which can be com-bined after they are solved optimally. So, what we have to do is, solve a smaller instance of the problem (with fewer points) and then continue by adding new points to our sequence and modify accordingly the LCSS.

Now the solution can be found by solving the following equation using dynamic programming (Figure 4):

where LCSS[i, j] denotes the longest common subsequence between the ﬁrst i elements of sequence A and the ﬁrst j elements of sequence B. Finally, LCSS[n, m] will give us the length of the longest common subsequence between the two sequencesA and B.

The same dynamic programming technique can be employed in order to ﬁnd the Warping Distance between two sequences.

Fig. 4. Solving the LCSS problem using dynamic programming. The gray area indicates the elements that are examined if we conﬁne our search window. The solution provided is still the same.

3.2. Extending the LCSS Model

Having seen that there exists an efficient way to compute the LCSS between two sequences, we extend this notion in order to define a new, more flexible, similarity measure. The LCSS model matches exact values, however in our model we want to allow more flexible matching between two sequences, when the values are within certain range. Moreover, in certain applications, the stretching that is being provided by the LCSS algorithm needs only to be within a certain range, too.

We assume that the measurements of the time-series are at ﬁxed and discrete time intervals. If this is not the case then we can use interpolation [23,34].

Deﬁnition 3. Given an integerδ and a real positive number ε, we deﬁne theLCSS_δ,ε(A, B) as follows:

LCSSδ,ε(A, B) =











0 if A or B is empty

1 +LCSSδ,ε(Head(A), Head(B)) if|an− bn| < ε and |n − m| ≤ δ

max(LCSSδ,ε(Head(A), B), LCSSδ,ε(A, Head(B))) otherwise

Fig. 5. The notion of the LCSS matching within a region of δ & ε for a sequence. The points of the two sequences within the gray region can be matched by the extended LCSS function.

The constant δ controls how far in time we can go in order to match a given point from one sequence to a point in another sequence. The constant ε is the matching threshold (see Figure 5).

The ﬁrst similarity function is based on the LCSS and the idea is to allow time stretching. Then, objects that are close in space at diﬀerent time instants can be matched if the time instants are also close.

Deﬁnition 4. We deﬁne the similarity functionS1 between two sequences A and B, given δ and ε, as follows:

S1(δ, ε, A, B) = LCSS_δ,ε(A, B) min(n.m)

Essentially, using this measure if there is a matching point within the regionε we increase the LCSS by one.

We use functionS1 to deﬁne another, more ﬂexible, similarity measure.

First, we consider the set of translations. A translation simply causes a vertical shift either up or down. LetF be the family of translations. Then a functionf_c belongs toF if f_c(A) = (a_x,1+c, . . . , a_x,n+c). Next, we deﬁne a second notion of the similarity based on the above family of functions.

Deﬁnition 5. Givenδ, ε and the family F of translations, we deﬁne the similarity functionS2 between two sequences A and B, as follows:

S2(δ, ε, A, B) = max

fc∈FS1(δ, ε, A, fc(B))

Fig. 6. Translation of sequence A.

So the similarity functions S1 and S2 range from 0 to 1. Therefore we can deﬁne the distance function between two sequences as follows:

Deﬁnition 6. Givenδ, ε and two sequences A and B we deﬁne the following distance functions:

D1(δ, ε, A, B) = 1 − S1(δ, ε, A, B) and

D2(δ, ε, A, B) = 1 − S2(δ, ε, A, B)

Note that D1 and D2 are symmetric. LCSS_δ,ε(A, B) is equal to LCSSδ,ε(B, A) and the transformation that we use in D2 is translation which preserves the symmetric property.

By allowing translations, we can detect similarities between movements that are parallel, but not identical. In addition, the LCSS model allows stretching and displacement in time, so we can detect similarities in move-ments that happen with diﬀerent speeds, or at diﬀerent times. In Figure 6 we show an example where a sequenceA matches another sequence B after a translation is applied.

The similarity function S2 is a signiﬁcant improvement over the S1, because: (i) now we can detect parallel movements, (ii) the use of normal-ization does not guarantee that we will get the best match between two time-series. Usually, because of the signiﬁcant amount of noise, the average value and/or the standard deviation of the time-series that are being used in the normalization process can be distorted leading to improper translations.

3.3. Diﬀerences between DTW and LCSS

Time Warping and the LCSS share many similarities. Here, we argue that the LCSS is a better similarity function for correctly identifying noisy

sequences and the reasons are:

1. Taking under consideration that a large portion of the sequences may be just outliers, we need a similarity function that will be robust under noisy conditions and will not match the incorrect parts. This property of the LCSS is depicted in the Figure 7. Time Warping by matching all elements is also going to try and match the outliers which, most likely, is going to distort the real distance between the examined sequences.

In Figure 8 we can see an example of a hierarchical clustering pro-duced by the DTW and the LCSS distances between four time-series.

Fig. 7. Using the LCSS we only match the similar portions, avoiding the outliers.

Fig. 8. Hierarchical clustering of time series with signiﬁcant amount of outliers. Left:

The presence of many outliers in the beginning and the end of the sequences leads to incorrect clustering. DTW is not robust under noisy conditions. Right: The LCSS focusing on the common parts achieves the correct clustering.

Fig. 9. Left: Two sequences and their mean values. Right: After normalization. Obvi-ously an even better matching can be found for the two sequences.

The sequences represent data collected through a video tracking process (see Section 6). The DTW fails to distinguish the two classes of words, due to the great amount of outliers, especially in the beginning and in the end of the sequences. Using the Euclidean distance we obtain even worse results.

Using the LCSS similarity measure we can obtain the most intuitive clus-tering as shown in the same figure. Even though the ending portion of the Boston 2 time-series differs significantly from the Boston 1 sequence, the LCSS correctly focuses on the start of the sequence, therefore producing the correct grouping of the four time-series.

2. Simply normalizing the time-series (by subtracting the average value) does not guarantee that we will achieve the best match (Figure 9). How-ever, we are going to show in the following section, that we can try a set of translations which will provably give us the optimal matching (or close to optimal, within some user deﬁned error bound).

4. Eﬃcient Algorithms to Compute the Similarity

In document The Sardic Phase in the Ordovician of Southern Sardinia and Eastern Pyrenees: (página 91-105)