Capítulo 3: Percepciones socio-espaciales
3.3 Centro Mayor “Grandes Emociones para Todos”
In this section, we formally introduce the concept of threshold-based queries or threshold queries.
4.3.1 Threshold-Crossing Time Intervals
At rst we show how a given threshold valueτ is used to dene a collection
of intervals, the so called Threshold-Crossing Time Interval Sequence. Denition 4.1 (Threshold-Crossing Time Interval Sequence). Let X be a time series as dened in Denition 2.1 of length N, let τ ∈ R,
and let T be the time domain. Then the threshold-crossing time interval
sequence of X with respect to τ is a sequence Sτ,X =h(lj, uj) ∈T ×T : j ∈
{1, .., M}, M ≤Ni of time intervals, such that
4.3 Threshold-Based Queries 45 time time series X time series Y τY time τX
Threshold-Crossing Time Intervals:
time Time Series: X X Sτ , Y Y Sτ ,
Figure 4.3: Threshold-Crossing Time Intervals The value τ is called the threshold.
Note that we omit the τ parameter in Sτ,X if the choice for τ is obvious
or no specic value for the threshold is given.
The example shown in Figure 4.3 depicts two threshold-crossing time interval sequences for two time series, X and Y with respect to two dierent threshold valuesτX and τY.
4.3.2 Similarity Model for Threshold-Crossing Time In-
terval Sequences
In order to dene a threshold-based similarity function on time series we have to dene a similarity function on the interval sequence representations derived for a certain threshold. The threshold-crossing time interval sequences consist of single intervals, so the rst step is to dene a similarity function on single intervals. This function can afterwards be used to calculate a similarity value
Lett1 = (t1l, t1u)∈T ×T andt2 = (t2l, t2u)∈T ×T be two time intervals.
Then the distance function dint : (T ×T)×(T ×T) → R between two time
intervals is dened as:
dint(t1, t2) =
p
(t1l−t2l)2+ (t1u−t2u)2
Let us note, that intervals correspond to points in a two-dimensional space, where the starting point corresponds to the rst dimension and the ending point corresponds to the second dimension. This transformation is explained in more detail in the next section (cf. Section 6.3). Then the above denition of a distance function on intervals corresponds to the Eu- clidean distance in this two-dimensional space. While it is also possible to use other Minkowski metrics, we only use the Euclidean distance throughout this thesis. As we will show in the experimental section 4.6, the dierences between dierent Minkowski metrics are negligible.
Since for a certain threshold τ a time series object is represented by
a sequence of time intervals, we need a distance measure for sequences of intervals. As these intervals are naturally ordered by their starting points and as the intervals do not overlap each other, we can consider the threshold- crossing time interval sequences as sets of intervals without loss of generality. Several distance measures for sets have been introduced in the literature [EM97]. We use the Sum of Minimum Distances (SM D). Let S1 and S2 be
two sets. The idea of theSM D is as follows: at rst, each element of S1 is
matched to the best suited element in S2 and afterwards the same is done
for each element of S2. The process of matching two element is based on a
distance function dened on two elements of the sets.
In our case, when given two time series, each threshold-crossing time interval of the rst time series will be mapped to its most similar counterpart of the second time series. Obviously two threshold-crossing time interval
4.3 Threshold-Based Queries 47 sequences do not necessarily have the same cardinality, so we follow [KM04] and adapt the original denition of the SM D to our needs, by normalizing
the distance value by the cardinalities of the interval sets. Finally we are able to dene the threshold-distance.
Denition 4.3 (Threshold-Distance).
LetX andY be two time series andSX andSY be the corresponding threshold-
crossing time interval sequences. Then the threshold distance dT S is dened
as dT S(SX, SY) = 1 2· 1 |SX| · X s∈SX min t∈SY dint(s, t) + 1 |SY| · X t∈SY min s∈SX dint(t, s) !
For the sake of clarity, in the above denition we assumed both interval sequences were created using the same threshold. However, this is not a necessary constraint. As already mentioned, the idea of this distance function is to map every interval from one sequence to the closest (most similar) interval of the other sequence and vice versa. This distance measure has a further advantage. Time series having similar shapes, i.e. showing a similar behavior, may be transformed into threshold-crossing time interval sequences of dierent cardinalities. Since the above distance measure does not consider the cardinalities of the interval sequences, this distance measure is quite suitable for time interval sequences. Another advantage is that the distance measure mainly considers local similarity. This means, that for each time interval only its nearest neighbor (i.e. closest point) of the other sequence is taken into account. Other intervals of the counterpart sequence have no inuence on the result.
4.3.3 Similarity Queries based on Threshold Similarity
Based on the new distance measure introduced in the last sections, we can now extend the two most widely used similarity queries, the distance range query and the k-nearest-neighbor query. As specied in Denition 2.6, the
distance range query retrieves all objects of a database whose distance to a given query object Q is smaller or equal to a given distance value ε. This
Let D be a set of time series objects. The threshold-based ε-range query
consists of a query time series Q, a query threshold τ ∈ R, and a dis-
tance parameter ε ∈ R+
0. The threshold-based ε-range query retrieves the
set T Qrange
ε (Q, τ)⊆ D such that
∀X ∈T Qrangeε (Q, τ) :dT S(SQ, SX)≤ε
Analogously we extend the denition of the kNN query as follows.
Denition 4.5 (Threshold-Based k-Nearest-Neighbor Query).
LetD be a set of time series objects. The threshold-basedk-nearest neighbor
query consists of a query time series Q, a query threshold τ ∈ R, and a
parameter k ∈N+. The threshold-based k-nearest neighbor query yields the
smallest set T QN N
k (Q, τ)⊆ D that contains at least k elements such that
∀X∈T QN Nk (Q, τ),∀Y ∈ D \T QN Nk (Q, τ) :
dT S(SQ, SX)< dT S(SQ, SY)
Again, this denition could be adapted to dierent threshold values for dierent time series. However, as the query time series Q usually is of the
same application domain as the collection of time series the query is executed on, the standard approach is to use the same threshold value for all time series. In the following we will also refer to both query types as threshold query if it is not necessary to distinguish between the two dierent query types. We use the abbreviation T Q(Q, τ) to denote this generalized query type.