Comparative geochemical study on Furongian-earliest Ordovician

Palaeozoic Basement of the Pyrenees

A COMPARISON OF THE TOLEDANIAN AND SARDIC VOLCANISM

6.1 Comparative geochemical study on Furongian-earliest Ordovician

In this part of our experiments we evaluated the efficiency and effective-ness of the proposed indexing scheme. We performed tests over datasets of different sizes and different number of clusters. To generate large realistic datasets, we used real sequences (from the SEALS and ASL datasets) as

“seeds” to create larger datasets that follow the same patterns. To perform tests, we used queries that do not have exact matches in the database, but on the other hand are similar to some of the existing sequences. For each experiment we run 100 diﬀerent queries and we report the averaged results.

We have tested the index performance for diﬀerent number of clusters in a dataset consisting of a total of 2000 sequences. We executed a set of K-Nearest Neighbor (K-NN) queries for K 1, 5, 10, 15 and 20 and we plot the fraction of the dataset that has to be examined in order to guarantee

Fig. 20. Performance for increasing number of Nearest Neighbors.

Fig. 21. Index performance for variable number of data clusters.

that we have found the best match for the K-NN query. Note that in this fraction we included the medoids that we check during the search since they are also part of the dataset.

In Figure 20 we show some results for K-Nearest Neighbor queries. We used datasets with 5, 8 and 10 clusters. As we can see the results indicate that the algorithm has good performance even for queries with large K.

We also performed similar experiments where we varied the number of

clusters in the datasets (Figure 21). As the number of clusters increased the performance of the algorithm improved considerably. This behavior is expected and it is similar to the behavior of recent proposed index structures for high dimensional data [6,9,22]. On the other hand if the dataset has no clusters, the performance of the algorithm degrades, since the majority of the sequences have almost the same distance to the query. This behavior follows again the same pattern of high dimensional indexing methods [6,43].

7. Conclusions

We have presented efficient techniques to accurately compute the simi-larity between time-series with significant amount of noise. Our distance measure is based on the LCSS model and performs very well for noisy sig-nals. Since the exact computation is inefficient, we presented approximate algorithms with provable performance bounds. Moreover, we presented an efficient index structure, which is based on hierarchical clustering, for sim-ilarity (nearest neighbor) queries. The distance that we use is not a metric and therefore the triangle inequality does not hold. However, we prove that a similar inequality holds (although a weaker one) that allows to prune parts of the datasets without any false dismissals.

Our experiments indicate that the approximation algorithm can be used to get an accurate and fast estimation of the distance between two time-series even under noisy conditions. Also, results from the index evaluation show that we can achieve good speed-ups for searching similar sequences comparing with the brute force linear scan.

We plan to investigate biased sampling to improve the running time of the approximation algorithms, especially when full rigid transformations (e.g. shifting, scaling and rotation) are necessary. Another approach to index time-series for similarity retrieval is to use embeddings and map the set of sequences to points in a low dimensional Euclidean space [15]. The challenge of course is to ﬁnd an embedding that approximately preserves the original pairwise distances and gives good approximate results to similarity queries.

References

1. Agarwal, P.K., Arge, L., and Erickson, J. (2000). Indexing Moving Points. In Proc. of the 19th ACM Symp. on Principles of Database Systems (PODS), pp. 175–186.

2. Agrawal, R., Faloutsos, C., and Swami, A. (1993). Eﬃcient Similarity Search in Sequence Databases. In Proc. of the 4th FODO, pp. 69–84.

3. Agrawal, R. Lin, K. Sawhney, H.S., and Shim, K. (1995). Fast Similarity Search in the Presence of Noise, Scaling and Trans-Lation in Time-Series Databases. In Proc. of (VLDB), pp. 490–501.

4. Barbara, D. (1999). Mobile Computing and Databases – A Survey. IEEE TKDE, pp. 108–117.

5. Berndt, D. and Cliﬀord, J. (1994). Using Dynamic Time Warping to Find Patterns in Time Series. In Proc. of KDD Workshop.

6. Beyer, K., Goldstein, J., Ramakrishnan, R., and Shaft, U. (1999). When is

“Nearest Neighbor” Meaningful? In Proc. of ICDT, Jerusalem, pp. 217–235.

7. Bollobas, B., Das, G., Gunopulos, D., and Mannila, H. (1997). Time-Series Similarity Problems and Well-Separated Geometric Sets. In Proc. of the 13th SCG, Nice, France.

8. Bozkaya, T., Yazdani, N., and Ozsoyoglu, M. (1997). Matching And Indexing Sequences of Diﬀerent Lengths. In Proc. of the CIKM, Las Vegas.

9. Chakrabarti, K. and Mehrotra, S. (2000). Local Dimensionality Reduction:

A New Approach to Indexing High Dimensional Spaces. In Proc. of VLDB, pp. 89–100.

10. Chu, K. and Wong, M. (1999). Fast Time-Series Searching with Scaling and Shifting. ACM Principles of Database Systems, pp. 237–248.

11. Cormen, T.H., Leiserson, C.E., and Rivest, R.L. (1990). Introduction to Algo-rithms. The MIT Electrical Engineering and Computer Science Series. MIT Press/McGraw Hill, 1990.

12. Das, G., Gunopulos, D. and Mannila, H. (1997). Finding Similar Time Series.

In Proc. of the First PKDD Symp., pp. 88–100.

13. Duda, R. and Hart, P. (1973). Pattern Classiﬁcation and Scene Analysis.

John Wiley and Sons, Inc.

14. Faloutsos, C., Jagadish, H., Mendelzon, A. and Milo, T. (1997). Signature Technique for Similarity-Based Queries. In SEQUENCES 97.

15. Faloutsos, C. and Lin, K.-I. (May, 1995). FastMap: A Fast Algorithm for Indexing, Datamining and Visualization of Traditional and Multimedia Datasets. In Proc. ACM SIGMOD, pp. 163–174.

16. Faloutsos, C., Ranganathan, M., and Manolopoulos, I. (1994). Fast Sub-sequence Matching in Time Series Databases. In Proc. of ACM SIGMOD, pp. 419–429.

17. Gaﬀney, S. and Smyth, P. (1999). Trajectory Clustering with Mixtures of Regression Models. In Proc. of the 5th ACM SIGKDD, San Diego, CA, pp. 63–72.

18. Ge, X. and Smyth, P. (2000). Deformable Markov Model Templates for Time-Series Pattern Matching. In Proc. ACM SIGKDD.

19. Gionis, A., Indyk, P., and Motwani, R. (1999). Similarity Search in High Dimensions Via Hashing. In Proc. of 25th VLDB, pp. 518–529.

20. Gips, J., Betke, M. and Fleming, P. (July, 2000). The Camera Mouse: Pre-liminary investigation of automated visual tracking for computer access. In Proceedings of the Rehabilitation Engineering and Assistive Technology Soci-ety of North America 2000 Annual Conference, pp. 98–100, Orlando, FL.

21. Goldin, D. and Kanellakis, P. (1995). On Similarity Queries for Time-Series Data. In Proc. of CP ’95, Cassis, France.

22. Goldstein, J. and Ramakrishnan, R. (2000). Contrast Plots and p-Sphere Trees: Space vs. Time in Nearest Neighbour Searches. In Proc. of the VLDB, Cairo, pp. 429–440.

23. Grumbach, S., Rigaux, P., and Segouﬁn, L. (2000). Manipulating Interpolated Data is Easier than you Thought. In Proc. of the 26th VLDB, pp. 156–165.

24. Jagadish, H.V., Mendelzon, A.O., and Milo, T. (1995). Similarity-Based Queries. In Proc. of the 14th ACMPODS, pp. 36–45.

25. Kahveci, T. and Singh, A.K. (2001). Variable Length Queries for Time Series Data. In Proc. of IEEE ICDE, pp. 273–282.

26. Keogh, E., Chakrabarti, K., Mehrotra, S., and Pazzani, M. (2001).

Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. In Proc. of ACMSIGMOD, pp. 151–162.

27. Keogh, E. and Pazzani, M. (2000). Scaling up Dynamic Time Warping for Datamining Applications. In Proc. 6th Int. Conf. on Knowledge Discovery and Data Mining, Boston, MA.

28. Kollios, G., Gunopulos, D., and Tsotras, V. (1999). On Indexing Mobile Objects. In Proc. of the 18th ACM Symp. on Principles of Database Systems (PODS), pp. 261–272.

29. Lee, S.-L., Chun, S.-J., Kim, D.-H., Lee, J.-H., and Chung, C.-W. (2000).

Similarity Search for Multidimensional Data Sequences. In Proc. of ICDE, pp. 599–608.

30. Levenshtein, V. (1966). Binary Codes Capable of Correcting Deletions, Inser-tions, and Reversals. In Soviet Physics — Doklady 10, 10, pp. 707–710.

31. Park, S., Chu, W., Yoon, J., and Hsu, C. (2000) Eﬃcient Searches for Similar Subsequences of Diﬀerent Lengths in Sequence Databases. In Proc. of ICDE, pp. 23–32.

32. Perng, S., Wang, H., Zhang, S., and Parker, D.S. (2000). Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases. In Proc. of ICDE, pp. 33–42.

33. Pfoser, D., Jensen, C., and Theodoridis, Y. (2000) Novel Approaches in Query Processing for Moving Objects. In Proc. of VLDB, Cairo Egypt.

34. Pfoser, D. and Jensen, C.S. (1999). Capturing the Uncertainty of Moving-Object Representations. Lecture Notes in Computer Science, 1651.

35. Qu, Y., Wang, C., and Wang, X. (1998). Supporting Fast Search in Time Series for Movement Patterns in Multiple Scales. In Proc. of the ACM CIKM, pp. 251–258.

36. Raﬁei, D. and Mendelzon, A. (2000). Querying Time Series Data Based on Similarity. IEEE Transactions on Knowledge and Data Engineering,12(5), 675–693.

37. Rice, S.V., Bunke, H., and Nartker, T. (1997). Classes of Cost Functions for String Edit Distance. Algorithmica,18(2), 271–280.

38. Sakoe, H. and Chiba, S. (1978). Dynamic Programming Algorithm Optimiza-tion for Spoken Word RecogniOptimiza-tion. IEEE Trans. Acoustics, Speech and Signal Processing (ASSP),26(1), 43–49.

39. Saltenis, S., Jensen, C., Leutenegger, S., and Lopez, M.A. (2000). Indexing the Positions of Continuously Moving Objects. In Proc. of the ACM SIG-MOD, pp. 331–342.

40. Vlachos, M. (2001) Eﬃcient Similarity Measures and Indexing Structures for Multidimensional Trajectories, MSc Thesis, UC Riverside.

41. Vlachos, M., Gunopulos, D., Kollios, G. (2002) Discovering Multidimensional Trajectories. In Proc. of IEEE Int. Conference in Data Engineering (ICDE).

42. Wagner, R.A. and Fisher, M.J. (1974). The String-To-String Correction Problem. Journal of ACM,21(1), 168–173.

43. Weber, R., Schek, H.-J., and Blott, S. (1998). A Quantitative Analysis and Performance Study for Similarity Search Methods in High-Dimensional Spaces. In Proc. of the VLDB, NYC, pp. 194–205.

44. Yi, B.-K. and Faloutsos, C. (2000). Fast Time Sequence Indexing for Arbi-trary Lp Norms. In Proc. of VLDB, Cairo Egypt.

45. Yi, B.-K., Jagadish, H.V., and Faloutsos, C. (1998). Eﬃcient Retrieval of Similar Time Sequences Under Time Warping. In Proc. of the 14th ICDE, pp. 201–208.

CHAPTER 5

CHANGE DETECTION IN CLASSIFICATION MODELS

In document The Sardic Phase in the Ordovician of Southern Sardinia and Eastern Pyrenees: (página 161-165)