• No se han encontrado resultados

5. Propuesta de intervención

5.4. Diseño de las actividades

5.4.1. Cuadro resumen de las actividades

This research was supported by the National Science Council of the Republic of China under contract NSC94-2213-E-390-005.

References

1. Au W H, Chan K C C (2004) Mining fuzzy rules for time series classification. The 2004 IEEE International Conference on Fuzzy Systems, Vol. 1, pp. 239–244 2. Agrawal R, Psaila G, Wimmers E L, Zait M (1995) Querying shapes of histories.

The 21st International Conference on Very Large Databases, pp. 502–514 3. Agrawal R, Srikant R (1994) Fast algorithm for mining association rules. The

International Conference on Very Large Databases, pp. 487–499

4. Chen S M, Hwang J R (2000) Temperature prediction using fuzzy time series. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, Vol. 30, No. 2, pp. 263–275

5. Hettich S, Bay S D (1999) The UCI KDD Archive, Department of Information and Computer Science, University of California, Irvine, CA

6. Hong T P, Kuo C S, Chi S C (1999) Mining association rules from quantitative data. Intelligent Data Analysis, Vol. 3, No. 5, pp. 363–376

7. Hong T P, Kuo C S, Chi S C (2001) Trade-off between time complexity and number of rules for fuzzy mining from quantitative data. International Journal of Uncertainty, Fuzziness and Knowledge-based Systems, Vol. 9, No. 5, pp. 587–604 8. Indyk P, Koudas N, Muthukrishnan S (2001) Identifying representative trends in massive time series data sets using sketches. The 26th International Conference on Very Large Data Bases, pp. 363–372

9. Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduc- tion for fast similarity search in large time series databases. Journal of Knowl- edge and Information Systems, Vol. 3, No. 3, pp. 263–286

10. Lee Y C, Hong T P, Lin W Y (2004) Mining fuzzy association rules with multiple minimum supports using maximum constraints. Lecture Notes in Computer Science, Vol. 3214, pp. 1283–1290

11. Patel P, Keogh E, Lin J, Lonardi S (2002) Mining motifs in massive time series databases. The IEEE International Conference on Data Mining, pp. 370–377 12. Song Q, Chissom B S (1993) Fuzzy time series and its models. Fuzzy Sets

System, Vol. 54, No. 3, pp. 269–277

13. Udechukwu A, Barker K, Alhajj R (2004) Discovering all frequent trends in time teries. The 2004 Winter International Symposium on Information and Commu- nication Technologies, pp. 1–6

14. Watanabe N (2004) A fuzzy rule based time series model. The IEEE Annual Meeting on Fuzzy Information, Vol. 2, pp. 936–940

15. Yi B K, Faloutsos C (2000) Fast time sequence indexing for arbitrary Lp norms. The 26th International Conference on Very Large Databases, pp. 385–394

I-Jen Chiang1,3, Tsau Young (‘T. Y.’) Lin2, Hsiang-Chun Tsai3

Jau-Min Wong3, and Xiaohua Hu4

1 Graduate Institute of Medical Informatics, Taipei Medical University, 205, Wu-Hsien Street, Taipei, Taiwan, ROC

[email protected]

2 Department of Computer Science, San Jose State University, One Washington Square, San Jose, CA, USA

[email protected]

3 Graduate Institute of Biomedical Engineering, National Taiwan University, No.1, Sec. 1, Jen-Ai Road, Taipei, Taiwan, ROC

4

College of Information Science and Technology, Drexel University, Philadelphia, PA 19104, USA

[email protected]

Summary. To organize a huge amount of Web pages into topics, according to

their relevance, is the efficient and effective method for information retrieval. Latent Semantic Space (LSS) naturally in the form on some geometric structure inCom- binatorial Topologyhas been proposed for unstructured document clustering. Given a set of Web pages, the set of associations among frequently co-occurring terms in them forms naturally a CONCEPT, which is represented as a set of connected componentsof the simplicial complexes. Based on these concepts, Web pages can be clustered into meaningful categories.

1 Introduction

To adequately handle documents, a methodology to represent or to reveal their latent semantics are needed. To date, no universally accepted effective methodology has been discovered. In previous paper [15], we have pictured the latent semantics geometrically and call it the Latent Semantic Space (LSS) of the given set of documents. We take the key terms as vertices and visualize the term-associations(frequent co-occurring terms) as simplicial complex in LSS. Our thesis has been: a maximal connected component represents a CONCEPT in LSS of a collection of documents. However, in [15], we have not explored the full thesis, we consider only the PIMITIVE COMCEPTs of the highest dimension. Technically, we consider only the maximal connect components of the skeleton of the highest layer. In this paper, we explore the full notion

I-J. Chiang et al.: Latent Semantic Space for Web Clustering, Studies in Computational Intelligence (SCI)118, 61–77 (2008)

of PRINITIVE CONCEPTs and the results are very encouraging.1 These

results can directly obtained from search engines. All the returned results are automatically clustered into different topics. The authoritative web pages in each topic are ranked based on how similar web pages belong to the topic. The experimental results indicate that we have an effective way to organize the large amount of return from a web query.

Internet is an information ocean. How to marshal large amount of returned web pages, paragraphs or sentences is the key issue. Roughly speaking, we de- compose (triangulate, partition, granulate) LSS of documents (e.g., returned web pages or sentences) intosimplicial complexin combinatorial topology [23], which could be viewed a special form of hypergraphs. However, we should note that the notion of simplicial complexes is actually predated that of hy- pergraphs about half a century, even though the latter notion is more familiar to modern computer scientists.

Let us recall some examples to illustrate the main intuition. The associa- tion that consists of “wall” and “street” denotes some financial notions that have meaning beyond the two nodes, “wall” and “street”. This is similar to the notion of open segment (v0, v1)) that represents one dimensional geo-

metric object, 1-simplex, that carries information beyond the two end points. In general, an r-association represents some semantic generated by a set of

r keywords, may have more semantics or even have nothing to do with the individual keywords. A mathematical structure that reflects such phenomena is the notion of simplicial complex in combinatorial topology; see Sect. 3.

The thesis of this paper is that the simplicial complex of term-associations reflects the structure of the concepts in LSS of the documents. Based on such conceptual structure, the documents (returned pages, paragraph, or sentences) can be effectively clustered.

Documento similar