• No se han encontrado resultados

Interesting findings were discovered after web usage survey (Pitkow et al. 1995). It was built on the special author’s designed web site architecture and available to users on the web. In order to collect information about the browsing peculiarity research was based on the adaptive questions and answers. Certain

standpoints about web users were done and later used as an example about Internet users habits.

Other researchers propose a powerful Web mining tool WUM (Faulstich et al. 1999; Faulstich et al. 1999). Authors state that all items in user’s path definition are equally important to discover exact user behaviour. It means that repeated pages in the same session are also essential and are analysed dissimilarly to authors in (Cooley et al. 1997b). WUM has implemented specific query language MINT (Faulstich et al. 1998) which supports predicates of the web pages and their occurrences. Navigational patterns are also supported by miner WUM. Later, in (Berendt et al. 2000), authors focused on the tool which is able to cope with the dynamic pages as well.

In (Han et al. 2000) Han et al. developed an efficient data structure web access pattern tree algorithm WAP to retrieve useful information from pieces of logs. Algorithms such like Apriori counted some difficulty when the length of the pattern grows long what is happening with Web log data. Therefore authors developed an algorithm for efficient mining of such huge sets of data with a lot of patterns. Algorithm is organised in a way that it scans database twice. First time it determines set of frequent events with some defined threshold. Second time system builds a WAP tree data structure using frequent events. Then WAP system mines the WAP tree using conditional searches.

A systematic study about development of data warehousing is presented in (Han et al. 1998) with the tool WebLogMiner (Han et al. 1997). To examine web log transaction’s patterns, authors applied on-line analytic processing language (OLAP). OLAP allows characterize and examine data in the web log, view associations, predict values of the attributes, construct the models for each given class based upon the features of the web log data, produce time-series analysis. OLAP provides business analyst with a million spreadsheets at a time available in a logical and hierarchical structure. The analysis can go higher or deeper levels to look at the data from different perspectives (Peterson et al. 2000). Scientist encourage using OLAP as an applicable for analysis and visualisation tool mining web logs (Dyreson 1997).

Authors in (Balabanovic et al. 1995; Cooley et al. 2000) presented a feedback system which adapts and produces better pages for the following day. The system learns behaviour from users. Number of data mining techniques was implemented in the proposed system: clustering, frequent items and association rules. Recommendation engine computes a set of recommendation for the current session consisting of pages that user might want to visit because it is based on similar user patterns.

Model that takes both the travelling and purchasing patterns of customers into consideration was described in (Yun et al. 2000) and (Yun et al. 2000;

Pabarskaite 2003). Developed algorithms extracted meaningful web transaction records and determined large transaction patterns.

To predict future request using path that contain the ordered list of URLs accessed by the users within specified time constraint is described in (Schechter et al. 1998). The given methodology predicted pages request with the high level of accuracy what reduces time servers spend on generating the page.

Process of web log caching is a software integrated into caching/proxy server (Pitkow et al. 1994b). The algorithm implemented into this software is based on psychological research on human memory retrieval. It collects past access patterns and predicts further user actions. Authors also noticed that recent rates for document access are stronger indicators for future document requests than frequency indicator. Web log caching have been implemented by number of other researchers works (Glassman 1994; Luotonen et al. 1994).

Perkowitz and Etzioni (Perkowitz et al. 1997a; Perkowitz et al. 1997b) challenge scientists to concentrate on creation adaptive Web sites using modern artificial intelligent (AI) techniques. It means, that web sites must automatically expand their administration and management by learning from users access patterns. To achieve this, site developers should focus on the customization (modify web pages to suit users needs) and optimisation (to make navigation of the site easier). In (Perkowitz et al. 1999) and (Perkowitz et al. 1998) Perkowitz and Etzioni described algorithm PageGather which uses new cluster mining technique for indicating URLs sets for adaptive Web sites. This semi-automatic algorithm improves the organization and presentation of the web site by learning from visitors behaviour.

The system, which is based on gathering usage patterns on every day’s data collection, is presented in (Balabanovic et al. 1995). Firstly, data is collected and analysed. Secondly, the system called LIRA is able to make recommendations for the following day. Recommendations consist of selection documents which system thinks users will find interesting. Accordingly selected pages produced much better results than random selected pages.

Author in (Sarukkai 2000) suggests for link prediction and path analysis to use Markov chains. Since Markov chain model consist of a matrix of state probabilities, Markov chains allow the system to model dynamically URL access patterns.

How to reduce main memory requirement analysing web logs is shown in (Xiao et al. 2001). Authors introduced the problem which appears mining traversal patterns having duplicates. An effective suffix tree algorithm is presented which compress and prunes database and reduces main memory usage.

Documento similar