• No se han encontrado resultados

Informe de Revisión de los Registros de Acceso

in inference problems.17 SQL was used by those researchers to generate sets

of attributes that were useful in identifying item clusters. Lee and Liu sug- gested a way to utilize agent technology to expedite Web mining that utilized SQL. Grossman et al. presented a predictive model markup language to aid data mining.18

Key measures in association rule mining include support and confi- dence. Support refers to the degree to which a relationship appears in the data. Confidence relates to the probability that if a precedent occurs, a con- sequence will occur. The rule X oY has minimum support value min- sup if minsup percent of transactions support X ‰Y, the rule X o Y

holds with minimum confidence value minconf if minconf percent of trans- actions that support X also support Y. For example, from the transactions kept in supermarkets, an association rule such as “Bread and But- teroMilk” could be identified through association mining.

Market-Basket Analysis

Market-basket analysis refers to methodologies studying the composition of a shopping basket of products purchased during a single shopping event. This technique has been widely applied to grocery store operations (as well as other retailing operations, to include restaurants). Market basket data in its rawest form would be the transactional list of purchases by customer, indicating only the items purchased together (with their prices). This data is challenging because of a number of characteristics:19

x A very large number of records (often millions of transactions per day)

x Sparseness (each market basket contains only a small portion of items carried)

x Heterogeneity (those with different tastes tend to purchase a specific subset of items).

The aim of market-basket analysis is to identify what products tend to be purchased together. Analyzing transaction-level data can identify

17 S. Lopes, J.-M. Petit, L. Lakhal (2002). Functional and approximate dependency mining: Database and FCA points of view, Journal of Experimental Theoretical Artificial Intelligence 14, 93–114.

18 R.L. Grossman, M.F. Hornick, G. Meyer (2002). Data mining standards initia- tives, Communications of the ACM 45:8, 59–61.

19 C. Apte, B. Liu, E.P.D. Pednault, P. Smyth (2002). Business applications of data mining, Communications of the ACM 45:8 49–53.

56 4 Association Rules in Knowledge Discovery

purchase patterns, such as which frozen vegetables and side dishes are purchased with steak during barbecue season. This information can be used in determining where to place products in the store, as well as aid in- ventory management. Product presentations and staffing can be more intel- ligently planned for specific times of day, days of the week, or holidays. Another commercial application is electronic couponing, tailoring coupon face value and distribution timing using information obtained from market- baskets.20 Data mining of market basket data has been demonstrated for a

long time,21 and has been applied in a variety of applications.22

Due to increasing size of data sets in databases and data warehouses, researchers have tried different architectural alternatives for efficient coupling of mining with database systems. Experiments with real life datasets on some of the methods such as loose-coupling through a SQL cursor interface, encapsulation of a mining algorithm in a stored procedure, caching the data to a file system on-the-fly and mining, tight-coupling using primarily user- defined functions, and SQL implementations for processing in the DBMS have been conducted and compared.23 The focal point of research in data

mining is increasing efficiency, but simplicity and portability have also been considered.24

Market Basket Analysis Benefits

The ultimate goal of market basket analysis is finding the products that customers frequently purchase together. The stores can use this informa- tion by putting these products in close proximity of each other and making them more visible and accessible for customers at the time of shopping. These assortments can affect customer behavior and promote the sales for complement items. The other use of this information is to decide about the layout of catalogs and put the items with strong association together in sales catalogs. The advantage of using sales data for promotions and

20 G.J. Russell, A. Petersen (2000). Analysis of cross category dependence in mar- ket basket selection, Journal of Retailing 78:3, 367–392.

21 E. Simoudis (1996). Reality check for data mining, IEEE Expert Intelligent Sys-

tems & Their Applications 11:5, 26–33.

22 R. Reed (1995). Household ethnicity, household consumption: Commodities and the Guarani, Economic Development & Cultural Change 44:1, 129–145. 23 S. Sarawagi, S. Thomas, R. Agrawal (1998). Integrating association rule mining

with relational database systems: Alternatives and implications, Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, Seattle, WA, United States, 343–354.

Market-Basket Analysis 57

store layout is that the consumer behavior information determines the items with associations. This information may vary based on the area and the assortments of available items in stores and the point of sale data re- flects the real behavior of the group of customers that frequently shop at the same store. Catalogs that are designed based on the market basket analysis are expected to be more effective on consumer behavior and sales promotion.

Market basket analysis is an undirected method. It doesn’t involve choos- ing specific products and finding their associations with other items. This method can reveal the associations that may be unknown to the store man- agement. The products that are most important are revealed through this analysis. The current study provides the method of querying for specific products as well. Market basket analysis can be used to identify the items frequently sold to new customers and profiling the customer baskets during a period of time by identifying customers through membership shopping cards.

Demonstration on Small Set of Data

Market basket analysis involves large scale datasets. To demonstrate methods, we will generate a prototypical dataset consisting of 10 grocery items, with 25 market baskets. Table 4.1 shows the raw data.

Table 4.1. Prototypical data

Basket Milk Eggs Bread Beer Water Cola Apples Beans Peas Diapers

1 X X 2 X X X X X 3 X X X X X X X 4 X X X X X 5 X X X X X 6 X X X 7 X X 8 X X X X 9 X X X X 10 X X X 11 X X 12 X X X 13 X X X 14 X X X 15 X X 16 X X X 17 X X 18 X X X X X X X X X X (Continued)

58 4 Association Rules in Knowledge Discovery

Basket Milk Eggs Bread Beer Water Cola Apples Beans Peas Diapers

19 X X X X X X 20 X X X 21 X X 22 X X X X X X X 23 X X 24 X X 25 X X X X X

By sorting on the item with the most volume, counts for cross-sales can be generated as in Table 4.2.

Table 4.2. Cross-Sales

Apples Bread Milk Cola Beans Eggs Beer Diapers Peas Water

Apples 15 8 10 8 5 5 3 5 6 4 Bread 8 14 11 8 8 5 4 4 3 1 Milk 10 11 13 6 6 5 3 5 4 1 Cola 8 8 6 10 5 3 2 2 3 1 Beans 5 8 6 5 10 4 3 3 3 2 Eggs 5 5 5 3 4 8 4 4 2 2 Beer 3 4 3 2 3 4 8 3 1 1 Diapers 5 4 5 2 3 4 3 7 2 2 Peas 6 3 4 3 3 2 1 2 6 3 Water 4 1 1 1 2 2 1 2 3 4 TOTAL 15 14 13 10 10 8 8 7 6 4 Table 4.1. (Continued)

The diagonal contains the total number of market baskets containing each item. It can be seen that most who purchased apples also purchased milk, bread, and cola. Of those that purchased beer, few purchased water. One customer purchased everything, so there are no zeros in this matrix. Realize that this set of data appears to have inflated sales, but it is for purposes of demonstration. Real groceries would have many more items, with many more zeros. But our data can be used to demonstrate key measures in association rules. Support is the number of cases for a given pair. The support for apples and milk is 10. For beer and water it is one. Minimum support can be used to control the number of association rules obtained, with higher support requirements yielding fewer pairs. A minimum support of 1 would have 45 pairs, the maximum for 10 items. Minimum support of 15 would have no rules. Minimum support of 10 would have 2 {Apples-Milk 10 cases; Bread-Milk 11 cases}. Note that association rules can go beyond pairs to triplets, sets of four, and so on. But usually pairs are of interest (and much easier to identify).

Documento similar