The limitations of EC can be grouped into technical and non-technical categories.
1.3.2.1 Technical Limitations of EC
• There is a lack of system security, reliability, standards, and some communication protocols.
• Insufficient telecommunication bandwidth.
• The software development tools are still evolving and changing rapidly.
• It is difficult to integrate the Internet and EC software with some existing applications and databases.
••••• Vendors may need special Web servers and other infrastructures, in addition to the network servers.
• Some EC software might not fit some hardware, or may be incompatible with some operating systems or other components.
1.3.2.2 Non-technical Limitations
Of the many non-technical limitations that slow the spread of EC, the following are the major ones.
• Cost and justification: The cost of developing EC in-house can be very high, and mistakes due to lack of experience may result in delays. There are many opportunities for outsourcing, but where and how to do it is not a simple issue. Furthermore, to justify the system one must deal with some intangible benefits (such as improved customer service and the value of advertisement), which are difficult to quantify.
• Security and privacy: These issues are especially important in the B2C area, especially security issues which are perceived to be more serious than they really are when appropriate encryption is used. Privacy measures are constantly improved. Yet, the customers perceive these issues as very important, and, the EC industry has a very long and difficult task of convincing customers that online transactions and privacy are, in fact, very secure.
• Lack of trust and user resistance: Customers do not trust an unknown faceless seller (sometimes they do not trust even known ones), paperless
NOTES
transactions, and electronic money. So switching from physical to virtual stores may be difficult.
• Other limiting factor:. Lack of touch and feel online. Some customers like to touch items such as clothes and like to know exactly what they are buying.
• Many legal issues are as yet unresolved, and government regulations and standards are not refined enough for many circumstances.
• Electronic commerce, as a discipline, is still evolving and changing rapidly.
Many people are looking for a stable area before they enter into it.
• There are not enough support services. For example, copyright clearance centres for EC transactions do not exist, and high-quality evaluators, or qualified EC tax experts, are rare.
• In most applications there are not yet enough sellers and buyers for profitable EC operations.
• Electronic commerce could result in a breakdown of human relationships.
• Accessibility to the Internet is still expensive and/or inconvenient for many potential customers. (With Web TV, cell telephone access, kiosks, and constant media attention, the critical mass will eventually develop.) Despite these limitations, rapid progress in EC is taking place. For example, the number of people in the United States who buy and sell stocks electronically increased from 300,000 at the beginning of 1996 to about 10 million in fall 1999. As experience accumulates and technology improves, the ratio of EC benefits to costs will increase, resulting in a greater rate of EC adoption. The potential benefits may not be convincing enough reasons to start EC activities
1.4 DATA MINING
1.4.1 Introduction to Data Mining
Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviours, allowing businesses to make proactive, knowledge-driven decisions.
The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time-consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.
NOTES
Most companies already collect and refine massive quantities of data. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line.1.4.1.1. What is Data Mining?
Data mining is the semi-automatic discovery of patterns, associations, changes, anomalies, rules, and statistically significant structures and events in data. That is, data mining attempts to extract knowledge from data.
Data mining differs from traditional statistics in several ways: formal statistical inference is assumption driven in the sense that a hypothesis is formed and validated against the data. Data mining in contrast is discovery driven in the sense that patterns and hypothesis are automatically extracted from data. Said another way, data mining is data driven, while statistics is human driven. The branch of statistics that data mining resembles most is exploratory data analysis, although this field, like most of the rest of statistics, has been focused on data sets far smaller than most that are the target of data mining researchers.
Data mining also differs from traditional statistics in that sometimes the goal is to extract qualitative models which can easily be translated into logical rules or visual representations; in this sense data mining is human centered and is sometimes coupled with human-computer interfaces research.
Data mining is a step in the data mining process, which is an interactive, semi-automated process which begins with raw data. Results of the data mining process may be insights, rules, or predictive models.
The field of data mining draws upon several roots, including statistics, machine learning, databases, and high performance computing.
Here, we are primarily concerned with large data sets, massive data sets, and distributed data sets. By large, we mean data sets which are too large to fit into the memory of a single workstation. By massive, we mean data sets which are too large to fit onto the disks of a single workstation or a small cluster of workstations. Instead, massive clusters or tertiary storage such as tape are required. By distributed, we mean data sets which are geographically distributed.
NOTES
The focus on large data sets is not a just an engineering challenge; it is an essential feature of induction of expressive representations from raw data. It is only by analyzing large data sets that we can produce accurate logical descriptions that can be translated automatically into powerful predictive mechanisms. Otherwise, statistical and machine learning principles suggest the need for substantial user input (specifying meta-knowledge necessary to acquire highly predictive models from small data sets).
1.4.2 The Scope of Data Mining
Data mining derives its name from the similarities between searching for valuable business information in a large database — for example, finding linked products in gigabytes of store scanner data — and mining a mountain for a vein of valuable ore. Both processes require either shifting through an immense amount of material, or intelligently probing it to find exactly where the value resides. Given databases of sufficient size and quality, data mining technology can generate new business opportunities by providing these capabilities:
• Automated prediction of trends and behaviours. Data mining automates the process of finding predictive information in large databases. A typical example of a predictive problem is targeted marketing. Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings. Other predictive problems include forecasting bankruptcy and other forms of default, and identifying segments of a population likely to respond similarly to given events.
• Automated discovery of previously unknown patterns. Data mining tools sweep through databases and identify previously hidden patterns in one step.
An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data that could represent data entry keying errors.
• Data mining techniques can yield the benefits of automation on existing software and hardware platforms, and can be implemented on new systems as existing platforms are upgraded and new products developed. When data mining tools are implemented on high performance parallel processing systems, they can analyze massive databases in minutes. Faster processing means that users can automatically experiment with more models to understand complex data.
High speed makes it practical for users to analyze huge quantities of data.
Larger databases, in turn, yield improved predictions.