Capítulo I. Preguntando por la crossmedia y la participación ciberciudadana
Capítulo 2. Aproximaciones teóricas a la cibercultura, desde la participación ciudadana, la
2.4. Colectivos en red
2.4.6. Comunidades de práctica
Data analysis is a method of processing, examining, cleaning, converting and modelling data with the purpose of discovering interesting hidden information that can assist with decision-making (Han et al., 2012). In data analysis, diverse techniques and approaches are used to extract meaningful information; the particular approaches that are used rely on the variables and science disciplines (Ott and Longnecker, 2015). In statistical applications, data analysis can be divided into:
• A descriptive statistic which describes the results in terms of a statistical model; • Exploratory data analysis that concentrates on learning and discovering new
37 • Confirmatory data analysis that examines and confirms the existing hypotheses
(Neuman and Robson, 2012).
One of the data analysis methods is data mining techniques. These techniques are used for modelling data analysis and knowledge discovery of large data, the purpose of which is to predict rather than purely describe data (Zhu, 2007). Therefore, data mining techniques can also be referred to as predictive analytic techniques since these techniques involve the application of statistical models (Han et al., 2012) for the purposes of forecasting and data classification. Furthermore, these data mining techniques can be used to analyse text and linguistics by extracting and classifying information from textual sources (Miner, 2012; Larose, 2014).
A sample case of implementing data mining techniques to find hidden information in the database of a bookstore is identifying and understanding the patterns of the relationship between variables from the database. Every purchase or sale of items in a bookstore is recorded in a database, whereby one of the tables stores transactional data that contains information about customer identity, age, address, credit card, and the purchased items. These variables are processed further by data mining techniques to find the relationship between them, so that patterns emerge from the data (Raorane et al., 2012). This can be achieved by combining each item with other purchased items. Then, the items that have been purchased in the same transaction will be determined, and the frequency of sets of item will be obtained. The frequent item set is commonly referred to as the frequent pattern (Chapman and Feit, 2015).
As a further example, data mining techniques have been applied to analyse data in texts on social networks, such as predicting opinions in messages (Bhagat et al., 2011), analysing current topics and issues on social networks (Lee et al., 2011),
38 and identifying the characteristics of users from social networks (Spiegel, 2011). Hence, data mining techniques can be used for classification purposes. This particular classification technique is used to predict the probability of data classification either in the form of texts or numbers. However, in this technique, data training is necessary for estimating data classification. Data training is a set of data values, often applied to distribute pre-defined class labels to new objects (Raschka, 2014).
In terms of functionality, data mining techniques can be categorised according to some common tasks. These tasks include finding the association rules that link objects together, clustering objects together based on some characteristics, classification of objects into predefined groups, detecting outliers as well as regression and summarization (Han et al., 2012). Several data mining techniques are described below:
• Association rules are used to discover the relationship between the variables that exist in the database (Raorane et al., 2012; Chapman and Feit, 2015). • Clustering is applied to find groups and data structures of unknown
classification in the database, where the data have similarities with each other (Jain, 2010; Shiells and Pham, 2010).
• Classification is applied to generalise data structures that its own group has previously known, into a new group of data. Classification involves supervised learning techniques that classify data items according to pre-defined class labels (Kotsiantis et al., 2007; Richards, 2013).
• Outlier detection is used to detect any unusual data that are very different from expectations. Any unusual data will be revealed and subjected to further in- depth investigation (Bakar et al., 2006; Angiulli, 2009).
39 • Regression is used to estimate the response variable value based on one or more predictor variables in which the variables are in numerical form (Witten and Frank, 2011; Han et al., 2012).
• Summarization is used to provide a report concerning data collection as calculated using the algorithm. The reports can be presented as graphs or regular reports (Kantardzic, 2011; Han et al., 2012).
The types of data mining techniques listed above can be used for a variety of purposes in research, business, and government. From the academic perspective, data mining techniques can be termed as statistical methods, since they need to be further investigated and refined according to the various types of data which include data streams, ordered or sequenced data, graph or networked data, spatial data, text data, multimedia data, and websites (Han et al., 2012). Similarly, in the business field, data mining techniques can assist businesses to increase their sales and company profits (Larose, 2014).
In Government organisations, such as those in the United States, data mining techniques are applied to identify terrorism networks throughout the world (DeRosa, 2004; Thuraisingham, 2004). The United States government employs several data mining techniques such as association rules and classification techniques to find the link between people suspected of terrorism and their followers and friends in their networks (DeRosa, 2004). These techniques are effective and helpful in obtaining information that can provide decision-making support to counteract terrorism in the United States.
40
Table 1 the Implementation Data Mining Techniques
No. The use of data mining techniques Authors
1. A study of development data mining techniques
(Bakar et al., 2006; Zhu, 2007; Angiulli, 2009).
2. Implementation of data mining in business
(Berry and Linoff, 2004; Hsieh, 2004; Yeh and Lien, 2009).
3. Using data mining techniques by government
(DeRosa, 2004; Thuraisingham, 2004; Shaikh et al., 2007; Shacheng, 2012).
4. Analysing social networks using data mining techniques
(Barbier and Liu, 2011; Aggarwal et al., 2011; Cooper, 2012; Liu, 2012; Nahar et al., 2012; Kontostathis et al., 2013; Nahar, 2014).
Data mining techniques can serve as tools for analysing the content on social networks (Barbier and Liu, 2011; Cooper, 2012); for instance, by detecting sentiment content through websites (Aggarwal et al., 2011; Liu, 2012), thus eventually identifying either positive or negative content that may lead to cyberbullying or non-cyberbullying messages (Nahar et al., 2012; Kontostathis et al., 2013; Nahar, 2014).