Data, as earlier explained, is a collection of pieces of facts (numbers, words, measurements, observations, etc.) that can be stored and later translated into meaningful information. Data can also be collected and stored in very large volumes amounting to what is referred to as „Big data‟. The term “Big Data” refers to data that is large, fast and complex such that it is difficult to process using traditional methods of storage and analysis. Sagiroglu (2013) and Magoulas and Lorica (2009) refer to Big Data as a collection of data that is voluminous, yet growing exponentially with time. It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently. Big Data has emerged as a significant area of study for researchers in the field of social sciences. As a subtheme in social science, Big Data explains the various ways to methodically collect, analyse, systematically extract information from, or otherwise deal with data sets that are large or complex to be dealt with by traditional data processing techniques.
3.4.1 Characteristics of Big Data
Big data can be described by the following characteristics: Volume, Variety, Velocity, and Variability.
a) Volume: The name Big Data suggests that it is related to a size which is enormous. The size of data plays a very crucial role in determining how the data will be processed, analysed and stored. Furthermore, whether a particular data can be considered as Big Data or not is dependent upon the volume of data. Hence, 'Volume' is one characteristic which needs to be considered while dealing with Big Data (SAS Institute, 2014; Sagiroglu, 2013).
b) Variety: The next aspect of Big Data is attribute of variety. Variety here refers to heterogeneous sources of the data, and the nature of data. During earlier
84
days, spreadsheets and databases were the only sources of data considered by most of the applications. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in the analysis applications. This variety of unstructured data poses certain issues for storage, mining and analysing data (SAS Institute, 2014; Sagiroglu, 2013).
c) Velocity: The term 'velocity' refers to the speed of generation of data. How fast the data is generated and processed to meet the demands, determines real potential in the data. Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors, Mobile devices, etc. The flow of data is massive and continuous (Guru99, 2020; SAS Institute, 2014).
d) Variability: This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively (Guru99, 2020; Sagiroglu, 2013).
According to Akoka, Comyn-Wattiau, and Laoufi (2017), SAS Institute (2014), and Sagiroglu (2013), Big Data can be of the following types: Structured, Unstructured, and Semi-structured.
Structured: Any data that can be stored, accessed and processed in the form of fixed format is termed as a 'structured' data. Over the period of time, talent in computer science has achieved greater success in developing techniques for working with such kind of data (where the format is well known in advance) and also deriving value out of it. However, nowadays, we are foreseeing issues when a size of such data grows to a huge extent, typical sizes are being in the rage of multiple zettabytes.
Unstructured: Any data with unknown form or the structure is classified as unstructured data. In addition to the size being huge, un-structured data poses multiple challenges in terms of its processing for deriving value out of it. A typical example of unstructured data is a heterogeneous data source containing
85
a combination of simple text files, images, videos etc. Now day organizations have wealth of data available with them but unfortunately, they don't know how to derive value out of it since this data is in its raw form or unstructured format.
Semi-structured: Semi-structured data can contain both the forms of data. We can see semi-structured data as a structured in form but it is actually not defined with e.g., a table definition in relational DBMS. Example of semi-structured data is a data represented in an XML file.
4.0 Conclusion
Data collection is an integral part of research, without which research cannot be said to be complete or meaningful. Through data collection, investigators collect relevant pieces of facts that can be processed into meaningful information. However, the data collection process follows some standardized processes and principles that ensures its reliability and validity. Social researches rely on data to make inferences about social behaviours, traits, actions and many other events in our complex society.
5.0 Summary
This section has examined the various methods of field work and data collection in the social sciences. Field work is the crux of research in that it is the phase of research where data is collected to corroborate the assumptions of the study. In social science field work, variety of measures are employed to select samples and collect data from respondents. Some of the considerations in selection of samples includes accessibility, convenience and representativeness of the sample with the total population. Data collection also involves the use of quantitative or qualitative methods such as interviews, questionnaires, observation, and experimentation. Data can also be enormous and requiring a higher level of analyses and storage as in big data.
86