• No se han encontrado resultados

RESUMEN EN ESPAÑOL

1. LA SALUD Y LOS SISTEMAS SANITARIOS.

1.2. Los determinantes de salud.

Social media provides users with opportunities for generating, sharing, seeking, and

receiving information in the context of multiuser communication (Kaplan & Haenlein, 2010;

Maness, 2006; Moorhead et al., 2013). In the context of social media, user status and activities

are complicated. Both quantitative and qualitative research methods have been applied to the

39

there are four different types of measurement scales including nominal, ordinal, interval and

ratio, while in qualitative studies, verbal data, observation data, document data and visual data

are collected (F. Gravetter & Forzano, 2015; Maxwell, 2012). In the social-media-related

studies, specific types of data are collected and utilized, including interval data, ratio data,

nominal data, ordinal data, text, and multimedia.

2.2.2.1. Data integration

In general, different types of data reflect characteristics of research objects from

different aspects. As mentioned before, different types of data, including numeric data, non-

numeric data, text and multimedia, are usually integrated in one study.

The nature of social media calls for the combination of different types of data, because

in social media different types of data are created together regularly (e.g. online product

reviews, videos on YouTube, and online maps). For instance, in order to detect the factors that

impact the helpfulness of product reviews, Mudambi and Schuff (2010) selected six products on

Amazon.com and captured all the reviews. They collected: (1) the star rating (1 to 5) the

reviewer gave the product; (2) the total number of people that voted in response to the

question, “Was this review helpful to you (yes/no)?” (p. 191) (3) the number of people who

voted that the review was helpful; and (4) the word count of the review (Mudambi & Schuff,

2010). Different data were used as different variables. In addition, product type was another

independent variable. Then all these variables were utilized in the Tobit regression. This study

implies that different types of data can be analyzed by the same approach. Sometimes, the data

40

users’ education background from their Twitter profile and categorized those data into three

categories. For some studies, one data collection approach or one data analysis method is not

enough. For example, He, Zha, and Li (2013) manually saved the data (both textual and numeric

data) from Facebook and Twitter about three pizza chains. Both SPSS Clementine text mining

tool and NVivo 9 were adopted for content analysis in the study.

In addition to data obtained from social media applications, data collected from

participants by survey, questionnaire or interview are also utilized. For instance, Gilbert and

Karahalios (2009) collected survey data to supplement the findings concluded from data

obtained from Twitter. They recruited 35 participants and collected the data of their friends on

Facebook, such as the number of messages, intimacy words, mutual friends, groups in common,

and so on. Meanwhile, the participants were asked to rate the strength of their friendships on

Facebook. Based on those two data collection approaches, they identified 74 predictors of tie

strength. The results showed that those predictors successfully predicted strong and weak ties

over 85% of the time with a certain data set containing more than 2000 Facebook posts. Ross,

Terras, Warwick, and Welsh (2011)investigated in what ways an academic community made

use of Twitter. They analyzed more than 4000 tweets with open coding and text mining to

detect user conventions. At the same time, they undertook a small qualitative survey so as to

ascertain users’ attitudes towards using a Twitter enabled backchannel in conference. In this

study, the data for both quantitative and qualitative analyses were integrated. These studies

show that although data collected directly from social media applications reveal some

characteristics of certain research objects; usually they do not reflect participants’ perceptions

41

allow researchers to gain insights into participants’ perceptions and motivations. Therefore, to

ensure the validity and reliability of research findings, various data collection approaches are

applied, and different types of data are integrated.

All these examples indicate that different types of data complement each other in

research studies. There are a variety of approaches available for collecting different types of

data from social media applications.

2.2.2.2. Data collection methods (1) Publicly available datasets

So far, a good deal of datasets containing social media data is accessible for researchers

(Paltoglou, 2014). These datasets contain various types of information, such as reviews,

comments, tweets, and so on. A particular example is the ICWSM Spinn3r Dataset (Burton et

al., 2009; Burton & Soboroff, 2011). This dataset contains several million blog posts scraped by

Spinn3r. In addition, in Blitzer, Dredze, and Pereira’s (2007) paper, they offered a new dataset

including Amazon product reviews for four types of products. The four types were books, DVDs,

electronics and kitchen appliances (Blitzer et al., 2007). Paltoglou, Thelwall, and Buckely (2010)

provided two datasets for textual sentiment analysis in their paper. One dataset consisted of

the information from the BBC Message Boards which contains users’ opinions about “ethical,

religious and news-related issues” (Paltoglou et al., 2010). The other dataset includes

information from a social network site, Digg. Both datasets were used in research studies

(Chmiel et al., 2011; Mitrović, Paltoglou, & Tadić, 2011; Thelwall, Buckley, & Paltoglou, 2011).

42

A lot of unique tools have been created for collecting social media data. Some of the

tools are browser extensions or plugins. Particular examples can be seen in the Firefox

extension Greasemonkey and the Chrome plugin NCapture. Gilbert and Karahalios (2009)

utilized the Firefox extension Greasemonkey to randomly select participants’ Facebook friends.

Furthermore, the Greasemonkey enabled the researchers to add survey questions on user’s

personal Facebook homepages. In this way, the researchers guided them to rate the tie

strengths between themselves and their friends. The NCapture provides ways to capture

Facebook wall posts and comments, Twitter content, and LinkedIn group discussions. However,

this tool needs to be utilized with NVivo for content analysis and cannot work independently. In

addition to extensions/plugins, there is a variety of software available. NodeXL, an add-on to

Excel, offers another approach to obtain social media data. It is capable of accessing and

gathering data from Outlook, Twitter, Facebook, Flickr and YouTube (Hansen, Shneiderman, &

Smith, 2010). For Facebook, it is able to collect fan lists, group discussion content, and timeline

data of certain users. For Twitter, it can capture the data of both user networks and tweets.

Apart from user networks, video information on YouTube and tags on Flickr can be obtained by

NodeXL as well. This tool is widely applied to social network analysis.

Facebook, Twitter, Yahoo! Answer, and some other social media allow researchers to

access and obtain the data in their databases by using APIs (Jansen, Zhang, Sobel, & Chowdury,

2009; Li, Lei, Khadiwala, & Chang, 2012). In addition to these social media applications, some

online services also provide accesses to data on social media. Jansen et al. (2009) used the

Summize4 to collect tweets. The Summize4 was a service for searching tweets, and it also

43

APIs are required to be employed with tools like Python or r (Meyer, Hornik, & Feinerer,

2008; M. A. Russell, 2013). Python and r are two of the most popular open source tools which

are frequently utilized to create scripts for collecting data from social media. For example,

Lipizzi, Iandoli, and Ramirez Marquez (2015) created a Python script to download tweets over

time and store them in their own database.

Browser extensions/plugins and software such as NodeXL have more user-friendly

interfaces, and are easier for researchers to use than Python and r. However, they are not as

flexible and powerful as the latter. APIs also have limitations. For instance, Twitter only allows

one hundred API calls per hour for one account. Therefore, all the approaches have their own

strengths and weaknesses.