RESUMEN EN ESPAÑOL
1. LA SALUD Y LOS SISTEMAS SANITARIOS.
1.2. Los determinantes de salud.
Social media provides users with opportunities for generating, sharing, seeking, and
receiving information in the context of multiuser communication (Kaplan & Haenlein, 2010;
Maness, 2006; Moorhead et al., 2013). In the context of social media, user status and activities
are complicated. Both quantitative and qualitative research methods have been applied to the
39
there are four different types of measurement scales including nominal, ordinal, interval and
ratio, while in qualitative studies, verbal data, observation data, document data and visual data
are collected (F. Gravetter & Forzano, 2015; Maxwell, 2012). In the social-media-related
studies, specific types of data are collected and utilized, including interval data, ratio data,
nominal data, ordinal data, text, and multimedia.
2.2.2.1. Data integration
In general, different types of data reflect characteristics of research objects from
different aspects. As mentioned before, different types of data, including numeric data, non-
numeric data, text and multimedia, are usually integrated in one study.
The nature of social media calls for the combination of different types of data, because
in social media different types of data are created together regularly (e.g. online product
reviews, videos on YouTube, and online maps). For instance, in order to detect the factors that
impact the helpfulness of product reviews, Mudambi and Schuff (2010) selected six products on
Amazon.com and captured all the reviews. They collected: (1) the star rating (1 to 5) the
reviewer gave the product; (2) the total number of people that voted in response to the
question, “Was this review helpful to you (yes/no)?” (p. 191) (3) the number of people who
voted that the review was helpful; and (4) the word count of the review (Mudambi & Schuff,
2010). Different data were used as different variables. In addition, product type was another
independent variable. Then all these variables were utilized in the Tobit regression. This study
implies that different types of data can be analyzed by the same approach. Sometimes, the data
40
users’ education background from their Twitter profile and categorized those data into three
categories. For some studies, one data collection approach or one data analysis method is not
enough. For example, He, Zha, and Li (2013) manually saved the data (both textual and numeric
data) from Facebook and Twitter about three pizza chains. Both SPSS Clementine text mining
tool and NVivo 9 were adopted for content analysis in the study.
In addition to data obtained from social media applications, data collected from
participants by survey, questionnaire or interview are also utilized. For instance, Gilbert and
Karahalios (2009) collected survey data to supplement the findings concluded from data
obtained from Twitter. They recruited 35 participants and collected the data of their friends on
Facebook, such as the number of messages, intimacy words, mutual friends, groups in common,
and so on. Meanwhile, the participants were asked to rate the strength of their friendships on
Facebook. Based on those two data collection approaches, they identified 74 predictors of tie
strength. The results showed that those predictors successfully predicted strong and weak ties
over 85% of the time with a certain data set containing more than 2000 Facebook posts. Ross,
Terras, Warwick, and Welsh (2011)investigated in what ways an academic community made
use of Twitter. They analyzed more than 4000 tweets with open coding and text mining to
detect user conventions. At the same time, they undertook a small qualitative survey so as to
ascertain users’ attitudes towards using a Twitter enabled backchannel in conference. In this
study, the data for both quantitative and qualitative analyses were integrated. These studies
show that although data collected directly from social media applications reveal some
characteristics of certain research objects; usually they do not reflect participants’ perceptions
41
allow researchers to gain insights into participants’ perceptions and motivations. Therefore, to
ensure the validity and reliability of research findings, various data collection approaches are
applied, and different types of data are integrated.
All these examples indicate that different types of data complement each other in
research studies. There are a variety of approaches available for collecting different types of
data from social media applications.
2.2.2.2. Data collection methods (1) Publicly available datasets
So far, a good deal of datasets containing social media data is accessible for researchers
(Paltoglou, 2014). These datasets contain various types of information, such as reviews,
comments, tweets, and so on. A particular example is the ICWSM Spinn3r Dataset (Burton et
al., 2009; Burton & Soboroff, 2011). This dataset contains several million blog posts scraped by
Spinn3r. In addition, in Blitzer, Dredze, and Pereira’s (2007) paper, they offered a new dataset
including Amazon product reviews for four types of products. The four types were books, DVDs,
electronics and kitchen appliances (Blitzer et al., 2007). Paltoglou, Thelwall, and Buckely (2010)
provided two datasets for textual sentiment analysis in their paper. One dataset consisted of
the information from the BBC Message Boards which contains users’ opinions about “ethical,
religious and news-related issues” (Paltoglou et al., 2010). The other dataset includes
information from a social network site, Digg. Both datasets were used in research studies
(Chmiel et al., 2011; Mitrović, Paltoglou, & Tadić, 2011; Thelwall, Buckley, & Paltoglou, 2011).
42
A lot of unique tools have been created for collecting social media data. Some of the
tools are browser extensions or plugins. Particular examples can be seen in the Firefox
extension Greasemonkey and the Chrome plugin NCapture. Gilbert and Karahalios (2009)
utilized the Firefox extension Greasemonkey to randomly select participants’ Facebook friends.
Furthermore, the Greasemonkey enabled the researchers to add survey questions on user’s
personal Facebook homepages. In this way, the researchers guided them to rate the tie
strengths between themselves and their friends. The NCapture provides ways to capture
Facebook wall posts and comments, Twitter content, and LinkedIn group discussions. However,
this tool needs to be utilized with NVivo for content analysis and cannot work independently. In
addition to extensions/plugins, there is a variety of software available. NodeXL, an add-on to
Excel, offers another approach to obtain social media data. It is capable of accessing and
gathering data from Outlook, Twitter, Facebook, Flickr and YouTube (Hansen, Shneiderman, &
Smith, 2010). For Facebook, it is able to collect fan lists, group discussion content, and timeline
data of certain users. For Twitter, it can capture the data of both user networks and tweets.
Apart from user networks, video information on YouTube and tags on Flickr can be obtained by
NodeXL as well. This tool is widely applied to social network analysis.
Facebook, Twitter, Yahoo! Answer, and some other social media allow researchers to
access and obtain the data in their databases by using APIs (Jansen, Zhang, Sobel, & Chowdury,
2009; Li, Lei, Khadiwala, & Chang, 2012). In addition to these social media applications, some
online services also provide accesses to data on social media. Jansen et al. (2009) used the
Summize4 to collect tweets. The Summize4 was a service for searching tweets, and it also
43
APIs are required to be employed with tools like Python or r (Meyer, Hornik, & Feinerer,
2008; M. A. Russell, 2013). Python and r are two of the most popular open source tools which
are frequently utilized to create scripts for collecting data from social media. For example,
Lipizzi, Iandoli, and Ramirez Marquez (2015) created a Python script to download tweets over
time and store them in their own database.
Browser extensions/plugins and software such as NodeXL have more user-friendly
interfaces, and are easier for researchers to use than Python and r. However, they are not as
flexible and powerful as the latter. APIs also have limitations. For instance, Twitter only allows
one hundred API calls per hour for one account. Therefore, all the approaches have their own
strengths and weaknesses.