Algunas conclusiones - Políticas ante los avances de la economía digital

For this thesis I have chosen to use data from the microblogging site Twitter to represent consumer sentiment. Particular attention is paid to how this data was collected as this was developed specifically for this thesis, and this methodology should be transparent.

The social media platform Twitter is currently the most popular micro-‐blogging site in the world. On Twitter, users have 140 characters to express themselves to their ‘followers’ and the rest of the world. There are as of now, over 284 million active users each month, and over 500 million tweets are sent each day (Twitter Inc. 2014). Tweets are by default public; they are seen by users followers and can found by anyone searching for a term that a user has written about. It is also possible to ‘retweet’ what other users have written, namely sharing a users tweet on your own twitter-‐feed.

Twitter is known to be heavily populated by consumer opinions, and has been used to perform analysis of both customer and consumer sentiments in several studies (Chamlertwat, Bhattarakosol, Rungkasiri, & Haruechaiyasak, 2012; He et al., 2013; Mostafa, 2013; Pak & Paroubek, 2010). Part of the reason for this is that Twitter, as opposed to other social media platforms, has given access to some of their Application

5_{It
could
be
argued
that
this
data
can
either
be
skewed
towards
the
negative
or
the
positive.
If
a
customer
has
an
unresolved
issue,
it} might motivate the customer to write a very negative message despite having a pleasant interaction with customer service or a positive impression of the company as a whole. Similarly, the data could be skewed towards the positive if a customer has not had a problem solved, but because of a pleasant interaction with a customer service agent believes it will be resolved, the customer might answer positively regardless of the actual outcome of the situation.

Programming Interface (API) to developers. This makes the data more accessible than other social networks such as facebook, instagram and snapchat, which are also all in large part picture and video based. By signing up as a third party developer anyone can therefore access a selection of contemporary tweets, within the confines of what Twitter has found appropriate (Twitter Inc., 2014). This has made Twitter a particularly interesting avenue of research for academia, and is much of the reason why this platform has been used to find consumer data for this thesis. 6

In order to archive results from Twitter, I first had to obtain a developer license to gain access to the Twitter API. Within this API I created a search string containing the key phrase “Telenor”. Further, as my focus is on Telenor Norway, I limited the search to Norwegian tweets by setting the language to “NO” (the ISO 639-‐1 code for Norwegian). This query in the API creates a stream of tweets that is automatically updated every hour. However, this data is still in a data interchange format. Twitter uses an open standard called JSON, which is a format that uses human readable text to send data objects (JSON.org, 2014). Even though JSON is one of the more readable formats in data language processing, it cannot be placed directly into the text analysis software at hand. Therefore I have utilized a script that formats JSON into a standard spreadsheet format (xls/cvs).7_{This
gives
me
the
textual
information
of
the
tweet
as
well
as
other
metadata} in a format that is easy to import into the analytical software.

The collection of tweets began on 25/11/14, and ended on 27/03/15. This gathered all tweets in Norwegian that mentioned the word “Telenor” in this time period. After removing irrelevant tweets8_{,
the
remaining
dataset
analyzed
contains
5440
tweets
on} the subject of Telenor.

There are of course many ways to retrieve, store and analyze textual data from a platform such as Twitter. The method used here is particularly optimized to create compatible data with the Provalis Research Suite, so that the tweets will not only be

6_{Academic
research
on
the
platform
has
already
been
used
to
find
that
it
could
fairly
accurately
predict
the
stock
market
(Bollen
et} al., 2011), and function as a real-‐time detection of earthquakes (Sakaki, Okazaki, & Matsuo, 2010).

7_{For
more
on
the
scrip
used
visit:}_{https://tags.hawksey.info/}_{-‐
note
that
I
have
also
modified
this
script
to
perform
a
Norwegian} language search.

8_{Tweets
from
Telenor’s
own
accounts
(@telenor_service
etc.)
were
deleted
from
the
dataset,
as
this
thesis
is
focused
on
the} consumer’s sentiment, and not the company’s. Also, tweets that were automatically generated by Twitterbots or other spamming accounts were also removed, as they cannot be said to contain consumer feedback and therefore irrelevant in this context.

retrieved and stored, but can be analyzed by the same software application as used on the NPS customer feedback data.

These two sources of textual data (NPS and Tweets) are in a sense complementary. Both are textual feedback on a company, they are usually fairly short, colloquial and often contain a positive or negative sentiment. As data they provide much more detailed and vivid information than common surveys, as textual free form data can be on anything, from customer service, the company as a whole, or it services.

However, unstructured text is also difficult to handle. One central aspect is the messiness and ambiguity of written colloquial text. It can often be riddled with spelling errors, jargon and slang, or even meant ironically – which can be difficult to pick up on. This can also make it difficult to find all cases on the same topic, if they are written completely differently. However, a lot of this is less problematic than previously due to advancements in text mining software, which can now easily make dictionaries and word-‐categorizations that include common misspellings or slang. So despite the complexities of textual data, it can still be considered a rich source of information. After discussing textual data as a source I will next discuss the validity and reliability of this thesis.

In document Políticas ante los avances de la economía digital (página 45-57)