• No se han encontrado resultados

Problemas de la oferta turísƟ ca Balance de síntesis

1. FUNDAMENTOS TEÓRICOS Y COORDENADAS ANALÍTICAS

1.5. TURISMO Y OCIO UNA APROXIMACIÓN NECESARIA

1.5.10. Problemas de la oferta turísƟ ca Balance de síntesis

A dark retweet is defined as a tweet that is propagating another tweet, but does not use conventional retweeting mechanisms such as a) using Twitter’s proprietary retweeting mechanism, or b) using common retweet markers such as “Rt” and “via” within the tweet text.

4.4.1 Retweetability Confidence Factors

Differentiating between an original tweet and a dark retweet is non-trivial, as dark retweets do not have any of the conventional markers that would identify them as a retweet. In this thesis, there are two factors which influence the degree of confidence that a tweet is a retweet:

Factor #1: Confidence that Tweet A is propagating Tweet B. We assume with strong confidence that Tweet A is a retweet of Tweet B if Tweet A is queried via the Twitter API, and then the API returns some metadata related to the originating tweet: Tweet B. We assume with moderate confidence that Tweet A is a retweet of Tweet B if Tweet A is not a proprietary retweet, but retweet markers such as “RT” or “via” exist within the text of Tweet A. However, this latter assumption is still debatable – is this manually marked, non-proprietary retweet referring to an original tweet that does indeed exist, i.e. does Tweet B really exist?

Factor #2: Confidence that the originating author can be identified. We as- sume with strong confidence that Tweet A is a retweet of Tweet B if Tweet A is looked up via Twitter API and this results in metadata identifying the author of Tweet B. We assume with moderate confidence that Tweet A is a retweet of Tweet B if Tweet A is not a proprietary retweet, but Tweet A contains the username of the perceived author of Tweet B. However, this latter assumption is still debatable – is the manually mentioned username the correct author of the originating tweet?

For any given retweet, the less we have to assume about the validity of these two factors, then the more confidence we have that this tweet is a retweet.

In the following chapters of this thesis, the detection of retweets relies on evidence provided by:

• the metadata relating to a particular tweet, and

There are several levels of difficulty in validating both factors. For Factor #1, if the originating tweet can be automatically detected by the Twitter API, then it is easy to validate this assumption.

However, this becomes harder when no metadata is available, such is the case with non- proprietary retweets. If the non-proprietary retweet contains a mention of another user — presumably the originating author — then this user’s timeline can be processed to see if the originating tweet could be found.

The difficulty becomes hardest when there are no user mentions in the tweet text. Deter- mining the provenance of the originating tweet would then require even more additional assumptions.

All of the above difficulty levels are also applicable when validating Factor #2 i.e deter- mining the originating author.

In the case of copied tweets with no attributions or retweet markers, they may not be considered as retweets because there is no evidence that suggests the existence of an originating tweet (which relates to Factor #1), nor any identifying information of an originating author (which relates to Factor #2). However, there have been several studies which have documented the existence of tweets which propagate across Twitter without using retweet markers nor giving proper attribution to originating authors (Boyd et al.,2010;Wu et al., 2011; Nagarajan et al.,2010). In Table 4.1, propagating tweets which are not explicitly marked as retweets are considered as dark retweets.

Fake retweets may also occur, and in this research work, they can only be detected if an originating tweet or author could not be found. The Orphan Rts group in Table4.1

already encompasses some of these fake retweets, but they can only detect proprietary retweets which have had their originating tweets deleted from the system. This research work does not look at fake retweets which were intentionally created. For example, User A may send a retweet of a particular message that was supposedly made by User B, but this not true because User B had not sent such a message. The detection of these fake retweets would require extra computational algorithms (natural language process- ing, pattern matching) and API requests (users timelines). Therefore, the detection of intentional fake retweets are beyond the scope of this research.

Replies are also considered within the context of dark retweets. A reply by itself denotes a one-way communication between two or more specific users. If the reply is not explicitly propagating another tweet, then the reply is classified as an original reply. However, if the reverse is true, where a reply is seen to be passing on content from a prior tweet, then this reply is considered as an implicit retweet. The advantage of using replies is that a user is able to send a tweet directly to a non-follower. However, replies would not be classified by the official Twitter API as a retweet, therefore making Factor #1 less concrete. In this study, in the context of tweet propagation, a reply would be

categorized as a dark retweet if the tweet text itself does not contain any conventional retweet markers such as “RT” or “via” and so on.

In this thesis, tweets are considered as dark retweets when there is less confidence that a tweet may be a retweet, in comparison to other conventionally-made retweets. This confidence is derived from evidence such as a tweet’s metadata and textual content. Referring to the matrix of propagation types in Table4.1, the dark retweets include six tweet propagation types, namely RtF (dark), RtnF (dark), P@F (dark), P@nF (dark), @F (dark) and @nF (dark). This is because these groups require more assumptions to be made with respect to Factors #1 and/or #2 before they can be identified as retweets.