• No se han encontrado resultados

MANUAL DE APLICACIONES CONTABLES DEL PATRIMONIO

CAPITULO III CATÁLOGO DE CUENTAS

MANUAL DE APLICACIONES CONTABLES DEL PATRIMONIO

This section describes a range of studies that use Twitter to look for expression of the views of the public. Most studies focus on a single topic area of public discussion such as politics, earthquake detection or influenza detection. There is a growing number of studies which use Twitter as a source of information about the views of the public. Twitter can “provide access to thoughts, intentions and activities of millions of users in real-time” (Phelan, McCarthy, & Smyth, 2009, p. 385).

An important aspect of Twitter is that anyone with access to it can express their views, and although some users have more influence than others, “a trend can be initiated by anyone, and if the environment is right, it will spread” (Cha, Haddadi, Benevenuto, & Gummadi, 2010, p. 11).

One of the attributes of the Twitter environment is competition for attention which Romero, Galuba, Asur, and Huberman (2010) describe as; “ideas, opinions, and prod- ucts compete with all other content for the scarce attention of the user community.” (Romero et al., 2010, p. 1) They go on to say “in spite of the seemingly chaotic fashion with which all these interactions take place, certain topics manage to get an inordinate amount of attention, thus bubbling to the top in terms of popularity and contributing to new trends and to the public agenda of the community” (p. 1).

Yardi and Boyd (2010a) suggest that one factor by which topics are spread is that people enjoy spreading news, especially if it is new and interesting. “Indeed, the large spike and subsequent decays in tweets following immediately after any event breaks out on Twitter suggests that people enjoy spreading news that are novel and popular” (Yardi & Boyd, 2010a, p. 325). This supports the earlier study by D. Zhao and Rosson (2009) which showed that people use Twitter because it provides an informal way to keep up with friends and colleagues: “Real-time information posted through micro- blogging is considered a quick and interesting source of news. It can also provide

valuable context information that may prompt catching-up conversations with distant friends and colleagues” (D. Zhao & Rosson, 2009, p. 247).

Because Twitter provides access to the individual public messages (tweets)4 that make up a conversation, it can be used not just to see what topics are popular, but to see what individuals are saying:

This type of usage has tremendous potential for social and behavioral sci- ences researchers, in terms of seeing both what topics are reverberating with the public as well as what Twitter users are actually saying about the topic. Twitter gives the ability to track both the subject and content of conversations (Ovadia, 2009, p. 203).

Ovadia (2009) found that Twitter can act as a filter to focus discourse about a conference saying “Whereas most conference Websites will display some representation of what was discussed without much filtering or analysis, searching tweets will connect users to content deemed interesting enough to tweet about” (Ovadia, 2009, p. 204). This was confirmed by a study of conference tweets by Letierce et al. (2010) looking at “how researchers use it for spreading information” (Letierce et al., 2010, p. 3) which “showed that studying streams of scientific conferences provide means to figure out trend topics of the event” (p. 8) and that “people use Twitter as a background commu- nication channel during conferences, focussing mainly on other people attending the conference” (p. 6). Although they found main focus is on other people at the confer- ence, they also “believe that Twitter has this potential to help the erosion of boundaries between researchers and a broader audience” (Letierce et al., 2010, p. 1).

Boyd et al. (2010)5 looked at how people use retweets as a form of conversation on Twitter. They describe retweeting as “the Twitter-equivalent of email forwarding where

4it is not possible to access any private tweets (private messages) that might also be part of the

conversation

5Note: danah michele boyd asks that people use lower case for her name -http://www.danah.org/

users post messages originally posted by others” (Boyd et al., 2010, p. 1). They stress the importance of retweeting in helping to provide a sense of community on Twitter:

While retweeting can simply be seen as the act of copying and rebroad- casting, the practice contributes to a conversational ecology in which con- versations are composed of a public interplay of voices that give rise to an emotional sense of shared conversational context. (Boyd et al., 2010, p. 1)

Retweeting allows people to “have a sense of being surrounded by a conversation, despite perhaps not being an active contributor” (Boyd et al., 2010, p. 1).

“as more scholars begin examining Twitter, it is important to have a grounded understanding of the core practices.” (Boyd et al., 2010, p. 1).

Boyd et al. (2010) collected three datasets. The first dataset was a “random sample of 720,000 tweets captured at 5-minute intervals from the public timeline over the period 1/26/09-6/13/09” (Boyd et al., 2010, p. 3). They used this to identify the proportion of different types of tweets in the dataset and found that:

• 36% of tweets mention a user in the form ‘@user’; 86% of tweets with @user begin with @user and are presumably a directed @reply

• 5% of tweets contain a hashtag (#) with 41% of these also containing a URL

• 22% of tweets include a URL (‘http:’)

• 3% of tweets are likely to be retweets in that they contain ‘RT’, ‘retweet’ and/or ‘via’ (88% include ‘RT’, 11% include ‘via’ and 5% include ‘retweet’)

Because the proportion of retweets in the random sample was quite low, Boyd et al. (2010) created a second data set, a “random sample of 203,371 retweets captured from the Twitter public timeline using the search API over the period 4/20/09-6/13/09.” (p. 4) collected using “explicit queries for retweets of the form ‘RT’ and ‘via”’ (p. 4). They used this to identify the proportion of retweets with particular features and found:

• 18% of retweets contain a hashtag

• 52% of retweets contain a URL

• 11% of retweets contain an encapsulated retweet (RT @user1 RT @user2 ...message..)

• 9% of retweets contain an @reply that refers to the person retweeting the post

(Boyd et al., 2010, p. 4)

Based on these results, they concluded that “compared to the random sample of tweets, hashtag usage and linking are overrepresented in retweets” (Boyd et al., 2010, p. 4). The 9% of retweets which contain a reference to the person retweeting the post indicate that person “A retweets B when B’s message refers to A. We call these ‘ego retweets’ ” (Boyd et al., 2010, p. 4).

Boyd et al. (2010) studied a third data set containing “qualitative comments on Twitter practices stemming from responses we received to a series of questions on @zephoria’s public Twitter account, which has over 12,000 followers” (p. 4) asking:

• “What do you think are the different reasons for why people RT some- thing?” [99 responses]

• “If, when RTing, you alter a tweet to fit under 140 chars, how do you decide what to alter from the original tweet?” [96 responses]

• “What kinds of content are you most likely to retweet? (Why?)” [73 responses]”

(Boyd et al., 2010, p. 4)

While they acknowlege that “the responses we received from this convenience sam- ple are not representative of all Twitter users nor do they reflect all possible answers” (Boyd et al., 2010, p. 4) they were able to obtain some interesting results. Some of the motivations for retweeting found in responses to danah boyd’s (@zephoria) questions include: (all from (Boyd et al., 2010, p. 6))

• “To amplify or spread tweets to new audiences”

• “entertain or inform”

• “to comment”

• “to make one’s presence as a listener visible”

• “to publicly agree”

• “validate others’ thoughts”

• “an act of friendship, loyalty, or homage”

• “recognize or refer to less popular people or less visible content”

• “self-gain”

• “save tweets for future personal access”

Boyd et al. (2010) found that “not all retweets are an accurate portrayal of the original message” (p. 9) and that “conversations on Twitter can sometimes take the form of a glorified game of “Broken Telephone” as individuals whisper what they re- member to their neighbor and the message is corrupted as it spreads” (p. 10). However

most messages are not corrupted and “retweets can knit together tweets and provide a valuable conversational infrastructure” (p. 7). On Twitter, “rather than participating in an ordered exchange of interactions, people instead loosely inhabit a multiplicity of conversational contexts at once” (Boyd et al., 2010, p. 10).

Letierce et al. (2010) found that the proportion of tweets that were repeated (retweets) compared to original tweets from each of three Semantic Web scientific conferences they studied was between 15% to 20% and note that this is much higher than the 3% found by Boyd et al. (2010) (Letierce et al., 2010, p. 3).

Having considered these general studies of how people communicate on Twitter I now move on to research that looks at specific topic areas, the first of which is politics.

Politics

Politics has been an active area of study of expression of the views of the public on Twitter, with researchers looking at predicting election outcomes, using Twitter as an alternative to polling, using public mood states on Twitter to predict stock market movements and investigating how public converations about a contiversial issue plays out on Twitter.

In a study of the 2009 German federal election Tumasjan et al. (2010) found that even a simple analysis of the number of tweets mentioning a political party can be almost as accurate in predicting election outcomes as traditional election polling, this may be affected by the system of voting and so may not apply in countries with a different system than Germany. Analysis of the joint mentions of political parties in individual tweets was able to describe the complex relationships between the different parties in Germany; “joint mentions of two parties are in line with real world political ties and coalitions” (Tumasjan et al., 2010, p. 178). Tumasjan et al. (2010) were surprised that their sample of the German electorate through Twitter still predicted the

election outcome “despite the fact that the Twittersphere is no representative sample of the German electorate, the activity prior to the election seems to validly reflect the election outcome” (Tumasjan et al., 2010, p. 183).

Tumasjan et al. (2010) found that in Germany “Twitter is indeed used extensively for political deliberation” (p. 178) and this was also reported for Australia by Grant, Moon, and Busby Grant (2010) who found that “Twitter is becoming, ever more, the political space in Australia in which ideas, issues and policies are first announced, discussed, debated and framed” (Grant et al., 2010, p. 599).

Another political study was conducted by O’Connor, Balasubramanyan, et al. (2010) who looked at whether two years of Twitter data could be mined for partic- ular topics (consumer confidence and political opinion) and the sentiment within these topics used to replicate traditional telephone based polling. They suggested that “min- ing public opinion from freely available text content could be a faster and less expensive alternative to traditional polls” (p. 122) . They go on to suggest that an important advantage of Twitter analysis over polling is that the topics that can be considered are much broader than what can be covered by a traditional poll: “Such analysis would also permit us to consider a greater variety of polling questions, limited only by the scope of topics and opinions people broadcast” (p. 122).

An investigation of whether homophily was present on Twitter was conducted by Yardi and Boyd (2010a) by examining the conversation in Twitter in the first 24 hours after a polarising news event, in this case, the shooting in the USA of a doctor at an abortion clinic. They defined homophily as “the principle that interactions between similar people occur more often than among dissimilar people” (p. 318). They captured approximately 30,000 tweets about the shooting by using the Twitter Search API with search terms like “#tiller, pro-life, pro-choice, abortion, and George Tiller” over the 60 days following the event (p. 319). For this study they “focus on the first 24 hours because traffic is heaviest at this point and later use is subject to anomalies among

heavy users and outliers” (p. 319). This resulted in a dataset of 11,017 Tweets from 6,803 Twitter accounts. They found that within this dataset there were 1,447 reply pairs where one Twitter account had replied to another. They manually coded each of these Twitter accounts as being ‘strong pro-life’, ‘pro-life’, ‘moderate/can’t tell’, ‘pro- choice’, or ‘strong pro-choice’ and looked at the number of reply pairs between each of these categories. They also used the LIWC text analysis tool6 to look at changes in emotions expressed in the Tweets over the 24 hours. They found that people with pro-abortion and anti-abortion views did interact through Twitter and that “The kinds of interactions we observed suggest that Twitter is exposing people to multiple diverse points of view but that the medium is insufficient for reasoned discourse and debate, instead privileging haste and emotion” (p. 325).

Earthquake detection

Another topic that researchers have studied on Twitter is earthquake detection. In this topic area researchers have been more explicit about treating tweets as sensor data rather than as part of a discourse. Real world events can be detected by using location and timing of tweets. Each user is considered to be a sensor, and their tweets are reports from that sensor (Earle, 2010; Sakaki, Okazaki, & Matsuo, 2010).

Earle (2010) used the timing and geographic location of Twitter messages about earthquakes to supplement instrument based estimates of earthquake location and mag- nitude. He found that tweets about earthquakes were available very quickly “generally within 20 seconds of widely felt events in tech-savvy regions” (Earle, 2010, p. 221). They compared this to the official U.S. Geological Survey (USGS) data; “tweets are often available before the 2 to 20 minutes it takes the USGS to publically distribute instrumentally derived estimates of location and magnitude” (Earle, 2010, p. 221).

A similar study by Sakaki et al. (2010) in Japan also found tweets to be faster than

the official Japan Meteorological Agency (JMA) announcements, although by a much smaller margin.

Our system sent e-mails mostly within a minute, sometimes within 20s. The delivery time is far faster than the rapid broadcast of announcements of JMA, which are widely broadcast on TV; on average, a JMA announcement is broadcast 6 min after an earthquake occurs. (Sakaki et al., 2010, p. 858)

An interesting aspect of the study by Sakaki et al. (2010) is that they applied statistical filtering techniques that are used to account for inaccuracies in distributed physical sensors, to the tweets in their study. They considered each Twitter user to be a different sensor, and the tweets they sent were sensor observations; “Each Twitter user is regarded as a sensor. A sensor detects a target event and makes a report probabilistically” (p. 853). Semantic analysis was used to classify tweets as positive or negative observations. They then tested the accuracy of results produced using a variety of algorithms for filtering distributed physical sensors such as “Kalman filters, multihypothesis tracking, grid-based, and topological approaches, and particle filters” (p. 859). They found that “Particle filters perform well compared to other methods” (p. 859). Sakaki et al. (2010) went on to use the same techniques to detect rainbow locations.

Health information

Scanfeld, Scanfeld, and Larson (2010) studied the discussion of health information on Twitter. They used the Twitter search API to get a sample of tweets mentioning “antibiotic” and “antibiotics” and used text analysis to categorise these. One of their categories was “abuse/misuse” which looked at the proportion of tweets that were conveying misinformation. They propose that automated responses to key phrases might be one way to address misinformation;

To disseminate information to those exhibiting confusion or sharing mis- information, online services are available to monitor and auto-respond to trigger word combinations, such as ‘flu + antibiotics.’ (Scanfeld et al., 2010, p. 187)

Scanfeld et al. (2010) conclude that “this study confirmed that Twitter is a space for the informal sharing of health information and advice” (p. 186).

As well as the discussion of health related information on Twitter another health topic that has been investigated is the use of Twitter to detect the spread of diseases.

Influenza detection using Twitter

Culotta (2010) conducted a study of the detection influenza outbreaks by analysing Twitter messages. He used eight months of data from September 2009 to May 2010 containing over 570 million Twitter messages. He determined which keywords gave the best correlation with national health statistics (US Out-patient Influenza-like Illness Surveillance Network (ILINet)). Culotta (2010) found that Twitter based detection was able to give almost real-time results at relatively low costs, while ILINet is expensive and reporting is delayed by one to two weeks.

His study provides strong support for the usefulness of simple keyword matching as a technique for assessing public interest in a topic on Twitter by showing that the level of discussion of a flu on Twitter does correlate well with the incidence of flu in the community.

Culotta (2010) was surprised by the level of correlation obtained using a single key- word. He found that just by using “flu” they obtained an 84% held-out correlation and the addition of a few other flu-related terms (cough, headache, sore throat) increased this to 95%. Culotta (2010) warned that a problem with their methodology is that it is

very likely to detect false correlations, “the phrase “flu shot” has a correlation greater than 90%, but certainly this is not a good term to monitor, as it may spike in frequency without a corresponding spike in influenza rates” (p. 2). He partially addresses this problem by adding a document classifier which can “reduce error rates by over half in simulated false alarm experiments” (p. 1), although its effectiveness decreased in very high noise simulations.

Lampos, De Bie, and Cristianini (2010) developed a web based automated Flu Detector for the UK. They used geolocated tweets from Twitter search across 49 urban centres in the UK (using the ability to search within a 10km radius) to develop a regression model against the official Influenza-like Illness (ILI) rates from the Health Protection Agency (HPA).

The research discussed in this section has shown that Twitter can be used to inves- tigate the expression of the views of the public over a range of topics including politics and health information. In contrast, the approach of treating tweets on certain topics as sensor data rather than discourse was shown to be useful in detecting events like earthquakes and influenza outbreaks. The next section describes one technique that can be used to extract the views of the public from Twitter data, topic detection.