1. Consideraciones generales y contexto de la expedición de una nueva normativa para las
1.4 El consenso de Washington y sus efectos en América Latina
Identifying Politically Active Users
Using a corpus of tweets2 provided by Sysomos, we classify tweets from 2016 as political or not political based on the words within each tweet. If a tweet contains at least one of the words “Obama”, “Clinton”,
“Trump”, “Ryan”, “McConnell”, “potus”, “teaparty”, “democrat”, “republican”, “trade”, “taxes”, “senate”, or “president”, the tweet was classified as political. We created this list of political words by hand based on viewing the content of many tweets in our corpus. We then took a random sample of size 15,000 of the users whose tweet was classified as political and retrieved their 2016 tweet history through the Twitter API. We then identified ‘politically active users’ as follows. For each of these 15,000 users, we checked if the user produced at least 20 original tweets (non-retweets) in 2016, 10 of which contain at least one of the political words listed earlier. If so, we consider that user a politically active user. Note that under this definition of a political tweets, we surely have not identified all political tweets, but the tweets we identify as political are very likely to be political. We obtain 4189 politically active users using this definition.
Identifying Political Beliefs
Since our end goal is to find a political signal in the tweets belonging to our set of politically active users, we would ideally like to know each user’s political party affiliation. We begin this process by creating a training set of users with known political affiliation from which we train a classifier. We identify a list of politi- cal words commonly found in users’ self-provided profile description on Twitter (“conservative”, “Trump”, “MAGA”, “NRA”, “Constitution”, “Republican”, “Libertarian”, “Democrat”, “liberal”, “Hillary”, “Clin- ton”, “Obama”, “progress*”, “Bern*”, “resist*”, “president”). If a user’s self-provided provided profile description contained one of these words, we hand-classify the user as belonging to one of the two major political parties in the US: Democratic or Republican. These users were explicitly clear in their profile description about their political beliefs or about which candidate they did or did not support in the 2016 presidential election. We classify self-described libertarians as Republicans, and classify self-described so- cialists as Democrats. We classify Never-Trump Republicans as Republicans, and classify Never-Hillary Democrats as Democrats. This creates our training set of 170 Democrats and 393 Republicans.
The classifier for predicting political party is built using the list of Twitter accounts that the users with known political party follow. As predictor variables we use Twitter accounts that are followed by at least 30 of the users with known political party. There are 3040 such accounts, meaning we have 3040 binary variables (following or not) that are used to predict political party. A random forest is used as a classifier. Table 2.8 gives the classification error rates for the random forest. Out of the 170 users hand classified as Democrats, 160 were correctly identified as Democrats, and 388 out of the 393 users hand classified as Republicans were correctly identified as being Republicans. Overall, only 2.66% of users with known po- litical party were incorrectly sorted by the random forest. Investigating the known Republican users who were misclassified revealed they were either self-described libertarians or outspoken anti-Trump Republicans. This is because there were relatively few of these users and they tended to follow both liberal and conser- vative accounts. Figure 2.12 gives the variable importance of the Twitter accounts used to classify. The most important accounts for classification are either politicians (e.g. BarackObama, realDonaldTrump, Sen- Warren, HillaryClinton, newtgingrich), political commentators (e.g. seanhannity, IngrahamAngle, maddow,
Predicted
Democrat Republican classification error
Actual Democrat 160 10 0.090
Republican 5 388 0.0085
Table 2.8: Random forest confusion matrix. Actual party affiliation corresponding to the hand classification; predicted party affiliation corresponding to the random forest out-of-bag prediction.
Figure 2.12: Variable importance of following accounts used in classifying users as Democrat or Republican.
TheDailyShow), or family members of politicians (e.g. DonaldJTrumpJr, EricTrump, MichelleObama). The random forest appears to find a true political signal within the 3040 accounts used to classify.
The trained random forest is used to predict political party for the remaining users with unknown political party. Since the users in the training set make their political opinions explicitly known in their self-provided description, they may have stronger political opinions or their political opinions may be more closely tied with their personal identity than the users with unknown political party. Therefore, it is possible that the users with known political affiliation are fundamentally different than the users with unknown political affiliation. When using the random forest to predict political affiliation of the remaining users, we want to both be fairly confident that users predicted to be in a certain political party are actually members of that party and have enough users in each party to detect a potentially small political signal. We choose an 80% cutoff rate to accomplish both goals. That is, a user is considered to be a Democrat if at least 80% of the trees predict the user the be a Democrat; similarly for Republican. This gives 489 total Democrats and 996 total Republicans that we use going forward.
Bots
The set of politically active users was created in mid-2017. Twitter has since deleted many bot accounts that had the goal of influencing other users’ political opinions. We want to ensure that we have not gathered multiple bot accounts in our set of politically active users. We want the opinions of real people.
Out of the 1485 politically active users identified in mid-2017, 99 accounts were unable to be scraped again in May 2018. These are split fairly evenly across Democrats and Republicans: 7% of Republicans’ and 5% of Democrats’ tweets were not able to be gathered using the Twitter API in May 2018. However, this does not mean the account was a bot; users can choose to delete their account at any time, can make their account private, or have their account suspended by Twitter, all of which would result in the account being inaccessible using the Twitter API.
NBC published a list of 453 bot users and tweets from those bots (Popken 2018). None of these known bots were included in our list of Democrats and Republicans.
Metrics
The politically active users identified above do not tweet exclusively about politics. Some tweets are about their personal life and other interests (sports, entertainment, etc.). We found it difficult to hand classify these users’ tweets as political or not political, much less create an algorithm to do so, since we do not know the intention of the user or the context in which the tweet was sent. Additionally, when a user retweets, we do not know if they are retweeting because they agree with the sentiment of the original tweet or are making fun of the original tweet/retweeting sarcastically. It has been found that users either retweet users who share very similar or very antagonistic views (Guerra et al. 2017). Thus, only original tweets are considered, and retweets ignored.
We consider two metrics to demonstrate that a political signal exists in tweets from 2016: frequency of tweets and sentiment of tweets. Frequency indicates whether or not our set of useres tweet about political events, and sentiment tells us their reaction to those events. For frequency we will look at the number of original tweets sent per user per day. For sentiment we continue to use Vader to calculate sentiment of original tweets (Hutto and Gilbert 2014).