• No se han encontrado resultados

Prediction of the User's political trends with Twitter

N/A
N/A
Protected

Academic year: 2023

Share "Prediction of the User's political trends with Twitter"

Copied!
73
0
0

Texto completo

The first one is based on analyzing the users' tweets by discovering the sentiment of the tweets and classifying them according to the political party they support. In this art chapter, we introduce the solutions provided by the literature to predict the results of the political elections through the analysis of the social network data.

Tweet classification

Therefore, in the Big Data Tools section, we find the tools used for managing and processing large data sets efficiently. In the Twitter API section we can find the tools used to extract twitter data, as well as to store data, such as the case of Mongodb.

Big data tools

We have chosen Hadoop for the MapReduce engine because it offers us excellent performance in parallelizing operations and optimizing resources. One of the strategies we present in this project is based on presenting data in a graph and performing operations on it, GraphFrames provides algorithms implemented based on graphs which are optimized for starting execution in a cluster.

Figure 3.2: MapReduce Work flow
Figure 3.2: MapReduce Work flow

Twitter API

In relation to data extraction, it provides the ability to connect to various data sources. In the following Figure 3.6 you can see an example of the different states of a single REST API call.

Figure 3.6: Twitter REST APIs Flow 3.2.2 Twitter streaming APIs
Figure 3.6: Twitter REST APIs Flow 3.2.2 Twitter streaming APIs

Data storage

Sentiment Analysis

This is required because the "surface form" of a named entity may actually refer to some real thing in the world. Reference resolution aims to group together all mentions in the text of a Named Entity.

Visualization tool

The meaning detection is about the identification of meanings in a text at expression level. URL Dispatcher is the bridge between the views and the browser, its responsibility is to call the view according to the associated URL.

Figure 3.9: Django Framework Architecture
Figure 3.9: Django Framework Architecture

Strategies

This chapter presents the development steps of the two strategies presented in this project: the Tweet Analysis Strategy (TAS) described in Section 4.3.2 and the Cluster Analysis Strategy (CAS) described in Section 4.4.2. All development of this project took place in a Hadoop cluster installed in some virtual machines managed by Open Nebula [39].

Definition

An example of a P1 user could be @marianorajoy which is the twitter username of the president of the Partido Popular or @SorayappMember of the congress of deputies and deputy of the Partido Popular.

Tweets Analysis Strategy (TAS)

It allows us to retrieve tweets that match any word in the keyword setK. The final result during sentiment analysis is the difference between positive and negative words/emotional symbols, which can be positive (difference>1), negative (difference<0) or neutral (difference==0). In Figure 4.9 we can see an example of extracting words from three different tweets.

The result of the overall process is a positive and negative collection of words/word pairs. The final result of this process is a tweet along with the political parties associated with it and the sentiment (positive, negative or neutral) of the tweet. In the figure 4.12 we can see an example of the Django application with the global results, in this case showing only a sample of users analyzed.

Figure 4.2: Architecture Tweet Analysis Strategy Next sections describe in detail each of the previous blocks.
Figure 4.2: Architecture Tweet Analysis Strategy Next sections describe in detail each of the previous blocks.

Clusters Analysis Strategy (CAS)

When we have the friendship users FO and the Political UsersO, we join them in the L=O∪FO. Therefore, we perform the iterative process until each node in the network has a label to which the maximum number of neighbors belong. At the end of the iterative process, nodes with the same label are grouped together as communities.

The result is a partition of the political user setOclassified by different groups as O=O1∪O2∪ O3∪Om, but not political groups, at the moment, only groups built by their friendship. To associate the political parties with the partitions of objective users such as O1, O2, .Ox, we need to obtain the friendship with the Politician user for each political party. In these steps, we will get the highest number of elements of the intersection of each FOi for each FPi.

Figure 4.13: Example of a Twitter Affinity Digraph In this digraph the friendship and followers list are:
Figure 4.13: Example of a Twitter Affinity Digraph In this digraph the friendship and followers list are:

Users Analyzed (TEN)

This chapter describes how we analyze the set of political users used to compare with the prediction results. Similarly, the results of the TAS, CAS strategy and sentiment analysis algorithm are shown and discussed. An example of a positive classification of one user with the political party 'podemos'.

With the same user, we also classify negatively to the "PP" political party, since the retweet5.2 shows an image that speaks of the corruption of the political party PP.

Tweets Analysis Strategy (TAS)

The algorithm's political score matches some of the positives identified by the user. The algorithm's political result matches some neutral parties identified by the user. The algorithm's political score matches some of the downsides identified by the user.

Users are classified according to the number of tweets sent by the user. Thus, the first row shows the results of users who have released a range between 0 and 10 tweets/user, in the second row users with a range of 0 to 25 tweets/user, the third row shows results with an interval between 0. to 50 tweets/user and the last row shows results related to users with a range of 0 to 100 tweets/user. Additionally, Table 5.2 shows the same information about the secondary trend calculated by the TAS strategy called TAS2.

Table 5.1 shows the asserts, errors and neutral results for the main tendency given by the TAS strategy, which is named TAS1
Table 5.1 shows the asserts, errors and neutral results for the main tendency given by the TAS strategy, which is named TAS1

Basic sentiment analysis results

In this case, we can see that the percentage of receivables increases to 69%. We can consider that these results are not bad considering the simplicity of the algorithm used in this project. We want to emphasize that if we increase the number of positive and negative words, the number of neutral tweets decreases.

Clustering results (CAS)

The algorithm's political result matches some neutral parties identified by the user, which we call a "partial success." The political result of the algorithm matches some of the neutral parties identified by the user, but in the analysis we found one or more positive tendencies in some party, which we called the "partial success error". The political result of the algorithm matches some of the downsides identified by the user, which we called “Flaw”.

If we can consider the addition of Success, Partial Success and Partial Failure Success, then the total Assertion of the.

Political Tendency Results: TAS vs CAS Strategies

In this subsection, we compare the political trend obtained with the TAS strategy against the results of the analyzed users (TEN). We identify that the TAS strategy is related to the political events, so it is very sensitive to the occurrence of the daily news that we extract from the users' Tweets. How we can see in the TAS strategy the major political trend of the users is based on PP and PSOE, but due to the sentiment analysis gives us a lot of neutral values ​​related to a tweets.

The political trend shown in Figure 5.9 shows the results across the 104 users of the TEN sample with the claim of 82.5%. Considering the trend of the analyzed users (TEN) shown in Figure 5.10, we can also observe a good agreement with the political parties, shown in Table 5.11. The reason is because the ratios of the users are not a value that changes frequently unlike the users' Tweets.

Figure 5.6: Political Tendency Users Analyzed (TEN) 5.5.1 TAS Strategy Results
Figure 5.6: Political Tendency Users Analyzed (TEN) 5.5.1 TAS Strategy Results

Discussion

With these results, we can conclude that the CAS strategy provides us with an excellent way to predict the political orientation of the users, but we can only obtain a static orientation. Users launch different opinions with her tweets, but the relationships between users do not change often.

Conclusions

Future Work

Investigate whether the assertion of the CAS algorithm would improve by adding the politicians to the user grouping process. Shows in a view a graph of the politicians and the more common related Hashtags and their sentiment. Show in a view analyzes of the CAS and TAS strategy work process such as: Tweets analyzed, Users analyzed, last news users analyzed, etc.

Proceedings of the Twelfth Annual ACM International Conference on Information and Knowledge Management (CIKM), (November. InProceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, GRADES '16, pages 2:1–2:8, New York, NY, USA, 2016. Show the results of the users analyzed by the CAS strategy and we can see the result of the CAS strategy compared to the result of the users analyzed by the TAS.

Users Analyzed (TEN)

The purpose of this section is; Present the users analyzed TEN, which shows the manually analyzed users. Present an example of the Twitter Streamin API output, showing 3 tweet examples in JSON format.

Table 7.1: Users Analyzed TEN
Table 7.1: Users Analyzed TEN

Sentiment Analysis

Nuestra prioridad será proteger y promover a los mexicanos donde…https://t.co/Yr1gJpQhRd La linterna del PSOE sigue girando a la extrema izquierda. Ya estamos preparados en Cibeles, desde Getafe...porque no podemos permitir interrupciones en ninguna casa #NoMasCortesDeLuz https://t.co/FGXvvSNFVG. Un expresidente del PP confirma ante el juez que el PP le pidió beneficiar a las empresas donantes https://t.co/NA7BpK8Z0Q vía @El_Plural El ministro popular de Asuntos Exteriores avergüenza al Congreso: los españoles emigran "por la mente abierta" https :/ /t.co/uEwvQhITDU Bolsacarlosmari @CiudadanaMartaR Confié en @CiudadanosCs @GirautaOficial pero parece que hay pocos grupos interesados.

El PSOE evita recibir a las víctimas del franquismo tras preguntar al Gobierno sobre la ley de memoria https://t.co/sMwysExI4x por @ATorrus Esperemos que @gabrielrufian y @aKollontai algún día se reúnan también con las familias de los asesinados por #ETA. p.Ya era un crack en Antena 3 https://t.co/jqr5Fb0psn La última decisión de @bdnpacac pone de relieve el verdadero espíritu del gobierno de la CUP - Podemos en #Badalona. GirautaOficial sobre la reforma de la ley del TC: DECISIÓN JURÍDICA del "elegido ilustrado", no DEMOCRACIA https://t.co/2PfYt9Gw6d Militantes tuiteros y simpatizantes de PP y C'S amenazan, humillan e insultan a la víctima del atentado en Berlín .

CAS strategy users results

Twitter Streaming API output example

Keywords

Collection Keywords

Figure

Figure 3.1: Hadoop node architecture
Figure 3.2: MapReduce Work flow
Figure 3.3: Spark Stack GraphX
Figure 3.5: Storm Topology
+7

Referencias

Documento similar

Además, el neumático usado en estas ruedas es más ancho (para que tenga una buena fluctuación) impidiendo que se invierta el lado de montaje de las ruedas. La trocha de las