LIBROS DEL CIELO CEASELESS CEASELESS ABBI GLINES ABBI GLINES —Es el movimiento No has dormido lo suficiente Te estás ajustando Te estoy forzando a salir a algunos

In this section, I discuss the extraction of follower network snapshots and present an overview of the snapshots thus extracted. An anomaly due in part to the data collection strategy, where extraneous events can cause the sudden influx of users not related to the communities of interest in the data, is discussed along with its wider implications for analysis of the data. The primary aim is to prepare data for the application of community detection algorithms. Assuming the communities of interest in the data form detectable communities in the follower network, this could be used as a filter to focus only on those communities, excluding extraneous users.

One avenue for analysis of network dynamics is to make comparisons between snapshots of the network at regular time intervals. This approach renders each snapshot amenable to existing community detection algorithms and other analysis techniques for static networks. Dynamics of inferred properties (e.g.: communities) can then be studied by comparing results between snapshots. In some cases, inferred properties for one snapshot can be used as a starting point or prior for analysis of the next. Where possible, incorporating link creation and destruction times in the analysis algorithm would be preferable, but to my knowledge at this time such methods for dynamic community analysis do not exist.

Only mutual links were included when extracting snapshots for two reasons. First, a mutual link is more likely to represent a social connection than a link that is not reciprocated. Non-reciprocated links often represent a “fan club” or information source. Ignoring them also reduces the prominence of highly popular users, though not entirely as some popular users follow many of their followers. The second reason is simply pragmatic and two-fold: the full network is simply too large for community detection algorithms to handle in reasonable time and many require undirected networks as inputs (in particular, those used in Chapter 6).

An initial naive approach to extracting weekly network snapshots resulted in a roughly linear network size over time. This was unexpected as overall tweet activity had remained relatively constant. Investigation confirmed that there were many links that had not been polled for a long time, some since the beginning of data collection, two and a half years before. Separating “stale” links, older than the empirical estimate of median link lifetime (Table 4.2), from “fresh links”, observed more recently than the empirical median lifetime, produced a more favourable picture where “fresh” network size declines slightly over time after an initial ramp up (Figure 4.10).

Figure 4.10: Network sizes for weekly snapshots showing fresh, recently observed links and stale links last observed more than 96.1 days before.

The small peak around 50 weeks corresponds to a particular event recorded in the data: the American ABC news ran a Twitter “chat” (online discussion) on eating disorders, using one of the search tags (#thinspo) in a tweet to introduce the discussion.

ABC tweet: “#EatingDisorder Twitter chat begins in 15 min w/ @dr- richardbesser + experts. Will cover #anorexia #bulimia #thinspo. Use #abcDRBchat.”

Though the ABC had over 2 million followers at the time they tweeted, this sub- stantial list was not recorded as it was too large for the database to fit into a single user object. Only 16 tweets in the data set contain the tag #abcDRBchat (indicating direct involvement in the chat) and the ABC only tweeted once, however there was a surge of activity on the data collection search tags, and that surge is what can be seen in Figure 4.10. Notice that the width of the peak corresponds to the cut-off time (96.11 days) for a link to be discarded if neither end has authored a tweet containing a search tag. It can be seen that about 300 thousand new follow relations appeared over a relatively short time and that the users that contributed those links promptly stopped using the searched hash tags. Another similar, but weaker, event can be seen around 82 weeks.

This event nicely highlights the need to be cautious when drawing conclusions from data collected in this way. The “fringes” of the data are very noisy and only partially observed. One of the aims of the community detection work in Chapter6

is to filter out such fringe data, retaining core users that represent coherent social groups and their interactions.

Another thing to notice in this event is the drop in the number of “stale” links (those last observed more than 96.1 days before the snapshot). This would seem to indicate long lasting links to/from users who tweet infrequently on the search query tags, and who tweeted in response to the event.This highlights a deficiency in the binary cutoff for temporarily unobserved links, which apparently removed these links prematurely. Possible improvements could be made through a more sophisticated link survival analysis incorporating extra user, tweet and link meta- data which may be able to identify such links and allow them to persist.

The gradual decline in snapshot size (the “fresh” networks in Fitgure 4.10) should not be taken as a decline in the popularity of eating disorder topics in online social media and in the community at large. Equally (if not more) likely is that the venue for eating disorder discussion has moved to another social media portal.

It is interesting that snapshot sizes, and the number of “stale” links for the full (directed) network are almost exactly proportional to corresponding mutual link statistics, albeit with slightly different proportions (Figure 4.11). The mean and standard deviation of per week proportions are given in Table 4.3.

Figure 4.11: Mutual network sizes and scaled full network sizes evolve almost identically.

These proportions relate to “followback” rates — when a user responds to someone following them by following the other user in return. There is an eti-

Mean Ratio SD fresh links 3.48 0.16 stale links 3.32 0.07

Table 4.3: Mean and standard deviation of per week ratios of all vs. mutual links for fresh (recently observed), stale (not observed recently) links. Note that the difference is not significant.

quette surrounding followback, as shown by many tweets complaining of users who do not followback, however the etiquette is not universally adopted. Many users are quite selective of who they follow, often not following back. This is perhaps especially the case for very popular users. The constant ratio is sugges- tive of robust behavioural statistics surrounding followback etiquette, however it would be well to control for confounding variables such as user popularity (number of friends/followers), network statistics such as betweenness, and other socially relevant variables before drawing firm conclusions or spending significant effort orchestrating further controlled experiments.

In document LIBROS DEL CIELO CEASELESS (página 60-62)