Capítulo 5. Conclusiones y recomendaciones
N. Anexo: Sugerencias de los evaluadores
Due to its heterogeneity and increasing complexity, the Internet as a whole system is considered immeasurable (Murray and Claffy, 2001). Nevertheless, measuring some key structural properties of the Internet may be done by following active or passive measurement approaches. The most common methods to date are passive measurements usually built into routers or switches that track Internet traffic as it is routed through them. On the contrary, active measurements inject test data packets into the networks to ‘sniff out’ responding information from devices such as routers (SLAC, 2001).
Most of both active and passive measurement techniques utilise crowdsourcing approaches. Single measurements for specific Internet problems such as structural bottlenecks analysis rather than complete Internet mapping efforts is considered especially valuable (Murray and Claffy, 2001). However, we feel that their value lies not only in the identification but also in the rerouting of traffic around the discovered bottlenecks.
The Internet community and applications saw great examples of crowdsourcing efforts such as the collection of crisis information by Ushahidi (2016) or the 2001 launch of Wikipedia (Wikimedia Foundation, 2016). The term ‘crowdsourcing’ was first coined by Jeff Howe in a Wired (2006) magazine article and refers to a
‘participative online activity in which an …institution, …or company proposes to a group of individuals of varying knowledge, heterogeneity, and number, …the voluntary undertaking of a task.’
Comparing 175 research articles, Estellés-Arolas and Gonzáles-Ladrón-de-Guevara (2012, p.11) generate a conclusive crowdsourcing definition, which mostly agrees with the one proposed by Jeff Howe. Numerous efforts aim to measure and characterize the structural properties of the Internet in using crowdsourcing measurements. A crucial work can be found in the research of the DIMES project, studying the Internet structure with the help of a voluntary community (DIMES, 2012). The software agents that DIMES
distribute, which autonomously run software programs installed on privately owned machines, measure connection traceroutes and their ping (see Glossary) times for diagnosis purposes. Traceroutes refer to a network diagnostics and measurement tool, using the Paris Traceroute (2016) version, to display and measure the paths of data packets across Internet Protocol networks. The last activities of DIMES (2012) seem to date back to the year 2012. Nevertheless, through their crowdsourcing of agents, DIMES demonstrate the ability to discover hidden parts of the Internet structure, as Shavitt and Weinsberg (2011) valuably show. Nevertheless, the DIMES project did not link their data findings to end-user affordability of access from the Internet periphery. Another research project that is making use of crowdsourced agents is the 2007 launched CAIDA Archipelago, or CAIDA-Ark (2016), building upon their previously 2008 retired Macroscopic Topology Project ‘Skitter’ (see CAIDA, 2016c)) and the DIMES project, where networks can participate by hosting so-called ‘Ark monitors’ to collaborate specifically towards active network measurements of the Internet structure. While the CAIDA-Ark (2016) aims to focus on Internet Topology Discovery and Congestion, key economics questions and incentives are also left out. The still-active RIPE NCC Atlas project follows a slightly different crowdsourcing approach than DIMES since it provides a testing infrastructure for community members, mostly network providers like in the CAIDA Archipelago, being interested in performing Internet connectivity and reachability measurements through a hardware probe (RIPE NCC, 2016). Therefore, the RIPE NCC Atlas follows the CAIDA-Ark (2016) best practices. Just like DIMES, the Atlas also makes use of traceroutes, amongst other technologies. The RIPE NCC Atlas represents a successful crowdsourcing project with a large number of 9,334 connected probing devices as of October 2016 (RIPE NCC, 2016b). Here, again, RIPE focusses on real-time Internet usage measurements rather than on the key economic dimensions of the Internet structure.
When it comes to Internet crowdsourcing projects that involve end-users rather than network providers, there are currently four notable projects in the research environment, Netalyzr, Netradar, OpenSignal and Portolan. First, the UC Berkeley originating ICSI Netalyzr is primarily a debugging tool for testing network connection issues on Google Android devices (ICSI, 2016). The Berkeley International Computer Science Institute (ICSI) uses the Netalyzr tool for network diagnostics, measuring the health of the Internet’s edges, rather than the structural properties of it. Both Netradar (2016) and OpenSignal (2016) are providing Android and iOS applications for measuring the signal
coverage and performance of mobile broadband operator networks that can be embedded in Google Maps. While Netradar (2016) comes from Aalto University’s School of Engineering, OpenSignal (2016) is provided by a London-based venture-backed company. Lastly, the Portolan represents a joint research effort of the Istituto di Informatica e Telematica of the Italian National Research Council CNR (IIT) and the University of Pisa (Portolan, 2015). Portolan is an active Internet measurement project and tool that aims to discover the structure of the Internet as well as signal coverage maps similar to Netradar and OpenSignal. The Portolan Project also relies on crowdsourcing of data collection through an application for Android end-user devices. The Portolan Project is a unique approach to measure the Internet structure from an active Internet periphery perspective, as indicated by Faggiani et al. (2012; 2013; 2014a; 2014b) and Gregori et al. (2013). This allows the study of mobile broadband operators’ Autonomous Systems from a unique end-user perspective, as shown in a pilot case study on a Bhutanese incumbent mobile broadband operator by Giovannetti and Sigloch (2015). Some researchers from the computer science field (e.g. Vázequz, Pastor-Satorras and Vespignani, (2002)) argue that traceroutes analysis at Internet Protocol level from one location in the network are unreliable when constructing complete Internet mapping projects, due to cross-links and other technical issues. Knight et al. (2011) mention that
traceroutes are commonly used but also point towards the deficiencies of such
measurements, mainly supporting the work of Willinger, Alderson and Doyle (2009). However, Knight et al. (2011) also point towards the possibility of analysing networks at Autonomous System granularity. Other researchers such as Feldman and Shavitt (2008), Siganos et al. (2003), Alvarez-Hamelin et al. (2008) and Giovannetti and Sigloch (2015) transform IP addresses to the AS granularity in order to reveal a greater understanding of the upstream Internet market structure. Such transformations, however, are highly dependent on the reliability of secondary fusion datasets, a major downturn. In her early work, Gao (2001) considers the Autonomous System level as especially valuable for analysing commercial contract relationships amongst Internet Service Providers. Dimitropoulos et al. (2013) agree on this and consider the AS granularity especially valuable when merged with a secondary CAIDA (2016b) AS-relationship dataset. Unfortunately, we believe that those commercial contract relationships are not entirely discoverable in practice, given more informal business relationships between Autonomous Systems.
Insight 5: Due to its applicability to measure the upstream Internet market structure, from an Internet periphery perspective, the Portolan (2015) application seems more appropriate compared to the alternatives Netalyzr, Netradar and OpenSignal that focus on different infrastructural specificities. By additionally using, filtering and integrating secondary data, obtained from Maxmind (2015), we are able to analyse IP and AS granularities to understand the upstream Internet market structure in our case studies. In doing so, we follow best practices of the Computer Science researchers such as Alvarez-Hamelin et al. (2008). However, in additional to these researchers’ contributions, we also add a critically relevant end-user perspective, focussing on the conditions of accessing the Internet infrastructure from these networks’ periphery.