1. INTRODUCCIÓN
1.4. T RATAMIENTO DEL CÁNCER DE MAMA
1.4.2. Tratamiento sistémico
1.4.2.4. Tratamiento hormonal
Understanding the limitation of route monitor deployment is critical for any sys-
tem relying on BGP data from multiple vantage points. This understanding also
enables us to better interpret the findings of previous research in terms of their gen-
erality and representativeness. Note that it is impossible to obtain routing data in
real time from every network due to the scalability issue and privacy concern. More-
over, a single BGP feed from one AS also presents a restricted view given there are
many routers in an AS, each with a potentially different view of routing dynamics.
However, for the purpose of detecting routing anomalies, traffic engineering, topology
discovery and other applications, it is useful to have additional feeds. But adding
an additional feed usually requires interacting with a particular ISP to set up the
locations to maximize the overall effectiveness of the route monitoring system.
In this chapter we illustrate the importance of route monitor selection on various
applications relying on BGP data. We study BGP data’s impact on three categories
of applications, namely, (1) discovery of relatively stable Internet properties such as
the AS topology and prefix to origin AS mappings, (2) discovery of dynamic routing
behavior such as IP prefix hijack attacks and routing instability, and (3) inference
of important network properties such as AS relationships and AS-level paths. For
each cateogy, we study various monitor deployment strategies by choosing ASes with
diverse topological properties.
We summarize our key results in the following. For the first class, more vantage
points generally improve completeness and accuracy of the topological properties
studied. we find that larger set of monitors can observe much more links but only
slightly more non-private ASes. The additional ASes identified are mostly at the edge.
Using Gao’s degree-based relationship inference algorithm, we compare the accuracy
of inferred paths comparing with paths in BGP data in terms of path length. We
found the improvement is small for path prediction with increasing vantage points.
These results imply that Gao’s algorithm is reasonably stable with changes in the
BGP data. For routing instability detection, we found a huge difference between
different schemes, indicating that vantage points associated with core networks are
more likely to observe network instabilities. For attack evasion, we show that it is
important to take into consideration possibility of evasion due to visibility constraints
for detecting routing attacks.
It motivates future work in the area of building monitoring and diagnosis systems
without ISP proprietary purely from end hosts. Revisiting the BGP based monitor-
ing described in Chapter II, all those studies relying on BGP routing data usually
assume that data from the route monitoring systems is reasonably representative of
the global Internet. Our work studied the limitations of route monitoring systems
and the visibility constraint of different deployment scenarios. We are the first to
point out the monitor location’s limitation on the attack detection. It suggests that
any detection system should be aware of the detection inaccuracy induced by vantage
point constraints.
This work also suggests an inherent limitation of approaches relying on routing
data alone. Given that most ISPs are reluctant about revealing details of their net-
works, they normally keep their routing feeds publicly inaccessible. The existing pub-
lic routing data repositories, RouteViews and RIPE, receive data from only around
154 ISPs, in most cases with one feed from each AS. The results in this chapter show
that sometimes it is insufficient to detect routing events, not to mention locating
the failure to a particular ISP. Given this fundamental limitation, in Chapter IV we
investigate the techniques to detect and locate performance disruptions using an end
CHAPTER IV
Diagnosing Routing Disruptions from End Systems
4.1
Introduction
The end-to-end performance of distributed applications and network services are
susceptible to routing disruptions in ISP networks. Recent work has found routing
disruptions often lead to periods of significant packet drops, high latencies, or even
temporary reachability loss [51, 103, 123, 138]. The ability to pinpoint the network
responsible for observed routing disruptions is critical for network operators to quickly
identify the cause of the problems and mitigate potential impact on customers. In
response, operators may tune their network configurations or notify other ISPs based
on whether routing disruptions originate from their internal networks, their border
routers, or remote networks. They may also find alternate routes or inform affected
customers for destinations which will experience degraded performance.
From end users’ perspective, the ability to diagnose routing disruptions also pro-
infrastructure as a whole. Knowing which ISPs should be held accountable for which
routing disruptions helps customers assess the compliance of their service-level agree-
ments (SLAs) and provides strong incentives for ISPs to enhance their service quality.
Past work on diagnosing routing events has relied on routing feeds from each ISP.
These techniques have proven to be effective in pinpointing routing events across
multiple ISPs [54] or specific to a particular ISP [125]. However, given that most
ISPs are reluctant about revealing details of their networks, they normally keep their
routing feeds publicly inaccessible. Today, the largest public routing data repositories,
RouteViews and RIPE, receive data from only around 154 ISPs [10, 14], in most cases
with one feed from each AS. These have shown to be insufficient to localize routing
events to a particular ISP in most cases [119]. As a result, customers are in the dark
about whether their service providers meet their service agreements. Similarly, ISPs
have limited ways to find out whether the problems experienced by their customers
are caused by their neighbors or some remote networks. They usually have to rely on
phone calls or emails to perform troubleshooting [8].
Motivated by the above observations, we aim to develop new techniques for diag-
nosing routing events from end systems. End systems are effectively hosts end-users
have access to and are typically located at the edge of the Internet. Our approach
differs markedly from recent work on pinpointing routing events in that it purely
relies on probing launched from end-hosts and does not require any ISP proprietary
information. In fact, using active probing on the data plane, our system can more
thermore, our techniques can be easily applied to many different ISPs instead of being
restricted to any particular one. This is especially useful for diagnosing inter-domain
routing events which often requires cooperation among multiple ISPs. Our inference
results can be made easily accessible to both customers and ISPs who need better
visibility into other networks. This is also helpful for independent SLA monitoring
and routing disruptions management stemmed from other networks. In addition, end
system probing can be used for both diagnosing and measuring the performance im-
pact of routing events. It offers us a unique perspective to understand the impact of
routing events on end-to-end network performance.
In this chapter, we consider the problem of diagnosing routing events for any given
ISP based on end system probing. Realizing that identifying the root cause of routing
events is intrinsically difficult as illustrated by Teixeira and Rexford [119], we focus
on finding explanations for routing events that the ISP should be held accountable for
and can directly address, e.g.internal routing changes and peering session failures. In
essence, we try to tackle the similar problem specified by Wu [125] without using ISP’s
proprietary routing feeds. Given that end systems do not have any direct visibility
into the routing state of an ISP, our system overcomes two key challenges: i) discovery
of routing events that affect an ISP from end systems; and ii) inference the cause of
routing events based on observations from end systems. We present the details of our
approach and its limitations in terms of coverage, probing granularity, and missed
routing attributes in Session4.3.
We have designed and implemented a system that diagnoses routing events based
to identify and classify routing events that affect an ISP. It models the routing event
correlation problem as a bipartite graph and searches for plausible explanation of
these events using a greedy algorithm. Our algorithm is based on the intuition that
routing events occurring close together are likely caused by only a few causes, which
do not create many inconsistencies. We also use probing results to study the impact
of routing events on end-to-end path latency.
We instantiate our system on PlanetLab and use it to diagnose routing events for
five big ISPs over a period of more than three and half months. Although each end-
host only has limited visibility into the routing state of these ISPs, our system is able
to discover a large number of significant routing events, e.g.hot-potato changes and
peering session resets, during that period. We validate the accuracy of our inference
results in two ways. Comparing with existing ISP-centric method, we are able to
distinguish internal and external events with up to 91.2% accuracy. We are able to
identify 4 out of 6 disruptions reported from NANOG mailing lists [8].
We summarize our main contributions. Our work is the first to enable end systems
to scalably and accurately diagnose causes for routing events associated with large
ISPs without requiring access to any proprietary data such as real-time routing feeds
from many routers inside an ISP. Unlike existing approaches to diagnose routing
events associated ISPs, our approach of using end system based probing creates a
more accurate view of the performance experienced by the data-plane forwarding
path. Our work is a first step to enable diagnosis of routing disruptions on the global
system architecture in Session4.2, followed by description of the collaborative probing
in Session4.3. Session 4.4 illustrates the procedure to identify individual routing
changes. Then in Session 4.5 we discuss the algorithm for root cause inference. The
deployment results are shown in Session 4.6 with validation shown in Session 4.7.