Capítulo 5. Implementación básica BGP
5.6 Propagación de redes en BGP
The impact of the monitor to the TCP throughput was measured using the Iperf3 tool. For one test run the tool was run two times for 60 seconds, once with the monitor and once without. To reduce noise the results were averaged with a total of 801 runs of the same test. The averaged data for the throughput is shown in the table 5.1. The benchmarks show roughly a 10% lower TCP throughput with the monitor than without. The monitor has three parts which may affect the throughput. The HTB queuing disci- pline attached to the egress, the bpf filter itself, and the user space monitor program. The impact of the user space program should be minimal as it
is sleeping in 1 second interval during the test. The overhead of using the queuing disciplines which support filters at all amount to a quarter of the penalty in the throughput. The other three quarters can be attributed to the eBPF filter itself.
Untraced Qdiscs only Traced Average throughput 28.9 GBit/s 28.2 GBit/s 25.9 GBit/s
Standard deviation 0.26 GBit/s 0.31 GBit/s 0.76 GBit/s
Percent of untraced 100% 97.7% 89.7%
Table 5.1: Throughput test summary
5.1.2
HTTP
The results for the benchmark with HTTP traffic payload yield information about the impact of the monitor to the HTTP-request throughput and delay. Due to high amount jitter in the results, the values presented in this section are medians of the corresponding metrics. The median is taken from a total of 801 runs. As an example, the mean throughput is therefore the median of the means calculated for each run.
The HTTP throughput is plotted as a function of the error request percentage in the Figure 5.1. The presumption for the results is that the throughput of non-monitored and qdisc only versions is not affected at all by the percentage of the error request. The presumption for the monitored ver- sion is that the throughput decreases as the percentage of the error request increases. The basis for this presumption is that mirroring of the request has non-zero overhead. That overhead should increase as the percentage of error request and therefore mirrored data increases.
As per the presumptions, the HTTP throughput of the monitored traf- fic load indeed dropped in seemingly linear fashion from 97% of the non- monitored throughput to 89%. The results from raw TCP throughput bench- mark would suggest that the benchmark with only the queuing disciplines in place should result in decreased throughput when compared to the non- monitored benchmark. however, this was not observed.
Percentile 50 90 99 99.9 99.99 99.999 99.9999
untraced 0.154 0.200 18 32 46 57 66
tc only 0.156 0.207 19 33 47 57 207
traced 0.164 0.226 19 33 48 57 207
Table 5.2: HTTP request time (ms) percentiles with 0% errors The request percentiles in the tables 5.2 and 5.3 show that all meaningful differences in the request times happen before the 99.9th percentile regardless of the request error rate. The difference between the 99.9th percentiles with 100% error rate is at most 3%. The smaller percentiles are however affected by the monitoring and the error rate. Without any errors the median request time increases by 6% when the monitoring is introduced. With errors the same increase is 13%. The 99.9999 percentiles show a significant increase in delay for the tc only and traced benchmarks. This seems to be connected to the HTB queuing discipline used. However, the exact reason is unknown.
Percentile 50 90 99 99.9 99.99 99.999 99.9999
untraced 0.154 0.200 18 32 47 57 65
tc only 0.156 0.207 19 33 48 57 207
traced 0.174 0.244 20 33 47 57 206
Table 5.3: HTTP request time (ms) percentiles with 100% errors
5.2
Multi-host results
The Multi-host benchmarks were conducted in four different configurations. The ”traced” and ”untraced” configurations match their counterparts de- scribed in the single host configuration. The ”funnel” represents configu- ration where the Metrofunnel was used for network monitoring instead of the eBPF monitor. In the fourth configuration the eBPF monitor used is identical to the one used in the traced configuration. However instead of mirroring, the filter will just pass the packets to the network stack without further action. This configuration is referred in tables and figures as ”pass”.
5.2.1
TCP throughput
The results for TCP throughput are presented in the table 5.4. The results presented are averaged between 70 runs of the Iperf3 benchmark. Due to the
CHAPTER 5. EVALUATION 35
fact that a single core used for generating the traffic is able to saturate the 10G NIC, the results are uneventful. When the computing is transformed from CPU bound to network bound, the overhead of monitoring vanishes completely.
Average throughput Standard deviation
Untraced 8.768 GBit/s 0.003 GBit/s
Traced 8.768 GBit/s 0.003 GBit/s
Table 5.4: Multi-host TCP throughput test summary
5.2.2
HTTP
The results of the HTTP throughput are presented in the figure 5.4.The HTTP throughput measured with the monitor with drops to 30% of the non- monitored reference without mirroring enabled. 100% mirroring causes the throughput drop even further to 20% of the reference. This is a drastic drop when compared to the 10% overhead measured in single host benchmark.
Percentile 50 90 99 99.9 99.99 99.999 99.9999
untraced 0.262 0.378 0.578 1 14 23 36
funnel 0.304 0.527 0.930 19 31 45 61
pass 0.877 1 2 4 5 6 13
traced 0.886 1 3 4 4 6 12
Table 5.5: HTTP request time (ms) percentiles with 0% errors The with the overhead of mirroring is visible in the delays as well. The request delays for the 99.9 - 99.999 percentiles are greater for the mirrored requests than the monitored requests without mirroring. However, the ad- ditional overhead from mirroring is small when compared to the overhead of the monitoring itself.
Percentile 50 90 99 99.9 99.99 99.999 99.9999
untraced 0.266 0.374 0.562 0.977 15 23 36
funnel 0.300 0.530 0.971 18 30 45 61
pass 0.897 1 3 4 4 6 14
traced 1 2 4 7 9 10 15
Table 5.6: HTTP request time (ms) percentiles with 100% errors
5.3
Experimentation
Due to the large performance degradation caused by the eBPF monitor with the multi-host setup, additional experimentation was done identify the roots of the sudden increase in the overhead. The aim for these measurements were to identify the relationship of between the monitoring overhead and used CPUs.
The experimentation used exactly the same traffic loads and monitor, however the amount of CPU cores available to the traffic generators was varied. The results for these measurements are shown in the Figure 5.8 and 5.9. The results are a median from 30 independent runs of the same test.
Chapter 6
Discussion
This chapter discusses the relationship of the Linux kernel based microservice monitoring and existing solutions for microservice monitoring. Furthermore, problems regarding the kernel based monitoring, their solutions and future tidings are discussed.
6.1
Performance
The results for performance overhead of the implemented eBPF monitor are mixed at best. The performance impact of at most 10% for throughput and negligible impact on delay are promising. However, the multi-host test setup introduced a grave performance degradation to the HTTP benchmark. The throughput was down to the 30% and the impact on delay significantly larger than with the single host setup. The characteristics of the results suggests that the eBPF monitor hit a hard performance bottleneck when 12 cores or more were in use. This effectively prevented any additional processors or HTTP client from positively contributing to the HTTP throughput. The non-monitored setup however did not suffer from this bottleneck.
Even though the underlying reason for the bottleneck was not identified, just the presence of the bottleneck has implications regarding the use case of a containerized monitor. An integral part of the use case is that the containerized monitor should work predictably and the same way regardless of the host environment. It is clear, that using traffic control and eBPF for monitoring can have unpredictable host specific side effects. These side effects could likely be mitigated by careful configuration of the host. This host specific configuration however is again in direct contradiction of the use case of a host independent monitoring container.
As a summary for the performance, the eBPF monitor has very low over-
head in the best-case scenario. However reliably achieving the best-case scenario in the context of a containerized monitor might not currently be possible without host specific configuration.