Chapter 2. Community metal mining consultas in Latin America
4. The rise and spread of mining consultations in Latin America
4.2. Guatemala and Colombia 1. Guatemala
Produced equations based on appropriate fitting of historical data have been further proposed. Maybe the greatest effort of such studies is performed by MINTS which have thoroughly analyzed recorded activity of 100 major traffic sources around the world, including the largest IXPs [UoM2]. Incoming and outgoing traffic from several consecutive past years is characterized by a fitting trend and described are the historical traffic aggregates using equations that are expressed as mentioned in the introductory chapter, but more specifically are of the following form:
y = 10bx + d → y = 10d ∙ 10bx (7)
The London Internet Exchange (LINX), one of the largest IXPs, has an average traffic of 6.400e+10 bps as observed between 4 February 2002 and 21 August 2009, an annual growth rate at 1.5971 for the total traffic and is characterized by the following equation (8), where x is the day and y the traffic in bps, and its data with the fitting curve are as
shown in figure 21 [UoM7]. All details for data, relations and analyses of this massive study can be found at the MINTS website.
y = 103.5082 ∙ 100.0006x (8)
Figure 21: Fitted traffic data for the LINX [UoM7]
Another study similar to MINTS, as referenced by Labovitz et al (2010), looks for an exponential fit of the following expression, where y is the traffic in bps and x is the day [Lab2010, p.84]:
y = A ∙ 10Bx x [1, 365] (9)
The fitted curve over 365 daily collected traffic traces from May 2008 to May 2009 can be seen in the following figure:
Figure 22: Curve fit over a year’s traffic by an anonymous provider [Lab2010, p.86]
In the same study, calculated is the total traffic volume for May 2008 at 9 Exabyte (EB), which matches with Cisco’s estimates and is also compared with MINTS figures [Lab2010, p.84-85] in the next table:
Table 9: Labovitz et al results for May 2008 (first column) [Lab2010, p.85]
Finally, more characterization studies are reported but specifically for mobile network traffic and according to device types and applications by Shafiq et al (2011). Aggregate and separate devices’ traffic have been observed for 1 week in which diurnal characteristics are present [Sha2011, p.308] similar to the sinusoidal-like shape of weekly and day-to-day traffic mentioned in previous sections. At the same time, of more importance are the traffic volumes generated by 3 types of mobile devices over a number of consecutive years for which a regression line is plotted for each type’s historical traffic [Sha2011, p.309] demonstrated in figure 23. The general expression of the regression equation is of the form:
y(x) = a∙x + b (10)
Figure 23: Regression lines for mobile devices trend characterization [Sha2011, p.309]
The equation in figure 23.(b) characterizes the trend line quite accurately. However, most of the methods employed for long term modeling and prediction, are observed to have some level of dispersion at the characterization process which has been not clearly defined and can therefore lead to large fitting and even forecast error rates. In addition, the methods described as being more static do not seem to report the associated fitting error such as for figures 21, 22, 23 even if different techniques are employed in those studies. In this thesis, fitting errors are minimized to very low levels in order to produce suitable mathematical formulas to predict traffic volumes with precise figures. The following chapter takes into consideration these peculiar issues and demonstrates an effective methodology for the long term analysis and projections of network traffic.
CHAPTER 3 Methodology
“All is Number.
Number rules the Universe”
- Pythagoras
The proposed method for long term Internet traffic modelling and forecasting is presented in this chapter. Based on four distinct conditions, rigorous characterization of massive historical measurements can successfully indicate future figures using novel mathematical formulae.
3.1 Introduction
As outlined, a considerable part of relevant research concentrates on popular statistical time series models, neural networks and analysis of the dynamics of the collected traffic traces. At the same time, another important part advances to more static techniques and use less dynamic assumptions making it more relevant to this thesis’ methodology, albeit with a different approach. The materials on which this investigation is based on are the collected actual historical data of available Internet volume figures from various traffic sources as well as the evolution of the number of Internet users worldwide.
Furthermore, the methods that have been used herewith are to reveal how the numbers of the time series seen in this history traffic are connected to each other. Certain connection properties have been observed over continuous chronological intervals that can be represented with appropriate fitting curves which, in turn, can indicate the growth of the corresponding traffic volumes for the future. For most of the data, there are hidden patters and these have been successfully detected in chronological order.
Subsequently, it has been further observed those patterns can be described with mathematical equations which have never been proposed before. In most of the reported cases in core chapters, the proposed formulae (i) encompass prominent fitting characteristics with the values from respective historical measurements and (ii) are expected to provide very good prediction results for the next years with an expected average prediction error at far less than 10%. Namely in the case where some new traffic data are already released, the hereby proposed methods have lower prediction errors than projections coming from other research bodies, averaging a rate of less than 5%.