Aplanar una colección de colecciones - Conversión de expresiones

CAPÍTULO 3: MANIPULACIÓN DE ESQUEMAS Y DATOS

3.12 Composición de operadores

3.4.5 Conversión de expresiones

3.4.5.4 Aplanar una colección de colecciones

We wish to extend the idea of moving averages to “moving lines.”

That is, instead of taking the average of the points, we may fit a straight line through these points and estimate the trend-cycle that way.

In Chapter 2, the least squares method of fitting a straight line was discussed. Recall that a straight trend line is represented by the equation T_t = a + bt. The two parameters, a and b, represent

3/3 Local regression smoothing 103

the intercept and slope respectively. The values of a and b can be found by minimizing the sum of squared errors where the errors are the differences between the data values of the time series and the corresponding trend line values. That is, a and b are the values that minimize the sum of squares

Xn t=1

(Y_t− a − bt)².

A straight-line trend is sometimes appropriate, but there are many time series where some curved trend is better. For example, the shampoo data plotted in Figure 3-3 do not follow a straight line.

Local regression is a way of fitting a much more flexible trend-cycle local regression curve to the data. Instead of fitting a straight line to the entire data

set, we fit a series of straight lines to sections of the data.

The estimated trend-cycle at time t is T_t = a + bt where a and b

are chosen to minimize the weighted sum of squares weighted sum of squares

Xm j=−m

a_j(Y_t+j− a − b(t + j))². (3.15)

Note that there is a different value of a and b for every value of t. In effect, a different straight line is fitted at each observation.

The calculation for trend-cycle at month 22 is shown in Figure 3-7.

The steps involved are as follows.

Step 1 The number of points to be used in the weighted regression was chosen to be 19. The shaded area, centered on month 22, shows the 19 points to be used, nine on either side of month 22.

Step 2 The observations are assigned weights using the weight func-tion shown in the upper right panel. This is exactly the same weight function as (3.14) which was used in Figure 3-6. The function has a maximum at month 22; the months closest to month 22 receive the largest weights and months further away receive smaller weights. The weights become zero at the boundaries of the shaded region. Months outside the shaded region receive zero weights, so they are excluded from the calculation.

Step 1

Month Total sales (liters) 100300500700

0 10 20 30

Step 2

Month Weight function 0.00.0100.020

2 4 6 8 10

Step 3

Month Total sales (liters) 100300500700

•

0 10 20 30

Result

Month Total sales (liters) 100300500700

•

0 10 20 30

Figure 3-7: The steps involved in calculating a local linear regression at month 22.

3/3 Local regression smoothing 105

Step 3 A line is fitted to the data using weighted least squares with the values of a and b chosen to minimize (3.15). The fit is shown in the lower left panel. The weights determine the influence each observation has on the fitting of the line. The estimate of trend-cycle for month 22 is shown by the filled circle.

The same calculations are carried out for each observation. The resulting trend-cycle estimates are joined together to form the line shown in the lower right panel. At the ends of the data, fewer observations are used in computing the fit.

Because a straight line is fitted to the data to estimate the trend-cycle, we do not have the same problem of bias at the end of the series which occurred with the moving average smoothers. This is the chief advantage of using local linear regression: it has smaller bias at the

ends and in areas where there is strong cyclic behavior. bias reduction One parameter must be selected before fitting a local regression,

the “smoothing parameter” k. The smoothing parameter is analogous smoothing parameters to the order of a moving average—the larger the parameter, the

smoother the resulting curve. This is illustrated in Figure 3-8 which shows three local regressions fitted to the shampoo sales data. In the top panel, k was set to 49 (or m = 24). Note that this is greater than the number of observations in the series. In this case, the calculation of weights is the same as for the ends of the series. The weights corresponding to available data are simply set to Q(j, m) and scaled so the sum is one. The fitted trend-cycle is too straight because k is too large. The second panel shows the trend-cycle calculated with k = 19 as in Figure 3-7. In the bottom panel, k was set to 7 (or m = 3). Here the estimated trend-cycle is too rough; the local wiggles follow the randomness in the data rather than the underlying trend-cycle. The goal in choosing k is to produce a trend-cycle which is as smooth as possible without distorting the underlying pattern in the data. In this example, k = 19 is a good choice that follows the trend-cycle without undue wiggles.

3/3/1 Loess

“Loess” is an implementation of local linear smoothing, developed by Bill Cleveland and coworkers at AT&T Bell Laboratories. It is described in Cleveland and Devlin (1988) and Cleveland, Devlin, and

Total sales (liters) 200400600

Month

Total sales (liters)

0 10 20 30

200400600

Figure 3-8: Three local regression curves with different values of the smoothing parameter. From the top panel, the values of k are 49, 19, and 7.

Grosse (1988). It is widely used and is available in several software packages.

The heart of Loess is local linear smoothing but with some pro-tection against extreme observations or outliers. An initial local regression is calculated as described in Figure 3-7. Then the irregular component is calculated using

Eˆ_t= Y_t− ˆT_t.

These are simply the differences between each observation Y_tand the

In document Lenguaje de Consulta para Bases de Datos Orientada a Objetos. (página 77-92)