3. La Eneida
3.3. Las profecías
· · · (12)
· · · (13)
· ·
· (14)
· · (15)
3.5 Extrapolation and Weight Coefficients
In general, interpolation is more accurate than extrapolation. Extrapolation is based on pure speculation of the effect of the parameter being estimated. Interpolation, at minimum, bounds the effect since the effect under intermediate conditions will fall between the two conditions tested. However, the method may use extrapolation, for example predicting points on the plane but outside the triangle shown in Figure 7. As with other models, extrapolation should be exercised with caution. Trying to predict vehicle activity that is very dissimilar to the baseline cycles used to create the model may produce unrepresentative results. Therefore, it is desirable to use baseline cycles that cover a wide envelope of vehicle activity to avoid performing extrapolations, or at least extrapolations far outside the bounds defined by the data.
Negative weights will result if the cycle being predicted is outside the region defined by the baseline cycles. Equations 16 and 17 show the set of equations to be solved for a simplified case where only two baseline cycles are available. Figure 8 shows a two‐dimensional illustration of the method. Baseline cycles 1 and 2 are used to generate the equation of a line. In this case the number of baseline cycles is two so only one property (P in Equation 16). The weight coefficient of baseline cycle 2 (w2) is illustrated as an arrow and has a value of 0 at point 1 and a value of 1 at point 2. If the cycle being predicted is between the two baseline cycles (point a in the Figure), w2 will have a value between 0 and 1 and both weight coefficients are positive. On the other hand, if the predicted cycle is outside the region defined
by the baseline cycles (point b in the Figure), one of the weight coefficients (w2) will have a value greater than 1 making the other weight coefficient (w1) to be negative.
1 1 1 (16)
(17)
Figure 8 Geometric interpretation of a two‐dimensional simplified model
Considering the three‐dimensional case (Equation 7), note that the solution of the system of three simultaneous equations does not depend on fuel consumption or emission rates values. The weights only depend on the properties (P1 and P2) of baseline and unseen cycles. With this in mind Figure 9 illustrates a two dimensional projection of the plane in the properties axis. Dotted lines represent the points in the region where one of the weight coefficients has a value of zero. It can be seen that six different regions of extrapolation occur; each one is shown with its corresponding signs of the weight coefficients (w1, w2, w3) within the parentheses. Any unseen cycle outside the triangular region determined by baselines cycles 1, 2, and 3 will result in either one or two of the weight coefficients to be negative.
Figure 9 Weight coefficients signs17
By inspection of Figure 9, one can find regions outside the triangle where unseen cycle properties are bounded by the values of properties for the baseline cycles (think of a rectangular box where the triangle is inscribed within this box). Going from a 1‐dimensional space to a 2‐dimensional space the interpolation region is smaller than expected. In general, adding dimensions to the model (i.e. adding another baseline cycle and property) makes the interpolation region smaller to the whole space. Figure 10 shows the interpolation regions in 2‐dimensional space and 3‐dimensional space. The ratio between the areas of the triangle and the circle is 0.4135 as calculated in Equation 1818. The ratio between the volumes of the tetrahedron and the sphere is 0.1225 as calculated in Equation 1919. For the 1‐
dimensional case, if the baseline cycles are at the extreme boundaries of the space, all the space corresponds to an interpolation region and the ratio would be equal to 1. It is clear that adding dimensions to the model will reduce the region of interpolation with respect to the whole space. This fact can explain the occurrence of diminishing returns in accuracy when trying to add more than three baseline cycles to the model.
17 Sign within the parentheses represent the sign for each weight (w1, w2, w3).
18 The side of an equilateral triangle inscribed in a circle of radius r=1 is a=3/√3.
19 The side of a regular tetrahedron inscribed in a sphere of radius r=1 is a= .
Figure 10 Interpolation regions in two and three dimensions20
√
0.4135 (18)
√
0.1225 (19)
It is expected for the model to experience a loss of accuracy due to extrapolation. The magnitude of that loss will be higher if the predicted cycle is farther from the interpolation region. A good metric to quantify the magnitude of extrapolation (i.e. distance from the triangle or tetrahedron) is the absolute value of the highest weight coefficient. If no extrapolation occurs, weight coefficients’ absolute values will be less than one. If extrapolation occurs, one or more weight coefficient absolute values will be greater than one. The longer the distance to the interpolation region, the larger the absolute value of the weight coefficients. An extrapolation parameter could be defined as shown in Equation 20.
0 | | 1
| | 1 | | 1 (20)
Figure 11 shows a scatter plot of the absolute percentage error versus the extrapolation parameter for the prediction of CO2 mass rate for a representative case21. Although the regression coefficient shown (R2=0.54) is not conclusive, the general trend observed for this and many other models is that
20 Source: http://www.math.ubc.ca/~cass/courses/m308‐03b/projects‐03b/wagner/Webpage.htm
21 Predictions using average road load power and aerodynamic speed as properties and all possible combinations of cycles available for bus 41 described in Section 5.1
attempting heavier extrapolations (higher extrapolation parameter values) caused larger prediction errors.
Figure 11 Relationship between extrapolation and absolute error