• No se han encontrado resultados

4. Ciudad Productiva

4.1 Estructura productiva y sistema de ciudades

2.1.1 Empirical Estimator for Low Income Proportion Low income proportion is defined as

θαβ =P(X ≤αξβ) = F(αξβ), (2.1)

Let X1, X2, ..., Xn be a simple random sample drawn from the population X with cu-

mulative distribution function F(x). Define the empirical estimator for θαβ to be Fn(αξβ).

Since Fn(αξβ) depends on the αth fraction of the β-th quantile of the unknown population,

Fn(x). Then an empirical estimate for θαβ can be defined as ˆ θαβ =Fn(αξˆβ) = 1 n n X i=1 I(Xi ≤αξˆβ).

2.1.2 A Kernel Estimator for a Low Income Proportion

Since the empirical estimator ˆθαβ is a non-smoothing estimator for θαβ, while in many

applications,θαβ is a smoothing function. We will use kernel methods to develop a smoothed

estimator for θαβ. Extensive literature has shown the advantage of kernel estimation. Falk

(1983, 1985) concluded that, for a distribution functionF(x), or its quantile functionF−1(x), their corresponding kernel-based estimators asymptotically dominate their empirical estima- tors. Kernel estimations have been found in wide applications. Lloyd and Yong (1999) proved that the kernel estimator for the ROC curve performs better than the empirical es- timator for its smaller mean-square error. The difference between empirical estimator and kernel estimator diminish as sample size increases. Hsieh and Turnbull (1966) showed that, for Youden index, the kernel-based estimator is superior to empirical estimators in that the MSE is asymptotically smaller byO(h/n). In this section, we propose a kernel estimator for a low income proportion and develop confidence intervals based on bootstrap and jackknife empirical likelihood methods.

The kernel function is defined as K(x) = R−∞x ω(y)dy, where ω is a density function. By substituting the indicator function I(αξβ −Xi ≥ 0) by the kernel function K(

αξˆβ−Xi

h ),

we construct a kernel estimator of the low income proportion θαβ as:

ˆ Tn(α, β) = 1 n n X i=1 K(α ˆ ξβ −Xi h ). (2.2)

Kernel estimator ˆTn(α, β) follows the asymptotic normal distribution as shown in The-

orem 2.1:

Theorem 2.1. Assume that the density functionωof the kernel function K has bounded support, its first derivativeω0 exists and is bounded on its supporting set, andR−∞∞ |ω0(y)|dy <

∞. If h=h(n)→0, √nh→ ∞ as n → ∞, then √ n{Tˆn(α, β)−θαβ} d −→N(0, σαβ2 ), where σ2 αβ = α2β(1β)f2(αξ β)) f2(ξ

β) +θαβ(1−θαβ), and f(x) is the density function of the income

distribution F(x).

2.1.3 Bandwidth Selection for the Kernel Estimator by cross-validation Method One of the difficulties in the calculation of the smoothed estimator ˆTn(α, β) is to choose

bandwidthh for the kernel estimator. The choice of the bandwidthh will strongly influence the performance of the kernel estimator. Extensive simulation analysis has been conducted to show that the choice of kernel function will not change the density estimation much. However, as in many kernel methods, the choice of the bandwidth h may influence the performance of the proposed kernel estimate.

Many methods have been proposed for selecting the bandwidth for kernel estimators. In our study, we propose a cross-validation (CV) method for bandwidth selection. In order to ease the implementation, we utilize the 2-fold cross-validation method. The band- width h is suggested to be h =cn−1/3, based on our simulation analysis. Then, the choice of h is controlled by the constant c. Here and thereafter, we denote ˆTn,c(α, β) = ˆTn(α, β).

For a given β, we selectc by minimizing the Mean Squared Error(MSE).

M SE(c) = E[ ˆTn,c(α, β)−θαβ] 2

.

For this purpose, we randomly split the sample into two parts, where the first part is treated as the training sample, and the other part is as the validation sample. The kernel estimator for low income proportion ˆTn,c(1)(α, β) is constructed based on the training sample,

while the empirical estimator ˆθαβ(2) is constructed from the validation sample. By repeating this random split many times, we will obtain the following cross-validation estimate of the

MSE CVc= 1 L L X l=1 [ ˆTn,c(1,l)(α, β)−θˆα,β(2,l)]2,

whereL is the number of random splits. Then, the value ofcis chosen as the constant that minimize CVc.

Figure 2.1 is a simulation example to illustrate the relationship between MSE and the constantc, which actually affects the bandwidth h. Clearly, the plot of MSE vs. bandwidth is a “smiling curve”. The value of h corresponding to the lowest point of MSE will be the optimal bandwidth.

Figure 2.1 Bandwidth selection by MSE.

Alternatively, if we focus on the overall performance of the smoothed estimator for low income proportion across all β, we can use a similar cross-validation procedure for selecting

cby minimizing the Average Mean Squared Error (AMSE), AM SE(c) = E 1 K K X k=1 [ ˆTn,c(α, βk)−θαβk] 2.

where βk is a fine grid of (0,1), and K is an integer.

And the cross-validation estimate of the AMSE is:

ACVc= 1 L 1 K L X l=1 K X k=1 [ ˆTn,c(1,l)(α, βk)−θˆ (2,l) αβk] 2.

Again, cis chosen as the one that minimize ACVc.

Figure 2.2 illustrates the relationship between bandwidth and AMSE, and how we choose the constant cfor bandwidth h.

Figure 2.2 Bandwidth selection by AMSE.