• No se han encontrado resultados

Perfil de seguridad

8. Perfil de seguridad.

Cluster analysis has been an important tool in regional economic analysis (Feser and Luger 2003). This tool is highly utilized in the descriptive analysis of company and industry clusters. By using agglomerative hierarchical clustering algorithm developed by Ward in 1963, Feser (2003) identified knowledge-based occupation clusters in thirty eight US metropolitan areas by comparing 1112 occupations according to 33 variables. The data came from the Occupatio nal Information Network (ONET) database of the US Department of Labor and the clustering algorithm was run in the SAS Software.

Spatial statistics are highly utilized in cluster analysis. Feser et al. (2005, 395) used Getis/Ord local G statistic, which is a measure for spatial concentration, and detailed county- level industry employment data from the US Bureau of Labor Statistics to identify groups of nominally linked industries clustered in particular locations in 1989 and 1997 in the US. O’huallachain and Leslie (2007) used Ripley’s K-function, a method used to compare a point pattern distribution with a reference distribution under complete spatial randomness such as a homogenous Poisson process, to detect the clustering patterns of twelve disaggregated advanced producer service sectors in Phoenix, Arizona. Maoh and Kanaroglu (2007) estimate univariate K functions and kernel surfaces to examine the range and shape of firm clustering in the city of Hamilton, Ontario. In his analysis of locational strategies of different industries in eight Canadian cities, Shearmur (2007) developed a measure of spatial autocorrelation and concluded that functions fully

characterize the space they occupy: high-order services and FIRE sectors are located in the CBDs and they have high level of spatial autocorrelation indicating interdependence; manufacturing characterizes the outer reaches of metropolitan areas and is also highly autocorrelated. Finally, there are some areas that do not have any specialization.

Also, cluster analysis methods and measures are used to predict subcenter formation (Erickson 1985; McDonald 1987; Sasaki and Mun 1996). In this predictive approach, agglomeration of service industry is explained as a function of the high transportation and opportunity cost of service sector professionals, the availability of labor force and commercial and/or retail space, job/housing ratio, and land use mix. Recent studies also focus on employment density (Anas, et al. 1998; Lee 2007). Edge cities and sub-centers are examples of agglomerations of commercial and retail firms located in suburban areas (Anas, et al. 1998). The data for regression analysis mostly comes from employment data, transportations costs, population and household

distribution data found in regular census surveys and in other reports such as US Department of Commerce’s County Business Patterns and ZipCode Business Patterns (Glaeser, et al. 2001; Glaeser and Kahn 2001).

In implementing cluster analysis, clustering algorithms and computer software developed in statistics are highly utilized including SAS, CAST, and CrimeStat. In this thesis, Crime Stat is chosen due to its ability to integrate with the ArcGIS software that is used for geocoding firm and building data. CrimeStat is a spatial statistics package that is developed by Ned Levine & Associates to analyze crime incident location data. It is a stand-alone Windows XP Professional program that can inter face with most desktop geographic information systems (GIS). Its

purpose is to provide a variety of tools for the spatial analysis of point locations. It is designed to operate with data sets. Although CrimeStat is developed for analyzing crime- incident locations, it can also be used for other types of applications involving point locations, such as the location of arrests, motor vehicle crashes, emergency medical service pick ups, or in this case the locations of firms and buildings (Levine 2004).

In CrimeStat, analyzing concentrations of incidents clustered within a limited geographical area is called the hot-spot analysis. This study utilizes Nearest Neighbor Hierarchical Clustering Technique (Nnh) in Crime Stat. Nearest Neighbor Hierarchical Clustering Technique of CrimeStat is utilized as an analytic tool in various studies. Meyer (2006) used it to identify the the clustering of small and medium sized IT firms in Canada near higher education isntitutions. Elliot (2005) used Nnh clustering technique to identfy the clustering of graphic design firms in Metropolitan Melbourne.

Nnh is a hierarchical clustering routine that clusters points together on the basis of spatially proximity (Levine 2004). Nnh is hierarchical because after identifying the first set of clusters, the routine continues on to grouping the first-order clusters into higher clusters, Nnh shows how the features are clustered at several geographic scales (Mitchell 2005). In the Nnh routine, the user has to define four parameters: threshold distance, probability level, the minimum number of points, and the visual output of the hot spots.

In the threshold distance parameter there are two options: random nearest neighbor distance option calculated by the Nnh routine and the fixed distance option specified by the user. Only

points that are closer to one or more other points than the threshold distance are selected for clustering (Levine 2004). The mean random distance is calculated as:

A

Mean Random Distance = d(ran) = 0.5 SQRT [ ---] (Levine 2004, 6.15) (3.1) N

A is the study area, N is the number of points. Dividing the area by the number of points gives the average number of points per unit area. The square root is taken to convert this into a linear distance unit. Then, the value is divided by half since the calculation assumes that all distances between points is calculated twice (Mitchell 2005, 155).

Threshold distance option is closely linked with the probability level parameter, where the user specifies a confidence interval around the selected choice of threshold distance. The confidence interval defines a probability for the distance between any pair of points. For a specific one-tailed probability, p, fewer than p% of the incidents would have nearest neighbor distances smaller than this selected threshold distance if the distribution was spatially random. In order to be in a cluster the points must be closer than what would randomly be the case (Levine 2004, 6.16). Higher confidence interval levels, which indicate that a certian percentage of points have distances larger than the threshold distance in a random distribution, will include less features but with a a higher certainity level. The t-value corresponding to this probability level is selected under the assumption that the degrees of freedom are at least 120, which is appropriate for a relatively large sample (Meyer 2006; Mitchell 2005) .

Confidence Interval for Mean Random Distance = Mean Random Distance ± t* SEd(ran) A 0.26136

= 0.5 SQRT [ ---] ± t [---] (Levine 2004, 6.15) (3.2) N SQRT[ N² /A ]

0.26136 is a constant derived from the standard error for a normal curve (Mitchell 2005). Thus, CrimeState uses the principles of normal distribution in defining complete spatial randomness.

The second choice in selecting a threshold distance is to choose a fixed distance (in miles, nautical miles, feet, kilometers, or meters). The main advantage of this method is that the search radius can be exactly specified, which might be useful for comparing the number of clusters for different distributions. However, in this method, the choice of a threshold is subjective, the user has to have a solid justification for the selection. The larger the distance that is selected, the greater the likelihood that clusters will be found by chance. The significance of the clusters can be tested in CrimeStat using a Monte Carlo simulation (Levine 2004, 6.17). In this simulation, the routine assigns N cases randomly to a rectangle with the same area as A, and evaluates the number of clusters according to the defined threshold and minimum point parameters. It repeats this test K times (defined by the user). By running the simulation many times, the user can see the minimum and maximum number of clusters as well as number of clusters at different percentiles (indicates one-tailed error percentage of finding greater number of clusters than by chance) for the particular first-order Nnh (Levine 2004, 6.30). The confidence interval formula of the fixed distance is obtained by substittuting the mean random distance by fixed distance in Formula 3.2.

The minimum number of points is another important parameter in Nnh. The user can specify the minumum number of points that make up a cluster. The dafult is ten points. By decreasing this number, more clusters are produced; by increasing this number, fewer clusters are produced.

This criteria is used to reduce the number of very small clusters in the analysis. If only points are selected based on the threshold distance crietria, there would be so many clusters in the final analysis. To minimize this many very small clusters and reduce the possibility that clusters could be found by chance, the user defines a minimum number restriction. If additional points are to be included in the cluster, then the probability of obtaining the cluster by chance will be less. In the final clustering, Nnh will show clusters that have the minimum number of points and those points will be closer to at least one other point than the specified threshold distance (Levine 2004; Meyer 2006).

The final parameter in the Nnh clustering routine is the visual output of the hot-spots. There are three choices: ellipse, convex hull, or both. A standard deviational ellipse is calculated for each cluster. The user has the option to choose between 1X (the default), 1.5X, and 2X. One standard deviation will cover more than 50% of the cases, one and a half standard deviations will cover more than 90% of the cases, and two standard deviations will cover more than 99% of the cases, although the exact percentage will depend on the distribution. The ellipse is an abstraction, it does not represent the actual cluster. A convex hull is also calculated for each cluster. The convex hull draws a polygon around the points in the cluster. Compared to ellipse, it is the literal definition of the cluster. Both the ellipse and the convex hull can be saved in ArcView ‘.shp’, MapInfo ‘.mif’ or Atlas*GIS ‘.bna’ formats (Levine 2004, 6.18).

3.5.1.2. Using Nearest Neighbor Hierarchical Clustering in Identifying Company and

Documento similar