• No se han encontrado resultados

CAPITULO SIETE Irradiando Luz

In document FALLEN Por Lauren Kate (página 82-91)

The overall classification results presented in the previous section, Section 4.6, indicated that the proposed graph-based approach, using a graph/tree representation to which frequent sub- graph mining was applied, performed well over the two different satellite images datasets con- sidered. The main findings from the four sets of experiments conducted were:

1. Classification effectiveness trended to improve as the support threshold decreased be- cause more frequent subgraphs were identified. The reported evaluation found that

σ=10 produced the best performance.

2. The best classification performance in terms of feature selection mechanism, for both data sets (Site A and B), was obtained using Gain Ratio, follow by Information Gain, and then Chi-Squared feature selection.

3. With respect to Gain Ratio feature selection it was found thatk=55 produced the overall best performance.

4. The most appropriate classification generation mechanisms identified from the reported evaluation were: (i) Bayesian Network and (ii) AODE. However the average AUCs using Bayesian Network and AODE for both sites (Site A and B) was found to be 0.844 and 0.839 respectively, thus the Bayesian Network classifier produced a slightly better overall performance than the AODE classifier.

4.8

Summary

In this chapter the first of the proposed approaches to population estimation mining from satel- lite imagery has been presented. The proposed approach is based on a graph representation, and used a hierarchical decomposition whereby each individual household image was decomposed into a quadtree hierarchical structure by recursively partitioning the image space into quad- rants. Each household image was represented using a single quadtree. A frequent subgraph mining approach was then applied so that subgraphs that frequently occur across the image set could be identified. The set of identified frequent subgraphs was then transformed into a feature vector space. A feature selection approach was applied to the feature vectors and the most discriminative features (subgraphs) selected (to which a number of classification learning methods may be applied). The reported evaluation indicated that high classification accuracy results were obtained when using a low support threshold (σ). The Gain Ratio feature selection mechanism was found to be the most appropriate feature selection mechanism with respect to both data sets (Site A and Site B). The most appropriatekvalue, with respect to the Gain Ratio feature selection mechanism, was found to be k=55. The most suitable classification gen- erator was found to be the Bayesian Network model. In the following chapter an alternative

approach for classifying satellite images using colour histograms and colour based statistical features is described.

Chapter 5

Population Estimation Mining using

Satellite Imagery: The Colour

Histogram Based Approach

5.1

Introduction

The proposed image colour based approach to population estimation mining using satellite im- agery is presented in this chapter. Recall that the application of classification techniques to image data requires that the image data set under consideration is represented in a manner that captures the salient features of the data but at the same time is compatible with the classifi- cation techniques to be used. In the previous chapter a graph-based approach was suggested, in this chapter an alternative mechanism founded on the usage of image colour is proposed. The colours within an image are its most basic content; representing images in terms of this content is thus an obvious idea. One method of encapsulating image colour is to represent the distribution of colours within a given image using histograms [44, 118]. Thus, in the context of our segmented household data, the idea is to represent each household in terms of a collection of colour histograms, seven histograms per household. Of course, for classifier training pur- poses each collection of histograms will have a family size (class) label associated with it. In addition, the use of simple statistical information concerning the distribution of colour across an image was also considered.

A schematic of the proposed colour histogram representation approach for population esti- mation mining is given in Figure 5.1. From the figure it can be seen that the overall approach encompasses three processes: (a) image segmentation, (b) feature extraction and (c) classifier generation. The image segmentation process (the top rectangular box in Figure 5.1) is the same as that discussed with respect to the graph-based approach described in Chapter 4; the image segmentation process was detail in Chapter 3 and is thus not discussed further here. The fea- ture extraction process is concerned with translating the segmented data into a form ready for

Figure 5.1: Schematic illustrating the population estimation mining approach using colour histograms

classification, the colour histogram based representation in the case of the work presented in this chapter. From the figure it can also be seen that the feature extraction process comprises three steps: (i) colour histogram computation, (ii) the identification of a number colour based statistical metrics to be included in the representation, and (iii) reduction of the feature vector dimensions using a feature selection mechanism. The third process included in Figure 5.1 is classifier generation. This is where standard classifier generation methods can be applied to build the desired classifier which can then be applied to unseen data.

The rest of this chapter is organised as follows. Section 5.2 provide detail of the proposed colour histogram representation. Section 5.3 discussed the calculation of the additional colour statistical metrics used to augment the basic colour histogram representation. The feature selec- tion and classifier generation process is then considered in Section 5.4. Section 5.5 reports on the evaluation of the proposed approach, followed by some discussion in Section 5.6. Finally, the main findings and some associated conclusions are presented in Section 5.7

In document FALLEN Por Lauren Kate (página 82-91)