e Z Probabilidad de metástasis =
VIII. PROYECTOS FUTUROS
In this thesis, we approach habitat classification using ground-taken imagery as an au- tomatic image annotation problem. In this scenario, the annotations are the different Phase 1 habitat classes. Consequently, our goal is to correctly identify which habitat classes are present in which photographs or, in other words, which annotations belong to which photographs.
Automatic image annotation (AIA) is an increasingly popular approach often used in the Computer Vision community. AIA was developed as mechanism to deal with the
exponential increase in visual data [185]. For example, Flickr surpassed 6 billion pho- tographs in 2011, only six years after its foundation [125] and Geograph is hosting almost 4 million photographs from England, Ireland and the Isle of Man as of April 2014 [154]. Traditional image retrieval techniques proved to be lacking when dealing with such a large number of images, specially due to the gap between content-based image retrieval and classification and image semantics understandable by humans [213]. This gap is often referred to in literature as the semantic gap [185].
AIA methods can be regarded as particularly well-suited methods to bridge the semantic gap between low level features and high level semantics. In essence, AIA methods were developed to facilitate the search and navigation of large numbers of images. In [213], the authors propose AIA as an alternative to content-based and text-based annotation image retrieval.
The main aim of AIA methods is to automatically learn semantic concept models, in the form of annotations, from a large number of samples, images in our case. Then, new unseen images are labeled using these models. For this, semantically labelled images are collected and significant features, such as those discussed in Section2.2.1, are extracted. These are used in conjunction with a machine learning algorithm that, once trained, will be used to annotate unseen samples.
AIA methods can be divided into three categories: single labelling annotations, multi- labelling annotations and annotations which use metadata to annotate images [213]. Our problem is inherently a multi-label problem, since the ground-taken photographs that we have collected contain a variable number of habitats. Moreover, we have used metadata in the decision-making process. Consequently, in this thesis, we have created a hybrid annotation framework which mixes approaches from the second and third categories. There are many methods that have been developed for image annotation with general classes, also referred to as basic-level classes [209]. For example, [150] combined image annotation with semantic information and bag-of-features to classify photographs ac- cording to twenty-one classes such as building, grass, tree, cow, water, chair, road and cat. [167] used semantic texton forests to annotate and classify images with a similar classification scheme. [25] combined interactive and online learning to create a frame- work that was able to annotate bird images. [112] also developed a method for indoor and outdoor scene recognition based on partitioning an image into increasingly finer sub-regions and computing their histograms.
However, what makes the problem of habitat classification different from other image annotation problems is the nature of the classes that need to be recognised. Most of the existing AIA research focuses on object [22, 66, 150] or scene [112] recognition
and annotation. In those works, the classes are easily identifiable, they do not share semantic properties and their classification is regarded as basic-level categorization (i.e. distinguishing between a boat and a cow, a chair and a building).
However, instead of conventional and clearly separable classes, such as building, flower, tree, dog, cow, road, body, boat, mountain, forest [150, 167], Phase 1 is a hierarchical classification whose classes are difficult to identify and tell apart even for human survey- ors [102]. As mentioned in Chapter 1, the aim in this case, instead of classifying trees, grass or water, for example, is to classify which kind of trees (broad-leaved or conifer- ous), grasses (improved, semi-improved or unimproved) or water (standing or running) appear in the photographs. In Computer Vision, this type of problems are referred to as fine-grained visual categorization problems (FGVC) [24]. FGVC, in contrast to the concept of basic-level categorization presented previously, is also known as subordinate- level categorization [209]. In FGVC problems, the aim is the accurate discrimination between classes that share similar semantics [205].
FGVC has gained much interest in the Computer Vision field in the last few years mainly due to its many applications and its technical challenges, since it tackles catego- rization problems that are difficult even for humans. Examples of FGVC applications include the categorization of leaves [108], flowers [136], dogs [120] and, more recently, birds [15]. As can be inferred, FGVC methods and approaches are extremely fitting for biological problems, specially those where taxonomy impose a set of mutually exclusive subcategories [15].
Additionally, FGVC and image annotation are deeply connected. This is due to the fact that most FGVC datasets and approaches work with different types of annotations and related metadata in order to extract as much information as possible from the images, which can help improve the performance of such difficult classification tasks. Some alter- natives have been developed in order to eliminate the use of annotations or, alternatively, visual code-words, another popular approach applied to FGVC. An example of this is found in [210], in which the authors used a large number of random image templates instead in order to classify the unseen test samples. However, most of the state-of-the- art FGVC methods continue to use annotations as part of their framework due to their flexibility and the large amount of information they can provide [15, 58, 78]. For ex- ample CUB-200-2011, created by [199], is a dataset for birds with parts and attributes and Leeds Butterflies, created by [200], includes segmentations and text descriptions of butterflies.
Moreover, a methodology that has been successfully introduced in FGVC problems is the human-in-the-loop (HITL) approach [26]. Since FGVC problems are difficult for both human and computers, HITL methods aim to be an intermediate solution which
combines the strengths of both and to progressively minimise the amount of human labour [24]. HITL methodology can be easily applied to many different problems, such as criminology [140], port design [29] and aviation [171]. However, it is particularly suitable for FGVC problems. For example, [26] developed a HITL method for bird classification and [151] used HITL technology for skin-lesion image recognition.
In summary, automatic image annotation is a very broad topic whose research has been expanding and developing greatly during recent years. It has been regarded as a method- ology whose purpose is to bridge the semantic gap often associated with content-based image classification. Moreover, it has obtained excellent results in many classifications problems [15, 112, 167]. In our case, and given the semantic similarities between the classes that we aim to categorise, a FGVC image-annotation approach seems the most appropriate option to apply.