III. Artistas como geógrafos y sus estrategias artísticas
3. Un intento de ‘traducción’ – Desmaterializar la frontera 1 Antoni Muntadas
Given the learning and modeling categories (Section 2.2), the main steps followed by a typ-
ical machine learning tool-chain are outlined and discussed in the following subheadings: data generation and preprocessing, feature extraction and model training and inference.
2.3.1 Data pre-processing
A first step in many learning activity is based on knowledge of the domain of data because fea- tures to an application might be disturbances to other applications. In a data-driven framework, the features underlying many processes can best be the data that they are derived from assuming the process is devoid of external disturbances. However, the authors in (Smola and Vishawanathan (2008)) notes that new challenges and problems are made easier by adaptation of techniques when they possess similar data types and/or exhibit similar relations among variables to a previously tackled problem. However, some form of pre-processing are also important for removing unwanted artifact like noise from data. The quality of datasets from applications have to be examined closely. Certain qualities such as accuracy, completeness, consistency, interpretability and trust in the source have to be verified. Based on the required qualities, tasks can be examined under the fol- lowing categories: cleaning to tackle incompleteness, inconsistencies in units, values, and ensuring robustness to high frequency contents. Other processes that are included in preprocessing are data integration, transformation, reduction and discretization. Improvements in image enhancement tasks are specifically important as they serve as useful preprocessing tools to aid further analysis. The authors of (Vincent et al. (2008)) showed the concept of denoising auto-encoders for automat-
ing noise removal from images while (Jain and Seung (2008)) applied convolutional neural networks for the same purpose on natural images. Enhancement procedures such as inpainting (Xie et al. (2012)) and deblurring (Schuler et al. (2014)) that have been reported in neural network frame- works are some other techniques that applies learning to pre-process. An adaptive multi-column architecture was implemented by Agostinelli et al. (2013) to robustly denoise images with varieties of synthetically added noise types. Stacked denoising auto-encoder was used (Burger et al. (2012)) to reconstruct clean images from noisy images by exploiting the encoding layer of the multilayer perceptron (MLP). Apart from being a preprocessing step for learning algorithms, this step has tremendous engineering benefits for industrial and military applications. For instance, robotics vision, drones imaging, military airplanes imaging sensors can be aided by adequately preprocessed data. A recent study conducted by the present author with his team (Akintayo et al. (2015)) has grouped the sources of low-light “corruption” in sensed images into three major categories. They are based on the effects of: the environment described as degraded visual environment that arises from unfavorable weather conditions such as fog, snow, etc., and high dynamic range of the imaging scene; the sensor effects due to size, weight and power cost that limit the extent of improvements to sensor devices; the image processing and display effects of discretization, image de-mosaicing (deriving the channels for colored images), and smoothing effects in greyscale images. A sparse de- noising auto-encoder framework, called low light network, LLNet (Lore et al. (2017)) was explored to enhance images taken with poor sensors and/or in degraded environmental conditions due to its reviewed benefits and track record. Also, another framework, ReProcCS (Guo et al. (2014)) which does effectively background and foreground separation was used to detect objects of interest in a dark scenario. All the above pre-processing goals however, the basic techniques for most algorithm are standardization, normalization and in some cases, pre-whitening.
2.3.2 Feature extraction
An important aspect of learning is the discovery of salient features that clearly characterizes the goal to be achieved - feature extraction. Due to its importance, several machine learning and image
processing programs have dedicated toolkits and libraries such as Python language’s scikit-learn that are dedicated solely for feature extraction. Features – also called parameters, features vectors, code vectors, descriptors – are usually unique attributes that describe a process. For instance, the health condition of an individual has different effects, the features could be thought of as the underlying symptoms. An important property of these features is adequacy. In that perspective, the features must just exactly represent the underlying process by being robust to all other high frequency contents. It is the satisfaction of these properties that authors (Bengio (2009)) called good representation. In some neural network algorithms, for instance, deep belief networks, (Bengio et al. (2007)) Restricted Boltzmann Machines (Hinton and Salakhutdinov (2006)) and autoencoders (Vincent et al. (2008)), feature extraction step is called a pre-training stage. Feature extraction also falls into the category of data mining and knowledge discovery. It is an attempt to unravel the “unknown” feature from data with machine learning algorithms. However, the word “unknown” here is related to the algorithm’s discovery, automatic feature construction since the features are usually known, a priori to the domain experts. Feature construction is then followed by efficient search through a reduced set of features and then computing a criterion for assessing which features are the most descriptive of the process. Therefore, the features are a lower dimensional abstraction of the dataset. In this manner, data is projected to a lower dimensions manifold (Vincent et al. (2008)), thus relating feature extraction to dimensionality reduction. The primitive algorithms are the Fisher’s linear discriminant and nearest neighbor algorithms. The discussion was developed to the “state-of-the-art” feature extraction methods exemplified by support vector machines, multi- layer perceptrons and ensemble methods. Thus, features can be imagined as classifier’s prior knowledge of an underlying application that aids it to perform the desired task.
2.3.3 Model training & inference
The fidelity of the extracted features from the model learning stage are evaluated in the training stage on different tasks. It is basically a feedback stage where the performance, in TPE model (Mitchell (2006)), of applying the learned features are evaluated and the error are fed back into the
learning algorithm until a threshold in error is achieved. For example, neural networks are typically trained using the so-called “backpropagation” algorithms that use the error feedback to improve the models. Previously, training of layer-wise learned network in this manner had proven difficult until (Bengio et al. (2007)) introduced techniques for achieving it in a “greedy fashion”. The training process typically includes a regularization function as in the (LeCun et al. (1998a)) without which the error profile would not generally be monotonically decreasing and the model becomes prone to over-fitting. During optimizing the model parameters, the momentum-based parameters for example, Nesterov-momentum (Sutskever et al. (2013)) are used to adaptively modify the step sizes in optimization algorithm to reduce the tendency of getting stuck in local minima. Machine learning algorithms other than neural networks have different training schemes. Usually, this is the aspect that differentiates the various machine learning algorithms available. For instance, (Erdogan (2010)) notes that least square regression method and a fisher linear discriminant (FLD) only differ by the presence of a regularization factor in the former’s optimization, training or loss function. The trained model is finally used to perform similar task on a new test data which was either held out of the training set also “out-of-sample” or an entirely new dataset. This final step is called the inference step as statistical decisions are made with the trained models.