• No se han encontrado resultados

Goodfellow et al. [95] describes an autoencoder as “The quintessential example of a representation learning algorithm.” Autoencoders, (also known as autoassociative NNs or replicator NNs) are a type of FNN designed to perform an identity mapping of the input data. In other words, ANNs copy the input data to the output data. The ANN is composed of two parts, the encoderfunction mapping the input data into a representation internal to the NN, or code and

the decoder function mapping from the code space back to the original. Let ∈ x represent

the code representation internal to the NN, the function , is the mapping from

the natural data space ( ∈ x ) to the code space ( ∈ x ), and the function

, is the mapping back to the natural space from the coded internal

representation [95].

ANNs with the capacity to learn an exact identity mapping without error are not very useful as there would be no modification to data already available for analysis. Autoencoders are designed with constraints restricting their capability to approximate the identity function. By limiting their capacity, the autoencoder is forced to prioritize which aspects of the data set to learn to reproduce, in doing so, also learning properties of the data of interest to analysts. The efficacy of autoencoders as a data analysis tool are dependent on the constraints placed limiting their capability to exactly reproduce the data [95], [133].

The undercomplete autoencoder neural network (UANN) restricts the networks capacity

to learn an exact identity mapping by imposing a restriction on the dimensionality of the code

layer in the NN as shown in Figure 14. The bottleneck layer has fewer neurons than the

input/output data, consequently the internal code representation of the data ∈ x has a

smaller dimension than the input data, . The internal dimensional reduction of the

bottleneck layer forces the NN recreate higher dimensional output data from the compressed representation. In order for the ANN to minimize the resultant error during training, it must learn the most salient features of the data to recreate the input data at the output layer.

Figure 14. Undercomplete Autoencoder Nerual Network adapted from [103]

The NN in essence compresses redundant information in the bottleneck layer, only retaining the patterns in the data useful for differentiation of non-redundant information [52], [95], [133]. As the capacity of the autoencoder is limited by the bottleneck layer, so too is the capability of the network to retain information within the data used for differentiation of non-

redundant information. Consequently, the ANN will reproduce common data with less error than rare (anomalous) data [56].

Autoencoders have seen wide adoption in data reduction tasks as well as information retrieval tasks. In addition, autoencoders have also shown promise as a data preprocessing tool in classification tasks such as computer image recognition [134]. Japkowicz et al. [52] was the first to utilize ANN for the task of anomaly detection. Their research utilized supervised data sets for anomaly detection. In the Japkowicz et al. approach, a semi-supervised data subset composed entirely of non-anomalous observations was used to train the autoencoder. Once trained to the normal data, the autoencoder was evaluated against the anomalous observations and compared to other common anomaly detection methods. Compared to the traditional anomaly detection methods, the autoencoder method performed better than or equal the other methods examined in three different domain areas [52].

Hawkins et al. first used ANN to detect anomalies in an unsupervised dataset. The datasets contained class labels, however during NN training, these labels were omitted from consideration. The labels were only used to evaluate the performance of the NNs after training. In the Hawkins et al. work, the datasets studied were first split into two distinct subsets, a

training and testing set of data. The training set being used for parameter updates, and the test set for evaluating performance of the NN. The NN objective function used in Hawkins et al. was the mean square error defined as

where is the number of features, is the number of observations, , is the true value of the

th observation for the th feature, and , is the ANN predicted value.

In addition to using unsupervised datasets, the Hawkins et al. work introduced the

concept of using a score based method, known as the outlier factor (OF), for identifying outliers

in datasets analyzed with autoencoders. The outlier factor for each observation in the dataset is the average reconstruction error over all features defined as

OF 1 , , (22)

where is the number of features, , is the true value of the th observation for the th feature,

and , is the ANN predicted value. The OF is evaluated for all observations in the dataset using

a trained autoencoder, with higher OF values considered to be more likely anomalous data [56]. The Hawkins et al. work, is the earliest found example of a purely unsupervised anomaly detection method using ANNs. The autoencoder methods established in Hawkins et al. were tested against traditional anomaly detection methods in Williams et al. using four datasets, one of which was in the domain of network intrusion detection. In the Williams et al. work, the ANN methods performed comparably to the traditionally anomaly methods tested and in the case of network intrusion detection, the ANN method’s performance surpassed the alternate traditional anomaly detection methods [87].

Documento similar