5 Diseño de los subsistemas del satélite
5.2 Subsistema de estructura
5.2.1 Selección de material
There are many challenges that arise in Computer Vision, and the classification of images is one of the core challenges. Computer Vision researchers have been using CNNs increasingly as they become a more viable approach for image classification. Unfortunately, an excessive number of training samples is needed when it comes to training CNNs, and this task is costly and time consuming. In addition, we can often not find sufficient training data for a specific class target; thus, it is best to reuse the knowledge that has been obtained from a previous model trained on a large dataset through the transfer learning approach.
Throughout our thesis work, new representations of the images were generated as we conducted a transfer learning of a deep CNN model. We started with a CNN trained on a huge dataset [9]. The weights of this model were transferred to our model. In our approach, we freeze these weights and feed training images to our model in order to generate new presentations of these inputs called features. Our training dataset consisted of logo images cropped from an original logo dataset publicly released [27], [28].
As we transferred the knowledge from a deep CNN model trained on a huge dataset, time to build the model from start to finish has been avoided and the expense of data labeling has been reduced. The features obtained from our scarce training data were within a high dimensional space (e.g., 4096, if extracted from the sixth convolutional layer (CONV6)). These features were the inputs to thek-NN algorithm in order to accomplish a classification of out-of-sample images. Through our methodology, we sought the most appropriate configuration to best handles the scarcity of labeled training data. We experimented with multiple parameters, including the distance metric, the layer from which the features were extracted, the number of nearest neighbors k to vote, and the available number of training samples, to achieve the greatest results in classification.
In order to investigate the best layer of the CNN to extract features from, we started with observing the dispersion and discrimination of the features extracted from the various layers of the CNN by projecting them onto a 2-D plane. The t-SNE [26] technique was chosen to visualize the high dimensional data, and the results show that the features in the first layers appeared to be less discriminative and more spread out. The features showed less dispersion towards the remaining last layers. Then, we focused our experiments on the last two layers, FC6 and FC7.
To determine the best value for the hyperparameterk, the number of nearest neigh- bors, and the best metric to be considered in high dimensional space, we ran our experiments for different values ofk(1, 3 and 5), and at each iteration we considered two different metrics (Euclidean distance and cosine similarity). The accuracy of our classification was evaluated for each setting by collecting the number of True Positives (TPs), False Positives (FPs), and False Negatives (FNs) as the output thek-NN classifier. Since we were interested mainly in scarce data scenarios, we simulated small training datasets by choosing a specific number
s of training instances from each class. This hyperparameter swas also varied in order to study its impact on the accuracy of the classification task.
Our final results show that the best output layer at which to extract the CNN activations is the FC6 layer. These results can be explained by the fact that the activations extracted from the layer FC6 are not general like those from the early layers. Furthermore, they are not specific to the target task, unlike FC7 or any classifier used in other research.
As for the metric, we compared the Euclidean distance to the cosine similarity in determining similar representations of images in high dimensional spaces. The cosine similarity was able to give better results in combination with thek-NN algorithm we used in our research, and it is more likely to be used in such a space dimension.
In regards to the effect of the size of the training set, we found that we can reach good results by using thek-NN solution applied to the activation codes extracted from the FC6 layer for small datasets. For instance, we achieved 90% accuracy for only 17 training samples per class (19 classes were considered for our research), and the results improved as we increased the number of the training samples (in our case we variedsfrom 1 to 20).
Our work supports the mounting evidence that the activations extracted from CNNs are very efficient. Unlike previous researches we methodically evaluate the impact of the number of training samples on classificaiton performance while applying a nonparametric method, thek-NN classifier. Furthermore, our research helps building a detector for objects similar to logos with few training samples for a specific level of accuracy. For example, with 17 training samples we expect a 90% classification accuracy.
Practically, we encourage using the solution that has been proposed in this thesis to implement sophisticated classification systems that overcome the shortage of training images and support tactical decision making. Our work has determined the configuration that best tolerates the lack of data while still being able to offer a desired level of classification accuracy, lower costs, and less computational time.
For future work, we suggest further research and investigation of the fine-tuning as a transfer learning scenario, instead of freezing the transferred layers, and applying our proposed parameters including number of nearest neighborsk, the metric, and the number of training samples. The results from this can be compared with the results we have obtained.