Aplicación de las 5´s - Identificación de la propuesta

6. CAPITULO VI PROPUESTA DE MEJORA

6.2. Identificación de la propuesta

6.3.5. Aplicación de las 5´s

This section discussed the results presented in Chapter 5 and tested the suitability of the highest performing detector for real-time application. Regarding the detection performance of the examined detector, the metrics used to evaluate CNN-based detectors indicate a very good model (high AP, low miss rate). The employment of a simple architecture, trained end to end with a limited amount of data manages to produce results that are comparable with state of the art detectors for this specific task (vehicle detection). The discussion of run-time performance indicated that the produced vehicle detector, while not as fast as other state of the art object detectors (YOLO, SSD) still manages to perform well despite a lack of significant optimisations. The gap in speed between the proposed detector and other models is due to the structure of the Faster R-CNN architecture and the way detections are

174 generated. Any discussion about performance should acknowledge the limitations of each architecture and that there is a trade-off between accuracy and speed. The potential for improvements is there though, to achieve an improved detection speed using the proposed model. Finally, the simple distance calculation highlighted the inherent issues with estimating distance using a monocular camera. The lack of any error correction results in significantly inferior results compared to the more accurate camera system.

175

7 Conclusions

7.1 Introduction

Vehicle collisions are one of the most significant problems in transport, as they are one of the leading causes of deaths and injuries around the world. Collisions are attributed on environmental, vehicle and human factors, with human errors (either recognition, decision or performance errors) being the dominant causation factor of accidents.

To mitigate the effect of human errors in accidents, the automotive industry is moving towards removing the human element from driving. Research from both industry and academic institutions is heavily invested in bringing every necessary component (hardware, software, methods) together to reach that stage where a human driver is not required to handle any driving-related task. Either by revolution (going from 0 to full autonomy via high-tech solutions) or evolution (slowly improving and automating driving functions until full autonomy is achieved), soon human driving behaviour will no longer be a liability and a threat to safety.

A safe trip with an Autonomous Vehicle is ensured by the presence of an effective CAS, which ensures all potential threats ahead of the vehicle are correctly identified and every danger avoided. Usually, AVs use multiple sensors to scan the surrounding environment, collect information and detect targets; and that means that the hardware cost and system complexity is high. Data from multiple sensors (active and passive) need to be collected, processed and fused together to confidently produce an accurate detection along with its classification as pedestrian, vehicle or other object.

This research focuses on the task of vehicle detection and attempts to produce an accurate vehicle detector based on the data coming from a single low-cost monocular camera. Current literature regarding the subject of object detection using vision systems favours two approaches. The first one follows traditional image processing principles where specific visual cues identifying potential targets are sought for in an image and then a classifier trained with relevant data is used to verify each object’s class. The second approach uses a specific type of Neural Network modified to

176 process image data, the CNN, to unify the detection pipeline (ROI generation and classification) in a single process. While this approach was introduced some years ago (LeCun et al., 1989), the high computational requirements and need for large amounts of data meant that only the last few years has it been made a viable option for object detection, with a rise in computing performance and deep learning making it possible.

The data used for this research were collected using Loughborough University’s instrumented vehicle. Both training and testing data were manually annotated with ground truth labels while radar data were used as ground truth for the camera’s range measurements. The relevant literature on object detection does not usually follow this approach. Particularly for CNN-based detection, readily available datasets are preferred over processing raw data; in addition to using transfer learning (pre-trained CNN networks) as opposed to training a new network from scratch. The reason for this is the effort required to collect and manually annotate large amounts of data and also possibly, a concern that the amount of data collected will not be sufficient to efficiently train a network end to end.

This PhD can serve as a guide to developing object detectors using end to end training. It demonstrates that, for specific applications, complex structures or the reliance on out of the box solutions are not a requirement. An alternative and efficient solution to vehicle detection, especially when there are limitations in the data, is proposed here.

Additionally, the developed detector can be used as a base for more complex tasks such as TTC calculation for a CAS. The simple calculation performed here gives only a rough estimate of range, but it can be improved by accounting for range error and indicates that a complete CAS is possible using a camera sensor.

177

In document Análisis y propuesta de mejora a través del uso de herramientas del Lean Manufacturing para la optimización de la gestión del proceso productivo en una empresa de comida rápida (página 141-151)