El desafío de la formación del profesorado

BLOQUE 1 DE RESULTADOS

5. El desafío de la formación del profesorado

The techniques we use as a baseline are accurate, but could however been

improved to obtain an even higher accuracy. In [8], Benenson et al. give an

overview of the evolution of pedestrian detection algorithms. They demonstrate the combination of multiple accuracy improvements on top of the already

accurate SqrtChnftrs detector [7] to obtain higher accuracy. The techniques

they use are Local Decorrelation Features [67], motion information [74] and

a model to exploit the relation between two pedestrians [72]. As they

demonstrate, these approaches are beneficial and complementary to other pedestrian detection algorithms, and thus potential techniques to also improve

our own implementations. In [94], Yan et al. propose an extension of the

DPM detector to take resolution differences of detections into account, reaching

high accuracy. In future work, we could investigate the improvements these techniques could offer on top of the techniques we discussed in this dissertation. Important, however, is to keep an eye on at what computational cost these would come.

For Channel based detectors, we already have a very fast implementation at our disposal with the ACF detector. Since it already uses a very optimised implementation to calculate the channels, it may even be not worth to convert this to GPU. The part-based DPM implementation however, allows large room

for improvement. In our implementation of section4.7we did not fully exploit

the possibilities of the GPU hardware. This implementation will benefit from exploiting localisation by implementing the full pipeline in one kernel, such that each thread calculates the final feature values of a small section of the image. By avoiding the transfer to the global memory, which is currently the main bottleneck, a large speed-up could be obtained. The use of vector quantization

on the HOG features, which is used in [80,81], allows a large speed improvement

calculation time of the HOG features, while the vector quatization technique optimizes the model evaluation process.

In chapter 3we proposed a technique to improve the detection accuracy by

using multiple "less performing" algorithms as a combination. Especially when using this technique, it is important that each detector can be evaluated as fast as possible. The combination approach however is currently not fully explored and further research on this topic could lead to further accuracy improvements. Further improvements could possibly be obtained by the use of more dynamic values for Confidence and Complementarity, other combination methods, combinations of models trained on different datasets to avoid a training set bias, ... Based on our rule of thumb to use detectors that are very complementary, it could be beneficial to focus the training of detectors on this feature e.g. train a detector that especially focusses on finding pedestrians that are missed by the others, instead of finding all pedestrians.

The techniques we described in chapter5 to reduce the search space based on

scene knowledge, require currently a lot of manual input to configure. Although effective, these approaches would benefit from a more automatic system to determine a good setting.

In our tracking-by-detection framework of section5.4.2, we use two parameters.

The first is the location and size of Initialisation Regions, used to initialise tracks, while the second is the lifetime to overcome sequences where no matching detection can be found. These parameters are currently determined manually based on intuition and visual inspection of the data. However, both of these parameters could be determined automatically, based on an evaluation of the the pedestrian detector on full scale on a training dataset, and tracking each of the found pedestrians. The Initialisation Regions are determined as the locations where new pedestrians appear (such as the borders of the image), and regions of the image where pedestrian detection does not perform well, which could lead to lost tracks. The lifetime should be chosen such that tracks are allowed to recover from a sequence of frames where no matching detection can be found. Evidently a balance should be made between the number of regions (and their size) and the processing speed. Covering most of the image for full-scale detection, will induce a large accuracy at the cost of small speed improvement. Note however that these parameters are not independent, e.g. lowering the lifetime can induce more lost tracks such that more Initialisation Regions may be necessary, which makes finding the best parameter selection a challenging task. We suggest determining these parameters iteratively.

The ground constraint we use is currently determined as a first order function, which forms the relation between the y-position in the image and the height in pixels of the pedestrians. Although we used the ground truth for this task, this

FUTURE WORK 147

can equally easily be obtained from detections. Important is that outliers should be pruned, and that the precision of the detector should be sufficiently high (avoiding false detections for this calibration) while still a lot of samples should be used. Note that during evaluation of the detector, the ground constraint could be updated based on the detections of the recent frames.

The UAV case we discussed in chapter7performs well for the application we

had in mind, where we assumed certain constraints (single pedestrian, limited speed of movement, we could stay at eye-level of the pedestrian reducing the influence of viewpoint-induced distortions,...). Applying this in a more challenging context, such as following sports such as snowboarding, is however never tested and will probably require other settings to allow faster movement.

In document Evaluación del proyecto de aulas cooperativas multitarea del Centro de Formación Padre Piquer. Una perspectiva émica (página 165-170)