Diagrama de casos de uso del sistema - ESPECIFICACION DE LOS REQUISITOS DE SOFTWARE

CAPÍTULO 2: CARACTERÍSTICAS DEL SISTEMA

2.7 ESPECIFICACION DE LOS REQUISITOS DE SOFTWARE

2.7.6 Diagrama de casos de uso del sistema

The proposed method has been evaluated on three multi-script datasets and one English-only dataset for different tasks, in one hand for text segmentation on the KAIST and MSRRC datasets, and on the other for text localization in the MSRA-TD500 and ICDAR2003 datasets (see Chapter3for datasets details).

Text Segmentation

Table4.1show the obtained results on the KAIST and MSRRC datasets.

Sample qualitative results are shown in Figure 4.6, where it can be appreciated the ability of our method to robustly extract regions in

s c e n e t e x t e x t r ac t i o n b a s e d o n p e rc e p t ua l o r g a n i z at i o n 39

difficult cases like curved and multi-colored text.

KAIST MSRRC

p r f p r f

Lee et al.[71] 0.69 0.60 0.64 - -

-OTCYMIST [69] 0.52 0.61 0.56 0.50 0.29 0.37

Sethi et al.[68] - - - 0.33 0.72 0.45

Yin et al.[146,68] - - - 0.71 0.67 0.69 This Chapter 0.67 0.78 0.71 0.64 0.58 0.61

Table 4.1: Scene text segmentation re-sults in KAIST and MSRRC datasets.

The method presented in this Chapter participated as a competing entry in the 2013 Multi-script Robust Reading Competition (MSRRC) reaching the second place. While, as can be appreciated in Table4.1, the winner of the competition [146] show better numbers in both precision and recall metrics, our proposal outperformed the other two entries with a noticeable margin. Consistently, our method also outperforms Lee et al. [71] and OTCYMIST [69] methods in KAIST dataset.

Text Detection

Since the perceptually meaningful text groups detected by our method rarely correspond directly to the semantic level ground truth infor-mation is defined in (words in the case of ICDAR, and lines in the case of the MSRA-TD500 dataset), the proposed method is extended with a simple post-processing step in order to obtain text line level bounding boxes.

We consider a group of regions as a valid text line if the mean of the y-centers of its constituent regions lies in an interval of 40%

around the y-center of their bounding box and the variation coeffi-cient of their distribution is lower than 0.2. Notice that in MSRA-TD500, as we are considering text lines at any possible orientation, the orientation of the group (and consequently the definition of the

Figure 4.6: Qualitative segmentation re-sults on sample images of the KAIST (top) and MSRCC (bottom) datasets.

y-axis) is always defined in relation to the axes of the circumscribed rectangle of minimum area for the given group. If the collinearity test fails, it may be the case that the group comprises more than one text line. Thus in such a case we perform a histogram projection analysis in order to identify the text lines orientation, and then split the initial group into possible lines by clustering regions on the iden-tified direction. This process is iteratively repeated until all regions have either been assigned to a valid text line or rejected, using the collinearity test described above, or until no more partitions can be found.

Table 4.2 show a comparison of the obtained results with other state of the art methods on the MSRA-TD500 and ICDAR2003 datasets.

Sample qualitative results are shown in Figure4.7

MSRA-TD500 ICDAR2003

P R f P R f

Chen et al. [14] 0.05 0.05 0.05 0.60 0.60 0.58 Epshtein et al. [27] 0.25 0.25 0.25 0.73 0.60 0.66 Li et al.[74] 0.30 0.32 0.31 0.45 0.80 0.57 TD-ICDAR [144] 0.53 0.52 0.54 0.68 0.66 0.66 TD-Mixture [144] 0.63 0.63 0.60 0.69 0.66 0.67 This Chapter 0.58 0.54 0.56 0.71 0.57 0.64

Table 4.2: Scene text localization results (precision, recall, and f-score) in MSRA-TD500 and ICDAR2003 datasets.

While our method is outperformed by TD-Mixture [144] in both text localization datasets, the most important outcome from the com-parison in Table 4.2is that our results are consistent with the claim of being script and orientation independent. Therefore, we see that the method presented in this Chapter outperforms other methods in MSRA-TD500 that are designed with only horizontal text in mind, despite some of them perform better that us in ICDAR2003.

By analyzing the errors of our method in both localization and segmentation tasks we have found that in many cases we fail to de-tect small texts. Something that can be explained by the fact that the meaningfulness measure does not find those small clusters perceptu-ally meaningful. On the other hand, we also found that some times the late fusion of the different similarity modalities through the evi-dence accumulation algorithm, while helps in finding consensus and

thus in increasing precision by removing duplicate detections, also Figure 4.7: Qualitative localization re-sults on sample images of the MSRA-TD500 dataset.

s c e n e t e x t e x t r ac t i o n b a s e d o n p e rc e p t ua l o r g a n i z at i o n 41

removes groups that are detected as meaningful only in one of the similarity cues. This effectively reduces the potential recall rate of the method. To mitigate this weaknesses in next Chapter we explore the idea of an early fusion of the similarity modalities, while we also complement the meaningfulness test presented in this Chapter with an efficient discriminative classifier.

4.3 Conclusion

In this chapter we have proposed a new methodology for scene text extraction inspired by the human perception of textual content, largely based on perceptual organization. The proposed method re-quires practically no training as the perceptual organization based analysis is parameter free. It is totally independent of the language and script in which text appears, it can deal efficiently with any type of font and text size, while it makes no assumptions about the ori-entation of the text. Experimental results demonstrate competitive performance when compared with state of the art.

Chapter 5 Optimal design and efficient analysis of similarity

In document Sistema de Gestion de reportes de Mantenimiento UCI. (página 37-68)