CAPÍTULO 3: EXPERIENCIAS DEL EXILIO
III.2 THE BOAT PEOPLE
The last step to represent the problem in matrix form is to set the total cost results in a convenient matrix structure. This matrix is created by putting together the set of matrix blocks such that the whole matrix is block-diagonal. The amount of blocks that are generated to build the matrix depends on the number of letters of the current word being analysed. The size and content of all these blocks depends on the dimensions of the SW defined during previous steps in the following way:
First Block
The first non-squared block has only one row, connecting the source node to the first letter; and it has as many columns as the total number of pixels in the first letter’s SW.
Each of the elements of this cell is referred to the distance cost of connecting the source node to all the possible template placements over the first letter’s SW. Although the source node is an auxiliary concept that is used to build the graph, it is associated to the first point of the word’s contour line. Then, the first row of the matrix contains the distance between this point and the right reference point of the first letter.
82 Chapter 5: ALGORITHM KEY ELEMENTS
Middle Blocks
The middle blocks are squared cell structures. There is one less squared-block than the total number of letters in the word. Each one of these squared-blocks represents the connection between consecutive letters. Therefore, there is one block to connect the 1st and 2nd letters, another to connect the 2nd and the 3rd… The last squared-block connects the penultimate letter with the last one; what means that there is no squared-block to connect the last letter in the word to anywhere. It can be seen then, that each block connects an origin letter to a destination letter. Each one of these blocks’ rows refer to the pixels of the origin letter’s SW; and the blocks’ columns refer to the pixels of the destination letter’s SW. Given the fact that the size of the SW is the same for all letters in a word, these intermediate blocks have squared dimensions. To generate these blocks, the previous cost matrices have to be reshaped.
a) Template matching cost reshape
The template matching results for each letter are stored in a matrix of the size of the associated SW. To generate the global cost matrices, these NCC results stored in matrixes are reshaped into a column vector per each letter.
Figure 57: Graphical clarification of the NCC template matching results matrix reshaping
To match the required dimensions for the matrix-blocks this data has to be rearranged. It has been remarked that the results from the template matching stage would actually belong to the nodes but it is transferred to the links in-between nodes. Take the set of edges that represent the situation of a particular placement of a letter’s template towards all possible placements of the next letter’s template. All these edges must contain the weight of the value of the first template placement and each one must add to this value the cost distance that depends on the destination node of the edge, meaning the positioning of the next template. In other words, the NCC contribution to the total cost of all edges leaving a node must be exactly the same while the distance contribution may be different per each edge depending on the arrival node.
The fact that the template matching cost between a template placement for one letter and all possible placement of the following letters is the same one has to be reflected in the matrix. This is done by replicating the column vector as many times as the size of the search window.
b) Distance cost reshape
On the other hand, the distance cost evaluates the magnitude associated to the origin’s template placed at each pixel of origin SW and the destination template placed at each pixel of destination SW; reason why it is represented by an squared-matrix.
The description of how to arrive to this matrix shape and values is covered in previous section. Given the fact that the output of the distance assessment is already a square matrix of the desired dimensions, there is no required posterior reshaping. Output matrix of that previous step can be directly evaluated to obtain its associated cost.
Section 5.3. Graph-based word segmentation 83
Figure 58: Graphical clarification of the inter-letter distances results matrix reshaping
Last Block
The last non-squared block has only one column, connecting the last letter to the sink node; and it has as many rows as the total number of pixels in the last letter’s SW.
Each of the elements of this cell is referred to the template matching cost of the last letter, i.e., the evaluated assessment of placing the template of the last letter in all possible pixels of its SW. It also contains the associated distance of connecting the “end” point of each possibility to the sink node. In this case, it is not necessary to replicate the vector containing this information because all edges are connected to a single node.
Whole cost matrix
The last step is the addition into a single matrix of the evaluated and reshaped results of Template Matching and inter-letters distance. The total cost is then computed as a linear combination of the two magnitudes. Different options are possible to generate this total cost, depending on the weight given to each one of the components. For this reason, during the algorithm development, this has been set as a free parameter to be tuned.
It must be highlighted that at this point, there are multiple design decision that have been taken empirically, by comparing the quality of the results that are obtained in each case. Although this is explained in this work in a sequential way, to develop the program we have worked in an iterative way so that the possible design options of different stages can be combined. In this mode it can be guaranteed that the sampling of possibilities is enlarged, what makes the greedy solution more robust.
5.3.4 Shortest path
After the cost matrix is completely set up, the segmentation problem can be solved. Thanks to the graph-based formulation that we have introduced with this work we can solve the problem by means of a shortest-path algorithm. The shortest-path solution selects, for each letter, the warped template translation (i.e. warped template position) that leads to global maximum assessment. All the costs included in the matrix act as penalties for the word segmentation into letters. Thus, the graph-segmentation optimizes the similarity between the templates and the word image and the spatial coherence imposed by consecutive letters.
The shortest path has to be found between the source node and the sink node, so it is never a choice to select other initial or ending nodes. This is set in this way to force the algorithm to include all the letters of the word in the solution. Thanks to the matrix block-diagonal structure it is necessary to include all the letters in the solution to connect the source and the sink nodes.
84 Chapter 5: ALGORITHM KEY ELEMENTS
To solve this problem we use the Dijkstra algorithm. This method explores the possible paths from the source node towards the sink node by always giving priority to the least cost path.
The shortest path solution contains the source and sink nodes and one middle-node per each one of letters in the word. This solution is returned using the global numeration of all the nodes in the total cost matrix. To decode the solution, each one of the middle nodes is assigned back to its corresponding SW pixel, from the corresponding letter; and the numbering of this pixel is also moved back to the original coordinates of the SW. The relative position between the SW and the original source image is used to convert these local coordinates into global coordinates.
In summary, the algorithm selects the set of pixels of the source image where the corresponding warped templates must be placed for the best letter recognition solution as a result of the joint optimization of template matching and spatial consistency.
5.4 C
ONCLUSIONThere are three main elements that represent the core of our method. For each one, we have presented its exact formulation and also why are they relevant four our purpose. Besides, we have reviewed the challenges that come up with its implementations followed by the solutions that we have encountered to face it.
In particular, in this chapter we have described the template warping phase with Lucas Kanade optimization algorithm. The inclusion of this phase in the method has been one of the main issues through all the development. The reason for this is that this algorithm has been proven to be sensitive to the initialization and also to the specific implementation considerations, such as image pre-processing. As a direct consequence of the weak points of this first phase, we find another key element of the method: the distance transforms and image processing. The reason to include this feature in this chapter is its huge impact on template matching phases, especially, in LK method; other than its intrinsic complexity. As we have observed in this chapter, the results of the LK phase are strongly linked to the characteristics of the images that we use. Thus it has been crucial to analyse these two phases together to optimize the algorithm performance,
Finally, the third element key is the graph-based segmentation, which includes the fine alignment of the warped templates and the spatial constraints between consecutive letters. The main challenge of this stage is to build the graph in such a way that it accurately represents the whole problem and that it constraints the solution for an optimal segmentation.
Section 6.1. Validation criteria 85
6 RESULTS VALIDATION
The proposed method has been evaluated on the reference handwriting data and later on the samples of children’s handwriting. We have analysed the results in a quantitative way to validate the method and its various phases.
6.1 V
ALIDATION CRITERIAThe presented work is a methodology for handwritten words segmentation into letters. The result, therefore, is a graphical representation of the start and end points of all the letters in the word. However, to validate this method it is convenient to assess this result numerically and quantify the accuracy that is achieved.
Multiple data sets were provided to us to develop this work. This is an advantage for the validation phase, because the algorithm can be tested on multiple problems and under very different characteristics: different content of the words, different handwriting regularity of quality....
Along this work, this big amount of data has been split in three main groups:
a) Algorithm-Design Data: This data is compounded by the reference handwritten words; i.e., the words written by C. Gosse. This data has been used in the first steps of the algorithm construction, being the most basic and ideal cases where the complexity level is reduced.
b) Algorithm-Tuning Data: Among all the data files that belong to the children’s handwriting samples, approximately the 40% has been used to tune the preliminary implementation of the method. In these second set of data, variability in handwriting increases significantly. This fact, together with the decrease in correctness and handwriting skills, adds complexity to the problem. Under these new circumstances, the method can be tuned to be more robust and cover more difficult segmentation cases. c) Algorithm-Validation Data: Finally, the rest of data that has not been used until now, to
test the built algorithm in new cases.
An important remark that we must do at this point is that, from all the available data, those samples that have orthographical errors or were considered to be of anomaly bad quality, have been discarded. A further version of the program could implement new functions to face these cases. This is discussed in Chapter 7.
6.1.1 Quantification methodology
We just mentioned that it is necessary to define a way to assess the obtained results so that the algorithm can be validated and also to compare the functioning under different working conditions. Thus, numerical quantification of the accuracy is desired.
For this, we proceed in the following way: We take all the words that we will use to test the algorithm and manually determine the solution that we expect. This means that we segment the word into letters by hand according to the reference points that we have defined for the letters and trying to be as accurate as possible. Of course, the obtained solution is a good approximation of the optimal one.
86 Chapter 6: RESULTS VALIDATION
In parallel, the images are loaded into the program and it is executed to obtain the digital solution for the same word images. As presented before, the solution is provided as vertical straight lines on the links between letters that indicate the right and left point of each letter. With the manual and digital solutions, it is possible for us to quantify the method accuracy. For the method solution to be considered as a valid segmentation of a particular letter, its right and left points must be close to the manually determined ones. More specifically, the manual frontier between letters must be contained within the interval delimited by two consecutive digital points, one being the end of one letter and the other, the beginning of the following one. Hence, we associate to each letter a binary variable that is worth 1 if both, right and left manual segmentation points belong the link interval; and 0¸otherwise.
6.2 R
EFERENCE DATAThe reference data (handwritten words by C. Gosse) has been initially used to design the program; and once the program has been adjusted with new data; the initial data set has been reloaded to test the new version performance on the reference data. The expectations are that the program must still work properly for the reference image sets; even the results may have improved from the initial ones.
The main objective of this section is to analyse the results that the designed program can provide to validate that the implemented solution reaches the required function.
Along the algorithm development, multiple decisions have been taken with respect to the program implementation. Most of them are intrinsic to the different steps digital implementation and have been validated in the previous chapter. However, the use of the quadratic version of Distance Map is still to be validated on complete examples. For this reason, its validation is also included in this chapter.
A part from the global validation and the analysis of the Distance Map variant to be used, this section aims to prove that the different steps included in the method are bringing some improvement. This means that in this section we discuss if the template modification implemented through the Lucas Kanade algorithm in a preliminary template matching has any positive effect on the solutions or it is non-essential. And, finally, we also discuss the other element that represents the core this work: the cost graph representation to solve the segmentation in a shortest-path approach. To validate this, the results that are obtained under the complete method are compared to the results that are obtained if this graph is not constructed; this is, if only template matching steps are executed.
To sum up this, we determine that the main features being tested in the section are: the inclusion of the Lucas Kanade template modification, the graph approach for problem solving and the use of the quadratic version of the distance map.
We point out that the main configuration, which has been designed through all this work, is the one where the quadratic variant of the image distance map of the images is used; the one that includes LK algorithm for templates warping; and the one in which the final graph representation of the problem takes into account the cost contribution from template matching and the cost contribution from the relative position between consecutive letters.
Section 6.2. Reference data 87
For each comparative approach, one of the main factors has been changed, one at a time, to isolate its effect. Therefore there are 3 approaches for comparison with the reference one. In the first one, the Lucas Kanade preliminary template matching is skipped. This means that the raw templates are used for the NCC template matching directly.
In the second approach, the relative position between letters has been ignored; this means that only the results from template matching are taken into account for word segmentation. This is equivalent to select the best matching for each letter locally, not considering the whole word accuracy (and in consequence, not performing graph-based segmentation). This is done by not including to the total cost matrix the contribution of the distance between letters.
The third approach is done by using the standard version of the DT in which the pixels are simply labelled with the Euclidean value of the distance to the closest contour pixel and saturated at 255. This is applied in both, the word images and the templates.
Approach /
Features Reference (A) Omitted LK (B)
Null Distance
Cost (C) Linear DM (D)
LK Included Omitted Included Included
Cost NCC + distance NCC + distance NCC NCC + distance
DM Quadratic Quadratic Quadratic Linear
Success
ratio 73.67% 71.51% 59.54% 55.19%
Table 6: Scheme of the 4 algorithm configurations that we compare for the reference data-set
The same words have been loaded to the program under the four different configurations and the segmentation result is obtained. Following the quantification methodology described above, the number of good letter-matches for each word is counted.