Once the text lines are extracted, we have to segment words on the same text line. Word segmentation in handwritten document is a difficult task because inter-word-spacing is
least square fit robust fit
64
sometimes wider than the intra-word-spacing (Fig. 4.7). Thus, it is not always possible to segment the document at the word level perfectly using geometrical information only.
Figure 4.7 Example of a handwritten line where the space between characters of the same word is wider than the space between two neighbouring words.
Many different approaches to segmenting words are proposed so far. We may categorize word segmentation algorithms to either top-down, bottom-up or hybrid ones. We experimented with well-known algorithms from each category, and we concluded that the scale-space algorithm proposed by Manmatha and Rothfeder [MR05] gives the best results for our collection of unconstrained handwritten documents. We carry out the word segmentation task by an enhanced version of the scale-space algorithm. We obtain the scale-space using derivatives of fast anisotropic Gaussian filters implemented in the Fourier domain. Therefore, our approach to word segmentation is based on the same theory that we introduced for the extraction of lines. There are only two minor differences here. First, we do not need to steer the Gaussians at different orientations because words within a skew corrected line are reasonably straight, and moreover the aspect ratio of a word (ratio between its width to its height) is much less than that of a text line. Second, we have to use two Gaussian filtering operations in order to compute the Laplacian of Gaussian (LoG) operator. This is explained in more details in the following.
65
The scale-space is computed by convolving the image with a kernel that is the sum unmixed second partial derivates of a Gaussian (in the x and y directions) [MR05]:
)
Figure 4.8 Output of line and word segmentation algorithms for a handwritten French document.
This operator is called Laplacian of Gaussian (LoG) filtering. It can be shown that the LoG operator can be approximated by the difference of two standard Gaussian filtering:
2 )
This equation actually subtracts a wide Gaussian from a narrow Gaussian in order to approximate the second partial derivative.
66
Figure 4.9 Output of line and word segmentation algorithms for a handwritten English document.
The output of the word segmentation algorithm for a handwritten French document, as applied to each line of text separately, is shown in Figure 4.8, where the word hypotheses are represented with different colors. The text lines are extracted by the FFS filtering that we described in the previous section. Another sample output is shown in Figure 4.9 for a handwritten English document from the IAM database.
67
Chapter 5
Character Segmentation
5.1 Introduction
As we mentioned earlier, the main goal of this research is to develop a methodology for spotting arbitrary keywords. Therefore, we cannot rely on holistic word recognition approaches, because it is not possible to compile a large enough training database for all possible keywords. Consequently, our main approach is to use non-holistic (analytical) recognition methods, and so for general keyword detection, we need to either implicitly or explicitly divide each word into its constituent letters. This task is done by a character segmentation algorithm.
Most of the conventional character segmentation methods in the literature are based on the analysis of projection profiles or candidate segmentation points, where in either case the 2D information in the image is not taken advantage of effectively [RMKI09, HAI07].
The segmentation paths generated are usually obtained without taking into account the constraints on character shapes and neighboring characters. One fundamental assumption in these algorithms is that characters are separable by vertical lines (after slant correction). This assumption is correct for machine-printed and simple cursive text, but not for complicated styles of handwriting. In general, where there is considerable amount of overlapping between neighboring characters, they are not separable by straight lines.
Samples of handwritten words with high overlapping are given in Fig. 5.1. In such cases,
68
application of a typical character segmentation algorithm would result in some damaged characters (i.e. some characters with missing parts and some characters with parts from neighboring characters).
We have developed a new character segmentation algorithm based on background skeletal graphs. Our proposed character segmentation algorithm is based on 2D data structures that correspond to arbitrary regions of the image, where any arbitrary character shapes can be circumscribed by a region or a sequence of regions. Consequently, the algorithm is capable of finding the perfect boundaries of a character no matter how much overlapping it may have with neighboring characters. Aside from the character segmentation, the character merging algorithm (which will be discussed later in this chapter) is benefited from the 2D representation. Incorporation of the context knowledge about characters to the merging algorithm is intuitive when we use data structures that correspond to characters or sub-characters.
Figure 5.1 Samples of handwritten words with a lot of overlapping between characters.
Any character segmentation algorithm, be it implicit or explicit, needs more than only geometrical information in the word image in order to segment it perfectly. In other words, it is not always possible to perfectly segment a word image into its constituent
69
characters without knowing the corresponding transcription. The reason is that a word image may represent more than one transcription. Therefore, we have to segment the input word in all possible ways and then resolve the ambiguity using the context, which is a lexicon in the simplest form. In order to generate all valid segmentation hypotheses, we developed a new merging algorithm which is based on graph partitioning.
In the rest of this chapter, firstly we present the terminology and detailed description of the character segmentation algorithm and next the character merging algorithm. We will give illustrative examples as well as pseudo code for each algorithm.