eling the properties of handshape combinations as well as the properties of handshape variations within the HSBN relates to the fact that the start/end handshapes annotated for signs in the lexicon dataset are assumed to reflect the ground-truth hand configurations articulated within a given video sequence. Preparing handshape annotations is inherently subjective due to the difficulties involved in associating a particular label from among a finite set of handshape classes to hand configurations that are observed as start/end hand images in the input signing video. Since hand configurations observed in signs often do not exactly match one of the predetermined set of handshapes, the annotators had to make a forced choice (the apparent difference in handshapes in some signs may therefore be greater than the actual difference in the hand configurations). Hands in many cases are only partially visible due to both self-occlusions and occlusions produced by the other hand. Differences in handshape annotations can also arise from differences in the start and end frames selected by annotators for multiple productions of a sign. All these factors are additional sources of differences/variations in the sets of handshape labels for a given sign-variant that are employed for training the HSBN model. Therefore, a prior over the model parameters is incorporated during the learning of the HSBN in order to improve the robustness of the estimated parameters.
4.4 Summary
The lexicon dataset was prepared with a goal of facilitating the development of a query- by-sign lookup system for an ASL dictionary. The lexicon dataset is unique in that it includes extensive annotations painstakingly prepared by linguists for several attributes of signs, with a specific focus on the properties of hand articulations. The annotations that are available for productions of signs contained in the dataset include the start/end video frames, the start/end handshapes, as well as morphological and articulatory classifications of signs. With the goals of distinguishing between variations in articulation that occur in
60
Handshape annotations for the productions of four signs in the ASLLVD
An example production of each sign along with its start/end handshape annotations
Sign Signer Dominant hand START handshape Non-dominant hand START handshape Dominant hand END handshape Non-dominant hand END handshape Li flat-O flat-O 5 5 Ty flat-O flat-O crvd-5 crvd-5 Na O O crvd-sprd-B crvd-sprd-B Br flat-O flat-O 5 5 BLOSSOM Dominant hand (start) Non-dom hand (start) Dominant hand (end) Non-dom hand (end) BLOSSOM sign start sign end Sign Signer Dominant hand START handshape Non-dominant hand START handshape Dominant hand END handshape Non-dominant hand END handshape Li 5 5 A A Ty 5 crvd-flat-B S S Na crvd-5 crvd-5 S S Br 5 5 A A La 5 crvd-5 S S Da 5 5 S S APPOINTMENT Dominant hand (start) Non-dom hand (start) Dominant hand (end) Non-dom hand (end) APPOINTMENT sign start sign end Sign Signer Dominant hand START handshape Non-dominant hand START handshape Dominant hand END handshape Non-dominant hand END handshape Li crvd-B B-L 10 B-L Li crvd-5 B-L A B-L Ty bent-B-L B-L 10 B-L Na 5 B-L 10 B-L Na crvd-5 B-L 10 B-L Br 5 B-L A B-L COLLECT Dominant hand (start) Non-dom hand (start) Dominant hand (end) Non-dom hand (end) COLLECT sign start sign end Sign Signer Dominant hand START handshape Non-dominant hand START handshape Dominant hand END handshape Non-dominant hand END handshape Li crvd-5 crvd-5 crvd-5 crvd-5 Li 5-C-L 5-C-L crvd-5 crvd-5 Na crvd-5 crvd-5 bent-B-L bent-B-L Br crvd-5 crvd-5 crvd-5 crvd-5 BREAK-DOWN Dominant hand (start) Non-dom hand (start) Dominant hand (end) Non-dom hand (end) BREAK-DOWN sign start sign end
Figure 4·4: Examples of handshape variation attested in the ASLLVD corpus. The focus here is on patterns of handshape variation that are produced as a result of general language processes. These are handshape variations that are not tightly linked to a specific item in the vocabulary. The start/end handshape labels on the dominant and non-dominant hands annotated by linguists are shown in the left column for examples of selected signs. An example for each sign (dashed outline) is depicted in the right column.
general across the language and those variations that are, for the most part, particular to certain specific items in the vocabulary, the productions of distinct signs have been annotated with a unique gloss (these are text labels in English). Multiple productions of signs, in many instances from different signers, are available for a large fraction of signs in the dataset vocabulary. In total, the lexicon dataset includes 9, 776 productions of 3, 457 distinct signs.
We envision that the lexicon dataset can serve as a valuable resource for developing data-driven approaches for learning the properties of articulation as well as the patterns of articulatory variation observed in signs. In this research we will utilize the lexicon dataset specifically for the purposes of learning and empirical evaluation of the HSBN formulation for the task of handshape inference in monomorphemic lexical signs.
Chapter 5
HandShapes Bayesian Network (HSBN)
In this chapter, we aim to formulate probabilistic models to represent the properties of start/end handshape combinations in monomorphemic lexical signs. The models are de- veloped with an eye towards facilitating start/end handshape inference given video input of a sign. As summarized in the preceding chapter on ASL linguistics, the three main articulatory classes of monomorphemic lexical signs are:
(a) ‘two-handed : same handshapes’: the handshapes articulated on the two hands are
the same (or, are very similar),
(b) ‘two-handed : different handshapes’: the two-hands exhibit dissimilar handshapes in
either or both the start and end points of the sign. The non-dominant hand takes a small subset of possible handshapes and also does not exhibit a change in handshape between the start and end positions, and
(c) ‘one-handed’: only the dominant hand is involved in the articulation.
We will propose a HandShape Bayesian Network (HSBN) model for each of these three articulatory classes. An HSBN is a probabilistic generative model that represents the likely combinations of start/end handshapes in monomorphemic lexical signs. We start by formu- lating the HSBN for the class of one-handed signs, and then extend this model to obtain the HSBNs for two-handed signs. The mathematical notation used in the HSBN formulation is
summarized in Table5.1.
Notation Description
Is;D, Ie;D Images of handshapes for the dominant hand observed in the input video at the start and end of the sign
Is;N, Ie;N Images of handshapes for the non-dominant hand observed in the input video in two-handed signs
X Inventory of handshape labels, which contains 85 handshape distinctions in our implementation
Xs;D, Xe;D, Xs;N, Xe;N Handshape labels from the set X for the observed start/end handshape images is;D,ie;D,is;N,ie;N
Zs, Ze Variables depicting hidden (unobserved) start/end states Z= (Zs,Ze) State-space associated with the hidden variables Zs, Ze,
which are estimated during HSBN learning. Table 5.1: Notations used in the HSBN formulation.
Figure 5·1: The HSBNdominant graphical model for handshape inference in one-handed signs.
5.1 HSBN for one-handed signs
For one-handed signs, the dominant hand alone participates in the articulation. Thus, our model for one-handed signs considers only the start and end handshapes of the signer’s
dominant hand. The corresponding HSBNdominant model is depicted in Figure 5·1. The
model comprises three layers of random variables. The lowest layer represents handshape images observed for the dominant hand at the start and end positions of the sign. The
64
images of the dominant hand are denoted using the random variables Is;D, Ie;D. The middle
layer in the model includes the random variables, Xs;D, Xe;D, to depict handshape labels for
the start/end handshape images. The inventory of handshapes, X , in our implementation contains 85 labels. The top layer of the HSBN model accounts for the hidden variables. The
labels for observed handshapes Xs;D, Xe;Din the HSBN are obtained as different realizations
of certain hidden states, Zs, Ze. Hidden variables are included in the HSBN to model the
phenomena of handshape variation produced as a result of general phonological processes. The phenomena of sign-independent phonological variation are described in more detail
in Chapters2 and 4.
The HSBN is formulated for the handshape classification task wherein labels from a pre- defined set of handshapes, X , are desired as outputs of the handshape inference algorithm. A convenient modeling choice for the HSBN is to employ a collection of discrete states to represent hidden variables. Probability distributions that involve the hidden variables,
Zs, Ze, reduce to multinomial distributions, a property that enables relatively efficient al-
gorithms for HSBN learning and handshape inference. Handshapes in signs are produced as a result of the hands adopting configurations in a continuous parameter space and therefore robustness to gradience in handshape configurations is essential in algorithms for hand- shape inference. In the proposed HSBN implementation, a degree of robustness to small differences in articulation is incorporated into the observation likelihood function by using an algorithm for non-rigid handshape image alignment. An alternate modeling choice for the hidden variables that utilizes a continuous domain representation (such as a Gaussian mixture model) requires a significantly larger training set size in order to accommodate the wide range of hand orientations attested in signs. Furthermore, several handshapes are either indistinguishable or are very similar in many of their 2D projections. We set aside the investigation of a continuous domain representation for hidden variables as a topic for future work.
Given the assumed representation for hidden variables in the HSBN, the probability distributions in the model and their associated parameters are defined as follows. The
Notation Description
πzs or π[zs] The prior distribution P (Zs = zs), for the hidden state at the start of a sign
azs, ze or a[zs, ze] Transition probabilities P (Ze = ze| Zs = zs) for start/end hidden states
bs
zs(xs) or bs[zs, xs], be
ze(xe) or be[ze, xe]
The probabilities for observed handshape labels to be obtained as different realizations of hidden states: P(Xs= xs| Zs= zs), P (Xe= xe| Ze= ze)
λ The parameters {π, a, bs,be} for the HSBN model Table 5.2: Parameters for the HSBN formulation.
probability distribution over the start latent states are denoted as: πzs = P (Zs = zs).
The start/end transitions in the model are represented as: azs,ze = P (Ze = ze| Zs = zs).
The probability distributions for observed handshape configurations to be produced as
different realizations of hidden states are given by bszs(xs;D) = P (Xs;D= xs;D| Zs= zs)
and beze(xe;D) = P (Xe;D= xe;D| Ze= ze). These parameters taken together are denoted
as λ and are summarized in Table 5.2.
The likelihoods of producing the observed start/end handshape appearances in in- put video given their corresponding handshape configuration labels are depicted as:
P(Is;D= is;D| Xs;D= xs;D) and P (Ie;D= ie;D| Xe;D= xe;D). The expressions for these dis-
tributions are derived in a subsequent section on handshape inference.