5.7 Descripción de la propuesta
5.7.5 Lineamiento para evaluar la propuesta
A predicate (P) is a main component in a CFG rule. The variable in an augmented rule ⟨R, P, A⟩ is constrained to occur at least once in a personal name dictionary P [3]. However, a token of a personal name can be absent from the dictionary. Hence, in our modules, it may be returned NIL if a token does not match any names in the dictionary.
We now introduce part of predicate (P) or our personal name dictionary. Our personal name dictionary is a collection of titles, nicknames, alternative names, given names and family names. The sources and methods we used to construct the dictionary are described in the subsection below.
Given Names and Family Names
Given names and family names are derived from two sources: The US census [9] and The YAGO knowledge base [40]. The US census website provides lists of given names and family names ("Census1990" and "Census2000"). The data from the US census is very clean and the given names and family names are distinguished. The second source we use to generate given names and family names is the YAGO knowledge base. YAGO provides full personal names; it does not separate between a given name and a family name. We use rules R1-R5, R7 and R14-R15 given in Table 4.5 to extract given names and family names. As a result, we extract 19,295 given names and 182,661 family names. Table 4.6 shows a part of given names and family names from the dictionary listed in alphabetical order.
Table 4.6 An excerpt from given name and family name dictionary. Dictionary List of names
Given name Aaliyah, Aaron, Aaryn, Aasif, Abaham, ... Family name Aabergh, Aaby, Aadland, Aafedt, Aagaard, ...
80 Personal Name Transformation With Context Free Grammar
Alternative Names
Alternative names are derived from the YAGO knowledge base. The alternative name is a name, combined with letters and numbers , a name that consists of only one word (excludiry prefix and suffix) or a name that contains more than four sequences of words (excludiry a prefix and suffix). We provide a link between an alternative name and its real name. We use rules R1, R8 and R14-R15 given in Table 4.5 to extract alternative names. Therefore, we extracted 10,122 alternative names from YAGO. Figure 4.4 shows an example of an alternative name dictionary listed in alphabetical order.
Fig. 4.4 Example of alternative names in a personal name dictionary.
Nicknames
A nickname is an informal name that is used to refer to a person. We always find personal nicknames in a web document. For example, Bill Clinton or Bill Gates are used for the personal names: William Jefferson Clinton and William Henry Gates.
Table 4.7 represents common nicknames for people in Anglo-Saxon countries; the coun- tries which use English language as the official language.
4.3 Personal Name Transformation Modules (PNTM) 81
Table 4.7 Example of traditional English nicknames Male Names Female Names Names Nicknames Names Nicknames Aaron Erin, Iron, Ron, Ronnie Amanda Manda, Mandy
Benjamin Ben, Bennie, Benjy, Jamie Barbara Bab, Babs, Barby, Bobbie David Dave, Davey, Day Dorothy Dolly, Dot, Dortha, Dotty Edward Ed, Ned, Ted, Teddy, Eddie Emily Emmy, Millie, Emma, Em
This thesis collected the standard nicknames from the three websites below:
1. http://www.tngenweb.org/franklin/frannick.htm [29] has a lot of nicknames from tra- ditional English names.
2. http://www.censusdiggins.com/nicknames.htm [12] provides a list of the most com- mon nicknames from A-Z.
3. https://github.com/carltonnorthern/nickname-and-diminutive-names-lookup [58] pro- vides a nicknames look-up system. The system was created by Old Dominion Uni- versity - Web Science and Digital Libraries Research Group. They provide a CSV file that contains US given names and their associated nicknames.
Given a set of nicknames NN then n ∈ NN is one nickname. Let GN be a set of given names. Given name g ∈ GN may be an instance of one or more nicknames n, written as GN(n). For example, GN(Abbie) = {Abigail, Abner, Absalom}. Accordingly, we extract 1,454 surface forms of nickname. A nickname surface form is a nickname dictionary that contains two components: a nickname and a set of its reference given names. Figure 4.5 shows an example of a nickname dictionary listed in alphabetical order.
82 Personal Name Transformation With Context Free Grammar
Fig. 4.5 Example of nicknames in a personal name dictionary.
Prefix and Suffix
The personal name titles (prefix and suffix) are derived from Grace Y.W. Tse [67] who stud- ied "The grammatical behaviour of personal names in present-day English". Table 4.8 shows an example of our prefixes and suffixes in our dictionary listed in alphabetical order.
Table 4.8 Example of prefix and suffix
Prefix Suffix
Baroness, Capt, Cdr, Chief, Col, Count, C.P.D., Dame, Det Chief lnsp, Det Insp, Dr, Earl, Emperor
A.B, B.A., B.S., II, III, Jr., Sr.
Personal Name Pattern
We use function preg_match in PHP to map a personal name structure over a personal name standard pattern. For example, assign the following PHP function "CheckPattern($name)" to a personal name which has three sequences of words arranged from a given name, a mid- dle name and a family name (e.g. George Walker Bush). Each word has at least two letters and may include symbols ( " . ", " - " and " ’ ") or numbers in the word (e.g. O’Neill).
4.3 Personal Name Transformation Modules (PNTM) 83