• No se han encontrado resultados

"By focusing on the facts to be maintained in a data base, we obtain a methodology for data analysis and design which is at once simpler and more powerful than other methodologies. " (Kent, 1 983 p.3)

Introduction

As the need for business users to participate in the construction of effective and useful data models has become recognised, one area of research that has developed is the exploration of the possibilities offered by natural language to provide this communication bridge (Metais et al., 1 993; Sowa, 1984; Sowa, 1 99 1 ; Steinberg et al., 1 994; Tjoa & Berger, 1 993; Vadera & Meziane, 1 994; Way, 1 99 1 ). However, Object-Role Modelling (ORM), particularly as used within NIAM and as supported by the CASE tool, InfoModeler™, I already provides a natural language interface between the user and the modeller. ORM also fulfils Tsichritzis and Lochovsky's ( 1 982) mapping requirement, in that the conceptual model it produces can be transformed by a straightforward algorithm (Nijssen & Halpin, 1 989) into a normalised relational schema. NIAM is widely used in Europe and Australia and is increasingly recognised as one of the major data modelling approaches (Creasy, 1989; Kim & March, 1 995 ; Laender & Flynn, 1992; Song & Forbes,

1 99 1 ; Weber & Zhang, 1 99 1 ) .

Object-Role Modelling

Originating in the early 1 970's, ORM views the world as made up of objects playing roles (Halpin, 1 995) and "traditionally expresses all information in terms of elementary facts, constraints and derivation rules" (Halpin, 1 993b p . l ) . There have been several

I InfoModelerTM is the registered trademark of Asymetrix Corporation. All references in this study are to

methodologies developed for the creation of an ORM, of which NIAM (Natural Language Information Analysis Method) is the best known. "The fundamental approach of building a design by starting with specific examples and thereafter following a well-defined procedure" (Nijssen & Halpin, 1989 p.3 1 ), was initially developed by Nijssen during his work at Control Data in The Netherlands. Nijssen and Halpin ( 1 989 p.3 1 ) attribute the initial proposal "to base conceptual schema concepts on elementary natural language sentences" to Falkenberg ( 1 976) whose approach was in turn influenced by the work of the linguist Fillmore ( 1 968). NIAM has subsequently been independently developed by both Nijssen ( 1 994) and Halpin (1995).

Hitchman ( 1 995) reports that Nijssen has demonstrated that ORM and the E-R Model are capable of expressing a similar level of meaning, a view supported by Laender and Flynn (1 994). Bronts et al ( 1 995) too, conclude, "there is little difference in the way of modelling of E-R and NIAM" although in their terms the "way of working" is significantly different2(p.232). Apart from anecdotal evidence (e.g. Halpin & Orlowska, 1992), however, there is nothing to show whether the fact-oriented approach is more or less effective. Recent comparative studies of E-R and NIAM are inconclusive (Laender &

Flynn, 1 994; Kim & March, 1995; Shoval & Even-Chaime, 1 987) although Weber and Zhang ( 199 1 ) conclude that the constructs provided by NIAM are more powerful than those provided by the E-R Model. Halpin ( 1 995) suggests that ORM and E-R modelling may have a complementary use although his suggestion is limited to using E-R diagrams as a convenient means of summarising complex ORM models. However, even this limited suggestion is unusual with most advocates of ORM maintaining the method' s superiority over the E-R Model with almost religious fervour. It i s not the intention of this study to enter this debate.

NIAM-CSDP

The complete NIAM method (termed the NIAM-ISDM) is made up of three stages, conceptual schema design (the NIAM-CSDP), conceptual schema transformations and relational implementation. A summary of the seven steps that make up the CSDP is

shown in Figure 6 (Halpin, 1995 p.43). In essence, the steps provide a clear and well­ defined procedure for building the conceptual schema, by capturing information requirements as natural language sentences, termed 'facts' , extracting sentence patterns or 'fact types' from these sentences and using 'real' examples to validate the facts and assist in identifying the required constraints. On completion of the CSDP, the resulting schema can be adjusted in the conceptual schema transformation stage although any generated alternatives must be equivalent in meaning to the original. This schema is then transformed into a normalised, relational schema by a straightforward published algorithm (Nijssen & Halpin, 1989).

1. Transform familiar information examples into elementary facts, and apply quality checks. 2. Draw the fact types and apply a population check.

3. Check for entity types that should be combined and note any arithmetic derivations.

4. Add uniqueness constraints for each fact type and check arity of fact types. S. Add mandatory role constraints and check for logical derivations.

6. Add value, set comparison and subtyping constraints.

7. Add other constraints and perform final checks.

Figure 6. The 7 steps of the NIAM-CSDP (Halpin, 1995)

While some aspects of the process can be tedious and time-consuming when undertaken manually, the use of a CASE tool such as InfoModeler™ not only speeds up but also significantly simplifies the later stages of the process. With InfoModeler™, once the 'facts' have been added to the model, together with their constraints and example values, the tool is able to determine and create the diagrammatic representation of the sentence, including the basic constraints of uniqueness, cardinality and optionality. It is also able to validate the model for syntactic correctness, transform the ORM model into a database schema in "optimal normal form3" (Nijssen & Halpin, 1 989, p.254) and generate appropriate SQL commands for a variety of DBMSs. It is, therefore, theoretically possible to create a physical schema from an entered set of natural language sentences, without further intervention, although in most cases some refinement is desirable.

3 Optimal Normal Form or ONF is described as basically equivalent to 5th Normal Form. although the "number of SNF tables in the overall schema has been minimized" for efficiency. (Nijssen and Halpin. 1 989 p 254).

Details of all the phases of the NIAM-ISDM can be found in Nijssen and Halpin ( 1 989), while Halpin ( 1 995) has produced an extended version of NIAM, termed FORM (Formal Object-Role Modelling) which provides the underlying paradigm for the InfoModeler™ CASE tool. This study is mainly interested in the first step of the CSDP process, i.e. the collection of information examples and their transformation into elementary facts and fact types. This step, identified by Halpin and Orlowska ( 1 992) as not only the most important but also "the foundation of NIAM's design procedure" (p97), is procedurally equivalent to the identification of entities and relationships in the E-R approach.

Natural Language

NIAM theory begins from the axiom that all information communicated can be expressed as a set of elementary declarative natural language sentences from which general patterns or 'fact types', as shown in Figure 7, can be extracted (Nijssen, 1 994). NIAM has its foundation in "linguistic theory and applies set theoretical concepts to induce formal information grammars from (these) sets of natural language sentences" (Schouten, 1 993 p. t ). Two further axioms state that all communication with the user is held in the user' s language exclusively and that all communication with the user is illustrated with practical examples. This is justified by the argument that users are most comfortable describing their enterprise in their own natural language and that with a representative set of example sentences they are able to assess the validity of the set, allowing even complex constrai�ts to be determined (Nijssen, 1 994). As Biller and Neuhold ( 1 978) remark, "it is very important to rely on the users understanding of natural languages, since only in this fashion can the connection between a data base and the reality about which statements are to be represented, be established" (p. l 1 ).

Fl Department (number) employs staff member (id)

Department with number "57" employs staff member with id " 1 122" Department with number "57" employs staff member with id "2233" Department with number "59" employs staff member with id "3344" F2 Department (number) has name (value)

Department with number "57" has the name "Information Systems" Department with number "59" has the name ''Computer Science"

NIAM also provides a graphical notation for representing the objects, roles and constraints identified by the elementary sentences and a simple example is provided at Figure 8 . However, with the exception of some of the more complex constraints, which are more easily shown on a diagram, the natural language 'elementary facts' and their diagrammatic representation are equivalent in the sense that the graphical notation can be automatically transformed into the 'elementary fact types' or vice versa. This equivalence, sometimes termed 'semantic equivalence' is perhaps better described by the expression 'data equivalence' utilised by Biller and Neuhold ( 1 978) who provide an informal definition of it in asserting that, "two data bases are equivalent if they represent equivalent facts about a certain slice of reality"(p.12). However, they are clear that in using this term they are assuming "that it is known, whether two sets of natural language sentences are equivalent. Again we presuppose that the natural language is commonly understood" (ibid. p. l 2) .

works for employs has

1234 57 57 loCo Systems

1 122 57 59 Computer Science

1 1 12 59

Figure 8. A simple NIAM diagram

InfoModeler™ capitalises on this perceived 'data equivalence' by providing a 'Verbalizer' report which displays the graphical representation of each 'elementary fact type' and its natural language equivalent together with the natural language examples which have been entered. This report, illustrated in Figure l O on page 96, provides a version of the conceptual schema considered suitable for user verification . .

The 'Elementary Fact' Concept

The most fundamental concept in NIAM is that of the 'elementary fact' , derived from familiar, concrete examples within the UoD. An 'elementary fact' is defined as an assertion that an object plays a role or that one or more objects participate in a relationship (Nijssen & Halpin, 1989). In other words, it is an assertion about the Uo.D. The choice of the term 'fact' is not incidental but indicates that the system is to treat the assertion as

being true of the UoD whether or not this is actually the case in the 'real' world (Halpin, 1 993a). However, as Halpin (1993b) concedes it is difficult to "define the notion (of an elementary 'fact') precisely"(p.2) although the definition given above is a useful working definition. Halpin also concedes that "expressing information as elementary facts is not always easy" (ibid. p.3) but nevertheless feels that the benefits more than justify wrestling with any difficulties. He expresses as follows: -

• "By dealing with information in simple units we stand a better chance of getting a correct picture of the application being modeled;

• Constraints are easier to express and check (e.g. all functional dependencies should appear as uniqueness constraints and because facts types are shorter the number of possible constraint patterns in each one is reduced);

• The conceptual schema is easier to modify, since fact types can be added or deleted one at a time, rather than modifying compound fact types4; • The same conceptual schema can be used to map to different data models

(if we group fact types together into compound fact types on the conceptual schema, different groupings may actually be required in some target data models)" (ibid. p.3).

Halpin ( 1 995) makes no mention here of the benefits of interacting with the users in their own natural language, benefits which are seen as self-evident by the NIAM community.

S ome forms of ORM insist that all 'elementary facts' are binary, however, NIAM allows elementary facts to be of any 'arity' , i .e. unary, binary, ternary or higher, although binary facts are certainly the most common. Halpin ( l 993b) has demonstrated that all NIAM elementary facts of whatever arity can be expressed in binary fonn but usually at the expense of natural expressiveness. From the statement 'Hedgehogs hibernate ' for example, NIAM would allow expression of the unary fact:

( 1 ) 'The animal with type 'hedgehog', hibernates '

which, as a binary fact, would need to be expressed in a form such as:

(2) 'The animal with type 'hedgehog' has HibernationStatus of 'H'.

Both these sentences qualify within NIAM as legitimate 'elementary facts' but as a major concern of NIAM is to "bridge the semantic gap between the informal user world and the formal modelling world" (Yunker, 1 993 p. 1 5), Halpin ( 1 995) prefers to retain

the first, more natural, form. Indeed, InfoModeler™, utilising the optimal normal form transformation algorithm, will map both sentences into the same relational structure (Appendix 1 ).

Determining whether or not a 'fact' is elementary, can be problematic. In general tenns, a 'fact' is elementary if it cannot be broken into two or more 'facts' without losing information. For example, Hedgehogs live in England, rendered as: -

(3) The animal of type 'hedgehog' lives in the country with name 'England', is a legitimate 'binary fact' . It cannot be expressed as two simpler sentences without

losing the infonnation that hedgehogs live in England. However, the sentence Hedgehogs

live in England and hibernate, rendered as: -

(4) The animal of type 'hedgehog' lives in the country with name 'England' and hibernates,

can clearly be broken down into the two 'elementary facts' ( 1 ) and (3

i

. However, the distinction is not always so obvious. It is possible to find some linguistic heuristics to assist the modeller in judging when a 'fact' is elementary. The use of the conjunction 'and' , for example, can often provide a clue that two 'elementary facts' are within the one sentence. However, this guideline is by no means foolproof as sentences (5) and (6)

illustrate. Sentence (5) has no 'and' but is clearly intended to convey the same semantics as sentence (4). Sentence (6) on the other hand is a legitimate quartenary 'elementary fact' , which despite the use of the conjunction cannot be further decomposed without information loss.

(5) The hibernating animal of type 'hedgehog' lives in the country with the name 'England'.

(6) The student with the student identifier '9500001 2' enrolled in the Paper with the code '57.366' and obtained a Grade of 'B' in the Year '1 997'.

In these circumstances, modellers are required to determine the 'elementarity' of a 'fact', by reference to the known constraints provided by the concrete examples (Halpin, 1 993b) although the final arbiter in unclear cases must always be the domain expert (Collingnon

& van der Weide, 1994).

5 Of course this does not follow if hedgehogs that live in England hibernate but those resident elsewhere,

The Construction of Elementary Facts

There are three clear stages in the construction of 'elementary facts' (van der Lek et al.,

1 992). The first is the collection of concrete examples from the VoD, which serve to illustrate the relevant information. Within NIAM, it is recognised as being the users' responsibility to provide these examples, which are usually taken from both the input and output documents of the system and from interviews with the users themselves. It is critically important to the entire CSDP that the set of examples is sufficiently rich to describe all possible facts about the VoD (Calway & Sykes, 1 995). The assumption underlying this, that the domain experts can and should provide a complete set of significant examples (Collingnon & van der Weide, 1 994) has been criticised by some as being very limiting (Darke & Shanks, 1 995c). Certainly the only guidelines that are given to meet a situation where suitable examples are not available, e.g. for a new system, are rather unsatisfactory. In this situation the analyst is advised to "begin by getting the user to write down some examples, and then work from these" (Nijssen & Halpin, 1989 p.35). As Darke and Shanks ( 1 995c) also point out, there are other aspects of requirements elicitation and definition that are not addressed adequately in this stage. These are the social context in which the activity is taking place, and the resolution of potential conflict that can arise over either the problem definition or alternative viewpoints of the information requirements. These criticisms will be discussed further in the next chapter.

The second stage of the process is the verbalisation, or expression in natural language sentences, of the examples. This verbalisation is comprised of a set of sentences describing all the objects in the UoD and the roles that they play. For example, sentences derived from an example listing of employee details could include verbalisations such as,

A. Adams has the employee number '71 5' and works in the Sales Department. His office is in room 2.23 and his phone number is 4206 .. . C. Smith has the

employee number '71 6' and works in the Finance Department. His office is room 3.21 and his phone number is '4242' ...

This verbalisation is merely an intermediate step to provide a natural language basis from which the 'elementary facts', such as

The Employee with the number '71 5' has the name 'A Adams'

The Employee with the number '71 5' works in the 'Sales Department'

can be derived. In practice, most experienced modellers will often move directly from the examples to 'qualified facts' from which the fact types can be derived, in much the same

way as an experienced relational modeller will often instinctively create entities in third normal form. 'Qualified elementary facts' have a formal structure depending on their 'arity' . Every 'qualified fact' must have a minimum of one object, reference mode, label and predicate as this defmes a 'unary fact' , i.e.

, <object>, <reference mode>, <label> ,<predicate>

For 'binary facts' the first three elements are repeated after the predicate, thus,

<object>, <reference mode>, <label> ,<predicate> <object>, <reference mode>, <label>. The objects are the things of interest, the reference mode is the property of the object which allows one to identify which instance of the object is being referred to, the label is the actual value and the predicate is the role that the object(s) are participating in. Some of the 'qualified facts' from the previous verbalisation could be expressed thus: -

The EMPLOYEE with employee # 715 ' has the NAME (with the value) of 'A dams A ' The EMPLOYEE with employee # 716' has the NAME (with the value) of 'Smith C' The EMPLOYEE with employee # 715 ' works for

the DEPARTMENT with the name 'Sales '

The EMPLOYEE with employee # 716 ' works for

the DEPARTMENT with the name 'Marketing '

Here the objects are shown in upper case letters, the reference modes in bold, the labels are italicised and the predicates are underlined6. These facts are often written in abbreviated fonn with the superfluous words, such as 'with', 'the' and 'of , omitted and the reference mode shown in parentheses e.g.

EMPLOYEE (employee #) 715 ' has NAME (value) 'Adarns A '

EMPLOYEE (employee #) '715' works for DEPARTMENT (name) 'Sales'

NIAM insists that each 'qualified fact' must include at least one object, which must participate in at least one role7. Additionally the reference mode and label of each entity type objectS must also be expressed. Thus the hibernating hedgehog of Sentence ( 1 )

provides an example of the minimum expression allowable i n NIAM, i n the form,

Documento similar