• No se han encontrado resultados

II. MARCO TEÓRICO

2. Calidad Educativa

2.3. Modelos de calidad

'" How old did you say you were? '

Alice made a short calculation, and said 'Seven years and six months. '

'Wrong! ' Humpty Dumpty exclaimed triumphantly. 'You never said a word like it! ' .

'/ thought you meant, "How old are you? " , Alice explained.

'If I'd meant that, I'd have said it, ' said Humpty Dumpty.

Alice didn 't want to begin another argument so she said nothing. " (Lewis Carroll, 1871) Introduction

The INTECoM approach requires a means of translating the datalogical design model into an infological form suitable for verification by users. The technique, tenned NaLER, Natural Language for E-R/R, described in this chapter has been developed primarily to meet this need. However, the ability to understand the information content of E-R/R models, i.e. to 'read' them accurately, has a much wider application. It is a fundamental skill required by any person involved with E-R/R models in almost any capacity. Not only the modellers themselves and the users whose requirements have been sought, but other end users, such as domain experts, auditors, systems analysts, database designers and administrators, also have a need to 'read' a model. It is an important skill for those undertaking teaching and research in data modelling. As Kent ( 1 983 p.5 1 ) observes "keeping a record of how the data elements express facts, in terms of the entities and relationships of the business, would constitute excellent documentation of what data is being maintained and what it means." Despite the increasing need, from an increasing range of people, little attention has been given to it.

Data models have been likened to maps (e.g. Kent, 1 978) and the ability to read a data model can be seen to be similar to reading a geographically based map. A map may be consulted by a variety of people for any one of a number of reasons. Each individual user needs to have either a working knowledge of the map symbols or access to a

suitable annotated legend. With this knowledge, the user is able to extract useful infonnation. It seems reasonable to suggest that a person will employ a familiar tool, i.e. natural language, in order to mentally manipulate this new infonnation and will certainly use natural language in order to share, compare or review it with others.

While there is a substantial body of research on map interpretation (MacEachren, 1 995), there appears to be little research that investigates how the interpretation of a data model is undertaken. However, it seems likely that a person, attempting to make sense of such a model, will behave in a similar manner to a map user and transfonn at least some of the semantics into natural language in order to access the new infonnation in the model. Certainly several texts (Simsion, 1 994; Veryard, 1 984) suggest that analysts may need to occasionally phrase natural language descriptions of all or part of a model in order to

facilitate validation of the model by non-technical users. However, there is no

indication of how any such descriptions should be derived and certainly very little encouragement to do so. Indeed, Veryard ( 1 984 p. 1 9) states, "most users can be shown the model itself. . . (as) ... there is very little notation to learn and the users need not be bothered with conceptual niceties." There seems to be an unstated consensus that the interpretation of a data model is straightforward and intuitive once the syntax of both the diagram and the accompanying data dictionary are understood. It is certainly assumed that learning how to build a data model leads directly to an understanding of how to detennine what information one contains. This has lead some authors to suggest that users may need to be trained more extensively in conceptual data modelling concepts (Hitchrnan, 1 995 ; Metais et al., 1 993). This is analogous to suggesting that only cartographers, or those non-cartographers with extensive cartographic training, can 'read' maps. The low level of data model usage, reported by both Hitchrnan ( 1 995) and MacDonell ( 1 994), would suggest that there is a real problem of comprehension and that data models need to be more accessible to a wider range of people, if their well documented benefits are to be fully realised.

The need for semantic comprehension

Aside from the general desirability of understanding a conceptual data model to assist in tasks such as decision-making and strategic planning, there are a number of specific activities for which a clear and comprehensive understanding of its information content is essential. Firstly, the 'domain experts' among the users need to verify that the model

represents an accurate and useful perspective of an organisation's "slice of reality"

(Biller & Neuhold, 1 978 p. l l ). Secondly, auditors may use a conceptual data model to

confirm that the database that is derived from it processes data accurately and completely (Am er, 1 993). The need is not confined to the user community however, technical users of the data model also need this accurate understanding in order, for example, to specify functions against the data, design appropriate physical data structures or assess the impact of a new data model on existing data structures. Thus systems analysts, database designers and administrators and data administrators make up a third group for whom a clear semantic comprehension is essential.

Data modellers, themselves, need to be able to recognise the different semantic implications inherent in alternative conceptual data structures and make informed choices between them (Simsion, 1 994). The need for some mechanism of understanding by modellers has also been highlighted by the work of Batra and Antony ( 1 994) in their investigation of common errors made by novice (student) modellers. Their findings suggested that their subjects tended to propose a solution that was based on their initial perception of the problem and that once an initial solution was formulated the basic structure of the model or representation was rarely changed. They suggested that a significant reason for this lack of adjustment to the initial solution was that "since there is no mechanism to inform the designer that the solution is incorrect, there is little motivation to modify the initial solution" (p.64). Batra and Sein ( 1 994), working on the basis of these findings and recognising that "logical database design does not lend itself to self-monitoring by the designer" (p.653), have endeavoured to provide feedback to novice modellers via a design aid called SERFER. SERFER provides feedback by creating a series of structured natural language statements to the data designer based on their input description of a relation. However, the natural language sentences, illustrated in Figure 1 5, relate not to the information content of the model but to the syntactical rules that are being invoked.

Relationship(s) found :

There is a relationship between :

... Employee. Skill and Project

The degree of the relationship is:

..

...

...

3

(Ternary) ....

.

... .

The connectivity of the relationship is:

...

..

One-Many-Many ... .

The connectivity is for :

... Project.. ... .

The connectivity is for :

... Employee ... .

. The connectivity is �many for :

... Skill ... .

.

An instance each of...

... Employee and Skill ..

.

...

.

.

.is associated with instances of... ...

..

.

. Project

An instance each of.. .

... Employee and Project..

.

...

.is associated with one/many instances of

... Sklll

An instance each of

..

.

.

.

.. SkllI and Project ...

...

..

is associated with one/many instances of

... Employee

Figure 15. Example of Feedback from SERFER (Batra & Sein, 1994)

While the notion of timely feedback is valid and seems to offer some fruitful avenues for further research, the nature of the feedback proposed by Batra and Sein ( 1 994) seems rather inappropriate. For novice modellers struggling to come to terms with the differences implicit in various forms of representation, natural language sentences of this form may be confusing and, therefore, less than helpful. While making clear statements about the syntactical elements e.g. 'The connectivity is one for Skill', the SERFER feedback does nothing to assist the modeller understand the semantic implications of this syntax i.e. that while a Project may use many Skills, and an Employee may have many Skills, only one Skill may be used on a Project by one Employee. A natural language representation of an E-RJR model, which makes explicit those implications by focusing on the semantics rather than the syntax, would offer a more profitable tool to both modellers and users. The Natural Language for E-RJR models (NaLER) technique, described in this chapter, provides a means for creating such a representation.

Uses of the NaLER method

The creation of the NaLER technique presented here has two objectives. It is intended as an aid to the modeller within the design process and, more importantly, as a means of presenting the information content of the design model to users.

The first objective then, is to provide a self-monitoring mechanism whereby modellers can create their own feedback in terms of the information content of their created structures. Based on the premise that "modelling is essentially making statements in

some language" (Lindland et al., 1 994 p.43) and that "all representation is an act of knowledge construction" (MacEachren, 1 995 p. vii), a useful form of feedback would be to present a natural language translation of the information content of the data structure to the modeller. In this way a designer, in the act of creating a data model, can judge whether a statement created in the modelling language, says what it was intended to say. It is interesting to note that while most texts provide simple translations of simple modelling examples and also rich scenarios to be turned into data models by the students, none provide a rich description of a complex model by way of comparison. However, it is the ability to compare the information within the modelling representation, to the description of the UoD that is most useful in assisting the modeller to understand the implications of the structures that they have created.

The second objective is to provide a means of constructing a description of the modelled world that can be compared with the 'real' world. The previous chapter suggested the use of NIAM sentences for semantic verification of user requirements. NaLER provides a means to create similar sentences for the verification of the semantics of the E-R/R design model. In this way, it can be confirmed that user requirements are still being supported.

NaLER also provides a helpful tool for teachers or researchers concerned with evaluating the semantic quality of data models. Chapter 8 showed that almost all researchers involved with data modelling judge the quality of a subjects data model by comparing it with a previously worked 'correct' solution. However, this process of comparison has several difficulties, one of which is determining semantic equivalence between two different structural representations. In order to determine whether two facts are equivalent there must be a commonly agreed method of extracting the facts from the data schema as well as an agreed means of comparing them. While their paper describes a fairly complex formal method for comparing the equivalence of different representations, Biller and Neuhold ( 1 978) themselves conclude that comparing the data representation to reality "must rely on a common understanding of natural language" (p.29). As NaLER provides a means of extracting a complete set of natural language sentences from an E-RJR model, any number of sets can be compared either with each other or with an initial set extracted directly from the UoD. Such comparisons can

highlight incorrect or new semantics and provides a genuine basis for assessing semantic equivalence between differing data structures.

The technique could also assist with another problem faced by evaluators of data models; their reportedly low level of understandability. Shanks ( 1 997) observes that data models need to be better explained and he suggests exploring the use of various natural language based techniques such as narrative scenario descriptions (ascribed to Carroll, 1 995) and the argumentation-based design rationale (Buckingham et al., 1994). The NaLER technique described here could be a useful addition to these.

1. Document the model conventions.

2. Check assumptions.

3. Simple entities.

3.1 Construct primary key sentences. 3.2 Construct attribute sentences.

3.3 Construct relationship sentences.

4. Construct super/sub-type sentences.

5. Complex entities

5.1 Construct relationship sentences

5.2 Construct primary key sentences

5.3 Construct attribute sentences.

6. Populate with examples 7. Produce NaLER description.

Figure 16. NaLER -An overview

Thus, there are a number of situations that would benefit from a natural language interpretation of an E-R/R model and the creation of such an interpretation is a natural and intuitive response to the need to make sense of such a model. The proposed technique, an overview of which is illustrated at Figure 1 6, is designed to capture this response in an organised way and to encourage data modellers to create a semi-formal natural language description to elucidate the information that their models contain.

The NaLER Method

NaLER is designed for use with a relational data model, represented diagrammatically

by an E-R/R diagram and supported by a data dictionary, as produced by many contemporary CASE tools. The more extensive the available documentation, the more effective the translation will be.

It is envisaged that the data designer would use NaLER either to check the semantic content of certain data structures under development or to present documentation to users for verification. The NaLER user will thus have a good understanding of both the relational paradigm and the syntax of E-R/R models. When used by practitioners it is therefore reasonable to expect the fol lowing pre-requisites to be met.

P I - Entities are named.

P2 - Entities will have a unique identifier or primary key.

P3 - Lines between entities denote Primary Key/Foreign Key relationships. P4 - Relationships are named in at least one direction.

P5 - Relationship cardinality is indicated on the diagram.

1.1 Assumptions

Although there are a number of additional desirable elements, it is recognised that not all CASE tools provide the same level of documentation support and also, that when used in support of the modelling process, not all the information may have been recorded. Therefore, while full documentation is recommended, if the model is incomplete in some way there are a number of assumptions that can be made. These assumptions are that,

A l - Relationships are optional unless clearly annotated as mandatory) .

A2 - A I - I mandatory relationship is implicit in the position of an attribute in an

entit

/

.

A3 -If an attribute is described as nullable, the 1 - 1 relationship is optional.

A4 - Two attributes with the same name placed in different entities relate to the same 'real world' concept3•

A5 -If any such attribute is a primary key in one entity then it is a foreign key in any

others in which it appears.

IThis assumption is based on the findings of Siau et al. ( 1 995) who suggest that experienced modellers

will almost always prefer to show relationships as optional unless there is very clear evidence to the contrary.

1-his assumption only holds if the model is in I NF and assumes that the intention of the designer would

be, minimally, to create an E-RJR structure in first normal form.

3 While it is not a requirement that attribute names are unique within a model, it is generally good practice

to make them so. Thus if the same attribute name appears in more than one entity is can be assumed to relate to the same 'real world' element. If it becomes clear that this is not the case, the attributes should be renamed to remove ambiguity.

A6 - An entity whose primary key consists of two or more foreign keys is specifying a many to many relationship between the entities of which those attributes are primary keys.

These are the only assumptions that should be required where the data designer who is using NaLER has constructed the model. However, when sentences are being extracted from models created by other designers, particularly students or research subjects, it is

possible that a number of syntactic errors can impede the process. In this case some

additional assumptions are proposed which, while requiring more judicious use, can assist in maximising the amount of useful information that can be extracted. These are,

A7 - For unnamed 1 -m relationships, the parenthesised name '(has)' can be used, unless a more intuitive one is suggested by the names of the participating entities.

A8 - For unnamed 1 - 1 relationships, the name '(has)' can be used unless the two entities have an identical primary key in which case the name '(is)' can be used.

A9 - If a foreign key attribute exists without an existing relationship line, then the relationship should be treated as missing and unnamed. It may be useful to create the relevant sentence using 'has' as the relationship name and a cardinality of 1 -m.

It is suggested that any adjustments that are made on the basis of these assumptions are clearly marked in the final description, either by the use of parentheses as here or by

some type of formatting such as bolding or italicising.

1.2 Procedure

The procedure is broken down into 6 steps. For reference, all the statements are numbered as they are constructed, however, the ordering of the statements is not significant.

Step 1 - Identify and document the diagram conventions

This purpose of this step is to record and clarify what notation has been used to construct the diagram and data dictionary. Where a CASE tool has been used it may be unnecessary although for future reference it is useful to record how the model is being interpreted. Any inconsistent use of notation in the model should also be noted.

Step 2 - Check what assumptions need to be made

The purpose of this step is to identify any areas in the model where assumptions from the above list will need to be made. Where this is a check of the designer' s own work, it can be seen as a syntax check and it is expected that most of the ambiguities and omissions that are discovered can be resolved. In other circumstances, 'corrections'

should not be incorporated. Any assumptions that are made should be recorded. Step 3 - Identify each simple entity

This step is concerned with extracting the sentences that relate to the simple entities within the model by completing the following 3 tasks.4

3.1 Construct a sentence for the primary key attribute(s) asS :-

Sn: Each <entity-name> is uniquely identified by <primary kep'�6 e.g.

SI: Each Zoo is uniquely identified by zoo-no.

S2: Each Zoo-Animal is uniquely identified by zoo-no, animal-no

This task is intended to focus on the appropriateness of the chosen primary key and the entity name.

3.2. For each attribute, construct a sentence as :-.

Sn: Each <entity name> (<primary key » must have only one .

Documento similar