Chapter 7. Chapter 7. Network Schema Design
10. CHAPTER CONCLUSION
•
Chapter 10. Conclusion. -182
they proceed satisfactorily thereafter. The main reasons for this seem to be the user's awe of using a computer terminal, and his misinterpretation of terminology such as 'object' or 'identifier'.
Once this has been clarified, the user can largely be left alone to complete the specification. Naturally some problems may still arise, but only in exceptional situations. When these test took place, a user manual had not been written for ADD.
areas highlighted by this experiment were used to write
The problem the User Manual for the system. Indeed, it is intended that this manual be updated periodically, as experience increases our understanding of users' difficulties. The usability of the present version of ADD
can be summarised as follows: After reading a few pages of a User Manual to clarify the concepts on which ADD is based, or after a short discussion with the database designer, a computer novice of average intelligence should be able to create an SDM specification using ADD, with little (if any) further assistance.
Regarding the second objective, we see that the ADD system does indeed produce a prototype DMS 1100 schema. The quality of this prototype is of interest here. Its strength lies in its choice of a logical structure, which should require minimal if any alterations. The wealth of information in the SDM model has been used to define records, sets and items in the best possible way.
The characteristics of attributes, interclass connections and attribute derivations which enhance one's understanding of the organisation, are utilised to improve the network. As the logical structure is based solely on a description of the data, and not on functional requirements, it should not need to be reorganised when processing requirements change. Integrity constraints are included where permitted by the DMS 1100 CHECK and RESULT clauses.
As these facilities are more restrictive than the SDM modelling
•
constructs such as mappings,the SDM constraints cannot always be incorporated. However, the prototype generat ed by ADD will definitely specify integrity constraints wherever this is possible.
A simple physical structure is used for the prototype, since optimising physical parameters was considered beyond the scope of this system. However, ADD does relieve the human designer of much of the tedium associated with the specification of physical criteria. For example, sort keys are designated where appropriate, ALIAS items are declared and singular sets chosen as VIA sets wherever these exist. Changes will probably be necessary before the final schema is decided upon and the outputs of the ADD system should assist the DBA in making these changes. These outputs comprise a copy of each run of the user interface, showing displays and responses; the SDM model specified; a list of all relations for the application; a DMS 1100 schema for the _application; and a listing of what happened to each item in the SDM description. (This can be either: the sets, records or items created for it, with any integrity constraints, or "not implemented").
The relation schema synthesised by ADD is in 2NF and adheres to the principles of normal form theory by removing embedded dependencies wherever these are recognised. For this reason, the system was extended to include an optional component whereby FDs between attributes could be identified.
It is appropriate at this point to explain how the ADD system has been tested. Since it was designed in a modular fashion, each aspect of the design was fully tested before continuing with the next. This involved many test runs, as all the different types of
•
•
Chapter 10. Conclusion . -184
situation under which a modelling feature could be used, were tested. This process proved to be worthwhile, as it resulted in two earlier versions of ADD being adjusted to arrive at the present system. The SDM example given in Hammer and McLeod's article [Hammer 1981] was used in its entirety as the final test, since it includes all modelling constructs used at least once . The results of this example are given in appendices F,G and H. To test the user interface program, a computer novice was asked to experiment with ADD (in addition to my own numerous runs). First the program, and then its displays, were modified as a result of
these experiments .
The advantages of ADD can be summarised as follows. It is helpful to systems analysts and DBAs in that much of their time usually spent in meeting with the user, or in coding the basic skeleton of the schema, is saved. Further, the database generated will form a good starting point for the process of database evaluation and tuning, as its logical structure has been carefully chosen. The system is original in that unlike the other automated database design systems [Gerritsen 1975,Hubbard 1979,Wang 1975] it is based on a semantic data model, not on processing re~uirements or functional dependencies.
ADD also deal s with a semantically richer re~uirements
specification, and unlike the others, automates the process of
re~uirements gathering as well. It also differs from the systems of Hubbard and of Wang and Wedekind in that it designs a Codasyl DBTG structure.
Although the network schema generated is specifically in ~he DDL of DMS 1100, this is so close to the CODASYL specifications 'that changes to accomodate other CODASYL DBMS's would be minimal.
•
•
10.2. Possible changes and extensions.
There are some modifications which could be considered for ADD.
For example, It may be argued that a DMS 1100 database should not have been used for storing the evolving network design, as
affects the system's portability. However, only
this the QDB-manipulation subroutine would need to be altered if a different database or ·data structure were used. As each call involves a very simple operation, altering this subroutine should be easily done. It would in fact be trivial if the system were used to design databases for another CODASYL DBMS. The QDB and its associated subroutine would require very few changes, because DMS 1100 adheres so strictly to the DBTG recommendations [Codasyl 1971, Codasyl 1977]. This small amount of effort that may be necessary is more than compensated for by having the versatility of a database for storing the structure being created. In fact, if ADD were developed _on another machine, its database system would almost certainly have been used for this purpose, because of the great extent to which it simplifies the system .
It became apparent, when immersed in the difficulties created by attribute derivations and subclass definitions, that there is definitely a trade-off between efficiency and integrity. It might be argued that some integrity checks could have been omitted.
This point is clearly debatable. One of the principles on which ADD is based is the following: if something forms part of the SDM specification, then it is sufficiently important to be included in the synthesised database. Furthermore, it was felt that decisions as to what was sufficiently unimportant to be able to ignore should not be made by an automated process, but by man.
•
•
Chapter 10. Conclusion. -186
Although ADD does not design a hierarchical database, this could fairly easily be incorporated into the system because a hierarchy is essentially a special type of network. Integrity constraints would need to be incorporated according to the hierarchical DBMS
involved, as would the handling of many:many relationships .
The present system could be extended by improving the physical aspect of the design, which would mean including another module to obtain the relevant data volume and processing details from the user. The system was initially envisaged as a design tool for conceptual schema design only. Thus the algorithm choosing physical criteria for the database aimed at the simplest, most straight forward solutions. The system could also be extended to build up multiple external views of an enterprise and then consolidate these into one global logical model. This however is a separate area of its own, and hence was not considered. It can be said with some certainty that the inclusion of this facility would greatly complicate the model-creation phase of ADD. The user interface could also be altered so as to incorporate some natural language understanding, or to provide more levels of
'help' . The ADD system could form a basis for future research, with automated loaders, query-language generators and programs to generate different subschemas being a natural extension to the system.
10.3. Design principles emerging from this study.
A great deal has been learnt in undertaking to write this automatic database designer. The importance of a good user interface was realised at the outset, and the high amount of care
•
•
•
•
that needs to be expended on such a program became evident when testing the system with non-DP persons.
The ADD system development clearly showed the ease with which a relational database can be designed, in comparison with a CODASYL one. This is probably due to the simplicity of the relational model, its high degree of data independence (no access paths need be specified) and the guidelines that exist in normal form theory.
However it should be noted that the question of integrity was not addressed because there is no relational DBMS on the Univac 1108.
Since most of the complexity in converting an SDM specification to a computer-compatible schema arises when incoporating integrity assertions, their inclusion may however increase the difficulties of relational schema design.
The usefulness of data models and the importance of one's choice of data model was evident early in the design of the system, and increasingly so in the latter stages of its development. The main problem with SDM was seen to be its incompatibility with CODASYL databases. Hence one can deduce that a model should be restricted in its use so that it remains within the scope of Database Management Systems and should not include features unimplementable on conventional systems. Some general principles for data model design are presented in the next section.
10.4. Design criteria for Data Models .
Having studied the various data models currently available, and implemented the ADD system, the following criteria for good data models are suggested. A data model is best used by the personnel of the organisation requiring the database. Only they truly
•
•
Chapter 10. Conclusion • -188
understand this enterprise and its data, in a way that systems analysts cannot, except possibly after much effort. Even then, users must still be able to understand the model. For this reason, a data model should be simple and sufficiently easy to be used by computer-naive persons, possibly with some assistance from
'
the analyst. Its syntax should be straight forward, avoiding complicated languages such as predicate calculus. Models should also not be too DP-like, as is DAPLEX for example [Shipman 1981].
For a model to be easy to conceptualise and work with, entities and attributes should be the central modelling elements, with relationships pl aying only an auxiliary role. If the user can specify his model by providing a list of all entities of interest, and then describe these by means of attributes, a correct and complete model should be obtained. Relationships can then be regarded either as entities or as attributes. To keep the complexity of the model within the capabilities of a non-computer user, the model should not be based on FDs (or MVDs). These can always be identif ied by (or under the guidance of) a systems analyst once the model is complete.
To increase the ease with which the model can be used to describe the real world, t here should be many modelling tools and the designer should be allowed a free interpretation of his environment. However, the ease with which the model is conceptualised should not be compromised, nor should any complicated constructs be introduced. Thus, semantically expressive modell ing elements for stating integrity assertions are desirable, as is the concept of generalisation, which will be applicable to relationships (as entities) as well. A name-based model is advocated, as this in conducive to a clearer view of a
•
•
•
model and forces the user to think carefully, lessening the probability of incorrect definitions. Similarly, entities should represent themselves, and the concept of keys should be de-emphasised, because they are not
'natural'to users
[Martin1973] .
Dynamics should not be part of a data
model,as this results in a database based on functional requirements, rather than
on inherentcharacteristics of data; furthermore dynamics constructs cannot be
incorporatedinto conventional schemas. This last point
isone of the most important to bear in mind when designing a data model.
If it is not compatible with available DBMS's, but provides modelling elements beyond their scope, then its suitabiltiy for database design is
jeopardised.10.5. Criteria for Database Design
Methodologies.Experience
with the ADD system has also indicated the
followingcriteria for database design methodologies. Above all, the user
shouldpartake in as much of the design process as is possible.
For