Chapter 7. Chapter 7. Network Schema Design
8. CHAPTER TUNING FOR PERFORMANCE
8.1. Introduction.
The ADD system generates a conceptual schema and incorporates simple physical specifications to create a complete DMS 1100 schema, which can be used as a prototype design. This schema may have to be tuned to improve its efficiency. As this depends on knowledge of the data itself- that is the relative importance, volume and frequency of use of each record and item, this is a process which is best done manu~lly. The following paragraphs provide guidelines for manual alteration of the schema to meet performance requirements.
8.2. General Guidelines for Improving Efficiency.
In improving t he efficiency of a database, the general method is to identify overhead situat~ons and then to resolve this by altering either the physical properties or the logical structure at that point. To alter the database in this way, it is necessary to gather a subst antial amount of extra information from the user.
For each record type, the expected: number of occurences, frequency and type of access, and selection criteria must be established. Where the frequency or type of access for an item differs from that of the record of which it is a part, the designer must be made aware of this. Further, the frequency and type of use of all relationships and the expected number of.
members of all set types should be supplied by the user. The ease
~ ·-=~-- ---- ----· ·-- ---- --- "1: -- - - -
•
Chapter 8. Tuning for Performance.
-164
and accuracy with which the user can estimate this information depends on the type of data· in the database. In some situations he may even be able to provide additional data such as common patterns of access to groups of items and/or records, security needs, time constraints on availability of data, long term archiving and any high-priority programs requiring special response times [Sperry 1977]. For the most part however, the user will have difficulty telling even the basic requirements, especially if inexperienced or new to computerisation.
Once this information is available, potential overhead situations can be located and rectified. The most common overheads are listed below.
1) maintenance of sorted sets. Especially when sets are large or volatile, order LAST or NEXT is preferable.
2) poor choice of the "VIA" set for a record type. If this set type is not the most frequently used relationship when accessing such records, or if it has very many members, an alternat~ve
should be used.
3) extremely large records. This results in few records per page and hence more page chang~s (I/O's) than would be necessary with smaller records. This can be rectified by having larger pages (which has its disadvantages) or, rather, by splitting off rarely-used items of such records, eg. ADDRESS, to form a separate record type. large items whose actual length varies from one record occurrence to another (e.g. a text) should be reduced to the smallest reasonable size, and an "itemextension" member record used for larger items.
4)
implicit many-to-many relationships. As an example, suppose most INCIDENTS involve only one SHIP, but a few may involve more than one. If a set INCIDENTS->
SHIPS is used there will be•
duplicate occurrences of some SHIPS. This causes serious overheads if the SHIPS record is large or is involved in many sets. The set should be replaced by an explicit many-to-many relationship.
5)
page "thrashing" in large set occurrences. Traversing the logical ring pointers may involve several page-swops depending on how much the physical ordering of members differs from their logical order. An ORDER of LAST minimises this possibility (and hence is used as a default in ADD). The use of PRIOR and/or OWNER links alleviates this, as the DBMS will then use the shortest path to a record. If the members are small and volatile, they can be combined into fewer record occurrences. Alternatively the member record can be replaced by a repeating group in the owner, with a"memberextension" record type for instances where the size of the repeating group is inadequate. In this way, fewer DML commands will be needed t o process the set, which is a considerable saving because of the "base" overhead involved in any execution of a DML command.
6) too many secondary sets. If a record is the member of more than one set, one is called
termed secondary. If the member
the prime set and the others are is accessed frequen~ly or is highly volatile, the
CHAIN to POINTER ARRAY.
mode of these sets should be changed from This reduces the number of I/O's required when traversing each set of which it is a member.
7)
incorrect choice of location mode can result in poor placement of records in a file or unacceptably slow accesses. Good discussions on choice of location mode can be found in [Ullmann 1980, Date 1981] .8) lack of redundancy. It is sometimes advisable to introduce a certain amount of so-called ''controlled redundancy" to facilitate
•
Chapter 8. Tuning for Performance. -166
database proces
sing. Factors such as stability and access freQuencies of
the items concerned must be considered as there is a trade-off here between saving time and space.
9)