The relational model is a theory in whichalldata is modelled asrelations; there are no records, no pointers or data structures – only relations. The concept of a relation is formally introduced later in this chapter. Until then, since tables are well suited to graphically representing relations, we shall treat ‘relation’ and ‘table’ as synonyms.
For example, information about departments and employees in an organisation might be modelled in two relations – Employees and Departments – as illustrated in Figure 3.1.
Employees
Emp_ID Emp_Name Salary Department
4182 A. Singh 36,003.12 Development
5542 M. Lee 39,818.15 Development
3351 T. Esterhazy 31,919.18 Development
6164 A. Barraclough 38,002.05 Marketing
9095 N. Prabakar 36,003.12 Research
Departments
Department Budget
Development 500,000
Marketing 150,000
Research 100,000
Figure 3.1. Two relations illustrated as tables.
Two conditions are assumed by the relational model that restrict the definition of a relation:
1. All data values areatomic (scalar). This means that its structure cannot be broken down into values of a more basic type. Figure 3.2 shows an incorrect relation and a corrected version. In the incorrect version, the ‘Children’
data value is a set of three values, and so can be further broken down; the decomposed set is used in the corrected version.
Incorrect relation Corrected relation
Parent Children Parent Children
Leda
Helen Leda Helen
Clytemnestra Leda Clytemnestra
Castor Leda Castor
Pollux Leda Pollux
Anakin Skywalker Luke Skywalker Anakin Skywalker Luke Skywalker Leia Organa Anakin Skywalker Leia Organa Shmi Skywalker Anakin Skywalker Shmi Skywalker Anakin Skywalker Figure 3.2. The table on the left does not represent a legal relation because the data values
for Children each represent a set of data. The table on the right is altered so that all values are atomic.
2. All information must be represented asexplicit data values. More formally, all base relations1 are definedextensionally.
Suppose, given the Parent-Child relation given in corrected form in Figure 3.2, we wish to define a new base relation called ‘Ancestor’. The Ancestor relation would link fathers, grandfathers, great-grandfathers and so on to their children and grandchildren.
Such a relationship is deterministic and easily computed based on the Parent-Child relation. A non-relational model might allow us to specify a set of logical rules to define the new base relation, but within the relational model, the relation must be explicit, as in Figure 3.3.
Ancestor Descendant
Shmi Skywalker Anakin Skywalker Shmi Skywalker Luke Skywalker Shmi Skywalker Leia Organa Anakin Skywalker Luke Skywalker Anakin Skywalker Leia Organa
Leda Helen
Leda Clytemnestra
Leda Castor
Leda Pollux
Figure 3.3. The ancestor relation must include all information explicitly if it is to be used as a base relation. Relations like this can easily become very large as the family tree expands.
Clearly such a restriction could give rise to issues in practice, but there do exist other mechanisms – such asviews – that we shall consider in later chapters which do make this restriction less problematic.
If relations are tables, then each column draws its data values from a domain. A relation shown as a table with three columns is defined over a set of three domains; formally, we would write this as R: D1 × D2 × D3. In general, any given relation will be defined over a set ofn domains as R: D1 × D2 × … × Dn.
There are only two data objects needed by the relational model:relations and domains. We will return to relational data objects later in this chapter.
1 Base relations will be defined later in this chapter in section 3.4.
Data objects are of little use unless you can do things with them. The relational model provides a set of operators for data manipulation. Two main approaches exist with respect to relational operators:
• declarative, represented byrelational calculus; and
• procedural, represented byrelational algebra.
In this chapter, we look only at relational algebra operators, since the most used database language,SQL, implements relational algebra operators.
Relational algebra operators are global (more formally,set at a time). This means that a single operator is considered to be applied to entire relationsat once, with the results being returned in the form of relations.
One important relational algebra operator is therestrict operator, of which we will see more later. If we wanted to take the relation containing employees from Figure 3.1 and show only the higher earners, then we would use the operator as follows:
RESTRICT Employees SUCH THAT Salary > 37000;2
The result of this expression is, of course, a relation. It is shown in Figure 3.4.
Emp_ID Emp_Name Salary Department
2 M. Lee 39,818.15 Development
4 A. Barraclough 38,002.05 Marketing
Figure 3.4. Relation resulting from a restriction operator.
The information relevant to a real-life system is modelled, within the relational model, by means of explicitly-defined relations. Since there is no such thing in the relational model as a pointer, relations that are in some way related to each other must be linked in some other way. This is achieved usingcorresponding fields, implemented usingkeys.
If you look back at the two relations shown in Figure 3.1, you will see that the Department column in both tables contains the same data values. In each relation, the Marketing department, for example, means the same thing. So the links are implemented purely in terms of data values.
Usually, a field will be constrained by restrictions on the real-world concepts they model, limiting it to a set ofcorrect orvalid values. For example, the Salary column in the Employee relation would not make sense if it contained a negative value, nor is Emp_Name likely to be valid with a value of “12”. This means that when devising a relational model, we do not define only the structure of the data. Limitations that constrain data to correct values must also be defined.
These defined limitations are calledintegrity constraints. In this chapter, we shall introduce two generally applicable integrity constraints: entity and referential integrity constraints.
Summarising what we have seen so far, the relational model can be defined as a way of talking about and handling data, under three categories:
• data representation (relational data objects)
• data manipulation (relational operators)
• integrity constraints representation (relational data integrity).
The rest of the chapter will cover these in more detail.
3.2.1 Relational DBMSs
A DBMS that implements relational theory is called, unsurprisingly a Relational Database Management System, orRDBMS. RDBMSs make up a large
proportion of the DBMSs currently in use.
2 The notation used here is chosen to illustrate the operations clearly.
It is based on relational algebra, but is not a standard notation.
Formal relational algebra will be introduced later.
As we have seen in Chapter 2 of this subject guide, an RDBMS uses the relational model to allow operations to be carried out at theconceptual level, on relations.
This abstraction protects the user from having to engage with theinternal level and the specifics of a platform, while at the same time making migration to a new platform – and even between different DBMS implementations – easier. For a reminder about these concepts, look at Figures 2.6 and 2.9 of Chapter 2 of the subject guide.
In general, RDBMSs are so much the dominant approach that most writers simply use the term DBMS and assume that it is relational. If a different model is used it will usually be specified explicitly. From here onwards, this subject guide will follow that practice.
Despite the prevalence of relational DBMSs, it is rarely the case that any system implements the full relational theory. This must be considered when modelling and implementing databases for a specific DBMS.