Determining the access methods, data clustering and partitioning strategies
Data distribution and replication in distributed systems.
A Relation has the following properties:
Has a name that is distinct from all other relation names in the relational schema.
Each cell contains exactly one atomic (single) value.
Each attribute has a distinct name.
The values of an attribute are all from the same domain.
Each tuple is distinct; there are no duplicate tuples.
The order of attributes has no significance.
The order of tuples has no significance, theoretically. (however, in practice, the order may affect the efficiency of accessing tuples.)
A relation is defined as a set of tuples Placing of proper attribute into a relation
The problem in relational model is to place the proper attributes into a relation and of grouping attributes in different relations. This problem can be solved by the following two methods:
1. By intuition and semantic. But problems are...
a. No semantic distinction between the two attributes, Ex. T (a, b) b. Duplication of attribute values
c. Semantic ambiguity d. Storage anomalies
2. Relational normalization using the notion of functional dependency.
Three design goals for relational databases
1. Avoid redundancies and resulting update, insertion, and deletion anomalies, by decomposing schemes as necessary.
2. Ensure that all decompositions are lossless-join.
3. Ensure that all decompositions are dependency-preserving.
The types of update anomalies that may occur on a relation that has redundant data.
A major aim of relational database design is to group attributes into relations so as to minimize information redundancy and thereby reduce the file storage space required by the base relations. Another serious difficulty using relations that have redundant information is the problem of update anomalies. These can be classified as insertion, deletion, or modification anomalies. The process of Normalization can reduce all these anomalies.
Types of anomalies
Insertion anomalies: An independent piece of information cannot be recorded into a relation unless an irrelevant information must be inserted together at the same time.
Update anomalies: The update of a piece of information must occur at multiple locations, not required by the referential integrity rule.
Deletion anomalies: The deletion of a piece of information unintentionally removes other information.
Purpose of Normalizing data:
Normalization can be achieved with the help of keys and functional dependencies(FD).
Normalization process, as first proposed by codd (1972).
Initially, codd proposed three normal forms which he called first, second and third normal form
Normalization of data can be looked upon as a process of analyzing the given relation schemas based on their FDs and primary keys to achieve the desirable properties of (1) minimizing redundany and (2) minimizing the insertion, detection, and update anomalies.
The Concept of Functional Dependency:
Functional dependency describes the relationship between attributes in a relation.
If A and B are attributes of relation R, B is functionally dependent on A (denoted by A B), if each value of A in R is associated with exactly one value of B in R.
Given a table T with at least two attributes A and B, we say that A B (A functionally determines B, or B is functionally dependent on A)if it is the intent of the designer that for any set of rows that might exist in the table, two rows in T cannot agree on A and disagree on B. More formally, given two rows r and r in T, if r (A)= r (A) then r (B) = r (B).
Inference rules IR1 through IR3 are known as Armstrong’s inference rules.
Closure of set of attributes
for each functional dependency Y→Z in F do ifX ⊇ Y then X : X Z
Equivalence of sets of functional dependencies
Two sets of functional dependencies E and F are equivalent if E+ = F+. Hence equivalence means that every FD in E can be inferred from F, and every FD in F can be inferred from E, that is, E is equivalent to F if both the conditions E covers F and F covers E hold.
Minimal sets of functional dependencies
A set of functional dependencies F is minimal if it satisfies the following conditions 1. Every dependency in F has a single attribute for its right hand side.
2. We cannot replace any dependency X→ A in F with a dependency Y→A, where Y is a proper subset of X, and still have a set of dependencies that is equivalent to F.
3. We cannot remove any dependency from F and still have a set of dependencies that is equivalent to F
We can thick of a minimal set of dependencies as being a set of dependencies in a standard or canonical form & with no redundancies.
Algorithm: Finding a Minimal Cover F for a Set of Functional Dependencies E 1) Set F: = E
2) Replace each functional dependency X {A , A A } in F by the n functional dependencies X A , X A . X A
3) For each functional dependency X A in F for each attribute B that is an element of X if
if {{F (X A)} {(X {B}) A}}is equivalent to F, then replace X A with (X {B}) A in F
4) For each remaining functional dependency X A in F if {F {X A}} is equivalent to F,
then remove X A from F Normalization and database Design
E – R Diagram provides macro view , determines entities
Normalization provide micro view of entities, focuses on characteristics of specific entities, may yield additional entities
Essence of Normalization
When a relation has more than one theme, break it in to two (or more) relations, each having a single theme.
Functional dependency and the process of normalization
Three normal forms were initially proposed, which are called first (1NF), second (2NF) and third (3NF) normal form. Subsequently, a stronger definition of third normal form was introduced and is a referred to as Boyce- Codd normal form (BCNF) .All these normal form are based on the functional dependencies among the attribute of a relation.
Steps in Normalization
First Normal Form:
A relation in 1NF if the values in the domain of each attribute of the relation are atomic.
It also disallows multivalued attributes that are themselves composite. They are called nested relations.
Second Normal Form:
It is based on full functional dependency
A relation schema R is in 2NF if it is in 1NF and every non-pirme attribute A in R is fully functional dependent on any key of R.
UNNORAMALIZED
Every constraint on the relation is as consequence of the definition of keys and domain (subsumes 4NF and 5NF).
Third Normal Form:
A relation R is in 3NF, if it is in 2NF and whenever a non trivial FDX → A holds in R either (i) X is a super key of R or
(ii) A is a prime attribute of R must be satisfied.
i.e., It disallows the transitive dependency i.e., relation should not have a FD in which a non-prime attribute depends on a non non-prime attribute.
Boyce Codd Normal Form:
A relation scheme R is in BCNF if it is in 3NF and whenever a nontrivial functional dependency X → A holds then X is a super key of R and it does not allow A to be prime.
This is more restrictive than 3NF
While decomposing relation to make them in BCNF we may loose some dependencies i.e.
BCNF does not gurqntee the dependency preservation property.
Multi-Valued Dependency (MVD) represents a dependency between attributes (for example, A, B, and C) in a relation, such that for each value of A there is a set of values for B and a set of values for C. However, the set of values for B and C are independent of each other.
A multi-valued dependency can be defined as being trivial or nontrivial. A MVD A -- >> B in relation R is defined as being trivialif B is a subset of A or (b) A B R.
A MVD is defined as being nontrivial if neither (a) nor (b) are satisfied. A trivial MVD does not specify a constraint on a relation, while a nontrivial MVD does specify a constraint.
Fourth Normal Form:
A relation schema R is in 4NF whit respect to a set of dependencies F (including both functional dependency and multivalued dependency) if it is in BCNF and for every nontrivial multivalued dependency X→Y in F , X is a super ey for R
Every relation in 4NF is in BCNF.
Join dependencies
A join dependency (JD), denoted by JD(R , R . R ), specified on relation schema R, specifies a constraint on the states r of R. The constraint states that every legal state r of R should have a non-additive (lossless) join decomposition into R , R . R that is for every such r we have ( (r), (r), . (r)) r.
A join dependency JD (R , R . R ), specified on relation schema R, is a trivial JD if one of the relation schemas R in JD (R , R . R ) is equal to R.
Fifth Normal Form
A relation schema R is in 5NF with respect to a set F of functional multivalued & join dependencies if, for every non-trivial dependency JD(R , R . R ),isF every R is a super key of R
Properties of Normal Forms of Their Decompositions
Property 3NF BCNF 4NF
Eliminates redundancy due to FD’s Yes Yes Yes Eliminates redundancy due to MVD’s No No Yes
Preserve FD’s Yes Maybe Maybe