Normalization in DBMS

Nishajha
7 min readFeb 22, 2021

Normalization is a database design technique that reduces data redundancy and eliminates undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization rules divides larger tables into smaller tables and links them using relationships. The purpose of Normalization in SQL is to eliminate redundant (repetitive) data and ensure data is stored logically.

Normalization is used for mainly two purposes,

  • Eliminating redundant(useless) data.
  • Ensuring data dependencies make sense i.e data is logically stored.

Normal Form

A table without redundancy and anomaly is known as a normal form.

Anomaly

Anomalies are the side effects that are caused due to data manipulation language operation i.e. insert, update, delete.

Database Normal Forms

Normal Forms

Here is a list of Normal Forms

  • 1NF (First Normal Form)
  • 2NF (Second Normal Form)
  • 3NF (Third Normal Form)
  • BCNF (Boyce-Codd Normal Form)
  • 4NF (Fourth Normal Form)
  • 5NF (Fifth Normal Form)
  • 6NF (Sixth Normal Form)

i.1NF (First Normal Form)

A table is said to be in 1NF if it satisfies the following conditions:

  1. The table should not be having duplicate records.
  2. The table should be having single-valued data or atomic data.

Example 1 — Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute STUD_PHONE. Its decomposition into 1NF has been shown in table 2.

ii.2NF (Second Normal Form)

A table is said to be in 1NF if it satisfies the following conditions:

  1. The table should be in 1 NF.
  2. The table should not be having the partial functional dependency.

Partial Functional Dependency

The attribute which is dependent on another attribute that is a part of the composite key attribute.

composite key

It is a combination of two or more non-key attributes.

In relation to STUDENT_COURSE given in Table 3,

FD set: {COURSE_NO->COURSE_NAME}
Candidate Key: {STUD_NO, COURSE_NO}
  • In FD COURSE_NO->COURSE_NAME, COURSE_NO (a proper subset of candidate key) is determining COURSE_NAME (non-prime attribute). Hence, it is partial dependency and the relation is not in the second normal form.
  • To convert it to the second normal form, we will decompose the relation STUDENT_COURSE (STUD_NO, COURSE_NO, COURSE_NAME) as :
STUDENT_COURSE (STUD_NO, COURSE_NO)
COURSE (COURSE_NO, COURSE_NAME)
  • Note — This decomposition will be lossless join decomposition as well as dependency preserving.

Note: If the table consists of partial functional dependency we have to identify the attributes which are responsible and they are removed from the table.

iii.3NF (Third Normal Form)

A table is said to be in 1NF if it satisfies the following conditions:

  1. The table should be in 2NF.
  2. The table should not be having the transitive functional dependency.

Transitive functional dependency

An attribute is dependent on a non-key attribute which is dependent on a key attribute.

A relation is in 3NF if at least one of the following condition holds in every non-trivial function dependency X –> Y

  1. X is a super key.
  2. Y is a prime attribute (each element of Y is part of some candidate key).

If A->B and B->C are two FDs then A->C is called transitive dependency.

In relation to STUDENT given in Table 4,

  • FD set: {STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE, STUD_NO -> STUD_COUNTRY, STUD_NO -> STUD_AGE, STUD_STATE -> STUD_COUNTRY}
  • Candidate Key: {STUD_NO}
  • For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE -> STUD_COUNTRY are true. So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the third normal form. To convert it in third normal form, we will decompose the relation STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
  • STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_AGE)
  • STATE_COUNTRY (STATE, COUNTRY)

Note: If the table consists of transitive functional dependency we have to identify the attributes which are responsible and they are removed from the table.

iv. Boyce and Codd Normal Form(BCNF)

BCNF is a higher version of the Third Normal form. This form deals with a certain type of anomaly that is not handled by 3NF. A 3NF table that does not have multiple overlapping candidate keys is said to be in BCNF. For a table to be in BCNF, the following conditions must be satisfied:

  • R must be in 3rd Normal Form
  • For each functional dependency ( X → Y ), X should be a super Key.

A relation R is in BCNF if R is in Third Normal Form and for every FD, LHS is super key. A relation is in BCNF if in every non-trivial functional dependency X –> Y, X is a super key.

Example Find the highest normal form of a relation R(A,B,C,D,E) with FD set as {BC->D, AC->BE, B->E}

  • Step 1. As we can see, (AC)+ ={A, C, B, E, D} but none of its subsets can determine all attribute of relation, So AC will be the candidate key. A or C can’t be derived from any other attribute of the relation, so there will be only 1 candidate key {AC}.
  • Step 2. The prime attribute is those attribute which is part of candidate key {A, C} in this example and others will be non-prime {B, D, E} in this example.
  • Step 3. The relation R is in 1st normal form as a relational DBMS does not allow multi-valued or composite attribute.
  • The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a proper subset of candidate key AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E is in 2nd normal form (B is not a proper subset of candidate key AC).
  • The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is a prime attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be super key or RHS should be a prime attribute.
  • So the highest normal form of relation will be the 2nd Normal form.

Key Points –

  1. BCNF is free from redundancy.
  2. If a relation is in BCNF, then 3NF is also satisfied.
  3. If all attributes of relation are prime attributes, then the relation is always in 3NF.
  4. A relation in a Relational Database is always and at least in 1NF form.
  5. Every Binary Relation ( a Relation with only 2 attributes ) is always in BCNF.
  6. If a Relation has only singleton candidate keys( i.e. every candidate key consists of only 1 attribute), then the Relation is always in 2NF( because no Partial functional dependency possible).
  7. Sometimes going for BCNF form may not preserve functional dependency. In that case, go for BCNF only if the lost FD(s) is not required, else normalize till 3NF only.
  8. There are many more Normal forms that exist after BCNF, like 4NF and more. But in real-world database systems, it’s generally not required to go beyond BCNF.

v.Fourth Normal Form

If a table is in BCNF and has no multi-valued dependency then it will be in 4NF. For dependency A->B, if for a single value of A, multiple values of B exist, then this will be in multi-valued dependency. The fourth normal form (4NF) is a level of database normalization where there are no non-trivial multivalued dependencies other than a candidate key. It builds on the first three normal forms (1NF, 2NF, and 3NF) and the Boyce-Codd Normal Form (BCNF). It states that, in addition to a database meeting the requirements of BCNF, it must not contain more than one multivalued dependency.

vi. Fifth Normal Form(5 NF)

A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless. A relation R is in 5NF if and only if every join dependency in R is implied by the candidate keys of R. A relation decomposed into two relations must have loss-less join Property, which ensures that no spurious or extra tuples are generated when relations are reunited through a natural join.

Conclusion

Normalization helps in logical mapping, normalizing product data means that the information is always organized and stored in its proper place, without duplicates or outdated versions. The reliability of the data is increased for all those involved in accessing the databases, and there will be greater consistency in the information stored. The main advantage of normalizing data, apart from clearing out redundancies, is the design of a complete data system that will show how data from different tables relates one to another.

--

--