What is Normalization?
Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.The Normal Forms
The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF).First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database:- Eliminate duplicative columns from the same table.
- Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
Second Normal Form (2NF)
Second normal form (2NF) further addresses the concept of removing duplicative data:- Meet all the requirements of the first normal form.
- Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
- Create relationships between these new tables and their predecessors through the use of foreign keys.
Third Normal Form (3NF)
Third normal form (3NF) goes one large step further:- Meet all the requirements of the second normal form.
- Remove columns that are not dependent upon the primary key.
Boyce-Codd Normal Form (BCNF or 3.5NF)
The Boyce-Codd Normal Form, also referred to as the "third and half (3.5) normal form", adds one more requirement:- Meet all the requirements of the third normal form.
- Every determinant must be a candidate key.
Fourth Normal Form (4NF)
Finally, fourth normal form (4NF) has one additional requirement:- Meet all the requirements of the third normal form.
- A relation is in 4NF if it has no multi-valued dependencies.
De-normalization
De-normalization is the process of attempting to optimize the performance of a database by adding redundant data. It is sometimes necessary because current DBMSs implement the relational model poorly. A true relational DBMS would allow for a fully normalized database at the logical level, while providing physical storage of data that is tuned for high performance. De-normalization is a technique to move from higher to lower normal forms of database modeling in order to speed up database access.
Denormalization of Database! Why?
Only one valid reason exists for denormalizing a relational design - to enhance performance. However, there are several indicators which will help to identify systems and tables which are potential denormalization candidates. These are:
* Many critical queries and reports exist which rely upon data from more than one table. Often times these requests need to be processed in an on-line environment.
* Repeating groups exist which need to be processed in a group instead of individually.
* Many calculations need to be applied to one or many columns before queries can be successfully answered.
* Tables need to be accessed in different ways by different users during the same timeframe.
* Many large primary keys exist which are clumsy to query and consume a large amount of DASD when carried as foreign key columns in related tables.
* Certain columns are queried a large percentage of the time. Consider 60% or greater to be a cautionary number flagging denormalization as an option.
No comments:
Post a Comment