Data fragmentation in DBMS

Distributed Database systems provide distribution transparency of the data over the DBs. This is achieved by the concept called Data Fragmentation. That means, fragmenting the data over the network and over the DBs. Initially all the DBs and data are designed as per the standards of any database system – by applying normalization and denormalization. But the concept of distributed system makes these normalized data to be divided further. That means the main goal of DDBMS is to provide the data to the user from the nearest location to them and as fast as possible. Hence the data in a table are divided according their location or as per user’s requirement.

Dividing the whole table data into smaller chunks and storing them in different DBs in the DDBMS is called data fragmentation. By fragmenting the relation in DB allows:

Information about the fragmentation of the data is stored in DDC. When user sends a query, this DDC will determine which fragment to be accessed and it points that data fragment.

Fragmentation of data can be done according to the DBs and user requirement. But while fragmenting the data, below points should be kept in mind :

Table of Contents

There are 3 types of data fragmentations in DDBMS.

Horizontal Data Fragmentation :

As the name suggests, here the data / records are fragmented horizontally. i.e.; horizontal subset of table data is created and are stored in different database in DDB.

For example, consider the employees working at different locations of the organization like India, USA, UK etc. number of employees from all these locations are not a small number. They are huge in number. When any details of any one employee are required, whole table needs to be accessed to get the information. Again the employee table may present in any location in the world. But the concept of DDB is to place the data in the nearest DB so that it will be accessed quickly. Hence what we do is divide the entire employee table data horizontally based on the location. i.e.;

SELECT * FROM EMPLOYEE WHERE EMP_LOCATION = ‘INDIA; SELECT * FROM EMPLOYEE WHERE EMP_LOCATION = ‘USA’; SELECT * FROM EMPLOYEE WHERE EMP_LOCATION = ‘UK;

Now these queries will give the subset of records from EMPLOYEE table depending on the location of the employees. These sub set of data will be stored in the DBs at respective locations. Any insert, update and delete on the employee records will be done on the DBs at their location and it will be synched with the main table at regular intervals.

Above is the simple example of horizontal fragmentation. This fragmentation can be done with more than one conditions joined by AND or OR clause. Fragmentation is done based on the requirement and the purpose of DDB.

Vertical Data Fragmentation :

This is the vertical subset of a relation. That means a relation / table is fragmented by considering the columns of it.

For example consider the EMPLOYEE table with ID, Name, Address, Age, location, DeptID, ProjID. The vertical fragmentation of this table may be dividing the table into different tables with one or more columns from EMPLOYEE.

SELECT EMP_ID, EMP _FIRST_NAME, EMP_LAST_NAME, AGE FROM EMPLOYEE; SELECT EMP_ID, STREETNUM, TOWN, STATE, COUNTRY, PIN FROM EMPLOYEE; SELECT EMP_ID, DEPTID FROM EMPLOYEE; SELECT EMP_ID, PROJID FROM EMPLOYEE;

This type of fragment will have fragmented details about whole employee. This will be useful when the user needs to query only few details about the employee. For example consider a query to find the department of the employee. This can be done by querying the third fragment of the table. Consider a query to find the name and age of an employee whose ID is given. This can be done by querying first fragment of the table. This will avoid performing ‘SELECT *’ operation which will need lot of memory to query the whole table – to traverse whole data as well as to hold all the columns.

In this fragment overlapping columns can be seen but these columns are primary key and are hardly changed throughout the life cycle of the record. Hence maintaining cost of this overlapping column is very least. In addition this column is required if we need to reconstruct the table or to pull the data from two fragments. Hence it still meets the conditions of fragmentation.

Hybrid Data Fragmentation :

This is the combination of horizontal as well as vertical fragmentation. This type of fragmentation will have horizontal fragmentation to have subset of data to be distributed over the DB, and vertical fragmentation to have subset of columns of the table.

As we observe in above diagram, this type of fragmentation can be done in any order. It does not have any particular order. It is solely based on the user requirement. But it should satisfy fragmentation conditions.

Consider the EMPLOYEE table with below fragmentations.

SELECT EMP_ID, EMP _FIRST_NAME, EMP_LAST_NAME, AGE FROM EMPLOYEE WHERE EMP_LOCATION = ‘INDIA; SELECT EMP_ID, DEPTID FROM EMPLOYEE WHERE EMP_LOCATION = ‘INDIA; SELECT EMP_ID, EMP _FIRST_NAME, EMP_LAST_NAME, AGE FROM EMPLOYEE WHERE EMP_LOCATION = ‘US; SELECT EMP_ID, PROJID FROM EMPLOYEE WHERE EMP_LOCATION = ‘US;

This is a hybrid or mixed fragmentation of EMPLOYEE table.

Recent Posts