How to Remove Duplicate Data from Database

Table of Contents

How to Remove Duplicate Data from Database

In this article, I give you ways to delete duplicate data from your database. If you are a developer must read this article and try this information for removed duplicate data from your database.

Data Integrity:

When you’re working on a SQL database you need to you know about the Data integrity. Data integrity is the process of accuracy and consistency of records in your database and also described the database access with the consistency and accuracy. Set up a database you must know about the Data integrity role in your database. Maintain records or data with the accuracy and consistency is called Data integrity.

Here we discuss about the duplicate data records in your table. When you are working with the Multiple data records in tables you must know about the duplicate data.

Duplicate Data contained more space in database and also slower process of your queries execution for retrieve the records. So I have decided to write an article on duplicate data and how to remove this data from your database.

let’s start with an example of delete duplicate records from table

For example, I have one table named ‘Employee’, there are multiple row records in this table but there is some duplicate records which is not useful for me and I want to delete this duplicate records.

In this table I have 5 columns:

Employee ID : Employee_ID
Employee Name: Employee_Name
Employee City: Employee_City
Employee Address: Employee_Address
Employee Mobile Number: Employee_mobno

You can create this table from your SQL server with query or or without query on right click on the table option in the left side of your database SQL server. send this you can insert records in this with the query and also manually fill the records in table.

For Example:

create table Employee
(
Employee_ID int,
Employee_Name varchar(20),
Employee_City varchar(20),
Employee_Address varchar(20),
Employee_mobno int

)

In this table I have two row records named employee ‘John’, these two rows having the same records Like same name, same city, Same address, same mobile number. this record is called duplicate of record. So I want to delete this data from my table. But there is one unique column in this table which is called employee ID, Due to auto increment and identity employee ID contain serial wise different data.

Ways to delete duplicate record:

Using Group By Statement:

In SQL server Group By statement is the result of one or more column and used with the aggregate function like COUNT, MAX, MIN, SUM etc. Using this Group By statement on your table, you can remove the duplicate records from your table.

Using Distinct Statement:

We are also known about the Distinct Statement in SQL Server which gives non duplicate rows in your tables. So I also create a new table named employee1 and all Distinct records init from my previous table employee. I drop my table employee and rename my new table from employee1 to employee.

Select distinct * into Employee1 From Employee

Using Common Table Expression(CTE):

Common table expression is temporary result which is created by this Select, insert and update queries. I used this hear with Row_number() and partition by For delete my duplicate records.