Pandas Tutorial · Cleaning Data

Removing Duplicates

Learn all about Removing Duplicates in this comprehensive tutorial.

5 min read intermediate
  • Duplicate rows are rows that have been registered more than one time.
  • To remove duplicates, use the drop_duplicates() method.

Discovering Duplicates

Duplicate rows are rows that have been registered more than one time.

By taking a look at our test data set, we can assume that row 11 and 12 are duplicates.

To discover duplicates, we can use the duplicated() method.

The duplicated() method returns a Boolean values for each row:

python

Removing Duplicates

To remove duplicates, use the drop_duplicates() method.

python
Note: Remember: The (inplace = True) will make sure that the method does NOT return a new DataFrame, but it will remove all duplicates from the original DataFrame.

Module quiz

2 questions
1

Which of the following is true about Removing Duplicates?

2

What is the most common pitfall when working with Removing Duplicates?

Answer all questions to submit.