Pandas Tutorial · Cleaning Data
Removing Duplicates
Learn all about Removing Duplicates in this comprehensive tutorial.
5 min read intermediate
- •Duplicate rows are rows that have been registered more than one time.
- •To remove duplicates, use the drop_duplicates() method.
Discovering Duplicates
Duplicate rows are rows that have been registered more than one time.
By taking a look at our test data set, we can assume that row 11 and 12 are duplicates.
To discover duplicates, we can use the duplicated() method.
The duplicated() method returns a Boolean values for each row:
python
Removing Duplicates
To remove duplicates, use the drop_duplicates() method.
python
Note: Remember: The (inplace = True) will
make sure that the method does NOT return a new DataFrame, but it will remove all
duplicates from the original DataFrame.
Module quiz
2 questions1
Which of the following is true about Removing Duplicates?
2
What is the most common pitfall when working with Removing Duplicates?
Answer all questions to submit.