Pandas Tutorial · Cleaning Data

Cleaning Wrong Format

Learn all about Cleaning Wrong Format in this comprehensive tutorial.

5 min read intermediate
  • Cells with data of wrong format can make it difficult, or even impossible, to analyze data.
  • In our Data Frame, we have two cells with the wrong format.
  • The result from the converting in the example above gave us a NaT value, which can be handled as a NULL value, and we can remove the row by using the dropna() method.

Data of Wrong Format

Cells with data of wrong format can make it difficult, or even impossible, to analyze data.

To fix it, you have two options: remove the rows, or convert all cells in the columns into the same format.

Convert Into a Correct Format

In our Data Frame, we have two cells with the wrong format. Check out row 22 and 26, the 'Date' column should be a string that represents a date:

Let's try to convert all cells in the 'Date' column into dates.

Pandas has a to_datetime() method for this:

python

As you can see from the result, the date in row 26 was fixed, but the empty date in row 22 got a NaT (Not a Time) value, in other words an empty value. One way to deal with empty values is simply removing the entire row.

Removing Rows

The result from the converting in the example above gave us a NaT value, which can be handled as a NULL value, and we can remove the row by using the dropna() method.

python

Module quiz

2 questions
1

Which of the following is true about Cleaning Wrong Format?

2

What is the most common pitfall when working with Cleaning Wrong Format?

Answer all questions to submit.