Missing value for which true false must be replaced

Excursus: Missing values

Data collected in real life are mostly unclean and incorrect. A common problem with this are missing values, i.e. observations for which some characteristics were not recorded. Missing values ​​are marked differently in each data set, but these codes are often found again:.

If, for example, the mean value of a statistical variable is calculated, it must be decided how to deal with missing values: Should the values ​​be away become? Shall the missing values ​​by a certain value replaced become?

In s missing values ​​are indicated by the keyword ("Not a numberWhen reading in data (see e.g. the function), additional codes for incorrect values ​​can be specified with the argument.

Case study

The library usage data record contains the coding for missing values. These are not recognized correctly when reading in numerical columns:

Although the column is numeric, it is saved as text because it is not recognized as a number. For example, if you want to calculate, you will receive an error message, since no subtractions can be carried out for text values.

There are two ways to fix the problem. When reading in, you can also specify the coding for missing values:

Or you run a after reading explicit conversion the data type by:

Data types (1 minute)

What is the difference between value and value? What the value of the value? What is the value of the value? Is and the same?

Handling of missing values

offers the useful functions for and s,, and to identify missing values, to remove them or to replace them with other values.

filter

The functions () return a Boolean, which is () if there is a missing value at the point. In order for pandas to correctly recognize missing values, they must first be converted into the internal format (see above).

This useful command is a quick way to get the number of missing values ​​in each column:

This works because Python implicitly converts a Boolean value to numeric format if necessary. is converted to 1 and to 0.

Remove

Replace

By default, the operations or new or s are returned. The original variable remains untouched. The original objects are overwritten directly with the argument.

Excursus: Missing values ​​(20 min)

  • Which columns all contain missing values?
  • Read the data set and create one that no longer contains any observations with missing values.
  • Save this under the name.
  • How many observations were removed?