Convert Pandas Categorical Data For SciKit-Learn

As you encounter various data elements you should come across categorical data. Some individuals simply discard this data in their analysis or do not bring it into their models. That is certainly an option, however many times the categorical data represents information that we would typically want to bring in to these scenarios. If you are interested in a great resource for this you should check this out by Aurelien Geron.

Examples of values which may be represented in a categorical way:

  • Political party: Democratic, Republican, Independent
  • Religious affiliation: Christianity, Hinduism, Buddism
  • Retail departments: shoes, apparel, home goods
  • Property styles: Bungalow, Bi-level, 2-story

While there are several algorithms which can automatically handle categorical and numerical values with virtually no pre-processing. Different algorithms require your categorical data to be converted into numerical values.

If you want to better understand the kinds of data, take a look at Ian’s video below:

Further Reading

This section provides more resources on this topic if you are looking to go deeper.

Books