Convert Pandas Categorical Data For SciKit-Learn

As you encounter various data elements you should come across categorical data. Some individuals simply discard this data in their analysis or do not bring it into their models. That is certainly an option, however many times the categorical data represents information that we would typically want to bring in to these scenarios.

Examples of values which may be represented in a categorical way:

  • Political party: Democratic, Republican, Independent
  • Religious affiliation: Christianity, Hinduism, Buddism
  • Retail departments: shoes, apparel, home goods
  • Property styles: Bungalow, Bi-level, 2-story

While there are several algorithms which can automatically handle categorical and numerical values with virtually no pre-processing. Different algorithms require your categorical data to be converted to numerical values.

If you want to better understand kinds of data, take a look at Ian’s video below: