In this context, outliers are data observations that are distant from other observations. There are a number of reasons why…

As you encounter various data elements you should come across categorical data. Some individuals simply discard this data in their analysis…

Given enough modeling building, most Data Scientists run into a sparse matrix. Effectively this is simply when most of the…

When working with one-dimensional array’s we use the term vector and a matrix is a term we use for the concept…

Generating simulated data for regression is a helpful technique to know since you may want to create a model, when…

We will be using Gaussian blobs from SciKit-Learn to generate simulated clusters. View the code on Gist.

This is a classic dataset for regression models. View the code on Gist.

This creates clusters which are normally distributed and assigns an equal number of clusters for each class you choose. Interdependence…

This transformer turns lists of mappings (dict-like objects) of feature names to feature values into Numpy arrays or scipy.sparse matrices…