## Detecting Ouliers

In this context, outliers are data observations that are distant from other observations. There are a number of reasons why variability may exist in the data th...

# Category: Data Science

## Convert Pandas Categorical Data For SciKit-Learn

## Create a Sparse Matrix

## Transpose a Vector or a Matrix

## Make Simulated Data for Regression

## Make Simulated Data for Clustering

## Load Boston Housing Data SciKit-Learn

## Creating Simulated Data for Classification

## Loading Features from Dictionaries

Helping Others with Data.

As you encounter various data elements you should come across categorical data. Some individuals simply discard this data in their analysis or do not bring it i...

Given enough modeling building, most Data Scientists run into a sparse matrix. Effectively this is simply when most of the elements are zeros. As you will see i...

When working with one-dimensional array’s we use the term vector and a matrix is a term we use for the concept of storing matrices of more than one dimens...

Generating simulated data for regression is a helpful technique to know since you may want to create a model, when in fact, it is difficult to obtain the data. ...

We will be using Gaussian blobs from SciKit-Learn to generate simulated clusters. View the code on Gist.

This is a classic dataset for regression models. View the code on Gist.

This creates clusters which are normally distributed and assigns an equal number of clusters for each class you choose. Interdependence between features and var...

This transformer turns lists of mappings (dict-like objects) of feature names to feature values into Numpy arrays or scipy.sparse matrices for use with scikit-l...