Technical Notes

Artificial Intelligence & Machine Learning

Individuals sometimes find it difficult in converting their ideas into working code. This can most certainly be the case when working with Data Science. These technical notes should serve as a resource to quickly become acquited with certain concepts bringing you closer to the solution you are looking to create. If you have an interest in statistics, data science, or scientific programming – but most especially a passion for learning – these resources are for you.

 

Machine Learning

Vectors, Matrices, And Arrays

  • Transpose A Vector Or A Matrix
  • Selecting Elements In An Array
  • Reshape An Array
  • Invert A Matrix
  • Getting The Diagonal Of A Matrix
  • Flatten A Matrix
  • Find The Rank Of A Matrix
  • Find The Maximum And Minimum
  • Describe An Array
  • Create A Vector
  • Create A Sparse Matrix
  • Create A Matrix
  • Converting A Dictionary Into A Matrix
  • Calculate The Trace Of A Matrix
  • Calculate The Determinant Of A Matrix
  • Calculate The Average, Variance, And Standard Deviation
  • Calculate Dot Product Of Two Vectors
  • Apply Operations To Elements
  • Adding And Subtracting Matrices

Basics

Preprocessing Structured Data

  • Convert Pandas Categorical Data For Scikit-Learn
  • Delete Observations With Missing Values
  • Deleting Missing Values
  • Detecting Outliers
  • Discretize Features
  • Encoding Ordinal Categorical Features
  • Handling Imbalanced Classes With Downsampling
  • Handling Imbalanced Classes With Upsampling
  • Handling Outliers
  • Impute Missing Values With Means
  • Imputing Missing Class Labels
  • Imputing Missing Class Labels Using k-Nearest Neighbors
  • Normalizing Observations
  • One-Hot Encode Features With Multiple Labels
  • One-Hot Encode Nominal Categorical Features
  • Preprocessing Categorical Features
  • Preprocessing Iris Data
  • Rescale A Feature
  • Standardize A Feature

Preprocessing Images

  • Binarize Images
  • Blurring Images
  • Cropping Images
  • Detect Edges
  • Enhance Contrast Of Color Image
  • Enhance Contrast Of Greyscale Image
  • Harris Corner Detector
  • Installing OpenCV
  • Isolate Colors
  • Load Images
  • Remove Backgrounds
  • Save Images
  • Sharpen Images
  • Shi-Tomasi Corner Detector

Preprocessing Text

  • Bag Of Words
  • Parse HTML
  • Remove Punctuation
  • Remove Stop Words
  • Replace Characters
  • Stemming Words
  • Strip Whitespace
  • Tag Parts Of Speech
  • Term Frequency Inverse Document Frequency
  • Tokenize Text

Preprocessing Dates And Times

  • Break Up Dates And Times Into Multiple Features
  • Calculate Difference Between Dates And Times
  • Convert Strings To Dates
  • Convert pandas Columns Time Zone
  • Encode Days Of The Week
  • Handling Missing Values In Time Series
  • Handling Time Zones
  • Lag A Time Feature
  • Rolling Time Window
  • Select Date And Time Ranges

Feature Engineering

  • Dimensionality Reduction On Sparse Feature Matrix
  • Dimensionality Reduction With Kernel PCA
  • Dimensionality Reduction With PCA
  • Feature Extraction With PCA
  • Group Observations Using K-Means Clustering
  • Selecting The Best Number Of Components For LDA
  • Selecting The Best Number Of Components For TSVD
  • Using Linear Discriminant Analysis For Dimensionality Reduction

Feature Selection

  • ANOVA F-value For Feature Selection
  • Chi-Squared For Feature Selection
  • Drop Highly Correlated Features
  • Recursive Feature Elimination
  • Variance Thresholding Binary Features
  • Variance Thresholding For Feature

Model Evaluation

  • Accuracy
  • Create Baseline Classification Model
  • Create Baseline Regression Model
  • Cross Validation Pipeline
  • Cross Validation With Parameter Tuning Using Grid Search
  • Cross-Validation
  • Custom Performance Metric
  • F1 Score
  • Generate Text Reports On Performance
  • Nested Cross Validation
  • Plot The Learning Curve
  • Plot The Receiving Operating Characteristic Curve
  • Plot The Validation Curve
  • Precision
  • Recall
  • Split Data Into Training And Test Sets

Model Selection

  • Find Best Preprocessing Steps During Model Selection
  • Hyperparameter Tuning Using Grid Search
  • Hyperparameter Tuning Using Random Search
  • Model Selection Using Grid Search
  • Pipelines With Parameter Optimization

Linear Regression

  • Adding Interaction Terms
  • Create Interaction Features
  • Effect Of Alpha On Lasso Regression
  • Lasso Regression
  • Linear Regression
  • Linear Regression Using Scikit-Learn
  • Ridge Regression
  • Selecting The Best Alpha Value In Ridge Regression

Logistic Regression

  • Fast C Hyperparameter Tuning
  • Handling Imbalanced Classes In Logistic Regression
  • Logistic Regression
  • Logistic Regression On Very Large Data
  • Logistic Regression With L1 Regularization
  • One Vs. Rest Logistic Regression

Trees And Forests

  • Adaboost Classifier
  • Decision Tree Classifier
  • Decision Tree Regression
  • Feature Importance
  • Feature Selection Using Random Forest
  • Handle Imbalanced Classes In Random Forest
  • Random Forest Classifier
  • Random Forest Classifier Example
  • Random Forest Regression
  • Select Important Features In Random Forest
  • Titanic Competition With Random Forest
  • Visualize A Decision Tree

Nearest Neighbors

  • Identifying Best Value Of k
  • K-Nearest Neighbors Classification
  • Radius-Based Nearest Neighbor Classifier

Support Vector Machines

  • Calibrate Predicted Probabilities In SVC
  • Find Nearest Neighbors
  • Find Support Vectors
  • Imbalanced Classes In SVM
  • Plot The Support Vector Classifiers Hyperplane
  • SVC Parameters When Using RBF Kernel
  • Support Vector Classifier

Naive Bayes

  • Bernoulli Naive Bayes Classifier
  • Calibrate Predicted Probabilities
  • Gaussian Naive Bayes Classifier
  • Multinomial Logistic Regression
  • Multinomial Naive Bayes Classifier
  • Naive Bayes Classifier From Scratch

Clustering

  • Agglomerative Clustering
  • DBSCAN Clustering
  • Evaluating Clustering
  • Meanshift Clustering
  • Mini-Batch k-Means Clustering
  • k-Means Clustering

Deep Learning

Keras

  • Adding Dropout
  • Convolutional Neural Network
  • Feedforward Neural Network For Binary Classification
  • Feedforward Neural Network For Multiclass Classification
  • Feedforward Neural Networks For Regression
  • LSTM Recurrent Neural Network
  • Neural Network Early Stopping
  • Neural Network Weight Regularization
  • Preprocessing Data For Neural Networks
  • Save Model Training Progress
  • Tuning Neural Network Hyperparameters
  • Visualize Loss History
  • Visualize Neural Network Architecutre
  • Visualize Performance History
  • k-Fold Cross-Validating Neural Networks

Python

Basics

  • Add Padding Around String
  • All Combinations For A List 
  • Apply Operations Over Items In A List
  • Applying Functions To List Items
  • Arithmetic Essentials
  • Assignment Operators
  • Basic Operations With NumPy Array
  • Breaking Up String Variables
  • Brute Force D20 Roll Simulator
  • Cartesian Product
  • Chain Together Lists
  • Cleaning Text
  • Compare Two Dictionaries
  • Concurrent Processing
  • Continue And Break Loops
  • Convert HTML Characters To Strings
  • Converting Strings To Datetime
  • Create A New File Then Write To It
  • Create A Temporary File
  • Data Structure Basics
  • Date And Time Basics
  • Dictionary Basics
  • Display Scientific Notation As Floats
  • Exiting A Loop
  • Find The Max Value In A Dictionary
  • Flatten Lists Of Lists
  • For Loop
  • Formatting Numbers
  • Function Annotation Examples
  • Function Basics
  • Functions Vs. Generators
  • Generating Random Numbers With NumPy
  • Generator Expressions
  • Hard Wrapping Text
  • How To Use Default Dicts
  • If Else On Any Or All Elements
  • Indexing And Slicing NumPy Arrays
  • Indexing And Slicing NumPy Arrays
  • Iterate An Ifelse Over A List
  • Iterate Over Multiple Lists Simultaneously
  • Iterating Over Dictionary Keys
  • Lambda Functions
  • Logical Operations
  • Looping Over Two Lists
  • Mathematical Operations
  • Mocking Functions
  • Nested For Loops Using List Comprehension
  • Nesting Lists
  • Numpy Array Basics
  • Parallel Processing
  • Partial Function Applications
  • Priority Queues
  • Queues And Stacks
  • Recursive Functions
  • Scheduling Jobs In The Future
  • Select Random Element From A List
  • Selecting Items In A List With Filters
  • Set The Color Of A Matplotlib Plot
  • Sort A List Of Names By Last Name
  • Sort A List Of Strings By Length
  • Store API Credentials For Open Source Projects
  • String Formatting
  • String Indexing
  • String Operations
  • Swapping Variable Values
  • Try, Except, and Finally
  • Unpacking A Tuple
  • Unpacking Function Arguments
  • Using Named Tuples To Store Data
  • any(), all(), max(), min(), sum()
  • if and if else
  • repr vs. str
  • while Statement

Data Wrangling

  • Apply Functions By Group In Pandas
  • Apply Operations To Groups In Pandas
  • Applying Operations Over pandas Dataframes
  • Assign A New Column To A Pandas DataFrame
  • Break A List Into N-Sized Chunks
  • Breaking Up A String Into Columns Using Regex In pandas
  • Columns Shared By Two Data Frames
  • Construct A Dictionary From Multiple Lists
  • Convert A CSV Into Python Code To Recreate It
  • Convert A Categorical Variable Into Dummy Variables
  • Convert A Categorical Variable Into Dummy Variables
  • Convert A String Categorical Variable To A Numeric Variable
  • Convert A Variable To A Time Variable In pandas
  • Count Values In Pandas Dataframe
  • Create A Pipeline In Pandas
  • Create A pandas Column With A For Loop
  • Create Counts Of Items
  • Create a Column Based on a Conditional in pandas
  • Creating Lists From Dictionary Keys And Values
  • Crosstabs In pandas
  • Delete Duplicates In pandas
  • Descriptive Statistics For pandas Dataframe
  • Dropping Rows And Columns In pandas Dataframe
  • Enumerate A List
  • Expand Cells Containing Lists Into Their Own Variables In Pandas
  • Filter pandas Dataframes
  • Find Largest Value In A Dataframe Column
  • Find Unique Values In Pandas Dataframes
  • Geocoding And Reverse Geocoding
  • Geolocate A City And Country
  • Geolocate A City Or Country
  • Group A Time Series With pandas
  • Group Data By Time
  • Group Pandas Data By Hour Of The Day
  • Grouping Rows In pandas
  • Hierarchical Data In pandas
  • Join And Merge Pandas Dataframe
  • List Unique Values In A pandas Column
  • Load A JSON File Into Pandas
  • Load An Excel File Into Pandas
  • Load Excel Spreadsheet As pandas Dataframe
  • Loading A CSV Into pandas
  • Long To Wide Format
  • Lower Case Column Names In Pandas Dataframe
  • Make New Columns Using Functions
  • Map External Values To Dataframe Values in pandas
  • Missing Data In pandas Dataframes
  • Moving Averages In pandas
  • Normalize A Column In pandas
  • Pivot Tables In pandas
  • Quickly Change A Column Of Strings In Pandas
  • Random Sampling Dataframe
  • Ranking Rows Of Pandas Dataframes
  • Regular Expression Basics
  • Regular Expression By Example
  • Reindexing pandas Series And Dataframes
  • Rename Column Headers In pandas
  • Rename Multiple pandas Dataframe Column Names
  • Replacing Values In pandas
  • Saving A pandas Dataframe As A CSV
  • Search A pandas Column For A Value
  • Select Rows When Columns Contain Certain Values
  • Select Rows With A Certain Value
  • Select Rows With Multiple Filters
  • Selecting pandas DataFrame Rows Based On Conditions
  • Simple Example Dataframes In pandas
  • Sorting Rows In pandas Dataframes
  • Split Lat/Long Coordinate Variables Into Seperate Variables
  • Streaming Data Pipeline
  • String Munging In Dataframe
  • Using List Comprehensions With pandas
  • Using Seaborn To Visualize A pandas Dataframe
  • pandas Data Structures
  • pandas Time Series Basics

Data Visualization

  • Back To Back Bar Plot In MatPlotLib
  • Bar Plot In MatPlotLib
  • Color Palettes in Seaborn
  • Creating A Time Series Plot With Seaborn And pandas
  • Creating Scatterplots With Seaborn
  • Group Bar Plot In MatPlotLib
  • Histograms In MatPlotLib
  • Making A Matplotlib Scatterplot From A Pandas Dataframe
  • Matplotlib, A Simple Example
  • Pie Chart In MatPlotLib
  • Scatterplot In MatPlotLib
  • Stacked Percentage Bar Plot In MatPlotLib

Web Scraping

  • Beautiful Soup Basic HTML Scraping
  • Drilling Down With Beautiful Soup
  • Monitor A Website For Changes With Python

Testing

  • Simple Unit Test
  • Test Code Speed
  • Test For A Specific Exception
  • Test If Output Is Close To A Value
  • Testable Documentation

Other

  • Generate Tweets Using Markov Chains
  • Mine Twitter’s Stream For Hashtags Or Words
  • Simple Clustering With SciPy
  • What Is The Probability An Economy Class Seat Is An Aisle Seat?

Statistics

Frequentist

  • Bessels Correction
  • Demonstrate The Central Limit Theorem
  • Pearsons Correlation Coefficient
  • Probability Mass Functions
  • Spearmans Rank Correlation
  • T-Tests
  • Variance And Standard Deviation

Regular Expressions

  • Match A Symbol
  • Match A Unicode Character
  • Match A Word
  • Match Any Character
  • Match Any Of A List Of Characters
  • Match Any Of A Series Of Options
  • Match Any Of A Series Of Words
  • Match Dates
  • Match Email Addresses
  • Match Exact Text
  • Match Integers Of Any Length
  • Match Text Between HTML Tags
  • Match Times
  • Match URLs
  • Match US Phone Numbers
  • Match US and UK Spellings
  • Match Words With A Certain Ending
  • Match ZIP Codes

Mathematics

  • argmin and argmax

Software Engineering

Algorithms

  • Big-O Notation
  • Binary Search
  • Bubble Sort
  • Insertion Sort
  • Selection Sort

Cloud Computing

  • GitHub Cheatsheet
  • Run Project Jupyter Notebooks On Amazon EC2