Dimensionality Reduction

Motivation of dimensionality reduction, Principal Component Analysis (PCA), and applying PCA.

1. Motivation

I would like to give full credits to the respective authors as these are my personal python notebooks taken from deep learning courses from Andrew Ng, Data School and Udemy :) This is a simple python notebook hosted generously through Github Pages that is on my main personal notes repository on https://github.com/ritchieng/ritchieng.github.io. They are meant for my personal review but I have open-source my repository of personal notes as a lot of people found it useful.

1a. Motivation I: Data Compression

You are able to reduce the dimension of the data from 2D to 1D
- For example, pilot skill and pilot happiness can be reduced to pilot’s aptitude
- Generally, you can reduce x1 and x2 to z1
Your are able to reduce the dimension of the data from 3D to 2D
- Project the data such that they lie on a plane
- Specify two axes
  - z1
  - z2
- You would then be able to reduce the data’s dimension from 3D to 2D

1b. Motivation II: Visualization

Given a set of data, how are able to examine the data such as this?
We can use reduce the data’s dimensionality from 50D to 2D
- Typically we do not know what the 2 dimensions’ meanings are
- But we can make sense of out of the 2 dimensions

2. Principal Component Analysis (PCA)

2a. PCA Problem Formation

Let’s say we have the following 2D data
1. We can project with a diagonal line (red line)
  - PCA reduces the blue lines (the projection error)
    - Before performing PCA, perform mean normalization (mean = 0) and feature scaling
2. We can also project with another diagonal line (magenta)
  - But the projection errors are much larger
  - Hence PCA would choose the red line instead of this magenta line
Goal of PCA
- It’s trying to find a lower dimensional surface onto which to project the data, so as to minimize this squared projection error
- To minimize the square distance between each point and the location of where it gets projected.
PCA is not linear regression
- PCA is a minimization of the orthogonal distance

2b. Principal Component Analysis Algorithm

Data pre-processing step
- You must always do this before doing PCA
PCA intuition
- You need to compute the vector or vectors
  - Left graph: compute vector z(1)
  - Right graph: compute vector z(1) and z(2)
Procedure
- You can use eig (eigen) or svd (singular value decomposition) but the later is more stable
  - You can use any library in other languages that does singular value decomposition
  - You will get 3 matrices: U, S and V
  - But we only need matrix U where we manipulate to get z that is a k x 1 vector
Summary of PCA algorithm in octave

3. Applying PCA

3a. Reconstruction from Compressed Representation

We can go from lower dimensionality to higher dimensionality

3b. Choosing the Number of Principal Components

k is the number of principal components
- But how do we choose k?
There is a more efficient method on the right compared to the left
- We then use the S matrix for calculations
You would realise that PCA can retain a high percentage of the variance even after compressing the number of dimensions of the data

3c. Advice for Applying PCA

Supervised learning
- For many data sets, we can reduce by 5-10x easily to ensure our learning algorithm runs much faster
Application of PCA
1. Compression
  - Reduce memory or disk needed to store data
  - Speed up learning algorithm
    - We choose k by percentage of variance retained
2. Visualization
  - We choose only k = 2 or k = 3
Bad uses of PCA
1. To prevent over-fitting
  - Regularization is better because it is less likely to throw away valuable information as it knows the labels
2. Running PCA without consideration

Tags: