Motivation of dimensionality reduction, Principal Component Analysis (PCA), and applying PCA.

## 1. Motivation

I would like to give full credits to the respective authors as these are my personal python notebooks taken from deep learning courses from Andrew Ng, Data School and Udemy :) This is a simple python notebook hosted generously through Github Pages that is on my main personal notes repository on https://github.com/ritchieng/ritchieng.github.io. They are meant for my personal review but I have open-source my repository of personal notes as a lot of people found it useful.

### 1a. Motivation I: Data Compression

• You are able to reduce the dimension of the data from 2D to 1D
• For example, pilot skill and pilot happiness can be reduced to pilot’s aptitude
• Generally, you can reduce x1 and x2 to z1
• Your are able to reduce the dimension of the data from 3D to 2D
• Project the data such that they lie on a plane
• Specify two axes
• z1
• z2
• You would then be able to reduce the data’s dimension from 3D to 2D

### 1b. Motivation II: Visualization

• Given a set of data, how are able to examine the data such as this?
• We can use reduce the data’s dimensionality from 50D to 2D
• Typically we do not know what the 2 dimensions’ meanings are
• But we can make sense of out of the 2 dimensions

## 2. Principal Component Analysis (PCA)

### 2a. PCA Problem Formation

• Let’s say we have the following 2D data
1. We can project with a diagonal line (red line)
• PCA reduces the blue lines (the projection error)
• Before performing PCA, perform mean normalization (mean = 0) and feature scaling
2. We can also project with another diagonal line (magenta)
• But the projection errors are much larger
• Hence PCA would choose the red line instead of this magenta line
• Goal of PCA
• It’s trying to find a lower dimensional surface onto which to project the data, so as to minimize this squared projection error
• To minimize the square distance between each point and the location of where it gets projected.
• PCA is not linear regression
• PCA is a minimization of the orthogonal distance

### 2b. Principal Component Analysis Algorithm

• Data pre-processing step
• You must always do this before doing PCA
• PCA intuition
• You need to compute the vector or vectors
• Left graph: compute vector z(1)
• Right graph: compute vector z(1) and z(2)
• Procedure
• You can use eig (eigen) or svd (singular value decomposition) but the later is more stable
• You can use any library in other languages that does singular value decomposition
• You will get 3 matrices: U, S and V
• But we only need matrix U where we manipulate to get z that is a k x 1 vector
• Summary of PCA algorithm in octave

## 3. Applying PCA

### 3a. Reconstruction from Compressed Representation

• We can go from lower dimensionality to higher dimensionality

### 3b. Choosing the Number of Principal Components

• k is the number of principal components
• But how do we choose k?
• There is a more efficient method on the right compared to the left
• We then use the S matrix for calculations
• You would realise that PCA can retain a high percentage of the variance even after compressing the number of dimensions of the data

### 3c. Advice for Applying PCA

• Supervised learning
• For many data sets, we can reduce by 5-10x easily to ensure our learning algorithm runs much faster
• Application of PCA
1. Compression
• Reduce memory or disk needed to store data
• Speed up learning algorithm
• We choose k by percentage of variance retained
2. Visualization
• We choose only k = 2 or k = 3