Introduction to Machine Learning: Supervised and Unsupervised Learning
I would like to give full credits to the respective authors as these are my personal python notebooks taken from deep learning courses from Andrew Ng, Data School and Udemy :) This is a simple python notebook hosted generously through Github Pages that is on my main personal notes repository on https://github.com/ritchieng/ritchieng.github.io. They are meant for my personal review but I have opensource my repository of personal notes as a lot of people found it useful.
Examples of ML
 Database Mining
 Applications that can’t program by hand
 Handwriting recognition
 NLP
 Selfcustomising programs
 Amazon
 Netflix recommendation systems
 Understand human learning
 Brain
 Real AI
1. Supervised Learning
 Gave data set answers
 Regression: predict continuous valued output
1a. Classification
 Discrete valued output (0 or 1)
 Example: Breast cancer (malignant and benign)
 Tumor size
 Age
 Classify 2 clusters to determine which is more likely
1c. Problem 1 (Regression Problem)
You have a large inventory of identical items. You want to predict how many of these items will sell within the next 3 months. 10000 items (continuous value)
1d. Problem 2 (Classification Problem)
You’d like a software to examine individual accounts and decide, for each account, if it has been hacked/compromised. 0: not hacked 1: hacked
2. Unsupervised Learning
 Making sense of data; patterns we don’t know in advance
 Examples
 Social network analysis
 Market segmentation
 Astronomical data analysis
2a. Cocktail Party Problem
 Give 2 audio recording
 Feed to unsupervised learning algorithm
 Find structure
 Separate audio sources
2b. Unsupervised Learning Problems
 Given a set of news articles found on the web, group them into set of articles about the same story
 Given a database of customer data, automatically discover market segments and group customers into different market segments
3. Model and Cost Function
3a. Model Representation
 Supervised learning: given the right answer for each example in the data
 Regression: predict realvalued output
 Classification: predict discretevalued output
 Training set: original data
 m = number of training examples
 x = input variable/features
 y = output variable/target
 (x, y) = one training example
 (x^(i), y^(i)) = ith training example
 Representing hypothesis h
 linear regression with one variable (univariate linear regression)
 h(x) = a + bx
 a,b: parameters
 h(x) = a + bx
 linear regression with one variable (univariate linear regression)
3b. Cost Function
 Minimise squared error function
3c. h(x) and J(theta): 1 Parameter
 Assuming simplified cost function with only theta1
 Minimising J(theta) would fit the data most well
3d. h(x) and J(theta0, theta1): 2 Parameters

3D graph

Contour graphs
 Minimum is in the smallest concentric circle
4. Parameter Learning
4a. Gradient Descent

Concept

Gradient Descent Algorithm
 repeat until convergence
 a:= b (this means assignment)
 a = b (truth assertion)
 alpha (number, learning rate)
 large: aggressive gradient descent
 derivative: slope of J(theta)
4b. Gradient Descent Intuition
 Derivative intuition
 Alpha intuition
 If already at local optima –> derivative = 0
 Theta will not change!
 Steps will be smaller even with alpha fixed
 This is because derivative decreases (slope decreases) nearing local minimum
 If already at local optima –> derivative = 0
4c. Gradient Descent for Linear Regression
 Apply gradient descent algorithm to linear regression
 For derivative: d (single parameter), delta (multiple derivative, partial differentiation)
 Plug J(theta_0, theta_1) into Gradient Descent’s derivative
 Cost function for linear regression will always be convex function
 One global minimum
 Gradient descent for linear regression
 Keep changing parameters to reach global minimum
 This is called “Batch Gradient Descent”
 Each step uses all the training examples (batch)