Pipeline, sliding windows, artificial data synthesis, and ceiling analysis.

## Photo OCR

I would like to give full credits to the respective authors as these are my personal python notebooks taken from deep learning courses from Andrew Ng, Data School and Udemy :) This is a simple python notebook hosted generously through Github Pages that is on my main personal notes repository on https://github.com/ritchieng/ritchieng.github.io. They are meant for my personal review but I have open-source my repository of personal notes as a lot of people found it useful.

### Problem Description and Pipeline

• Photo OCR (Optical Character Recognition) Problem
1. Given picture, detect location of text in the picture
2. Read text at that location
• Photo OCR Pipeline
1. Text detection
2. Character segmentation
3. Character classification
• First character “A”, second “D”, and so on
• When you design a machine learning algorithm, one of the most important steps is defining the pipeline
• A sequence of steps or components for the algorithms
• Each step/module can be worked on by different groups to split the workload

### Sliding Windows

• In order to detect things in images we can use an example of pedestrian detection
• We can identify the pedestrians easily because the aspect ratio of most pedestrians are similar
• Supervised learning for pedestrian detection
• x = pixels in 82 x 36 image patches
• We can train a neural network to classify image patch as either containing a pedestrian or not
• Sliding window detection
• We slide a green box (82 x 36) with a defined step-size/stride
• We continue sliding the window over the whole image
• We can take a large box and resize to 82 x 36
• That’s how we train we train a supervised learning classifier to identify pedestrians
• Text detection
• Positive examples (y = 1), patches with text
• Negative examples (y = 0), patches without text
• Let us run a sliding window classifier on the image
• We have (on the bottom left) white areas that indicate text areas
• Bright white: classifier output a very high probability of text in the location
• If we take one more text by taking the output of the classifier and apply an expansion operator
• It takes the white region and expand them
• If we use heuristics and discard those with abnormal height-to-width ratio
• Now we have the text
• We can start with the green rectangle and slide the window
• Should we put a split in the window?
• Train a NN to recognize the text
• Photo OCR pipeline summary

### Getting Lots of Data and Artificial Data

• Artificial data synthesis
• Creating data from scratch
• If we have a small training set, we turn that into a large training set
• Example of artificial data synthesis for photo OCR: Method 1 (new data)
• We can take free fonts, copy the alphabets and paste them on random backgrounds
• As you can see, the image on the right are synthesized
• Example of artificial data synthesis for photo OCR: Method 2 (distortion)
• We can distort existing examples to create new data
• In this case, the way to distort is through warping the image
• Discussion on getting more data
1. Make sure you have a low bias (high variance) classifier before expending the effort to get more data
• Plot the learning curves to find out
• Keep increasing the number of features or number of hidden units in the neural network until you have a low bias classifier
2. How much work would it be to get 10x as much data as you currently have
• Artificial data synthesis
• Collect/label it yourself
• Crowd course
• Hire people on the web to label data (amazon mechanical turk)

### Ceiling Analysis: What Part of the Pipeline to Work on Next

• Ceiling analysis
• When you have a team working on a pipeline machine learning system
• This gives you an indication on which part of the pipeline is worth working on
• Ceiling analysis definition
• Estimating the errors due to each component
• Photo OCR example
• Choose any metric you would like
• Overall system
• Text detection
• By putting a check mark on “text detection”
• Going to go to the test set and give it the correct answers
• It’s as if you have a perfect text detection system
• Check the accuracy of the whole system (72% to 89%: 17% improvement)
• You run the algorithm and go to the next component in the pipeline
• You give it the correct “character segmentation”
• Check accuracy of the whole system (89% to 90%: 1% only)
• You run the algorithm mon the last component in the pipeline
• Check accuracy of the whole system (90% to 100%: 10%)
• This shows the upside potential from each component
• Another ceiling analysis example: face recognition from images
• Components most worthwhile
• Perfect face detection (5.9%)
• Perfect eye segmentation (4%)
• Do not use your gut feeling
• Use ceiling analysis
Tags: