Polynomial regression with scikit-learn
Polynomial Regression

Polynomial regression

Using numpy's polyfit

  • numpy.polyfit(x, y, deg)
  • Least squares polynomial fit
  • Returns a vector of coefficients p that minimises the squared error.
In [1]:
import numpy as np
In [2]:
# create arrays of fake points
x = np.array([0.0, 1.0, 2.0, 3.0,  4.0,  5.0])
y = np.array([0.0, 0.8, 0.9, 0.1, -0.8, -1.0])
In [4]:
# fit up to deg=3
z = np.polyfit(x, y, 3)
z
Out[4]:
array([ 0.08703704, -0.81349206,  1.69312169, -0.03968254])

Using scikit-learn's PolynomialFeatures

  • Generate polynomial and interaction features
  • Generate a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree
In [24]:
# Import
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
In [25]:
# Create matrix and vectors
X = [[0.44, 0.68], [0.99, 0.23]]
y = [109.85, 155.72]
X_test = [0.49, 0.18]
In [28]:
# PolynomialFeatures (prepreprocessing)
poly = PolynomialFeatures(degree=2)
X_ = poly.fit_transform(X)
X_test_ = poly.fit_transform(X_test)
/Users/ritchieng/anaconda3/envs/py3k/lib/python3.5/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)
/Users/ritchieng/anaconda3/envs/py3k/lib/python3.5/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)
In [31]:
# Instantiate
lg = LinearRegression()

# Fit
lg.fit(X_, y)

# Obtain coefficients
lg.coef_
Out[31]:
array([  0.        ,  19.4606578 , -15.92235638,  27.82874066,
        -2.52988551, -14.48934431])
In [32]:
# Predict
lg.predict(X_test_)
Out[32]:
array([ 126.84247142])