Vectorize your data
Vectorization Methods

Pandas vectorized methods

Concepts covered

  1. Lambda Refresher
    • lambda functions are small inline functions that are defined on-the-fly in Python
    • lambda x: x>= 1 will take an input x and return x>=1, or a boolean that equals True or False.
  2. map()
    • create a new Series by applying the lambda function to each element
    • can only be used on a Series to return a new Series
  3. applymap()
    • create a new DataFrame by applying the lambda function to each element
    • can only be used on a DataFrame to return a new DataFrame
  4. df.apply(numpy.mean)
    • Get mean of every column in a DataFrame
    • Exactly the same as df.mean()
In [50]:
import pandas as pd
import numpy as np
In [51]:
# columns
columns = ['one', 'two']
In [52]:
# index
index = ['a', 'b', 'c', 'd']
In [53]:
# lists
one = [1, 2, 3, 4]
two = [1, 2, 3, 4]
In [54]:
# dictionary
d = {
    'one': one,
    'two': two
}
In [55]:
# DataFrame
df = pd.DataFrame(d, columns=col, index=index)
In [56]:
df
Out[56]:
one two
a 1 1
b 2 2
c 3 3
d 4 4
In [58]:
# mean of every single column in df
df.apply(np.mean)
Out[58]:
one    2.5
two    2.5
dtype: float64
In [61]:
# you can use a pandas command too
df.mean()
Out[61]:
one    2.5
two    2.5
dtype: float64
In [64]:
# .map() on particular columns (Series)
# goes through every value in column and evaluate if it's > 1
df['one'].map(lambda x: x >= 1)
Out[64]:
a    True
b    True
c    True
d    True
Name: one, dtype: bool
In [65]:
df.applymap(lambda x: x >= 1)
Out[65]:
one two
a True True
b True True
c True True
d True True
Tags: pandas