Apply function to Series and DataFrame using .map() and .applymap()
Apply function to Series and DataFrame

Applying a function to a pandas Series or DataFrame

In [1]:
import pandas as pd
In [4]:
url = 'http://bit.ly/kaggletrain'
train = pd.read_csv(url)
train.head(3)
Out[4]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S

map() function as a Series method
Mostly used for mapping categorical data to numerical data

In [8]:
# create new column
train['Sex_num'] = train.Sex.map({'female':0, 'male':1})
In [9]:
# let's compared Sex and Sex_num columns
# here we can see we map male to 1 and female to 0
train.loc[0:4, ['Sex', 'Sex_num']]
Out[9]:
Sex Sex_num
0 male 1
1 female 0
2 female 0
3 female 0
4 male 1

apply() function as a Series method
Applies a function to each element in the Series

In [10]:
# say we want to calculate length of string in each string in "Name" column

# create new column
# we are applying Python's len function
train['Name_length'] = train.Name.apply(len)
In [12]:
# the apply() method applies the function to each element
train.loc[0:4, ['Name', 'Name_length']]
Out[12]:
Name Name_length
0 Braund, Mr. Owen Harris 23
1 Cumings, Mrs. John Bradley (Florence Briggs Th... 51
2 Heikkinen, Miss. Laina 22
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) 44
4 Allen, Mr. William Henry 24
In [16]:
import numpy as np

# say we look at the "Fare" column and we want to round it up
# we will use numpy's ceil function to round up the numbers
train['Fare_ceil'] = train.Fare.apply(np.ceil)
In [17]:
train.loc[0:4, ['Fare', 'Fare_ceil']]
Out[17]:
Fare Fare_ceil
0 7.2500 8.0
1 71.2833 72.0
2 7.9250 8.0
3 53.1000 54.0
4 8.0500 9.0
In [19]:
# let's extract last name of each person

# we will use a str method
# now the series is a list of strings
# each cell has 2 strings in a list as you can see below
train.Name.str.split(',').head()
Out[19]:
0                           [Braund,  Mr. Owen Harris]
1    [Cumings,  Mrs. John Bradley (Florence Briggs ...
2                            [Heikkinen,  Miss. Laina]
3      [Futrelle,  Mrs. Jacques Heath (Lily May Peel)]
4                          [Allen,  Mr. William Henry]
Name: Name, dtype: object
In [22]:
# we just want the first string from the list
# we create a function to retrieve
def get_element(my_list, position):
    return my_list[position]
In [23]:
# use our created get_element function
# we pass position=0
train.Name.str.split(',').apply(get_element, position=0).head()
Out[23]:
0       Braund
1      Cumings
2    Heikkinen
3     Futrelle
4        Allen
Name: Name, dtype: object
In [25]:
# instead of above, we can use a lambda function
# input x (the list in this case)
# output x[0] (the first string of the list in this case)
train.Name.str.split(',').apply(lambda x: x[0]).head()
Out[25]:
0       Braund
1      Cumings
2    Heikkinen
3     Futrelle
4        Allen
Name: Name, dtype: object
In [27]:
# getting the second string
train.Name.str.split(',').apply(lambda x: x[1]).head()
Out[27]:
0                                Mr. Owen Harris
1     Mrs. John Bradley (Florence Briggs Thayer)
2                                    Miss. Laina
3             Mrs. Jacques Heath (Lily May Peel)
4                              Mr. William Henry
Name: Name, dtype: object

apply() function as a DataFrame method
Applies a function on either axis of the DataFrame

In [30]:
url = 'http://bit.ly/drinksbycountry'
drinks = pd.read_csv(url)
drinks.head()
Out[30]:
country beer_servings spirit_servings wine_servings total_litres_of_pure_alcohol continent
0 Afghanistan 0 0 0 0.0 Asia
1 Albania 89 132 54 4.9 Europe
2 Algeria 25 0 14 0.7 Africa
3 Andorra 245 138 312 12.4 Europe
4 Angola 217 57 45 5.9 Africa
In [32]:
drinks.loc[:, 'beer_servings':'wine_servings'].head()
Out[32]:
beer_servings spirit_servings wine_servings
0 0 0 0
1 89 132 54
2 25 0 14
3 245 138 312
4 217 57 45
In [33]:
# you want apply() method to travel axis=0 (downwards, column) 
# apply Python's max() function
drinks.loc[:, 'beer_servings':'wine_servings'].apply(max, axis=0)
Out[33]:
beer_servings      376
spirit_servings    438
wine_servings      370
dtype: int64
In [34]:
# you want apply() method to travel axis=1 (right, row) 
# apply Python's max() function
drinks.loc[:, 'beer_servings':'wine_servings'].apply(max, axis=1)
Out[34]:
0        0
1      132
2       25
3      312
4      217
5      128
6      221
7      179
8      261
9      279
10      46
11     176
12      63
13       0
14     173
15     373
16     295
17     263
18      34
19      23
20     167
21     173
22     173
23     245
24      31
25     252
26      25
27      88
28      37
29     144
      ...
163    178
164     90
165    186
166    280
167     35
168     15
169    258
170    106
171      4
172     36
173     36
174    197
175     51
176     51
177     71
178     41
179     45
180    237
181    135
182    219
183     36
184    249
185    220
186    101
187     21
188    333
189    111
190      6
191     32
192     64
dtype: int64
In [35]:
# finding which column is the maximum's category name
drinks.loc[:, 'beer_servings':'wine_servings'].apply(np.argmax, axis=1)
Out[35]:
0        beer_servings
1      spirit_servings
2        beer_servings
3        wine_servings
4        beer_servings
5      spirit_servings
6        wine_servings
7      spirit_servings
8        beer_servings
9        beer_servings
10     spirit_servings
11     spirit_servings
12     spirit_servings
13       beer_servings
14     spirit_servings
15     spirit_servings
16       beer_servings
17       beer_servings
18       beer_servings
19       beer_servings
20       beer_servings
21     spirit_servings
22       beer_servings
23       beer_servings
24       beer_servings
25     spirit_servings
26       beer_servings
27       beer_servings
28       beer_servings
29       beer_servings
            ...
163    spirit_servings
164      beer_servings
165      wine_servings
166      wine_servings
167    spirit_servings
168    spirit_servings
169    spirit_servings
170      beer_servings
171      wine_servings
172      beer_servings
173      beer_servings
174      beer_servings
175      beer_servings
176      beer_servings
177    spirit_servings
178    spirit_servings
179      beer_servings
180    spirit_servings
181    spirit_servings
182      beer_servings
183      beer_servings
184      beer_servings
185      wine_servings
186    spirit_servings
187      beer_servings
188      beer_servings
189      beer_servings
190      beer_servings
191      beer_servings
192      beer_servings
dtype: object

applymap() as a DataFrame method
Applies function to every element

In [37]:
drinks.loc[:, 'beer_servings': 'wine_servings'].applymap(float).head()
Out[37]:
beer_servings spirit_servings wine_servings
0 0.0 0.0 0.0
1 89.0 132.0 54.0
2 25.0 0.0 14.0
3 245.0 138.0 312.0
4 217.0 57.0 45.0
In [41]:
# overwrite existing table

drinks.loc[:, 'beer_servings': 'wine_servings'] = drinks.loc[:, 'beer_servings': 'wine_servings'].applymap(float)
drinks.head()
Out[41]:
country beer_servings spirit_servings wine_servings total_litres_of_pure_alcohol continent
0 Afghanistan 0.0 0.0 0.0 0.0 Asia
1 Albania 89.0 132.0 54.0 4.9 Europe
2 Algeria 25.0 0.0 14.0 0.7 Africa
3 Andorra 245.0 138.0 312.0 12.4 Europe
4 Angola 217.0 57.0 45.0 5.9 Africa
Tags: pandas