Changing Data Type in Pandas
Changing data type

This introduction to pandas is derived from Data School's pandas Q&A with my own notes and code.

Changing data type of a pandas Series

In [1]:
import pandas as pd
In [2]:
url = 'http://bit.ly/drinksbycountry'
drinks = pd.read_csv(url)
In [3]:
drinks.head()
Out[3]:
country beer_servings spirit_servings wine_servings total_litres_of_pure_alcohol continent
0 Afghanistan 0 0 0 0.0 Asia
1 Albania 89 132 54 4.9 Europe
2 Algeria 25 0 14 0.7 Africa
3 Andorra 245 138 312 12.4 Europe
4 Angola 217 57 45 5.9 Africa
In [5]:
drinks.dtypes
Out[5]:
country                          object
beer_servings                     int64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
continent                        object
dtype: object

Data type summary

  • 3 integers (int64)
  • 1 floating (float64)
  • 2 objects (object)

Method 1: Change datatype after reading the csv

In [8]:
# to change use .astype() 
drinks['beer_servings'] = drinks.beer_servings.astype(float)
In [10]:
drinks.dtypes
Out[10]:
country                          object
beer_servings                   float64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
continent                        object
dtype: object

Method 2: Change datatype before reading the csv

In [11]:
drinks = pd.read_csv(url, dtype={'beer_servings':float})
In [12]:
drinks.dtypes
Out[12]:
country                          object
beer_servings                   float64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
continent                        object
dtype: object
In [13]:
url = 'http://bit.ly/chiporders'
orders = pd.read_table(url)
In [14]:
orders.head()
Out[14]:
order_id quantity item_name choice_description item_price
0 1 1 Chips and Fresh Tomato Salsa NaN $2.39
1 1 1 Izze [Clementine] $3.39
2 1 1 Nantucket Nectar [Apple] $3.39
3 1 1 Chips and Tomatillo-Green Chili Salsa NaN $2.39
4 2 2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98
In [15]:
orders.dtypes
Out[15]:
order_id               int64
quantity               int64
item_name             object
choice_description    object
item_price            object
dtype: object

The issue here is how pandas don't recognize item_price as a floating object

In [18]:
# we use .str to replace and then convert to float
orders['item_price'] = orders.item_price.str.replace('$', '').astype(float)
In [19]:
orders.dtypes
Out[19]:
order_id                int64
quantity                int64
item_name              object
choice_description     object
item_price            float64
dtype: object
In [20]:
# we can now calculate the mean
orders.item_price.mean()
Out[20]:
7.464335785374397

To find out whether a column's row contains a certain string by return True or False

In [22]:
orders['item_name'].str.contains('Chicken').head()
Out[22]:
0    False
1    False
2    False
3    False
4     True
Name: item_name, dtype: bool
In [23]:
# convert to binary value
orders['item_name'].str.contains('Chicken').astype(int).head()
Out[23]:
0    0
1    0
2    0
3    0
4    1
Name: item_name, dtype: int64
Tags: pandas