Using String Methods in Pandas
Using String Methods

This introduction to pandas is derived from Data School's pandas Q&A with my own notes and code.

Using string methods in pandas

In [1]:
# convert string to uppercase in Python
'hello'.upper()
Out[1]:
'HELLO'

How about string methods in pandas?
There are many!

In [2]:
import pandas as pd
In [3]:
url = 'http://bit.ly/chiporders'
orders = pd.read_table(url)
In [4]:
orders.head()
Out[4]:
order_id quantity item_name choice_description item_price
0 1 1 Chips and Fresh Tomato Salsa NaN $2.39
1 1 1 Izze [Clementine] $3.39
2 1 1 Nantucket Nectar [Apple] $3.39
3 1 1 Chips and Tomatillo-Green Chili Salsa NaN $2.39
4 2 2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98

Making the item_name uppercase

In [5]:
# .str is a string method
orders.item_name.str.upper()
Out[5]:
0                CHIPS AND FRESH TOMATO SALSA
1                                        IZZE
2                            NANTUCKET NECTAR
3       CHIPS AND TOMATILLO-GREEN CHILI SALSA
4                                CHICKEN BOWL
5                                CHICKEN BOWL
6                               SIDE OF CHIPS
7                               STEAK BURRITO
8                            STEAK SOFT TACOS
9                               STEAK BURRITO
10                        CHIPS AND GUACAMOLE
11                       CHICKEN CRISPY TACOS
12                         CHICKEN SOFT TACOS
13                               CHICKEN BOWL
14                        CHIPS AND GUACAMOLE
15      CHIPS AND TOMATILLO-GREEN CHILI SALSA
16                            CHICKEN BURRITO
17                            CHICKEN BURRITO
18                                CANNED SODA
19                               CHICKEN BOWL
20                        CHIPS AND GUACAMOLE
21                           BARBACOA BURRITO
22                           NANTUCKET NECTAR
23                            CHICKEN BURRITO
24                                       IZZE
25               CHIPS AND FRESH TOMATO SALSA
26                               CHICKEN BOWL
27                           CARNITAS BURRITO
28                                CANNED SODA
29                            CHICKEN BURRITO
                        ...                  
4592                         BARBACOA BURRITO
4593                            CARNITAS BOWL
4594                            BARBACOA BOWL
4595                             CHICKEN BOWL
4596                      CHIPS AND GUACAMOLE
4597                        CANNED SOFT DRINK
4598                            BOTTLED WATER
4599                             CHICKEN BOWL
4600                      CHIPS AND GUACAMOLE
4601                        CANNED SOFT DRINK
4602                         BARBACOA BURRITO
4603                         BARBACOA BURRITO
4604                             CHICKEN BOWL
4605                      CHIPS AND GUACAMOLE
4606                        CANNED SOFT DRINK
4607                            STEAK BURRITO
4608                           VEGGIE BURRITO
4609                        CANNED SOFT DRINK
4610                            STEAK BURRITO
4611                           VEGGIE BURRITO
4612                            CARNITAS BOWL
4613                                    CHIPS
4614                            BOTTLED WATER
4615                       CHICKEN SOFT TACOS
4616                      CHIPS AND GUACAMOLE
4617                            STEAK BURRITO
4618                            STEAK BURRITO
4619                       CHICKEN SALAD BOWL
4620                       CHICKEN SALAD BOWL
4621                       CHICKEN SALAD BOWL
Name: item_name, dtype: object
In [6]:
# you can overwrite with the following code
orders.item_name = orders.item_name.str.upper()
In [7]:
orders.head()
Out[7]:
order_id quantity item_name choice_description item_price
0 1 1 CHIPS AND FRESH TOMATO SALSA NaN $2.39
1 1 1 IZZE [Clementine] $3.39
2 1 1 NANTUCKET NECTAR [Apple] $3.39
3 1 1 CHIPS AND TOMATILLO-GREEN CHILI SALSA NaN $2.39
4 2 2 CHICKEN BOWL [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98

Check presence of substring
This is useful to filter data

In [9]:
orders.item_name.str.contains('Chicken').head()
Out[9]:
0    False
1    False
2    False
3    False
4    False
Name: item_name, dtype: bool

Chain string methods

In [11]:
# replacing elements
orders.choice_description.str.replace('[', '').head()
Out[11]:
0                                                  NaN
1                                          Clementine]
2                                               Apple]
3                                                  NaN
4    Tomatillo-Red Chili Salsa (Hot), Black Beans, ...
Name: choice_description, dtype: object
In [13]:
# chain string methods
orders.choice_description.str.replace('[', '').str.replace(']', '').head()
Out[13]:
0                                                  NaN
1                                           Clementine
2                                                Apple
3                                                  NaN
4    Tomatillo-Red Chili Salsa (Hot), Black Beans, ...
Name: choice_description, dtype: object
In [16]:
# using regex to simplify the code above
orders.choice_description.str.replace('[\[\]]', '').head()
Out[16]:
0                                                  NaN
1                                           Clementine
2                                                Apple
3                                                  NaN
4    Tomatillo-Red Chili Salsa (Hot), Black Beans, ...
Name: choice_description, dtype: object
Tags: pandas