Easy tabular data reading and manipulation with pandas
Reading Tabular Data

This introduction to pandas is derived from Data School's pandas Q&A with my own notes and code.

Reading a tabular data file into pandas

Tabular data file examples

  • csv
  • excel
  • table-like data format
In [1]:
# import pandas
import pandas as pd
In [8]:
# reading a well-formatted .tsv file
url = 'http://bit.ly/chiporders'
orders = pd.read_table(url)
orders.head()
Out[8]:
order_id quantity item_name choice_description item_price
0 1 1 Chips and Fresh Tomato Salsa NaN $2.39
1 1 1 Izze [Clementine] $3.39
2 1 1 Nantucket Nectar [Apple] $3.39
3 1 1 Chips and Tomatillo-Green Chili Salsa NaN $2.39
4 2 2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98

read_table assumptions

  • file is separated by tabs
  • presence of a header role
In [10]:
url2 = 'http://bit.ly/movieusers'
users = pd.read_table(url2)
users.head()
Out[10]:
1|24|M|technician|85711
0 2|53|F|other|94043
1 3|23|M|writer|32067
2 4|24|M|technician|43537
3 5|33|F|other|15213
4 6|42|M|executive|98101

Issues

  • Separator is a pipe character
    • We need to tell pandas that this is the separator using sep=
  • There is no header
    • We need to use header=None
    • We can add a row of names for the columns using names=user_cols
In [15]:
user_cols = ['user_id', 'age', 'gender', 'occupation', 'zip_code']
users = pd.read_table(url2, sep='|', header=None, names=user_cols)
users.head()
Out[15]:
user_id age gender occupation zip_code
0 1 24 M technician 85711
1 2 53 F other 94043
2 3 23 M writer 32067
3 4 24 M technician 43537
4 5 33 F other 15213

Tips

  • If you've a data file where you've some text at the top and bottom of the file
    • skiprows=None
      • Skip rows at the top or bottom
    • skipfooter=None
Tags: pandas