Working with date and time in pandas
Working with Date and Time

This introduction to pandas is derived from Data School's pandas Q&A with my own notes and code.

Working with dates and times in pandas

In [1]:
import pandas as pd
In [2]:
url = 'http://bit.ly/uforeports'
ufo = pd.read_csv(url)
In [3]:
ufo.head()
Out[3]:
City Colors Reported Shape Reported State Time
0 Ithaca NaN TRIANGLE NY 6/1/1930 22:00
1 Willingboro NaN OTHER NJ 6/30/1930 20:00
2 Holyoke NaN OVAL CO 2/15/1931 14:00
3 Abilene NaN DISK KS 6/1/1931 13:00
4 New York Worlds Fair NaN LIGHT NY 4/18/1933 19:00
In [4]:
ufo.dtypes
Out[4]:
City               object
Colors Reported    object
Shape Reported     object
State              object
Time               object
dtype: object
In [10]:
# 5 characters from the end to 3 characters from the end
# this is quite a chore
ufo.Time.str.slice(-5, -3).astype(int).head()
Out[10]:
0    22
1    20
2    14
3    13
4    19
Name: Time, dtype: int64
In [14]:
# we will convert the Time column to datatime format
# there are many options to ensure this works well with your data
ufo['Time'] = pd.to_datetime(ufo.Time)
ufo.head()
Out[14]:
City Colors Reported Shape Reported State Time
0 Ithaca NaN TRIANGLE NY 1930-06-01 22:00:00
1 Willingboro NaN OTHER NJ 1930-06-30 20:00:00
2 Holyoke NaN OVAL CO 1931-02-15 14:00:00
3 Abilene NaN DISK KS 1931-06-01 13:00:00
4 New York Worlds Fair NaN LIGHT NY 1933-04-18 19:00:00
In [12]:
ufo.dtypes
Out[12]:
City                       object
Colors Reported            object
Shape Reported             object
State                      object
Time               datetime64[ns]
dtype: object
In [16]:
ufo.Time.dt.hour.head()
Out[16]:
0    22
1    20
2    14
3    13
4    19
Name: Time, dtype: int64
In [18]:
ufo.Time.dt.weekday_name.head()
Out[18]:
0     Sunday
1     Monday
2     Sunday
3     Monday
4    Tuesday
Name: Time, dtype: object
In [19]:
ufo.Time.dt.weekday.head()
Out[19]:
0    6
1    0
2    6
3    0
4    1
Name: Time, dtype: int64
In [20]:
ufo.Time.dt.dayofyear.head()
Out[20]:
0    152
1    181
2     46
3    152
4    108
Name: Time, dtype: int64

Timestamps
We can use it for comparison and mathematical operations

In [22]:
ts = pd.to_datetime('1/1/1999')
In [25]:
ufo.loc[ufo.Time >= ts, :].head()
Out[25]:
City Colors Reported Shape Reported State Time
12832 Loma Rica NaN LIGHT CA 1999-01-01 02:30:00
12833 Bauxite NaN NaN AR 1999-01-01 03:00:00
12834 Florence NaN CYLINDER SC 1999-01-01 14:00:00
12835 Lake Henshaw NaN CIGAR CA 1999-01-01 15:00:00
12836 Wilmington Island NaN LIGHT GA 1999-01-01 17:15:00
In [26]:
ufo.Time.max()
Out[26]:
Timestamp('2000-12-31 23:59:00')
In [27]:
ufo.Time.max() - ufo.Time.min()
Out[27]:
Timedelta('25781 days 01:59:00')
In [28]:
(ufo.Time.max() - ufo.Time.min()).days
Out[28]:
25781

Plotting

In [29]:
%matplotlib inline
In [30]:
ufo['Year'] = ufo.Time.dt.year
In [31]:
ufo.head()
Out[31]:
City Colors Reported Shape Reported State Time Year
0 Ithaca NaN TRIANGLE NY 1930-06-01 22:00:00 1930
1 Willingboro NaN OTHER NJ 1930-06-30 20:00:00 1930
2 Holyoke NaN OVAL CO 1931-02-15 14:00:00 1931
3 Abilene NaN DISK KS 1931-06-01 13:00:00 1931
4 New York Worlds Fair NaN LIGHT NY 1933-04-18 19:00:00 1933
In [39]:
ufo.Year.value_counts().sort_index().plot()
Out[39]:
<matplotlib.axes._subplots.AxesSubplot at 0x118989da0>
Tags: pandas