This introduction to pandas is derived from Data School's pandas Q&A with my own notes and code.

Using "groupby" in pandas¶

import pandas as pd

url = 'http://bit.ly/drinksbycountry'
drinks = pd.read_csv(url)

drinks.head()

# get mean of the beer_servings' column
drinks.beer_servings.mean()

106.16062176165804

# using .groupby
drinks.groupby('continent').beer_servings.mean()

continent
Africa            61.471698
Asia              37.045455
Europe           193.777778
North America    145.434783
Oceania           89.687500
South America    175.083333
Name: beer_servings, dtype: float64

# here we are accessing all of Africa in the column "continent
drinks[drinks.continent=='Africa'].head()

drinks[drinks.continent=='Africa'].mean()

beer_servings                   61.471698
spirit_servings                 16.339623
wine_servings                   16.264151
total_litres_of_pure_alcohol     3.007547
dtype: float64

drinks[drinks.continent=='Africa'].beer_servings.mean()

61.471698113207545

drinks[drinks.continent=='Europe'].beer_servings.mean()

193.77777777777777

This is the same as the number given when we used .groupby

This is because we are grouping beer_servings by the continent

.groupby max and min

drinks.groupby('continent').beer_servings.max()

continent
Africa           376
Asia             247
Europe           361
North America    285
Oceania          306
South America    333
Name: beer_servings, dtype: int64

drinks.groupby('continent').beer_servings.min()

continent
Africa            0
Asia              0
Europe            0
North America     1
Oceania           0
South America    93
Name: beer_servings, dtype: int64

Aggregate findings

drinks.groupby('continent').beer_servings.agg(['count', 'min', 'max', 'mean'])

You can get mean of all numeric columns instead of specifying beer_servings

drinks.groupby('continent').mean()

Visualization

# allow plots to appear in notebook using matplotlib
%matplotlib inline

data = drinks.groupby('continent').mean()
data

data.plot(kind='bar')

<matplotlib.axes._subplots.AxesSubplot at 0x1179ad1d0>

	country	beer_servings	spirit_servings	wine_servings	total_litres_of_pure_alcohol	continent
0	Afghanistan	0	0	0	0.0	Asia
1	Albania	89	132	54	4.9	Europe
2	Algeria	25	0	14	0.7	Africa
3	Andorra	245	138	312	12.4	Europe
4	Angola	217	57	45	5.9	Africa

	count	min	max	mean
continent
Africa	53	0	376	61.471698
Asia	44	0	247	37.045455
Europe	45	0	361	193.777778
North America	23	1	285	145.434783
Oceania	16	0	306	89.687500
South America	12	93	333	175.083333