Calculate mean and median
Central Tendency

Measures of Central Tendency

Describing a distribution using measures of center

  1. Mode
    • Value (on the x-axis) at which frequency is highest
    • Other cases
      • May be a range that occured with the highest frequency
      • No mode for uniform distributions
      • May have multiple modes
      • May be a categorical mode
        • X-axis (plain and peanut)
        • y-axis (plain = 60,000, peanut = 10,000)
          • Mode = plain (x-axis)
          • 60,000 and 10,000 are frequencies
      • All scores in the dataset may not affect the mode
        • [2, 2, 3, 4, 100]
        • Mode is the same even if we add a big number 10000
      • Mode changes with each sample
        • May not be the same as the population's mode
      • Mode changes with bin sizes
      • There is no equation for calculating the mode
  2. Median
    • Value in the middle for an odd set of numbers
    • Mean of the 2 values in the middle for an even set of numbers
    • Properties
      • This will not be affected by the outlier
      • It does not take every score in the distribution
  3. Mean
    • Average
    • Properties
      • All scores of a distribution affect the mean
      • Mean can be represented by a formula
      • Many samples would have similar means
      • Mean will be affected by outliers

Calculating measures of central tendency in Pandas

In [55]:
import pandas as pd
In [56]:
url = './fb_data.csv'
data = pd.read_csv(url, header=None)
In [64]:
data
Out[64]:
0
0 0
1 69
2 123
3 137
4 174
5 240
6 241
7 256
8 258
9 322
10 366
11 376
12 408
13 479
14 555
15 589
16 600
17 777
18 784
19 822
20 850
21 863
22 1116
23 1143
24 1214
25 1250
26 1776
In [58]:
sorted(data)
data
Out[58]:
0
0 0
1 69
2 123
3 137
4 174
5 240
6 241
7 256
8 258
9 322
10 366
11 376
12 408
13 479
14 555
15 589
16 600
17 777
18 784
19 822
20 850
21 863
22 1116
23 1143
24 1214
25 1250
26 1776
In [59]:
# Since this is a pandas DataFrame, we can use mean() and median() methods
type(data)
Out[59]:
pandas.core.frame.DataFrame
In [60]:
data.mean()
Out[60]:
0    584.740741
dtype: float64
In [61]:
data.median()
Out[61]:
0    479
dtype: float64
In [63]:
# this is a uniform distribution
data.mode()
Out[63]:
0
Tags: pandas