# The Best “Average”? That Depends.

The word “average” in layman terms is what is called the mean in statistics. But the word “average” in statistical terms can refer to any of the three measures of central tendency: mean, median and mode. As a quick review, the mean arithmetic average and is found by adding all the values in a data set and dividing by the total number of values. The median is the value in the middle of the data set when the data is in numerical order. The mode is the data value that occurs the most in the set. For any particular data set, we may be not be able to find all three of these measures, and when we can, determining which is the “best” average is sometimes a matter of judgment.

## The Mode and Median

In most cases in which quantitative data are collected, the mean is the preferred measure of central tendency. However, some types of data do not lend themselves to even having a mean, and in other cases, the mean is not the best practical measure of the data. Here are a couple of examples of these situations.

If the data are nominal – that is, counts – such as in elections, the only measure of central tendency that applies is the mode. That is, we just need to know which category of data, such as which candidate for an office, has the highest frequency – the most votes. In categorical data in which the categories are not specific values, it is not even possible to find a mean or median.

The median (along with quartiles, deciles, and percentiles) are used to segment the data into equal groups, regardless of the specific values. So the median is best used when we want to divide the data set into two equal groups. One use of the median is with income data. Reports of incomes in the United States use the median income not the mean (though it can be calculated) as the average because the median shows the point that half of the country’s wage earners are below. (In 2014, for example, the U.S. median household income was \$53,657.)

## The Mean or Not the Mean

When the data that are collected have specific numerical values, the mean is usually the best choice for the measure of central tendency. The reason the mean is the best is that it takes into account all the values in the data set. Consider this set of hypothetical data, which might represent the salaries (in thousands) of employees at a company:

30        30        30        30        30        30        30        32        32        32

32        32        34        34        34        34        38        38        38        45

For these data, we get

mean = 32.6

median = 32

mode = 30

While these values are relatively close, the mean would be the preferred measure of central tendency because it was calculated using all the data values.

With data that are normally or nearly-normally distributed, the mean, median and mode are generally close to each other. When there are fluctuations, but not severely skewed data, the mean is still generally considered the best measure of central tendency. However skewed data or data sets with outliers may have a better measure of central tendency than the mean.

Let’s take a look at how an outlier affects the measures of central tendency, in particular, the mean. An outlier is a data value that falls well outside the majority of the data set. Let’s take a look at the set of data given above but with one change (the last value is 100 instead of 45) and find the mean, median, and mode.

30        30        30        30        30        30        30        32        32        32

32        32        34        34        34        34        38        38        38        100

Computing the measures of central tendency again, we find the following:

mean = 36

median = 32

mode = 30

So which of these values “best” represents the data? Here, the outlier has a big effect on the mean, bringing it up by more than 4 (\$4,000) so that it is larger than all but the highest four values in the data set. The median, which did not change, represents the midway mark of all the values, and in this data set is a fairer “average” than the mean. The mode is also a better measure than the mean, but as the mode is the lowest value in the data set, it would not be considered a better measure of central tendency than the median.