Measures: Central Tendency

In many cases when we collect data, we determine where they tend to gather. We want to find where the ‘crowd’ of numbers is. This is called the measures of central tendency. The three main measures of central tendency are the mean, median and mode. In any given set of data, these may be three very different values or values that are the same or close to the same. Let’s take a brief look at each of these measures and then show how to find each using a small set of data.

Mean

The mean of a set of data is what we usually think of as the “average,” though in data science, “average” can be any of the measures of central tendency. We get the mean by adding up all the values in the data set and dividing the sum by the number of values.

Median

The median of a set of data is the value that is in the middle (like the median of the road) when the values are in numerical order. If there are an even number of values, then the median is the average of the two middle values. (Add the two middle values and divide by 2.)

Mode

The mode of a set of data is the value that occurs the most often. We find this by determining the frequency of each value (that is, how many of each value is in the data set). A data set could have more than one mode, though in most cases, if there are more than two values that occur more than any of the others, the data set is said to have no mode. A data set with two modes is called bimodal.

Central Tendency Example

Let us consider a small data set, one with 18 data values. This set might represent the ages of students in a particular college class.

19, 21, 29, 23, 22, 17, 30, 19, 48, 23, 26, 23, 19, 26, 23, 20, 22, 20

The first step in analyzing data is usually to organize it, and this generally involves putting the values in order, most often from least to greatest. Even though it is not necessary to do this to find the mean of the data set, it is needed to find the median and it makes the work in finding the mode easier. If you are doing this manually, be sure that you have all the data values from the original set in your newly-ordered list.

17, 19, 19, 19, 20, 20, 21, 22, 22, 23, 23, 23, 23, 26, 26, 29, 30, 48

To find the mean of this data set, we add the values and then divide by the total number of values, which we know is 18.

(17 + 19 + 19 + 19 + 20 + 20 + 21 + 22 + 22 + 23 + 23 + 23 + 23 + 26 + 26 + 29 + 30 + 48)/18

= 430/18

= 23.888…

Rounding this to one decimal place, we get a mean age of 23.9 years.

To determine the median – which basically divides the set of numbers into two equal parts – we find which value (if there are an odd number of data values in the set) or two values (if there are an even number of data values in the set) are in the middle of the set. We do this by dividing the number of values, in this case 18, by 2. Here, we get 9. Because there are an even number of values, the median falls between the ninth and tenth values in the ordered set of values. These are highlighted below:

17, 19, 19, 19, 20, 20, 21, 22, 22, 23, 23, 23, 23, 26, 26, 29, 30, 48

To find the actual median, we now find the average of the two middle values, which, here is 22.5 (years).

[If there are an odd number of values in the data set, we find the middle value by adding 1 to the total number of values and then dividing by 2. So if a data set has 17 values, we add 1 to it, giving us 18, and then divide by 2, giving us 9. This means that the ninth value in the data set is the median. In the set above, if we removed the value 48, the median would be 22, the ninth value in the set.

Finally, we can find the mode by counting to find the frequency of each value. With the values in order, this is an easy task:

17 → 1

19 → 3

20 → 2

21 → 1

22 → 2

23 → 4

26 → 2

29 → 1

30 → 1

48 → 1

Looking at the list above, we see that the value 23 occurs more than any other value. Therefore, the mode of this data set is 23.

In practice, the work in finding the mean, median and mode will most likely be carried out by software. But it is important to understand what the numbers mean, to develop a deeper understanding of central tendency you may want to consider pursuing a degree in data science.