We are all familiar with the term “the bell curve.” But just what does that mean? In this article, we will answer that question and describe some of the important features of the normal bell curve, which, in statistical lingo, is called the standard normal curve and looks something like this:
Standard Normal Distribution
A set of data that fits a perfect bell curve has what is called a standard normal distribution. In the standard normal distribution, the mean (µ, the Greek letter mu) of the data set is in the exact middle of the distribution, and there is a specific percentage of values within specified standard deviations. Here are the very particular features of a “true” standard normal bell curve:
- The mean, median, and mode are all the same value.
- Approximately 68.26% of all the data values lie within one standard deviation ( ) of the mean. (34.13% from = -1 to = 0 and 34.13% from = 0 to = 1.)
- Approximately 95.44% of all the data values lie within two standard deviations of the mean. (47.72% from = -2 to = 0 and 47.72% from = 0 to = 2.)
- Approximately 99.74% of all the data values lie within three standard deviations of the mean. (49.87% from = -3 to = 0 and 49.87% from = 0 to = 3.)
As a demonstration of what this means, let’s say we have a distribution of values that fits the standard normal bell curve with a mean (µ) of 50 and a standard deviation ( ) of 10. In this distribution, then, 68.26% of all the values is between 50 – 10 = 40 and 50 + 10 = 60, 95.44% of all the values are between 50 – 2(10) = 30 and 50 + 2(10) = 70, and 99.74% of all the values are between 50 – 3(10) = 20 and 50 + 3(10) = 80.
In reality, most sets of data will not a have a perfect standard normal distribution. However, we use the concept of the standard normal distribution to compare the data that are collected in experiments to determine how much a specific value differs from the mean and if the difference is statistically significant – meaning it is not likely to have occurred by chance – which is an indication that the independent variable influenced the value. For example, say the population mean on a test given to students each year is 73 with a standard deviation of 4. A new learning program is used by one group of students, and the mean score on the test by this group is 82. Because the value of 82 falls more than two standard deviations away from the mean, there is a high probability that the new learning program was influential in producing this high score – that it was not likely due to random chance. If the group had a mean score of 77, which is just one standard deviation away from the population mean, then there is a good chance that the slightly higher-than-usual score was due to chance – that the new program was not necessarily effective.
Kurtosis is the characteristic that describes how “skinny” or “fat” the bell of the normal bell curve is. Or, looking at the tails, if they are thin (“light-tailed”) or wide (“heavy-tailed”). A light-tailed distribution will have fewer extreme values while a heavy-tailed distribution will have more extreme values, though in both cases, the values will still be evenly distributed throughout the distribution.
There are formulas for calculating the kurtosis of a distribution, though they will not be discussed in this article. The formulas, in essence, measure the weight of the tails relative to the rest of the distribution. There are at least two different scales for kurtosis. One scale uses 0 as the kurtosis of a true normal distribution, with light-tailed distributions having a negative kurtosis and heavy-tailed distributions having a positive kurtosis. Another scale uses 3 for the kurtosis of a standard-normal distribution while a light-tailed distribution has kurtosis less than 3 and a heavy-tailed distribution has kurtosis greater than 3. The measure of kurtosis of a bell-shaped distribution helps us to understand the probability of a value being in the tail of the distribution. The higher the kurtosis value, the more likely a specific value will be in the tail.
If you’re interested in bell curves and distributions, perhaps a master’s degree in data science is the right career move for you.