Sampling Data

We often talk about the top 25% or top 10% or even top 1% of something. When we are segmenting data into percentages we commonly are talking about quartiles, deciles and percentiles. Quartiles divide the data into four parts; deciles divide the data into 10 parts; percentiles divide the data into 100 parts. Let’s take a look at how these different types of divisions are used.

Quartiles

The quartiles of a data set divide the data into four equal parts, with one-fourth of the data values in each part. The second quartile position is the median of the data set, which divides the data set in half. To find the median position of the data set, divide the total number of data values (n) by 2. If there are an even number of data values, the median is the value that is the average of the value in the position and the + 1 position. (If there are an odd number of data values, the median is the value in the position.) For example, if the data set has 20 values, then the median is the average of the data values in the = 10th and + 1 = 10 + 1 = 11th position. For example, in the data set below, with 20 values, the median is the average of 9 and 11, which is 10.

The first quartile is the median of the first half of the data set and marks the point at which 25% of the data values are lower and 75% are higher. The third quartile is the median of the second half of the data set and marks the point at which 25% of the data values are higher and 75% lower. In the data set above, there are ten data values in each half, so the first quartile is the average of the values in the fifth and sixth positions (both of which are 5, so the first quartile is 5) and the third quartile is the average of the values in the fifteenth and sixteenth positions (17 and 20, so the third quartile is 18.5).

Quartiles are often used as a measure of spread of the data in what is called the interquartile range (IQR). The IQR is simply the difference between the third quartile and first quartile. Thus, in the sample data set given above, the IRQ is 18.5 – 5 = 13.5. While on its own the IQR is not a very useful measure, it can be useful when comparing the spread of two different data sets that measure the same phenomenon.

Deciles and Percentiles

Deciles and percentiles are usually applied to large data sets. Deciles divide a data set into ten equal parts. One example of the use of deciles is in school awards or rankings. Students in the top 10% — or highest decile – may be given an honor cord or some other recognition. If there are 578 students in a graduating class, the top 10%, or 58 students, may be given the award. At the opposite end of the scale, students who score in the bottom 10% or 20% on a standardized test may be given extra assistance to help boost their scores.

Percentiles divide the data set into groupings of 1%. Standardized tests often report percentile scores. These scores help compare students’ performances to that of their peers (often across a state or country). The meaning of a percentile score is often misunderstood. A percentile score in this situation reflects the percentage of students who scored at or above that particular group of students. For example, students who receive a percentile ranking of 87 on a particular test received scores that were equal to or higher than 87% of students who took the test. For those who do not understand these scores, they often mistake them for the score the student received on the test.

Growth charts are another common example of an application of percentiles. To help doctors and parents determine if a child is developing normally, his or her measurements are compared to others in the same sex and age groups. The figure below shows a growth chart (from the My Growth Charts website) that gives the percentiles for height and weight for boys ages 0 to 5 years. A two-year-old boy who is 33 inches long, for example, is in the 12th percentile, meaning he is taller than or the same height as only 12% of all boys of his age. However, he weighs 31 pounds, putting him in the 89th percentile, making him heavier than or as heavy as 89% of his peers.

Learn More

If segmenting data is of interest to you, click here to learn more about data science degrees.