The story of the missing bullet holes is a legendary one (in the world of statistics and military air history). The tale is one of insightful analysis overcoming the assumptions made by most of those who were analyzing the data. The story is told well by Jordan Ellenberg in his book How Not to Be Wrong: The Power of Mathematical Thinking (2014). A shorter version is recounted here.
During World War II, American planes were being shot down at a high rate. To reduce this number, the American military wanted to add armor to the planes. But armor is heavy, and heavy planes are slower, use more fuel and are more difficult to maneuver, so it had to be added in strategic places to minimize weight but maximize protection.
To help determine where to place the armor, the military collected data in the form of counting the number of bullet holes in various parts of airplanes after they returned from combat missions. These are the figures Ellenberg listed in his book (p. 4).
Analyzing the Assumptions About the Data
Most of those who analyzed the data came to the conclusion that the fuselage was taking the most hits and thus should get the armor. But Abraham Wald said no, that was not right. Wald was a brilliant statistician, and one of many eastern Europeans who left during the Nazi occupation to help the Allied war effort against their homeland. So where on the plane did Wald recommend putting the armor?
Wald said it should go over the engines. Why? Because while we would expect bullet holes to be uniformly distributed over all parts of the planes, there were clearly fewer bullet holes in the engines than elsewhere on the planes. Wald wondered where the holes in the engines were and concluded that they were on the planes that were shot down. That is, the planes with high numbers of bullet holes in the fuselage and other parts of the plane other than the engine were returning. But those with higher numbers of bullet holes in the engine did not return. And that makes sense, since without working engines, the planes will not fly.
Wald’s main analysis that provided the solution was not so much in analyzing the data but in analyzing the assumptions about the data. The other analysts assumed that the returning planes whose bullet holes were counted were representative of all the planes. Wald understood that they were not and that solution to the problem was in the missing holes. Following Wald’s advice, planes were outfitted with armor to protect their engines, and the number of lost planes started to decline and an untold number of lives saved not only during the rest of World War II but in future conflicts, including Vietnam and Korea.
The lesson here is to carefully consider your assumptions about the data, the sample from which you draw the data, or any other aspect of the situation. While it is almost impossible to avoid some assumptions in the course of collecting and analyzing data, you must do your best to not allow those assumptions to cloud your judgment of what the data mean or to extend interpretations of the results of data analysis to situations that are different than those represented by the sample. If you are interested in better understanding how insightful analysis can be used in overcoming the assumptions, consider pursuing a degree in data science.