Network analysis is an area of data science that examines the relationships among things, especially human beings. A goal of network analysis is to use data to determine regular features and rules that govern the network. Sports, with their human interaction, are a microcosm of the larger world and are a good source for network analysis. One such analysis was undertaken to determine the best tennis player in the history of tennis.
The study conducted by the author, Filippo Radicchi, began with the download of data on all professional tennis matches played in Grand Slam or ATP World Tour tournaments between January 1968 and October 2010 from the website of the Association of Tennis Professionals (ATP). This data included 3,700 players in 133,261 matches during 3,640 tournaments.
Analysis of the data took the form of counting the number of wins and losses each player had against opponents. One result of this analysis was a determination that a very large number of players played in a relatively few number of matches while a small number of players played in a large number of matches. That is, there were many players who played in just a few of these tournaments in their professional careers. This finding was in line with discoveries from other sports in which many players had short careers and few players had long careers.
So, Who is the Best?
To determine who was the best tennis player with statistical precision, network analysis was conducted on the subset of the entire player pool, which consisted of only the 24 players who were ranked number one during the years of the study. The author used a complex system of equations that were, in part, based on the wins and losses between pairs of players. Using the subset of number-one-ranked players added the element of “quality” of victories to the natural choice of “quantity” of victories, because the network for the subset of players included only victories against the other top-ranked players in the subset. However, not every number one ranked player played every other player on the list. John Newcombe, for example, did not play any matches against any other player in the subset in a major tournament. He ended up number 26 on the list of all-time best players.
In the end, the network analysis for the 43-year period of the study indicated that the best player overall was Jimmy Connors of the United States, who played from 1970 to 1996, with Ivan Lendl (1978-1994) second and John McEnroe (1976-1994) third. Separate analyses based on type of court surface were also conducted. These analyses concluded that Guillermo Villas (fourth overall, 1969-1992) was the best player on clay, Jimmy Connors was the best player on grass, and Andre Agassi (fifth overall, 1986-2006) was the best on hard courts.
To develop a deeper understanding of network analysis, consider pursuing a degree in data science or analytics.