Angelique Kerber, No. 1 in women’s tennis…since weeks!

This Monday September 12th will be a historic day for German female tennis. Angelique Kerber will be the first German player since Steffi Graf in 1996 who is ranked number one in the WTA ranking.

Winning the Australian Open in the beginning of this year, reaching the final of Wimbledon and then winning the US Open,  one could definitely say that she finally deserves it.  I would even go a step further and say it is overdue for a few weeks! To “prove” this claim, I grabbed all WTA matches since 1968 (yeah, I know Angelique wasn’t even alive then) until 29 August 2016 from here and here and built my own women’s tennis ranking with the power of Google’s PageRank.

Tennis and PageRank

Google’s famous PageRank Algorithm to rank websites has been used in an abundance of other settings to rank things.  In a scientific paper in 2011, Radicchi used it to rank male tennis players to determine the best player in history (Spoiler alert: Jimmy Connors). I will not go that far and will just try to give an alternative leaderboard for the WTA using the PageRank algorithm.

The ranking of the WTA is calculated weekly (an exception is when there is a grand slam tournament) and all games of the previous 52 weeks are used in the calculation. For the upcoming analyses, I also used games of a 52 weeks window to construct the necessary network for the PageRank algorithm.

Each match is represented as a directed link, pointing from the loser to the winner. You can think of this as a flow of  “prestige”. Losing a match comes with a loss of prestige which is added to the winners prestige. The better the losing player was, the more prestige the winner earns from beating her/him.

Links of the same direction are aggregated in order to form a weighted directed network of players. As a side note: weighted indegree would give the total number of wins and outdegree the number of losses.

Angelique vs. Serena

The current WTA ranking was released  29 August 2016. I constructed the network for this ranking period (29 August 2015 – 29 August 2016) and applied PageRank on it.

The slopegraph below shows a comparison of the Top10 of the WTA and PageRank.


So Angelique would have been number one at least since the last ranking period. In fact, Serena is actually only ranked 4th. In general, however,the rankings are fairly comparable. The biggest upset is Venus Williams, ranked 6th in WTA and 14th in PageRank.

I was curious if Angie (not Merkel) has been ranked number one before the 29th of August and expanded my analysis a bit. The figure below shows the evolution of the position according to PageRank since January 2015 for Serena and Angie.

pr_serena_angieAngie is actually ranked number one since the 4th of July. Even before that, she was ranked number one several times. What is, however, clearly visible is that Serena was quite dominant in the last two years.

PageRank Number One since 1968

Since I have all WTA matches since 1968, why not do an analysis on the whole data set? I thus calculated all PageRank rankings for a moving 52 week window of games and extracted the number one of each ranking. I put together an interactive timeline with the googleVis package of R, which can be viewed by clicking on the image below.


Martina Navratilova seems to be the most dominant player, considering her long periods as PageRank number one.

Although I did not want to do it, I aggregated all matches in the dataset into one huge network and applied PageRank yet again.

The all time Top10 according to PageRank is given below.

Rank Player
1 Martina Navratilova
2 Steffi Graf
3 Serena Williams
4 Chris Evert
5 Lindsay Davenport
6 Venus Williams
7 Arantxa Sanchez Vicario
8 Martina Hingis
9 Monica Seles
10 Gabriela Sabatini

Now we know that Jimmy Connors and Martina Navratilova were the best tennis players in history.

Game, Set and Match

This post is not as detailed as I wanted it to be, but I wanted to finish it before the US Open Final is done. I will add more details on the R coding I did in the upcoming weeks.

Posted in: Data Analysis, Network Analysis |

Tagged with: , , ,

Written by Dmathlete