Feast, Famine, and Sloane Stephens

Italian translation at settesei.it

Last week, Sloane Stephens reeled off an impressive series of victories, defeating Garbine Muguruza, Angelique Kerber, Victoria Azarenka, and Jelena Ostapenko to secure the title at the WTA Premier Mandatory event in Miami.  The trophy isn’t quite as life-changing as the one she claimed at the US Open last September, but it’s a close second, and the competition she faced along the way was every bit as good.

The Miami title comes with 1,000 WTA ranking points, and by adding those to her previous tally, Stephens moved into the top ten, reaching a career high No. 9 on Monday. With two high-profile championships to her name, not to mention semifinal showings last summer in Toronto and Cincinnati, there’s little doubt she deserves it. Elo isn’t quite convinced, but its more sophisticated algorithm (and its disregard for the magnitude of the US Open and Miami titles) puts her within spitting distance of the top ten as well.

What makes Stephens’s rise to the top ten so remarkable is her efficiency in converting wins to ranking points. Since her return from injury at Wimbledon last year, she has played only 38 matches, winning 24 of them. She has suffered six first-round losses, plus two more defeats at last year’s Zhuhai Elite Trophy round-robin and another pair in the Fed Cup final against Belarus. All told, in the last nine months, she has won matches at only six different events. Her unusual record illustrates some of the quirks in the ranking system, and how a player who peaks at the right times can exploit them.

24 wins is almost never enough for a spot in the vaunted top ten. From 1990 to 2017, a player has finished a season with a top-ten ranking only seven times while winning fewer than 30 matches. Only two of those involved fewer wins than Sloane’s 24: Monica Seles‘s 1993 and 1995, the timespans leading up to her tragic on-court stabbing and following her eventual comeback. Here are the top-ten seasons with the fewest victories, including the last 52 weeks of a few players currently near the top of the WTA table:

Year  Player              YE Rk   W   L  W-L %  
1995  Monica Seles*           1  11   1    92%  
1993  Monica Seles            8  17   2    89%  
2018  Sloane Stephens**       9  24  14    63%  
2010  Serena Williams         4  25   4    86%  
1993  Jennifer Capriati       9  28  10    74%  
2015  Flavia Pennetta         8  28  20    58%  
2000  Mary Pierce             7  29  11    73%  
2004  Jennifer Capriati      10  29  12    71%  
1993  Mary Joe Fernandez      7  31  12    72%  
1995  Iva Majoli              9  31  13    70%  
2018  Venus Williams**        8  31  14    69%  
1995  Mary Joe Fernandez      8  31  15    67%  
2015  Lucie Safarova          9  32  21    60%  
2008  Maria Sharapova         9  33   6    85%  
1998  Steffi Graf             9  33   9    79%  
2018  Petra Kvitova**        10  33  14    70%

* ranking frozen after her assault

** rankings as of April 2, 2018; wins and losses based on previous 52 weeks

What almost all of these seasons have in common is exceptional performances at grand slams. Sloane won the US Open; Seles won the 1993 Australian; Serena Williams won a pair of majors in 2010; Flavia Pennetta capped an otherwise anonymous 2015 campaign with a title in New York. The slams are where the rankings points are.

Even within this group of slam successes, Sloane stands out. Of the 16 players on that list, only two–Pennetta and Lucie Safarova–won matches at a lower rate than Stephens has since her comeback. In other words, most women who are this efficient with their victories don’t lose quite so early or often at lesser events.

That 63% won-loss record is even more extreme than the above list makes it look. Of the nearly 300 year-end top-tenners since 1990, only eight finished the season with a lower win rate. Here’s that list, expanded to the top 11 to include another noteworthy recent season:

Year  Player              YE Rk   W   L  W-L %  
2014  Dominika Cibulkova     10  33  24    58%  
2000  Nathalie Tauziat       10  36  26    58%  
2015  Flavia Pennetta         8  28  20    58%  
1999  Nathalie Tauziat        7  37  25    60%  
2007  Marion Bartoli         10  47  31    60%  
2015  Lucie Safarova          9  32  21    60%  
2000  Anna Kournikova         8  47  29    62%  
2010  Jelena Jankovic         8  38  23    62%  
2018  Sloane Stephens*        9  24  14    63%  
2004  Elena Dementieva        6  40  23    63%  
2016  Garbine Muguruza        7  35  20    64%

* ranking as of April 2, 2018; wins and losses based on previous 52 weeks

There’s not much overlap between these lists; the first group generally missed some time, then made up for it by scoring big at slams, while the second group slogged through a long season and leveled up with a strong finish or two at a major. The typical player with a 63% winning percentage doesn’t end up in the top ten: She wraps up the season, on average, in the mid-twenties. At least that’s better than the average 24-win season: Those result in year-end finishes near No. 40.

Stephens has always been a big-match player: She made an early splash at the 2013 Australian Open, reaching the semifinals and upsetting Serena as a 19-year-old, and her overall career record at majors (66%) is nearly ten percentage points higher than her record at other tour events (57%). For all that, she will probably not conclude 2018 with such a extreme set of won-loss numbers. To do so, she’d probably need to win a major to replace her 2017 US Open points while losing early at most other events. Recovered from injury, Stephens may maintain her feast-or-famine ways to some degree, but it’s unlikely she’ll continue to display such extreme peaks and valleys.

Measuring the Impact of Wimbledon’s Seeding Formula

Italian translation at settesei.it

Unlike every other tournament on the tennis calendar, Wimbledon uses its own formula to determine seedings. The grass court Grand Slam grants seeds to the top 32 players in each tour’s rankings, and then re-orders them based on its own algorithm, which rewards players for their performance on grass over the last two seasons.

This year, the Wimbledon seeding formula has more impact on the men’s draw than usual. Seven-time champion Roger Federer is one of the best grass court players of all time, and though he dominated hard courts in the first half of 2017, he still sits outside the top four in the ATP rankings after missing the second half of 2016. Thanks to Wimbledon’s re-ordering of the seeds, Federer will switch places with ATP No. 3 Stan Wawrinka and take his place in the draw as the third seed.

Even with Wawrinka’s futility on grass and the shakiness of Andy Murray and Novak Djokovic, getting inside the top four has its benefits. If everyone lives up to their seed in the first four rounds (they won’t, but bear with me), the No. 5 seed will face a path to the title that requires beating three top-four players. Whichever top-four guy has No. 5 in his quarter would confront the same challenge, but the other three would have an easier time of it. Before players are placed in the draw, top-four seeds have a 75% chance of that easier path.

Let’s attach some numbers to these speculations. I’m interested in the draw implications of three different seeding methods: ATP rankings (as every other tournament uses), the Wimbledon method, and weighted grass-court Elo. As I described last week, weighted surface-specific Elo–averaging surface-specific Elo with overall Elo–is more predictive than ATP rankings, pure surface Elo, or overall Elo. What’s more, weighted grass-court Elo–let’s call it gElo–is about as predictive as its peers for hard and clay courts, even though we have less grass-court data to go on. In a tennis world populated only by analysts, seedings would be determined by something a lot more like gElo and a lot less like the ATP computer.

Since gElo ratings provide the best forecasts, we’ll use them to determine the effects of the different seeding formulas. Here is the current gElo top sixteen, through Halle and Queen’s Club:

1   Novak Djokovic         2296.5  
2   Andy Murray            2247.6  
3   Roger Federer          2246.8  
4   Rafael Nadal           2101.4  
5   Juan Martin Del Potro  2037.5  
6   Kei Nishikori          2035.9  
7   Milos Raonic           2029.4  
8   Jo Wilfried Tsonga     2020.2  
9   Alexander Zverev       2010.2  
10  Marin Cilic            1997.7  
11  Nick Kyrgios           1967.7  
12  Tomas Berdych          1967.0  
13  Gilles Muller          1958.2  
14  Richard Gasquet        1953.4  
15  Stanislas Wawrinka     1952.8  
16  Feliciano Lopez        1945.3

We might quibble with some these positions–the algorithm knows nothing about whatever is plaguing Djokovic, for one thing–but in general, gElo does a better job of reflecting surface-specific ability level than other systems.

The forecasts

Next, we build a hypothetical 128-player draw and run a whole bunch of simulations. I’ve used the top 128 in the ATP rankings, except for known withdrawals such as David Goffin and Pablo Carreno Busta, which doesn’t differ much from the list of guys who will ultimately make up the field. Then, for each seeding method, we randomly generate a hundred thousand draws, simulate those brackets, and tally up the winners.

Here are the ATP top ten, along with their chances of winning Wimbledon using the three different seeding methods:

Player              ATP     W%  Wimb     W%  gElo     W%  
Andy Murray           1  23.6%     1  24.3%     2  24.1%  
Rafael Nadal          2   6.1%     4   5.7%     4   5.5%  
Stanislas Wawrinka    3   0.8%     5   0.5%    15   0.4%  
Novak Djokovic        4  34.1%     2  35.4%     1  34.8%  
Roger Federer         5  21.1%     3  22.4%     3  22.4%  
Marin Cilic           6   1.3%     7   1.0%    10   1.0%  
Milos Raonic          7   2.0%     6   1.6%     7   1.7%  
Dominic Thiem         8   0.4%     8   0.3%    17   0.2%  
Kei Nishikori         9   1.9%     9   1.7%     6   1.9%  
Jo Wilfried Tsonga   10   1.6%    12   1.4%     8   1.5%

Again, gElo is probably too optimistic on Djokovic–at least the betting market thinks so–but the point here is the differences between systems. Federer gets a slight bump for entering the top four, and Wawrinka–who gElo really doesn’t like–loses a big chunk of his modest title hopes by falling out of the top four.

The seeding effect is a lot more dramatic if we look at semifinal odds instead of championship odds:

Player              ATP    SF%  Wimb    SF%  gElo    SF%  
Andy Murray           1  58.6%     1  64.1%     2  63.0%  
Rafael Nadal          2  34.4%     4  39.2%     4  38.1%  
Stanislas Wawrinka    3  13.2%     5   7.7%    15   6.1%  
Novak Djokovic        4  66.1%     2  71.1%     1  70.0%  
Roger Federer         5  49.6%     3  64.0%     3  63.2%  
Marin Cilic           6  13.6%     7  11.1%    10  10.3%  
Milos Raonic          7  17.3%     6  14.0%     7  15.2%  
Dominic Thiem         8   7.1%     8   5.4%    17   3.8%  
Kei Nishikori         9  15.5%     9  14.5%     6  15.7%  
Jo Wilfried Tsonga   10  14.0%    12  13.1%     8  14.0%

There’s a lot more movement here for the top players among the different seeding methods. Not only do Federer’s semifinal chances leap from 50% to 64% when he moves inside the top four, even Djokovic and Murray see a benefit because Federer is no longer a possible quarterfinal opponent. Once again, we see the biggest negative effect to Wawrinka: A top-four seed would’ve protected a player who just isn’t likely to get that far on grass.

Surprisingly, the traditional big four are almost the only players out of all 32 seeds to benefit from the Wimbledon algorithm. By removing the chance that Federer would be in, say, Murray’s quarter, the Wimbledon seedings make it a lot less likely that there will be a surprise semifinalist. Tomas Berdych’s semifinal chances improve modestly, from 8.0% to 8.4%, with his Wimbledon seed of No. 11 instead of his ATP ranking of No. 13, but the other 27 seeds have lower chances of reaching the semis than they would have if Wimbledon stopped meddling and used the official rankings.

That’s the unexpected side effect of getting rankings and seedings right: It reduces the chances of deep runs from unexpected sources. It’s similar to the impact of Grand Slams using 32 seeds instead of 16: By protecting the best (and next best, in the case of seeds 17 through 32) from each other, tournaments require that unseeded players work that much harder. Wimbledon’s algorithm took away some serious upset potential when it removed Wawrinka from the top four, but it made it more likely that we’ll see some blockbuster semifinals between the world’s best grass court players.

The Steadily Less Predictable WTA

Italian translation at settesei.it

Update: The numbers in this post summarizing the effectiveness of sElo are much too high–a bug in my code led to calculating effectiveness with post-match ratings instead of pre-match ratings. The parts of the post that don’t have to do with sElo are unaffected and–I hope–remain of interest.

One of the talking points throughout the 2017 WTA season has been the unpredictability of the field. With the absence of Serena Williams, Victoria Azarenka, and until recently, Petra Kvitova and Maria Sharapova, there is a dearth of consistently dominant players. Many of the top remaining players have been unsteady as well, due to some combination of injury (Simona Halep), extreme surface preferences (Johanna Konta), and good old-fashioned regression to the mean (Angelique Kerber).

No top seed has yet won a title at the Premier level or above so far this year. Last week, Stephanie Kovalchik went into more detail, quantifying how seeds have failed to meet expectations and suggesting that the official WTA ranking system–the algorithm that determines which players get those seeds–has failed.

There are plenty of problems with the WTA ranking system, especially if you expect it to have predictive value–that is, if you want it to properly reflect the performance level of players right now. Kovalchik is correct that the rankings have done a particularly poor job this year identifying the best players. However, there’s something else going on: According to much more accurate algorithms, the WTA is more chaotic than it has been for decades.

Picking winners

Let’s start with a really basic measurement: picking winners. Through Rome, there had been more than 1100 completed WTA matches. The higher-ranked player won 62.4% of those. Since 1990, the ranking system has picked the winner of 67.9% of matches, and topped 70% during several years in the 1990s. It never fell below 66% until 2014, and this year’s 62.4% is the worst in the 28-year time frame under consideration.

Elo does a little better. It rates players by the quality of their opponents, meaning that draw luck is taken out of the equation, and does a better job of estimating the ability level of players like Serena and Sharapova, who for various reasons have missed long stretches of time. Since 1990, Elo has picked the winner of 68.6% of matches, falling to an all-time low of 63.1% so far in 2017.

For a big improvement, we need surface-specific Elo (sElo). An effective surface-based system isn’t as complicated as I expected it to be. By generating separate rankings for each surface (using only matches on that surface), sElo has correctly predicted the winner of 76.2% of matches since 1990, almost cracking 80% back in 1992. Even sElo is baffled by 2017, falling to it’s lowest point of 71.0% in 2017.

(sElo for all three major surfaces is now shown on the Tennis Abstract Elo ratings report.)

This graph shows how effectively the three algorithms picked winners. It’s clear that sElo is far better, and the graph also shows that some external factor is driving the predictability of results, affecting the accuracy of all three systems to a similar degree:

Brier scores

We see a similar effect if we use a more sophisticated method to rate the WTA ranking system against Elo and sElo. The Brier score of a collection of predictions measures not only how accurate they are, but also how well calibrated they are–that is, a player forecast to win a matchup 90% of the time really does win nine out of ten, not six out of ten, and vice versa. Brier scores average the square of the difference between each prediction and its corresponding result. Because it uses the square, very bad predictions (for instance, that a player has a 95% chance of winning a match she ended up losing) far outweigh more pedestrian ones (like a player with a 95% chance going on to win).

In 2017 so far, the official WTA ranking system has a Brier score of .237, compared to Elo of .226 and sElo of .187. Lower is better, since we want a system that minimizes the difference between predictions and actual outcomes. All three numbers are the highest of any season since 1990. The corresponding averages over that time span are .207 (WTA), .202 (Elo), and .164 (sElo).

As with the simpler method of counting correct predictions, we see that Elo is a bit better than the official ranking, and both of the surface-agnostic methods are crushed by sElo, even though the surface-specific method uses considerably less data. (For instance, the clay-specific Elo ignores hard and grass court results entirely.) And just like the results of picking winners, we see that the differences in Brier scores of the three methods are fairly consistent, meaning that some other factor is causing the year-to-year differences:

The takeaway

The WTA ranking system has plenty of issues, but its unusually bad performance this year isn’t due to any quirk in the algorithm. Elo and sElo are structured completely differently–the only thing they have in common with the official system is that they use WTA match results–and they show the same trends in both of the above metrics.

One factor affecting the last two years of forecasting accuracy is the absence of players like Serena, Sharapova, and Azarenka. If those three played full schedules and won at their usual clip, there would be quite a few more correct predictions for all three systems, and perhaps there would be fewer big upsets from the players who have tried to replace them at the top of the game.

But that isn’t the whole story. A bunch of no-brainer predictions don’t affect Brier score very much, and the presence of heavily-favored players also make it more likely that massively surprising results occur, such as Serena’s loss to Madison Brengle, or Sharapova’s ouster at the hands of Eugenie Bouchard. Many unexpected results are completely independent of the top ten, like Marketa Vondrousova’s recent title in Biel.

While some of the year-to-year differences in the graphs above are simply noise, the last several years looks much more like a meaningful trend. It could be that we are seeing a large-scale changing of a guard, with young players (and their low rankings) regularly upsetting established stars, while the biggest names in the sport are spending more time on the sidelines. Upsets may also be somewhat contagious: When one 19-year-old aspirant sees a peer beating top-tenners, she may be more confident that she can do the same.

Whatever influences have given us the WTA’s current state of unpredictability, we can see that it’s not just a mirage created by a flawed ranking system. Upsets are more common now than at any other point in recent memory, whichever algorithm you use to pick your favorites.

Playing Even Better Than Number One

Italian translation at settesei.it

Last night in Miami, Venus Williams beat newly re-minted WTA No. 1 Angelique Kerber. Venus, of course, has plenty of experience clashing with the very best in women’s tennis, with 15 Grand Slam finals and three spells at the No. 1 ranking herself.

Last night’s quarterfinal was Venus’s 37th match against a WTA No. 1  and her 15th win. Kerber became the sixth different top-ranked player to lose at the hands of the elder Williams sister.

All of these numbers are very impressive, especially when you consider that, taken as a whole, WTA No. 1s have won just over 88% of their nearly 2,300 matches since the modern ranking system was instituted. However, Venus doesn’t hold the record in any of these categories.

Records against No. 1s are a somewhat odd classification, since the best players tend to reach the top spot themselves. For example, Martina Hingis played only 11 matches against top-ranked opponents, barely one-fifth as many as the leader in that category. On the other hand, injuries and other layoffs have meant that many all-time greats have found themselves lower in the rankings for long stretches. That is particularly true of Venus and Serena Williams.

With her 37 matches played against No. 1s, Venus is approaching the top of the list, but it will take a superhuman effort to catch Arantxa Sanchez Vicario, at 51:

Rank  Player                   Matches vs No. 1
1     Arantxa Sanchez Vicario                51
2     Gabriela Sabatini                      38
3     Venus Williams                         37
4     Lindsay Davenport                      34
5     Conchita Martinez                      33
6     Helena Sukova                          31
7     Serena Williams                        28
8     Svetlana Kuznetsova                    27
-     Jana Novotna                           27
10    Amelie Mauresmo                        25
11    Maria Sharapova                        23

Wins against No. 1s is a more achievable goal. Martina Navratilova holds the current record at 18*, followed by Serena at 16, and then Lindsay Davenport and Venus at 15:

Rank  Player               Wins  Losses
1     Martina Navratilova    18*      
2     Serena Williams        16      12
3     Lindsay Davenport      15      19
-     Venus Williams         15      22
5     Steffi Graf            11       8
6     Gabriela Sabatini      10      28
7     Amelie Mauresmo         8      17
8     Svetlana Kuznetsova     7      20
-     Maria Sharapova         7      16
-     Mary Pierce             7      15
-     Justine Henin           7       9

*My database does not have rankings throughout Navratilova’s entire career, but other sources credit her with 18 wins.

Win percentage against top-ranked opponents is a bit trickier, as it depends where you set the minimum number of matches. I’ve drawn the line at five. That’s rather low, but I wanted to include Alize Cornet and Elina Svitolina, active players who have each won three of their six matches against No. 1s. By this standard, Venus ranks eighth, though equally reasonable thresholds of 8 or 10 matches would move her up two or three places:

Rank  Player             Wins  Losses   Win%
1     Steffi Graf          11       8  57.9%
2     Serena Williams      16      12  57.1%
3     Petra Kvitova         5       4  55.6%
4     Elina Svitolina       3       3  50.0%
-     Alize Cornet          3       3  50.0%
6     Lindsay Davenport    15      19  44.1%
7     Justine Henin         7       9  43.8%
8     Venus Williams       15      22  40.5%
9     Vera Zvonareva        4       7  36.4%
-     Dinara Safina         4       7  36.4%

Remember that the average player wins fewer than 12% of matches against No. 1s!

Finally, Venus’s defeat of Kerber gave her a win against her sixth different No. 1, moving her into second place in that department. As is so often the case, she trails only her sister, who has beaten seven. Oddly enough, there is very little overlap between Serena’s and Venus’s lists: Their only common victims are Hingis and Davenport. The full list:

Rank  Player               No. 1s defeated
1     Serena Williams                    7
2     Venus Williams                     6
3     Steffi Graf                        5
-     Kim Clijsters                      5
-     Amelie Mauresmo                    5
-     Maria Sharapova                    5
7     Petra Kvitova                      4
-     Lindsay Davenport                  4
-     Justine Henin                      4
-     Svetlana Kuznetsova                4

If Karolina Pliskova–who now stands within 1500 points of No. 1 and could further close the gap in Miami–reaches the top spot, Venus may get a chance to beat a 7th top player. Of course, Serena could get that chance, as well.

Measuring the Performance of Tennis Prediction Models

With the recent buzz about Elo rankings in tennis, both at FiveThirtyEight and here at Tennis Abstract, comes the ability to forecast the results of tennis matches. It’s not far fetched to ask yourself, which of these different models perform better and, even more interesting, how they fare compared to other ‘models’, such as the ATP ranking system or betting markets.

For this, admittedly limited, investigation, we collected the (implied) forecasts of five models, that is, FiveThirtyEight, Tennis Abstract, Riles, the official ATP rankings, and the Pinnacle betting market for the US Open 2016. The first three models are based on Elo. For inferring forecasts from the ATP ranking, we use a specific formula1 and for Pinnacle, which is one of the biggest tennis bookmakers, we calculate the implied probabilities based on the provided odds (minus the overround)2.

Next, we simply compare forecasts with reality for each model asking If player A was predicted to be the winner ($latex P(a) > 0.5$), did he really win the match? When we do that for each match and each model (ignoring retirements or walkovers) we come up with the following results.

Model		% correct
Pinnacle	76.92%
538		75.21%
TA		74.36%
ATP		72.65%
Riles		70.09%

What we see here is how many percent of the predictions were actually right. The betting model (based on the odds of Pinnacle) comes out on top followed by the Elo models of FiveThirtyEight and Tennis Abstract. Interestingly, the Elo model of Riles is outperformed by the predictions inferred from the ATP ranking. Since there are several parameters that can be used to tweak an Elo model, Riles may still have some room left for improvement.

However, just looking at the percentage of correctly called matches does not tell the whole story. In fact, there are more granular metrics to investigate the performance of a prediction model: Calibration, for instance, captures the ability of a model to provide forecast probabilities that are close to the true probabilities. In other words, in an ideal model, we want 70% forecasts to be true exactly in 70% of the cases. Resolution measures how much the forecasts differ from the overall average. The rationale here is, that just using the expected average values for forecasting will lead to a reasonably well-calibrated set of predictions, however, it will not be as useful as a method that manages the same calibration while taking current circumstances into account. In other words, the more extreme (and still correct) forecasts are, the better.

In the following table we categorize the set of predictions into bins of different probabilities and show how many percent of the predictions were correct per bin. This also enables us to calculate Calibration and Resolution measures for each model.

Model    50-59%  60-69%  70-79%  80-89%  90-100% Cal  Res   Brier
538      53%     61%     85%     80%     91%     .003 .082  .171
TA       56%     75%     78%     74%     90%     .003 .072  .182
Riles    56%     86%     81%     63%     67%     .017 .056  .211
ATP      50%     73%     77%     84%     100%    .003 .068  .185
Pinnacle 52%     91%     71%     77%     95%     .015 .093  .172

As we can see, the predictions are not always perfectly in line with what the corresponding bin would suggest. Some of these deviations, for instance the fact that for the Riles model only 67% of the 90-100% forecasts were correct, can be explained by small sample size (only three in that case). However, there are still two interesting cases (marked in bold) where sample size is better and which raised my interest. Both the Riles and Pinnacle models seem to be strongly underconfident (statistically significant) with their 60-69% predictions. In other words, these probabilities should have been higher, because, in reality, these forecasts were actually true 86% and 91% percent of the times.3 For the betting aficionados, the fact that Pinnacle underestimates the favorites here may be really interesting, because it could reveal some value as punters would say. For the Riles model, this would maybe be a starting point to tweak the model.

In the last three columns Calibration (the lower the better), Resolution (the higher the better), and the Brier score (the lower the better) are shown. The Brier score combines Calibration and Resolution (and the uncertainty of the outcomes) into a single score for measuring the accuracy of predictions. The models of FiveThirtyEight and Pinnacle (for the used subset of data) essentially perform equally good. Then there is a slight gap until the model of Tennis Abstract and the ATP ranking model come in third and fourth, respectively. The Riles model performs worst in terms of both Calibration and Resolution, hence, ranking fifth in this analysis.

To conclude, I would like to show a common visual representation that is used to graphically display a set of predictions. The reliability diagram compares the observed rate of forecasts with the forecast probability (similar to the above table).

The closer one of the colored lines is to the black line, the more reliable the forecasts are. If the forecast lines are above the black line, it means that forecasts are underconfident, in the opposite case, forecasts are overconfident. Given that we only investigated one tournament and therefore had to work with a low sample size (117 predictions), the big swings in the graph are somewhat expected. Still, we can see that the model based on ATP rankings does a really good job in preventing overestimations even though it is known to be outperformed by Elo in terms of prediction accuracy.

To sum up, this analysis shows how different predictive models for tennis can be compared among each other in a meaningful way. Moreover, I hope I could exhibit some of the areas where a model is good and where it’s bad. Obviously, this investigation could go into much more detail by, for example, comparing the models in how well they do for different kinds of players (e.g., based on ranking), different surfaces, etc. This is something I will spare for later. For now, I’ll try to get my sleeping patterns accustomed to the schedule of play for the Australian Open, and I hope, you can do the same.

Peter Wetz is a computer scientist interested in racket sports and data analytics based in Vienna, Austria.

Footnotes

1. $latex P(a) = a^e / (a^e + b^e) $ where $latex a $ are player A’s ranking points, $latex b $ are player B’s ranking points, and $latex e $ is a constant. We use $latex e = 0.85 $ for ATP men’s singles.

2. The betting market in itself is not really a model, that is, the goal of the bookmakers is simply to balance their book. This means that the odds, more or less, reflect the wisdom of the crowd, making it a very good predictor.

3. As an example, one instance, where Pinnacle was underconfident and all other models were more confident is the R32 encounter between Ivo Karlovic and Jared Donaldson. Pinnacle’s implied probability for Karlovic to win was 64%. The other models (except the also underconfident Riles model) gave 72% (ATP ranking), 75% (FiveThirtyEight), and 82% (Tennis Abstract). Turns out, Karlovic won in straight sets. One factor at play here might be that these were the US Open where more US citizens are likely to be confident about the US player Jared Donaldson and hence place a bet on him. As a consequence, to balance the book, Pinnacle will lower the odds on Donaldson, which results in higher odds (and a lower implied probability) for Karlovic.

Why Novak Djokovic is Still Number One

Italian translation at settesei.it

Two weeks ago, Andy Murray took over the ATP #1 ranking from Novak Djokovic. Yesterday, he defeated Djokovic in their first meeting since June, securing his place at the top of the year-end ranking table. Murray has been outstanding in the second half of this season, winning all but three of his matches since the Roland Garros final, and he capped the year in style, beating four top-five players to claim the title at the World Tour Finals.

Despite all that, Murray is not the best player in the world. That title still belongs to Djokovic. Since June, Murray has closed the gap, establishing himself as part of what we might call the “Big Two,” but he hasn’t quite ousted his rival. There’s no question that over this period, Murray has played better–that sort of thing is occasionally debatable, but this season it’s just historical fact–but identifying the best player implies something more predictive, and it’s much more difficult to determine by simply looking over a list of recent results.

The ATP rankings generally do a good job of telling us which players are better than others. But the official system has two major problems: It ignores opponent quality, and it artificially limits its scope to the last 52 weeks. Pundits and fans tend to have different problems: They often give too much credit to opponent quality (“He beat Djokovic, so now he’s number one!”) and exhibit an even more extreme recency bias (“He’s looked unbeatable this week!”).

Two systems that avoid these issues–Elo and Jrank–both place Djokovic comfortably ahead of Murray. These algorithms handle the details of recent matches and opponent quality differently from each other, but what they share in common is more important: They consider opponent quality and they don’t use an arbitrary time cutoff like the ATP ranking system does.

Here’s how the three methods would forecast a Djokovic-Murray match, were it held today:

  • ATP: Murray favored, 51.6% chance of winning
  • Elo: Djokovic favored, 61.6% chance of winning
  • Jrank: Djokovic favored, 57.0% chance of winning

Betting markets favored Djokovic by a margin of slightly more than 60/40 yesterday, though bettors probably gave him some of that edge because they thought Murray would be fatigued after his marathon match on Saturday.

As I wrote last week, Elo doesn’t deny that Murray has had a tremendous half-season. Instead, it gives him less credit than the official algorithm does for victories over lesser opponents (such as John Isner in the Paris Masters final), and it recognizes that he started his current run of form at an enormous disadvantage. With his title in London, Murray reached a new peak Elo rating, but it still isn’t enough to overtake Djokovic.

Even though Elo still prefers Novak by a healthy margin, it reflects how much the situation at the top of the ranking list has changed. At the beginning of 2016, Elo gave Djokovic a 76.5% chance of winning a head-to-head against Murray, and that probability rose as high as 81% in April. It fell below 70% after the Olympics, and the gap is now the smallest it has been since February 2011.

Last week illustrates how difficult it will be for Murray take over the #1 Elo ranking place. The pre-tournament Elo difference of 91 points between the two players has shrunk by only 8%, to 84 points. Murray’s win yesterday was worth a bit more than a measly seven points, but Djokovic had several opportunities to nudge his rating upwards in his first four matches, as well. Despite some of Novak’s head-scratching losses this fall, he still wins most of his matches–some of them against very good players–slowing the decline of his Elo rating.

Of course, Elo is just a measuring stick–like any ranking system, it doesn’t tell us what’s really happening on court. It’s possible that Murray has made a significant (and semi-permanent) leap forward or that Djokovic has taken a major step back. On the other hand, streaks happen even without such leaps, and they always end. The smart money is usually on small, gradual changes to the status quo, and Elo gives us a way to measure those changes.

For Elo to rate Murray ahead of Djokovic, it will probably require several more months of these gradual changes. The only faster alternative is for Djokovic to start losing more matches to the likes of Jiri Vesely and Sam Querrey. When faced with dramatic evidence, Elo makes more dramatic changes. While Djokovic has occasionally provided that evidence this season, he has usually offered enough proof–like four wins at the World Tour Finals–to comfortably maintain his position at the top.

Factchecking the History of the ATP Number One With Elo

Italian translation at settesei.it

As I wrote at The Economist this week, Andy Murray might sit atop the ATP rankings, but he probably isn’t the best player in tennis right now. That honor still belongs to Novak Djokovic, who comes in higher on the Elo ranking list, which uses an algorithm that is more predictive of match outcomes than the ATP table.

This isn’t the first time Elo has disagreed with the official rankings over the name at the top. Of the 26 men to have reached the ATP number one ranking, only 18 also became number one on the Elo list. A 19th player, Guillermo Coria, was briefly Elo #1 despite never achieving the same feat on the ATP rankings.

Four of the remaining eight players–Murray, Patrick Rafter, Marcelo Rios, and John Newcombe–climbed as high as #2 in the Elo rankings, while the last four–Thomas Muster, Carlos Moya, Marat Safin, and Yevgeny Kafelnikov–only got as high as #3. Moya and Kafelnikov are extreme cases of the rankings mismatch, as neither player spent even a single full season inside the Elo top five.

By any measure, though, Murray has spent a lot of time close to the top spot. What makes his current ascent to the #1 spot so odd is that in the past, Elo thought he was much closer. Despite his outstanding play over the last several months, there is still a 100-point Elo gap between him and Djokovic. That’s a lot of space: Most of the field at the WTA Finals in Singapore this year was within a little more than a 100-point range.

January 2010 was the Brit’s best shot. At the end of 2009, Murray, Djokovic, and Roger Federer were tightly packed at the top of the Elo leaderboard. In December, Murray was #3, but he trailed Fed–and the #1 position–by only 25 points. In January, Novak took over the top spot, and Murray closed to within 16 points–a small enough margin that one big upset could make the difference. Altogether, Murray has spent 63 weeks within 100 points of the Elo top spot, none of those since August 2013.

For most of the intervening three-plus years, Djokovic has been steadily setting himself apart from the pack. He reached his career Elo peak in April of this season, opening up a lead of almost 200 points over Federer, who was then #2, and 250 points over Murray. Since Roland Garros, Murray has closed the gap somewhat, but his lack of opportunities against highly-rated players has slowed his climb.

If Murray defeats Djokovic in the final this week in London, it will make the debate more interesting, not to mention secure the year-end ATP #1 ranking for the Brit. But it won’t affect the Elo standings. When two players have such lengthy track records, one match doesn’t come close to eliminating a 100-point gap. Novak will end the season as Elo #1, and he is well-positioned to maintain that position well into 2017.

Elina Svitolina and Multiple #1 Upsets

Last week in Beijing, Elina Svitolina beat new WTA #1 Angelique Kerber. It was the first time the Ukrainian defeated Kerber this season, but it wasn’t her first 2016 triumph over a player ranked #1. At the Rio Olympics in August, Svitolina upset then-top-ranked Serena Williams.

It’s unusual for a player to face two (or more) different #1-ranked opponents in the same season. Since 1985, it has happened 136 times on the WTA tour and 148 times on the ATP tour. That’s less than five times per season per tour.

Of course, it’s much less common to upset multiple #1-ranked opponents, as Svitolina did. This was only the 16th time a woman did so (again, since 1985), while it has happened on the men’s side 18 times.

Here is a full list of WTA player-seasons that featured defeats of more than one top-ranked player:

Year  Player               Upsets                      
2016  Elina Svitolina      Kerber; Serena              
2010  Samantha Stosur      Serena; Wozniacki           
2009  Venus Williams       Serena; Safina              
2008  Dinara Safina        Henin; Sharapova; Jankovic  
2006  Justine Henin        Davenport; Mauresmo         
2003  Justine Henin        Serena; Clijsters           
2002  Kim Clijsters        Serena; Venus               
2002  Serena Williams      Capriati; Venus             
2001  Lindsay Davenport    Capriati; Hingis            
1999  Amelie Mauresmo      Hingis; Davenport           
1999  Venus Williams       Davenport; Hingis           
1997  Amanda Coetzer       Hingis; Graf                
1996  Jana Novotna         Graf; Seles                 
1996  Kimiko Date Krumm    Graf; Seles                 
1991  Martina Navratilova  Graf; Seles                 
1991  Gabriela Sabatini    Graf; Seles

It’s quite an accomplished list. As we might expect, there’s a lot of overlap between the players who achieved these upsets and past and future #1-ranked players. The real standouts here are Justine Henin and Venus Williams, who managed the feat twice, and Dinara Safina, who faced three different #1s in 2008, going undefeated against them.

Here are the men who beat multiple #1s in the same season:

Year  Player                 Upsets             
2013  Juan Martin Del Potro  Nadal; Djokovic    
2012  Andy Murray            Federer; Djokovic  
2011  David Ferrer           Nadal; Djokovic    
2011  Jo Wilfried Tsonga     Nadal; Djokovic    
2010  Marcos Baghdatis       Nadal; Federer     
2009  Juan Martin Del Potro  Nadal; Federer     
2008  Andy Murray            Nadal; Federer     
2008  Gilles Simon           Nadal; Federer     
2003  Rainer Schuettler      Roddick; Agassi    
2003  Fernando Gonzalez      Hewitt; Agassi     
2001  Greg Rusedski          Safin; Kuerten     
2001  Max Mirnyi             Safin; Kuerten     
1995  Michael Chang          Agassi; Sampras    
1992  Richard Krajicek       Courier; Edberg    
1991  Guy Forget             Edberg; Becker     
1991  Andrei Cherkasov       Edberg; Becker     
1990  Boris Becker           Lendl; Edberg      
1988  Boris Becker           Wilander; Lendl

This list isn’t quite as impressive, though it does capture several very good players at their best.  It also highlights the world-beating potential of Max Mirnyi, who–despite never reaching the top 15 himself–finished the 2001 season with a 3-1 record against ATP #1s.

The rarity of facing multiple #1s in the same season–let alone beating them–stops us from drawing any meaningful conclusions about what Svitolina’s feat indicates for her future. At the very least, however, it reminds us of the Ukrainian’s potential as a future star, and puts her among some very good historical company.

How Elo Solves the Olympics Ranking Points Conundrum

Italian translation at settesei.it

Last week’s Olympic tennis tournament had superstars, it had drama, and it had tears, but it didn’t have ranking points. Surprise medalists Monica Puig and Juan Martin del Potro scored huge triumphs for themselves and their countries, yet they still languish at 35th and 141st in their respective tour’s rankings.

The official ATP and WTA rankings have always represented a collection of compromises, as they try to accomplish dual goals of rewarding certain behaviors (like showing up for high-profile events) and identifying the best players for entry in upcoming tournaments. Stripping the Olympics of ranking points altogether was an even weirder compromise than usual. Four years ago in London, some points were awarded and almost all the top players on both tours showed up, even though many of them could’ve won more points playing elsewhere.

For most players, the chance at Olympic gold was enough. The level of competition was quite high, so while the ATP and WTA tours treat the tournament in Rio as a mere exhibition, those of us who want to measure player ability and make forecasts must factor Olympics results into our calculations.

Elo, a rating system originally designed for chess that I’ve been using for tennis for the past year, is an excellent tool to use to integrate Rio results with the rest of this season’s wins and losses. Broadly speaking, it awards points to match winners and subtracts points from losers. Beating a top player is worth many more points than beating a lower-rated one. There is no penalty for not playing–for example, Stan Wawrinka‘s and Simona Halep‘s ratings are unchanged from a week ago.

Unlike the ATP and WTA ranking systems, which award points based on the level of tournament and round, Elo is context-neutral. Del Potro’s Elo rating improved quite a bit thanks to his first-round upset of Novak Djokovic–the same amount it would have increased if he had beaten Djokovic in, say, the Toronto final.

Many fans object to this, on the reasonable assumption that context matters. It certainly seems like the Wimbledon final should count for more than, say, a Monte Carlo quarterfinal, even if the same player defeats the same opponent in both matches.

However, results matter for ranking systems, too. A good rating system will do two things: predict winners correctly more often than other systems, and give more accurate degrees of confidence for those predictions. (For example, in a sample of 100 matches in which the system gives one player a 70% chance of winning, the favorite should win 70 times.) Elo, with its ignorance of context, predicts more winners and gives more accurate forecast certainties than any other system I’m aware of.

For one thing, it wipes the floor with the official rankings. While it’s possible that tweaking Elo with context-aware details would better the results even more, the improvement would likely be minor compared to the massive difference between Elo’s accuracy and that of the ATP and WTA algorithms.

Relying on a context-neutral system is perfect for tennis. Instead of altering the ranking system with every change in tournament format, we can always rate players the same way, using only their wins, losses, and opponents. In the case of the Olympics, it doesn’t matter which players participate, or what anyone thinks about the overall level of play. If you defeat a trio of top players, as Puig did, your rating skyrockets. Simple as that.

Two weeks ago, Puig was ranked 49th among WTA players by Elo–several places lower than her WTA ranking of 37. After beating Garbine Muguruza, Petra Kvitova, and Angelique Kerber, her Elo ranking jumped to 22nd. While it’s tough, intuitively, to know just how much weight to assign to such an outlier of a result, her Elo rating just outside the top 20 seems much more plausible than Puig’s effectively unchanged WTA ranking in the mid-30s.

Del Potro is another interesting test case, as his injury-riddled career presents difficulties for any rating system. According to the ATP algorithm, he is still outside the top 100 in the world–a common predicament for once-elite players who don’t immediately return to winning ways.

Elo has the opposite problem with players who miss a lot of time due to injury. When a player doesn’t compete, Elo assumes his level doesn’t change. That’s clearly wrong, and it has cast a lot of doubt over del Potro’s place in the Elo rankings this season. The more matches he plays, the more his rating will reflect his current ability, but his #10 position in the pre-Olympics Elo rankings seemed overly influenced by his former greatness.

(A more sophisticated Elo-based system, Glicko, was created in part to improve ratings for competitors with few recent results. I’ve tinkered with Glicko quite a bit in hopes of more accurately measuring the current levels of players like Delpo, but so far, the system as a whole hasn’t come close to matching Elo’s accuracy while also addressing the problem of long layoffs. For what it’s worth, Glicko ranked del Potro around #16 before the Olympics.)

Del Potro’s success in Rio boosted him three places in the Elo rankings, up to #7. While that still owes something to the lingering influence of his pre-injury results, it’s the first time his post-injury Elo rating comes close to passing the smell test.

You can see the full current lists elsewhere on the site: here are ATP Elo ratings and WTA Elo ratings.

Any rating system is only as good as the assumptions and data that go into it. The official ATP and WTA ranking systems have long suffered from improvised assumptions and conflicting goals. When an important event like the Olympics is excluded altogether, the data is incomplete as well. Now as much as ever, Elo shines as an alternative method. In addition to a more predictive algorithm, Elo can give Rio results the weight they deserve.

The Case for Novak Djokovic … and Roger Federer … and Rafael Nadal

Italian translation at settesei.it

By winning the US Open last weekend and increasing his career total to ten Grand Slams, Novak Djokovic has pushed himself even further into conversations about the greatest of all time. At the very least, his 2015 season is shaping up to be one of the best in tennis history.

A recent FiveThirtyEight article introduced Elo ratings into the debate, showing that Djokovic’s career peak–achieved earlier this year at the French Open–is the highest of anyone’s, just above 2007 Roger Federer and 1980 Bjorn Borg. In implementing my own Elo ratings, I’ve discovered just how close those peaks are.

Here are my results for the top 15 peaks of all time [1]:

Player                 Year   Elo  
Novak Djokovic         2015  2525  
Roger Federer          2007  2524  
Bjorn Borg             1980  2519  
John McEnroe           1985  2496  
Rafael Nadal           2013  2489  
Ivan Lendl             1986  2458  
Andy Murray            2009  2388  
Jimmy Connors          1979  2384  
Boris Becker           1990  2383  
Pete Sampras           1994  2376  
Andre Agassi           1995  2355  
Mats Wilander          1984  2355  
Juan Martin del Potro  2009  2352  
Stefan Edberg          1988  2346  
Guillermo Vilas        1978  2325

A one-point gap is effectively nothing: It means that peak Djokovic would have a 50.1% chance of beating peak Federer. The 35-point gap separating Novak from peak Rafael Nadal is considerably more meaningful, implying that the better player has a 55% chance of winning.

Surface-specific Elo

If we limit our scope to hard-court matches, Djokovic is still a very strong contender, but Fed’s 2007 peak is clearly the best of all time:

Player          Year  Hard Ct Elo  
Roger Federer   2007         2453  
Novak Djokovic  2014         2418  
Ivan Lendl      1989         2370  
Pete Sampras    1997         2356  
Rafael Nadal    2014         2342  
John McEnroe    1986         2332  
Andy Murray     2009         2330  
Andre Agassi    1995         2326  
Stefan Edberg   1987         2285  
Lleyton Hewitt  2002         2262

Ivan Lendl and Pete Sampras make much better showings on this list than on the overall ranking. Still, they are far behind Fed and Novak–the roughly 100-point difference between peak Fed and peak Pete is equivalent to a 64% probability that the higher-rated player would win.

On clay, I’ll give you three guesses who tops the list–and your first two guesses don’t count. It isn’t even close:

Player           Year  Clay Ct Elo  
Rafael Nadal     2009         2550  
Bjorn Borg       1982         2475  
Novak Djokovic   2015         2421  
Ivan Lendl       1988         2408  
Mats Wilander    1984         2386  
Roger Federer    2009         2343  
Jose Luis Clerc  1981         2318  
Guillermo Vilas  1982         2316  
Thomas Muster    1996         2313  
Jimmy Connors    1980         2307

Borg was great, but Nadal is in another league entirely. Though Djokovic has pushed Nadal out of many greatest-of-all-time debates–at least for the time being–there’s little doubt that Rafa is the greatest clay court player of all time, and likely the most dominant player in tennis history on any single surface.

Djokovic is well back of both Nadal and Borg, but in his favor, he’s the only player ranked in the top three for both major surfaces.

The survivor

As the second graph in the 538 article shows, Federer stands out as the greatest player of all time at his age. Most players have retired long before their 34th birthday, and even those who stick around aren’t usually contesting Grand Slam finals. In fact, Federer’s Elo rating of 2393 after his US Open semifinal win against Stanislas Wawrinka last week would rank as the sixth-highest peak of all time, behind Lendl and just ahead of Andy Murray.

Here are the top ten Elo peaks for players over 34:

Player         Age   34+ Elo  
Roger Federer  34.1     2393  
Jimmy Connors  34.1     2234  
Andre Agassi   35.3     2207  
Rod Laver      36.6     2207  
Ken Rosewall   37.4     2195  
Tommy Haas     35.3     2111  
Arthur Ashe    35.7     2107  
Ivan Lendl     34.1     2054  
Andres Gimeno  35.0     2035  
Mark Cox       34.0     2014

The 160-point gap between Federer and Jimmy Connors implies that 34-year-old Fed would win about 70% of the time against 34-year-old Connors. No one has ever sustained this level of play–or anything close to it–for this long.

At the risk of belaboring the point, similar arguments can be made for 33-year-old Fed, all the way to 30-year-old Fed. At almost any stage in the last four years, Federer has been better than any player in history at that age [2].  Djokovic has matched many of Roger’s career accomplishments so far, especially on clay, but it would be truly remarkable if he maintained a similar level of play through the end of the decade.

Current Elo ratings

While it’s not really germane to today’s subject, I’ve got the numbers, so let’s take a look at the current ATP Elo ratings. Since Elo is new to most tennis fans, I’ve included columns to indicate each player’s chances of beating Djokovic and of beating the current #10, Milos Raonic, based on their rating. As a general rule, a 100-point gap translates to a 64% chance of winning for the favorite, a 200-point gap implies 76%, and a 500-point gap is equivalent to 95%.

Rank  Player                  Elo  Vs #1  Vs #10  
1     Novak Djokovic         2511      -     91%  
2     Roger Federer          2386    33%     84%  
3     Andy Murray            2332    26%     79%  
4     Kei Nishikori          2256    19%     71%  
5     Rafael Nadal           2256    19%     71%  
6     Stan Wawrinka          2186    13%     62%  
7     David Ferrer           2159    12%     58%  
8     Tomas Berdych          2148    11%     56%  
9     Richard Gasquet        2128    10%     54%  
10    Milos Raonic           2103     9%       -  
                                                  
Rank  Player                  Elo  Vs #1  Vs #10  
11    Gael Monfils           2084     8%     47%  
12    Jo-Wilfried Tsonga     2083     8%     47%  
13    Marin Cilic            2081     8%     47%  
14    Kevin Anderson         2074     7%     46%  
15    John Isner             2035     6%     40%  
16    David Goffin           2027     6%     39%  
17    Grigor Dimitrov        2021     6%     38%  
18    Gilles Simon           2005     5%     36%  
19    Jack Sock              1994     5%     35%  
20    Roberto Bautista Agut  1986     5%     34%  
                                                  
Rank  Player                  Elo  Vs #1  Vs #10  
21    Philipp Kohlschreiber  1982     5%     33%  
22    Tommy Robredo          1963     4%     31%  
23    Feliciano Lopez        1955     4%     30%  
24    Nick Kyrgios           1951     4%     29%  
25    Ivo Karlovic           1949     4%     29%  
26    Jeremy Chardy          1940     4%     28%  
27    Alexandr Dolgopolov    1940     4%     28%  
28    Bernard Tomic          1936     4%     28%  
29    Fernando Verdasco      1932     3%     27%  
30    Fabio Fognini          1925     3%     26%

Continue reading The Case for Novak Djokovic … and Roger Federer … and Rafael Nadal