The Indian Wells Quarter of Death

The Indian Wells men’s draw looks a bit lopsided this year. The bottom quarter, anchored by No. 2 seed Novak Djokovic, also features Roger Federer, Rafael Nadal, Juan Martin del Potro, and Nick Kyrgios. It doesn’t take much analysis to see that the bracket makes life more difficult for Djokovic, and by extension, it cleared the way for Andy Murray. Alas, Murray lost his opening match against Vasek Pospisil on Saturday, making No. 3 seed Stan Wawrinka the luckiest man in the desert.

The draw sets up some very noteworthy potential matches: Federer and Nadal haven’t played before the quarterfinal since their first encounter back in 2004, and Fed hasn’t played Djokovic before the semis in more than 40 meetings, since 2007. Kyrgios, who has now beaten all three of the elites in his quarter, is likely to get another chance to prove his mettle against the best.

I haven’t done a piece on draw luck for awhile, and this seemed like a great time to revisit the subject. The principle is straightforward: By taking the tournament field and generating random draws, we can do a sort of “retro-forecast” of what each player’s chances looked like before the draw was conducted–back when Djokovic’s road wouldn’t necessarily be so rocky. By comparing the retro-forecast to a projection based on the actual draw, we can see how much the luck of the draw impacted each player’s odds of piling up ranking points or winning the title.

Here are the eight players most heavily favored by the pre-draw forecast, along with the their chances of winning the title, both before and after the draw was conducted:

Player                 Pre-Draw  Post-Draw  
Novak Djokovic           26.08%     19.05%  
Andy Murray              19.30%     26.03%  
Roger Federer            10.24%      8.71%  
Rafael Nadal              5.46%      4.80%  
Stan Wawrinka             5.08%      7.14%  
Kei Nishikori             5.01%      5.67%  
Nick Kyrgios              4.05%      2.62%  
Juan Martin del Potro     4.00%      2.34%

These odds are based on my jrank rating system, which correlates closely with Elo. I use jrank here instead of Elo because it’s surface-specific. I’m also ignoring the first round of the main draw, which–since all 32 seeds get a first-round bye–is just a glorified qualifying round and has very little effect on the title chances of seeded players.

As you can see, the bottom quarter–the “group of death”–is in fact where title hopes go to die. Djokovic, who is still considered to be the best player in the game by both jrank and Elo, had a 26% pre-draw chance of defending his title, but it dropped to 19% once the names were placed in the bracket. Not coincidentally, Murray’s odds went in the opposite direction. Federer’s and Nadal’s title chances weren’t hit quite as hard, largely because they weren’t expected to get past Djokovic, no matter when they faced him.

The issue here isn’t just luck, it’s the limitation of the ATP ranking system. No one really thinks that del Potro entered the tournament as the 31st favorite, or that Kyrgios came in as the 15th. No set of rankings is perfect, but at the moment, the official rankings do a particularly poor job of reflecting the players with the best chances of winning hard court matches.  The less reliable the rankings, the better chance of a lopsided draw like the one in Indian Wells.

For a more in-depth look at the effect of the draw on players with lesser chances of winning the title, we need to look at “expected ranking points.” Using the odds that a player reaches each round, we can calculate his expected points for the entire event. For someone like Kyle Edmund, who would have almost no chance of winning the title regardless of the draw, expected points tells a more detailed story of the power of draw luck. Here are the ten players who were punished most severely by the bracket:

Player                 Pre-Draw Pts Post-Draw Pts  Effect  
Kyle Edmund                    28.8          14.3  -50.2%  
Steve Johnson                  65.7          36.5  -44.3%  
Vasek Pospisil                 29.1          19.4  -33.2%  
Juan Martin del Potro         154.0         104.2  -32.3%  
Stephane Robert                20.3          14.2  -30.1%  
Federico Delbonis              20.0          14.5  -27.9%  
Novak Djokovic                429.3         325.4  -24.2%  
Nick Kyrgios                  163.5         124.6  -23.8%  
Horacio Zeballos               17.6          14.1  -20.0%  
Alexander Zverev              113.6          91.5  -19.4%

At most tournaments, this list is dominated by players like Edmund and Pospisil: unseeded men with the misfortune of drawing an elite opponent in the first round. Much less common is to see so many seeds–particularly a top-two player–rating as the most unlucky. While Federer and Nadal don’t quite make the cut here, the numbers bear out our intuition: Fed’s draw knocked his expected points from 257 down to 227, and Nadal’s reduced his projected tally from 195 to 178.

The opposite list–those who enjoyed the best draw luck–features a lot of names from the top half, including both Murray and Wawrinka. Murray squandered his good fortune, putting Wawrinka in an even better position to take advantage of his own:

Player              Pre-Draw Pts  Post-Draw Pts  Effect  
Malek Jaziri                21.9           31.6   44.4%  
Damir Dzumhur               29.1           39.0   33.9%  
Martin Klizan               27.6           36.4   32.1%  
Joao Sousa                  24.7           31.1   25.9%  
Peter Gojowczyk             20.4           25.5   24.9%  
Tomas Berdych               93.6          116.6   24.6%  
Mischa Zverev               58.5           72.5   23.8%  
Yoshihito Nishioka          26.9           32.6   21.1%  
John Isner                  80.2           97.0   21.0%  
Andy Murray                369.1          444.2   20.3%  
Stan Wawrinka              197.8          237.7   20.1%

Over the course of the season, quirks like these tend to even out. Djokovic, on the other hand, must be wondering how he angered the draw gods: Just to earn a quarter-final place against Roger or Rafa, he’ll need to face Kyrgios and Delpo for the second consecutive tournament.

If Federer, Kyrgios, and del Potro can bring their ATP rankings closer in line with their true talent, they are less likely to find themselves in such dangerous draw sections. For Djokovic, that would be excellent news.

Measuring the Performance of Tennis Prediction Models

With the recent buzz about Elo rankings in tennis, both at FiveThirtyEight and here at Tennis Abstract, comes the ability to forecast the results of tennis matches. It’s not far fetched to ask yourself, which of these different models perform better and, even more interesting, how they fare compared to other ‘models’, such as the ATP ranking system or betting markets.

For this, admittedly limited, investigation, we collected the (implied) forecasts of five models, that is, FiveThirtyEight, Tennis Abstract, Riles, the official ATP rankings, and the Pinnacle betting market for the US Open 2016. The first three models are based on Elo. For inferring forecasts from the ATP ranking, we use a specific formula1 and for Pinnacle, which is one of the biggest tennis bookmakers, we calculate the implied probabilities based on the provided odds (minus the overround)2.

Next, we simply compare forecasts with reality for each model asking If player A was predicted to be the winner (P(a) > 0.5), did he really win the match? When we do that for each match and each model (ignoring retirements or walkovers) we come up with the following results.

Model		% correct
Pinnacle	76.92%
538		75.21%
TA		74.36%
ATP		72.65%
Riles		70.09%

What we see here is how many percent of the predictions were actually right. The betting model (based on the odds of Pinnacle) comes out on top followed by the Elo models of FiveThirtyEight and Tennis Abstract. Interestingly, the Elo model of Riles is outperformed by the predictions inferred from the ATP ranking. Since there are several parameters that can be used to tweak an Elo model, Riles may still have some room left for improvement.

However, just looking at the percentage of correctly called matches does not tell the whole story. In fact, there are more granular metrics to investigate the performance of a prediction model: Calibration, for instance, captures the ability of a model to provide forecast probabilities that are close to the true probabilities. In other words, in an ideal model, we want 70% forecasts to be true exactly in 70% of the cases. Resolution measures how much the forecasts differ from the overall average. The rationale here is, that just using the expected average values for forecasting will lead to a reasonably well-calibrated set of predictions, however, it will not be as useful as a method that manages the same calibration while taking current circumstances into account. In other words, the more extreme (and still correct) forecasts are, the better.

In the following table we categorize the set of predictions into bins of different probabilities and show how many percent of the predictions were correct per bin. This also enables us to calculate Calibration and Resolution measures for each model.

Model    50-59%  60-69%  70-79%  80-89%  90-100% Cal  Res   Brier
538      53%     61%     85%     80%     91%     .003 .082  .171
TA       56%     75%     78%     74%     90%     .003 .072  .182
Riles    56%     86%     81%     63%     67%     .017 .056  .211
ATP      50%     73%     77%     84%     100%    .003 .068  .185
Pinnacle 52%     91%     71%     77%     95%     .015 .093  .172

As we can see, the predictions are not always perfectly in line with what the corresponding bin would suggest. Some of these deviations, for instance the fact that for the Riles model only 67% of the 90-100% forecasts were correct, can be explained by small sample size (only three in that case). However, there are still two interesting cases (marked in bold) where sample size is better and which raised my interest. Both the Riles and Pinnacle models seem to be strongly underconfident (statistically significant) with their 60-69% predictions. In other words, these probabilities should have been higher, because, in reality, these forecasts were actually true 86% and 91% percent of the times.3 For the betting aficionados, the fact that Pinnacle underestimates the favorites here may be really interesting, because it could reveal some value as punters would say. For the Riles model, this would maybe be a starting point to tweak the model.

In the last three columns Calibration (the lower the better), Resolution (the higher the better), and the Brier score (the lower the better) are shown. The Brier score combines Calibration and Resolution (and the uncertainty of the outcomes) into a single score for measuring the accuracy of predictions. The models of FiveThirtyEight and Pinnacle (for the used subset of data) essentially perform equally good. Then there is a slight gap until the model of Tennis Abstract and the ATP ranking model come in third and fourth, respectively. The Riles model performs worst in terms of both Calibration and Resolution, hence, ranking fifth in this analysis.

To conclude, I would like to show a common visual representation that is used to graphically display a set of predictions. The reliability diagram compares the observed rate of forecasts with the forecast probability (similar to the above table).

The closer one of the colored lines is to the black line, the more reliable the forecasts are. If the forecast lines are above the black line, it means that forecasts are underconfident, in the opposite case, forecasts are overconfident. Given that we only investigated one tournament and therefore had to work with a low sample size (117 predictions), the big swings in the graph are somewhat expected. Still, we can see that the model based on ATP rankings does a really good job in preventing overestimations even though it is known to be outperformed by Elo in terms of prediction accuracy.

To sum up, this analysis shows how different predictive models for tennis can be compared among each other in a meaningful way. Moreover, I hope I could exhibit some of the areas where a model is good and where it’s bad. Obviously, this investigation could go into much more detail by, for example, comparing the models in how well they do for different kinds of players (e.g., based on ranking), different surfaces, etc. This is something I will spare for later. For now, I’ll try to get my sleeping patterns accustomed to the schedule of play for the Australian Open, and I hope, you can do the same.

This is a guest article by me, Peter Wetz. I am a computer scientist interested in racket sports and data analytics based in Vienna, Austria.

Footnotes

1. P(a) = a^e / (a^e + b^e) where a are player A’s ranking points, b are player B’s ranking points, and e is a constant. We use e = 0.85 for ATP men’s singles.

2. The betting market in itself is not really a model, that is, the goal of the bookmakers is simply to balance their book. This means that the odds, more or less, reflect the wisdom of the crowd, making it a very good predictor.

3. As an example, one instance, where Pinnacle was underconfident and all other models were more confident is the R32 encounter between Ivo Karlovic and Jared Donaldson. Pinnacle’s implied probability for Karlovic to win was 64%. The other models (except the also underconfident Riles model) gave 72% (ATP ranking), 75% (FiveThirtyEight), and 82% (Tennis Abstract). Turns out, Karlovic won in straight sets. One factor at play here might be that these were the US Open where more US citizens are likely to be confident about the US player Jared Donaldson and hence place a bet on him. As a consequence, to balance the book, Pinnacle will lower the odds on Donaldson, which results in higher odds (and a lower implied probability) for Karlovic.

Forecasting Davis Cup Doubles

One of the most enjoyable aspects of Davis Cup is the spotlight it shines on doubles. At ATP events, doubles matches are typically relegated to poorly-attended side courts. In Davis Cup, doubles gets a day of its own, and crowds turn out in force. Even better, the importance of Davis Cup inspires many players who normally skip doubles to participate.

Because singles specialists are more likely to play doubles, and because most Davis Cup doubles teams are not regular pairings, forecasting these matches is particularly difficult. In the past, I haven’t even tried. But now that we have D-Lo–Elo ratings for doubles–it’s a more manageable task.

To my surprise, D-Lo is even more effective with Davis Cup than it is with regular-season tour-level matches. D-Lo correctly predicts the outcome of about 65% of tour-level doubles matches since 2003. For Davis Cup World Group and World Group Play-Offs in that time frame, D-Lo is right 70% of the time. To put it another way, this is more evidence that Davis Cup is about the chalk.

What’s particularly odd about that result is that D-Lo itself isn’t that confident in its Davis Cup forecasts. For ATP events, D-Lo forecasts are well-calibrated, meaning that if you look at 100 matches where the favorite is given a 60% chance of winning, the favorite will win about 60 times. For the Davis Cup forecasts, D-Lo thinks the favorite should win about 60% of the time, but the higher-rated team ends up winning 70 matches out of 100.

Davis Cup’s best-of-five format is responsible for part of that discrepancy. In a typical ATP doubles match, the no-ad scoring and third-set tiebreak introduce more luck into the mix, making upsets more likely. A matchup that would result in a 60% forecast in the no-ad, super-tiebreak format translates to a 64.5% forecast in the best-of-five format. That accounts for about half the difference: Davis Cup results are less likely to be influenced by luck.

The other half may be due to the importance of the event. For many players, regular-season doubles matches are a distant second priority to singles, so they may not play at a consistent level from one match to the next. In Davis Cup, however, it’s a rare competitor who doesn’t give the doubles rubber 100% of their effort. Thus, we appear to have quite a few matches in which D-Lo picks the winner, but since it uses primarily tour-level results, it doesn’t realize how heavily the winner should have been favored.

Incidentally, home-court advantage doesn’t seem to play a big role in doubles outcomes. The hosting side has won 52.6% of doubles matches, an edge which could have as much to do with hosts’ ability to choose the surface as it is does with screaming crowds and home cooking. This isn’t a factor that affects D-Lo forecasts, as the system’s predictions are as accurate when it picks the away side as when it picks the home side.

Forecasting Argentina-Croatia doubles

Here are the D-Lo ratings for the eight nominated players this weekend. The asterisks indicate those players who are currently slated to contest tomorrow’s doubles rubber:

Player                 Side  D-Lo     
Juan Martin del Potro  ARG   1759     
Leonardo Mayer         ARG   1593  *  
Federico Delbonis      ARG   1540     
Guido Pella            ARG   1454  *  
                                      
Ivan Dodig             CRO   1856  *  
Marin Cilic            CRO   1677     
Ivo Karlovic           CRO   1580     
Franco Skugor          CRO   1569  *

As it stands now, Croatia has a sizable advantage. Based on the D-Lo ratings of the currently scheduled doubles teams, the home side has a 189-point edge, which converts to a 74.8% probability of winning. But remember, that’s the chance of winning a no-ad, super-tiebreak match, with all the luck that entails. In best-of-five, that translates to a whopping 83.7% chance of winning.

Making matters worse for Argentina, it’s likely that Croatia could improve their side. Argentina could increase their odds of winning the doubles rubber by playing Juan Martin del Potro, but given Delpo’s shaky physical health, it’s unlikely he’ll play all three days. Marin Cilic, on the other hand, could very well play as much as possible. A Cilic-Ivan Dodig pairing would have a 243-point advantage over Leonardo Mayer and Guido Pella, which translates to an 89% chance of winning a best-of-five match. Even Mayer’s Davis Cup heroics are unlikely to overcome a challenge of that magnitude.

Given the likelihood that Pella will sit on the bench for every meaningful singles match, it’s easy to wonder if there is a better option. Sure enough, in Horacio Zeballos, Argentina has a quality doubles player sitting at home. The two-time Grand Slam doubles semifinalist has a current D-Lo rating of 1758, almost identical to del Potro’s. Paired with Mayer, Zeballos would bring Argentina’s chances of upsetting a Dodig-Franco Skugor team to 43%. Zeballos-Mayer would also have a 32% chance of defeating Dodig-Cilic.

A full Argentina-Croatia forecast

With the doubles rubber sorted, let’s see who is likely to win the 2016 Davis Cup. Here are the Elo– and D-Lo-based forecasts for each currently-scheduled match, shown from the perspective of Croatia:

Rubber                      Forecast (CRO)  
Cilic v Delbonis                     90.8%  
Karlovic v del Potro                 15.8%  
Dodig/Skugor v Mayer/Pella           83.7%  
Cilic v del Potro                    36.3%  
Karlovic v Delbonis                  75.8%

Elo still believes Delpo is an elite-level player, which is why it makes him the favorite in the pivotal fourth rubber against Cilic. The system is less positive about Federico Delbonis, who it ranks 68th in the world, against his #41 spot on the ATP computer.

These match-by-match forecasts imply a 74.2% probability that Croatia will win the tie. That’s more optimistic than the betting market which, a few hours before play begins, gives Croatia about a 65% edge.

However, most of the tweaks we could make would move the needle further toward a Croatia victory. Delpo’s body may not allow him to play two singles matches at full strength, and the gap in singles skill between him and Mayer is huge. Croatia could improve their doubles chances if Cilic plays. And if there is a home-court or surface advantage, it would probably work against the South Americans.

Even more likely than a Croatian victory is a 1-1 split of the first two matches. If that happens, everything will hang in the balance tomorrow, when the world tunes in to watch a doubles match.

Forecasting the 2016 ATP World Tour Finals

Andy Murray is the #1 seed this week in London, but as I wrote for The Economist, Novak Djokovic likely remains the best player in the world. According to my Elo ratings, he would have a 63% chance of winning a head-to-head match between the two. And with the added benefit of an easier round-robin draw, the math heavily favors Djokovic to win the tournament.

Here are the results of a Monte Carlo simulation of the draw:

Player        SF      F      W  
Djokovic   95.3%  73.9%  54.6%  
Murray     86.3%  58.3%  29.7%  
Nishikori  60.4%  24.9%   7.8%  
Raonic     50.9%  16.3%   3.3%  
Wawrinka   29.4%   7.8%   1.6%  
Monfils    33.2%   8.7%   1.4%  
Cilic      23.9%   5.8%   1.1%  
Thiem      20.7%   4.1%   0.5%

I don’t think I’ve ever seen a player favored so heavily to progress out of the group stage. Murray’s 86% chance of doing so is quite high in itself; Novak’s 95% is otherworldly. His head-to-heads against the other players in his group are backed up by major differences in Elo points–Dominic Thiem is a lowly 15th on the Elo list, given only a 7.4% chance of beating the Serb.

If Milos Raonic is unable to compete, Djokovic’s chances climb even higher. Here are the probabilities if David Goffin takes Raonic’s place in the bracket:

Player        SF      F      W  
Djokovic   96.8%  75.2%  55.4%  
Murray     86.2%  60.7%  30.6%  
Nishikori  60.7%  26.3%   8.1%  
Monfils    47.7%  12.4%   1.8%  
Wawrinka   29.3%   8.5%   1.7%  
Cilic      23.8%   6.2%   1.1%  
Thiem      29.5%   5.8%   0.7%  
Goffin     26.0%   4.9%   0.5%

The luck of the draw was on Novak’s side. I ran another simulation with Djokovic and Murray swapping groups. Here, Djokovic is still heavily favored to win the tournament, but Murray’s semifinal chances get a sizable boost:

Player        SF      F      W  
Djokovic   92.8%  75.1%  54.9%  
Murray     90.9%  58.1%  29.8%  
Nishikori  58.4%  26.9%   7.5%  
Raonic     52.3%  14.3%   3.3%  
Wawrinka   26.9%   8.4%   1.6%  
Monfils    35.3%   7.5%   1.4%  
Cilic      21.9%   6.2%   1.0%  
Thiem      21.6%   3.4%   0.5%

Elo rates Djokovic so highly that he is favored no matter what the draw. But the draw certainly helped.

Doubles!

I’ve finally put together a sufficient doubles dataset to generate Elo ratings and tournament forecasts for ATP doubles. While I’m not quite ready to go into detail, I can say that, by using the Elo algorithm and rating players individually, the resulting forecasts outperform the ATP rankings about as much as singles Elo ratings do.

Here is the forecast for the doubles event at the World Tour Finals:

Team               SF      F      W  
Herbert/Mahut   76.4%  49.5%  32.1%  
Bryan/Bryan     68.7%  36.8%  19.9%  
Kontinen/Peers  55.7%  29.1%  13.8%  
Dodig/Melo      58.4%  28.1%  13.2%  
Murray/Soares   48.3%  20.8%   8.6%  
Lopez/Lopez     37.7%  16.4%   6.2%  
Klaasen/Ram     30.2%  11.9%   4.0%  
Huey/Mirnyi     24.6%   7.3%   2.2%

This distribution is more like what round-robin forecasts usually look like, without a massive gap between the top of the field and the rest. Pierre-Hugues Herbert and Nicolas Mahut are the top rated team, followed closely by Bob Bryan and Mike Bryan. Max Mirnyi was, at his peak, one of the highest Elo-rated doubles players, but his pairing with Treat Huey is the weakest of the bunch.

The men’s doubles bracket has some legendary names, along with some players–like Herbert and Henri Kontinen–who may develop into all-time greats, but it has no competitors who loom over the rest of the field like Murray and Djokovic do in singles.

Elo-Forecasting the WTA Tour Finals in Singapore

With the field of eight divided into two round-robin groups for the WTA Tour Finals in Singapore, we can play around with some forecasts for this event. I’ve updated my Elo ratings through last week’s tournaments, and the first thing that jumps out is how different they are from the official rankings.

Here’s the Singapore field:

EloRank  Player                Elo  Group  
2        Maria Sharapova      2296    RED  
4        Simona Halep         2181    RED  
6        Garbine Muguruza     2147  WHITE  
8        Petra Kvitova        2136  WHITE  
9        Angelique Kerber     2129  WHITE  
11       Agnieszka Radwanska  2100    RED  
15       Lucie Safarova       2051  WHITE  
21       Flavia Pennetta      2004    RED

Serena Williams (#1 in just about every imaginable ranking system) chose not to play, but if Elo ruled the day, Belinda Bencic, Venus Williams, and Victoria Azarenka would be playing this week in place of Agnieszka Radwanska, Lucie Safarova, and Flavia Pennetta.

Anyway, we’ll work with what we’ve got. Maria Sharapova is, according to Elo, a huge favorite here. The ratings translate into a forecast that looks like this:

Player                  SF  Final  Title  
Maria Sharapova      83.7%  61.1%  43.6%  
Simona Halep         60.8%  35.4%  15.9%  
Garbine Muguruza     59.4%  25.7%  11.3%  
Petra Kvitova        55.2%  23.0%   9.8%  
Angelique Kerber     53.1%  21.7%   8.8%  
Agnieszka Radwanska  37.4%  17.4%   6.1%  
Lucie Safarova       32.3%   9.7%   3.1%  
Flavia Pennetta      18.1%   6.0%   1.4%

If Sharapova is really that good, the loser in today’s draw was Simona Halep. The top seed would typically benefit from having the second seed in the other group, but because Garbine Muguruza recently took over the third spot in the rankings, Pova entered the draw as a dangerous floater.

However, these ratings don’t reflect the fact that Sharapova hasn’t completed a match since Wimbledon. They don’t decline with inactivity, so Pova’s rating is the same as it was the day after she lost to Serena back in July. (My algorithm also excludes retirements, so her attempted return in Wuhan isn’t considered.)

With as little as we know about Sharapova’s health, it’s tough to know how to tweak her rating. For lack of any better ideas, I revised her Elo rating to 2132, right between Petra Kvitova and Angelique Kerber. At her best, Sharapova is better than that, but consider this a way of factoring in the substantial possibility that she’ll play much, much worse–or that she’ll get injured and her matches will be played by Carla Suarez Navarro instead. The revised forecast:

Player                  SF  Final  Title  
Simona Halep         69.9%  40.9%  24.0%  
Garbine Muguruza     59.4%  31.5%  16.5%  
Maria Sharapova      57.6%  29.5%  14.5%  
Petra Kvitova        55.6%  28.4%  14.4%  
Angelique Kerber     52.5%  26.3%  13.2%  
Agnieszka Radwanska  47.9%  22.3%   9.9%  
Lucie Safarova       32.6%  12.9%   4.9%  
Flavia Pennetta      24.7%   8.3%   2.7%

If this is a reasonably accurate estimate of Sharapova’s current ability, the Red group suddenly looks like the right place to be. Because Elo doesn’t give any particular weight to Grand Slams, it suggests that the official rankings far overestimate the current level of Safarova and Pennetta. The weakness of those two makes Halep a very likely semifinalist and also means that, in this forecast, the winner of the tournament is more likely (54% to 46%) to come from the White group.

Without Serena, and with Sharapova’s health in question, there are simply no dominant players in the field this week. If nothing else, these forecasts illustrate that we’d be foolish to take any Singapore predictions too seriously.

Forecasting the Effects of Performance Byes in Beijing

To the uninitiated, the WTA draw in Beijing this week looks a little strange. The 64-player draw includes four byes, which were given to the four semifinalists from last week’s event in Wuhan. So instead of empty places in the bracket next to the top four seeds, those free passes go to the 5th, 10th, and 15th seeds, along with one unseeded player, Venus Williams.

“Performance byes”–those given to players based on their results the previous week, rather than their seed–have occasionally featured in WTA draws over the last few years. If you’re interested in their recent history, Victoria Chiesa wrote an excellent overview.

I’m interested in measuring the benefit these byes confer on the recipients–and the negative effect they have on the players who would have received those byes had they been awarded in the usual way. I’ve written about the effects of byes before, but I haven’t contrasted different approaches to awarding them.

This week, the beneficiaries are Garbine Muguruza, Angelique Kerber, Roberta Vinci, and Venus Williams. The top four seeds–the women who were atypically required to play first-round matches, were Simona Halep, Petra Kvitova, Flavia Pennetta, and Agnieszka Radwanska.

To quantify the impact of the various possible formats of a 64-player draw, I used a variety of tools: Elo to rate players and predict match outcomes, Monte Carlo tournament simulations to consider many different permutations of each draw, and a modified version of my code to “reseed” brackets. While this is complicated stuff under the hood, the results aren’t that opaque.

Here are three different types of 64-player draws that Beijing might have employed:

  1. Performance byes to last week’s semifinalists. This gives a substantial boost to the players receiving byes, and compared to any other format, has a negative effect on top players. Not only are the top four seeds required to play a first-round match, they are a bit more likely to play last week’s semifinalists, since the byes give those players a better chance of advancing.
  2. Byes to the top four seeds. The top four seeds get an obvious boost, and everyone else suffers a bit, as they are that much more likely to face the top four.
  3. No byes: 64 players in the draw instead of 60. The clear winners in this scenario are the players who wouldn’t otherwise make it into the main draw. Unseeded players (excluding Venus) also benefit slightly, as the lack of byes mean that top players are less likely to advance.

Let’s crunch the numbers. For each of the three scenarios, I ran simulations based on the field without knowing how the draw turned out. That is, Kvitova is always seeded second, but she doesn’t always play Sara Errani in the first round. This approach eliminates any biases in the actual draw. To simulate the 64-player field, I added the four top-ranked players who lost in the final round of qualifying.

To compare the effects of each draw type on every player, I calculated “expected points” based on their probability of reaching each round. For instance, if Halep entered the tournament with a 20% chance of winning the event with its 1,000 ranking points, she’d have 200 “expected points,” plus her expected points for the higher probabilities (and lower number of points) of reaching every round in between. It’s simply a way of combining a lot of probabilities into a single easier-to-understand number.

Here are the expected points in each draw scenario (plus the actual Beijing draw) for the top four players, the four players who received performance byes, plus a couple of others (Belinda Bencic and Caroline Wozniacki) who rated particularly highly:

Player               Seed  PerfByes  TopByes  NoByes  Actual  
Simona Halep            1       323      364     330     341  
Petra Kvitova           2       276      323     290     291  
Venus Williams                  247      216     218     279  
Belinda Bencic         11       255      249     268     254  
Garbine Muguruza        5       243      202     210     227  
Angelique Kerber       10       260      224     235     227  
Caroline Wozniacki      8       208      203     205     199  
Flavia Pennetta         3       142      177     144     195  
Agnieszka Radwanska     4       185      233     192     188  
Roberta Vinci          15       120       91      94      90

As expected, the top four seeds are expected to reap far more points when given first-round byes. It’s most noticeable for Pennetta and Radwanska, who would enjoy a 20% boost in expected points if given a first-round bye. Oddly, though, the draw worked out very favorably for Flavia–Elo gave her a 95% chance of beating her first-round opponent Xinyun Han, and her draw steered her relatively clear of other dangerous players in subsequent rounds.

Similarly, the performance byes are worth a 15 to 30% advantage in expected points to the players who receive them. Vinci is the biggest winner here, as we would generally expect from the player most likely to suffer an upset without the bye.

Like Pennetta, Venus was treated very well by the way the draw turned out. The bye already gave her an approximately 15% boost compared to her expectations without a bye, and the draw tacked another 13% onto that. Both the structure of the draw and some luck on draw day made her the event’s third most likely champion, while the other scenarios would have left her in fifth.

All byes–conventional or unconventional–work to the advantage of some players and against others. However they are granted, they tend to work in favor of those who are already successful, whether that success is over the course of a year or a single week.

Performance byes are easy enough to defend: They give successful players a bit more rest between two demanding events, and from the tour’s perspective, they make it a little more likely that last week’s best players won’t pull off of this week’s tourney. And if all byes tend to the make the rich a little richer, at least performance byes open the possibility of benefiting different players than usual.

The Pervasive Role of Luck in Tennis

No matter what the scale, from a single point to a season-long ranking–even to a career–luck plays a huge role in tennis. Sometimes good luck and bad luck cancel each other out, as is the case when two players benefit from net cord winners in the same match. But sometimes luck spawns more of the same, giving fortunate players opportunities that, in turn, make them more fortunate still.

Usually, we refer to luck only in passing, as one possible explanation for an isolated phenomenon. It’s important that we examine them in conjunction with each other to get a better sense of just how much of a factor luck can be.

Single points

Usually, we’re comfortable saying that the results of individual points are based on skill. Occasionally, though, something happens to give the point to an undeserving player. The most obvious examples are points heavily influenced by a net cord or a bad bounce off an uneven surface, but there are others.

Officiating gets in the way, too. A bad call that the chair umpire doesn’t overturn can hand a point to the wrong player. Even if the chair umpire (or Hawkeye) does overrule a bad call, it can result in the point being replayed–even if one player was completely in control of the point.

We can go a bit further into the territory of “lucky shots,” including successful mishits, or even highlight-reel tweeners that a player could never replicate. While the line between truly lucky shots and successful low-percentage shots is an ambiguous one, we should remember that in the most extreme cases, skill isn’t the only thing determining the outcome of the point.

Lucky matches

More than 5% of matches on the ATP tour this year have been won by a player who failed to win more than half of points played. Another 25% were won by a player who failed to win more than 53% of points–a range that doesn’t guarantee victory.

Depending on what you think about clutch and momentum in tennis, you might not view some–or even any–of those outcomes as lucky. If a player converts all five of his break point opportunities and wins a match despite only winning 49% of total points, perhaps he deserved it more. The same goes for strong performance in a tiebreaks, another cluster of high-leverage points that can swing a match away from the player who won more points.

But when the margins are so small that executing at just one or two key moments can flip the result–especially when we know that points are themselves influenced by luck–we have to view at least some of these tight matches as having lucky outcomes. We don’t have to decide which is which, we simply need to acknowledge that some matches aren’t won by the better player, even if we use the very loose definition of “better player that day.”

Longer-term luck

Perhaps the most obvious manifestation of luck in tennis is in the draw each week. An unseeded player might start his tournament with an unwinnable match against a top seed or with a cakewalk against a low-ranked wild card. Even seeded players can be affected by fortune, depending on which unseeded players they draw, along with which fellow seeds they will face at which points in the match.

Another form of long-term luck–which is itself affected by draw luck–is what we might call “clustering.” A player who goes 20-20 on a season by winning all of his first-round matches and losing all of his second-round matches will not fare nearly as well in terms of rankings or prize money as someone who goes 20-20 by winning only 10 first-round matches, but reaching the third round every time he does.

Again, this may not be entirely luck–this sort of player would quickly be labeled “streaky,” but combined with draw luck, he might simply be facing players he can beat in clusters, instead of getting easy first-rounders and difficult second-rounders.

The Matthew effect

All of these forms of tennis-playing fortune are in some way related. The sociologist Robert Merton coined the term “Matthew effect“–alternatively known as the principle of cumulative advantage–to refer to situations where one entity with a very small advantage will, by the very nature of a system, end up with a much larger advantage.

The Matthew effect applies to a wide range of phenomena, and I think it’s instructive here. Consider the case of two players separated by only a few points in the rankings–a margin that could have come about by pure luck: for instance, when one player won a match by walkover. One of these players gets the 32nd seed at the Australian Open and the other is unseeded.

These two players–who are virtually indistinguishable, remember–face very different challenges. One is guaranteed two matches against unseeded opponents, while the other will almost definitely face a seed before the third round, perhaps even a high seed in the first. The unseeded player might get lucky, either in his draw or in his matches, cancelling out the effect of the seeding, but it’s more likely that the seeded player will walk away from the tournament with more points, solidifying the higher ranking–that he didn’t earn in the first place.

Making and breaking careers

The Matthew effect can have an impact on an even broader scale. Today’s tennis pros have been training and competing from a young age, and most of them have gotten quite a bit of help along the way, whether it’s the right coach, support from a national federation, or well-timed wild cards.

It’s tough to quantify things like the effect of a good or bad coach at age 15, but wild cards are a more easily understood example of the phenomenon. The unlucky unseeded player I discussed above at least got to enter the tournament. But when a Grand Slam-hosting federation decides which promising prospect gets a wild card, it’s all or nothing: One player gets a huge opportunity (cash and ranking points, even if they lose in the first round!) while the other one gets nothing.

This, in a nutshell, is why people like me spend so much time on our hobby horses ranting about wild cards. It isn’t the single tournament entry that’s the problem, it’s the cascading opportunities it can generate. Sure, sometimes it turns into nothing–Ryan Harrison’s career is starting to look that way–but even in those cases, we never hear about the players who didn’t get the wild cards, the ones who never had the chance to gain from the cumulative advantage of a small leg up.

Why all this luck matters

If you’re an avid tennis fan, most of this isn’t news to you. Sure, players face good and bad breaks, they get good and bad draws, and they’ve faced uneven challenges along the way.

By discussing all of these types of fortune in one place, I hope to emphasize just how much luck plays a part in our estimate of each player at any given time. It’s no accident that mid-range players bounce around the rankings so much. Some of them are truly streaky, and injuries play a part, but much of the variance can be explained by these varying forms of luck. The #30 player in the rankings is probably better than the #50 player, but it’s no guarantee. It doesn’t take much misfortune–especially when bad luck starts to breed more opportunities for bad luck–to tumble down the list.

Even if many of the forms of luck I’ve discussed are truly skill-based and, say, break point conversions are a matter of someone playing better that day, the evidence generally shows that major rises and falls in things like tiebreak winning percentage and break point conversion rates are temporary–they don’t persist from year to year. That may not be properly classed as luck, but if we’re projecting the rankings a year from now, it might as well be.

While match results, tournament outcomes, and the weekly rankings are written in stone, the way that players get there is not nearly so clear. We’d do well to accept that uncertainty.