Del Potro’s Draws and the Possible Persistence of Bad Luck

Tennis’s draw gods have not been kind to Juan Martin del Potro this year.

In Acapulco and Indian Wells, he drew Novak Djokovic as his second-match opponent. In Miami, Delpo got a third-rounder with Roger Federer. In each of the March Masters events, with 1,000 ranking points at stake, del Potro was handed the most difficult opponents for his first round against a fellow seed. Thanks in part to the resulting early exits, one of the most dangerous players on tour is still languishing outside of the top 30 in the ATP rankings.

When I wrote about the Indian Wells quarter of death–the section of the draw containing del Potro, Djokovic, Federer, Rafael Nadal, and Nick Kyrgios–I attempted to quantify the effect of the draw on each player’s expected ranking points. Before each player’s name was placed in the bracket, my model predicted that Delpo would earn about 150 ranking points–the weighted average of his likelihood of reaching the third round, the fourth round, and so on–and after the draw was conducted, his higher probability of a clash with Djokovic knocked that number down to just over 100. That negative effect was one of the worst of any player in the tournament.

The story in Miami is similar, if less extreme. Pre-draw, Delpo’s expected points were 183. Post draw: 155. In the four tournaments he has entered this year, he has been uniformly unlucky:

Tournament    Pre-Draw  Post-Draw  Effect  
Delray Beach      89.3       74.0  -17.1%  
Acapulco         121.5       97.1  -20.1%  
Indian Wells     154.6      102.5  -33.7%  
Miami            182.9      155.4  -15.0%  
TOTAL            548.2      429.0  -21.7%

*The numbers above for Indian Wells are slightly different than what I published in the Indian Wells article, since the simulations I ran for this post consider the entire 96-player field, not just the 64-player second round.

The good news, as we’ll see, is that it’s virtually impossible for this degree of misfortune to continue. The bad news is that those 119 points are gone forever, and at Delpo’s current position in the ranking table, that disadvantage will affect his tournament seeds, which in turn will result in worse draws (earlier meetings with higher-ranked players, independent of luck) for at least another few weeks.

Before we go any further, let me review the methodology I’m using here. (If you’re not interested, skip this paragraph.) For “post-draw” expected points, I’m taking jrank-based forecasts–like the ones on the front page of Tennis Abstract–and using each player’s probability of each round to calculate a weighted average of expected points. “Pre-draw” forecasts are much more computationally demanding. In Miami, for instance, Delpo could’ve faced any of the 64 unseeded players in the second round and been slated to meet any of the top eight seeds in the third round. For each tournament, I ran a Monte Carlo simulation with the tournament seeds, generating a new draw and simulating the tournament–100,000 times, then summing all those outcomes. So in the pre-draw forecast, Delpo had a one-eighth chance of getting Fed in the third round, a one-eighth chance of getting Kei Nishikori there, and so on.

It seems clear that a 22%, 119-point rankings hit over the course of four tournaments is some seriously bad luck. Last year, there were about 750 instances of a player being seeded at an ATP tournament, and in fewer than 60 of those, the draw resulted in an effect of -22% or worse on the player’s expected ranking points. And that’s just one tournament! The odds that Delpo would get such a rough deal in all four of his 2017 tournaments are 1 in more than 20,000.

Over the course of a full season, draw luck mostly evens out. It’s rare to see an effect of more than 10% in either direction. Last year, Thiemo de Bakker saw a painful difference of 18% between his pre-draw and post-draw expected points in 12 ATP events, but everyone else with at least that many tournaments fell between -11% and +11%, with three-quarters of players between -5% and +5%. Even when draw luck doesn’t balance itself out, the effect isn’t as bad as what Delpo has seen in 2017.

Del Potro’s own experience in 2016 is a case in point. His most memorable event of the season was the Olympics, where he drew Djokovic in the first round, so it’s easy to recall his year as being equally riddled with bad luck. But in his 12 other ATP events, the draw aided him in six–including a +34% boost at the US Open–and hurt him at the other six. Altogether, his 2016 ATP draws gave him a 5.9% advantage over his “pre-draw” expected points–a bonus of 17 ranking points. (I didn’t include the Olympics, since no ranking points were awarded there.)

Taken together, Delpo’s 2016-17 draws have deprived him of about 100 ranking points, which would move him three spots up the ranking table. So even with a short stretch of extreme misfortune, draw luck hasn’t affected him that much. Last year’s most extreme case among elite players, Richard Gasquet, suffered a similar effect: His draws knocked down his expected take by 9%, or 237 points, a difference that would bump him up from #22 to #19 in this week’s ranking list.

There are many reasons to believe that del Potro is a much better player than his current ranking suggests, such as his Elo rating, which stands at No. 7. But his ATP ranking reflects his limited schedule and modest start last year much more than it does the vagaries of each week’s brackets. The chances are near zero that he will continue to draw the toughest player in each tournament’s field in the earliest possible round, so we’ll soon have a better idea of what exactly he is capable of, and where exactly he should stand in the rankings.

The Indian Wells Quarter of Death

The Indian Wells men’s draw looks a bit lopsided this year. The bottom quarter, anchored by No. 2 seed Novak Djokovic, also features Roger Federer, Rafael Nadal, Juan Martin del Potro, and Nick Kyrgios. It doesn’t take much analysis to see that the bracket makes life more difficult for Djokovic, and by extension, it cleared the way for Andy Murray. Alas, Murray lost his opening match against Vasek Pospisil on Saturday, making No. 3 seed Stan Wawrinka the luckiest man in the desert.

The draw sets up some very noteworthy potential matches: Federer and Nadal haven’t played before the quarterfinal since their first encounter back in 2004, and Fed hasn’t played Djokovic before the semis in more than 40 meetings, since 2007. Kyrgios, who has now beaten all three of the elites in his quarter, is likely to get another chance to prove his mettle against the best.

I haven’t done a piece on draw luck for awhile, and this seemed like a great time to revisit the subject. The principle is straightforward: By taking the tournament field and generating random draws, we can do a sort of “retro-forecast” of what each player’s chances looked like before the draw was conducted–back when Djokovic’s road wouldn’t necessarily be so rocky. By comparing the retro-forecast to a projection based on the actual draw, we can see how much the luck of the draw impacted each player’s odds of piling up ranking points or winning the title.

Here are the eight players most heavily favored by the pre-draw forecast, along with the their chances of winning the title, both before and after the draw was conducted:

Player                 Pre-Draw  Post-Draw  
Novak Djokovic           26.08%     19.05%  
Andy Murray              19.30%     26.03%  
Roger Federer            10.24%      8.71%  
Rafael Nadal              5.46%      4.80%  
Stan Wawrinka             5.08%      7.14%  
Kei Nishikori             5.01%      5.67%  
Nick Kyrgios              4.05%      2.62%  
Juan Martin del Potro     4.00%      2.34%

These odds are based on my jrank rating system, which correlates closely with Elo. I use jrank here instead of Elo because it’s surface-specific. I’m also ignoring the first round of the main draw, which–since all 32 seeds get a first-round bye–is just a glorified qualifying round and has very little effect on the title chances of seeded players.

As you can see, the bottom quarter–the “group of death”–is in fact where title hopes go to die. Djokovic, who is still considered to be the best player in the game by both jrank and Elo, had a 26% pre-draw chance of defending his title, but it dropped to 19% once the names were placed in the bracket. Not coincidentally, Murray’s odds went in the opposite direction. Federer’s and Nadal’s title chances weren’t hit quite as hard, largely because they weren’t expected to get past Djokovic, no matter when they faced him.

The issue here isn’t just luck, it’s the limitation of the ATP ranking system. No one really thinks that del Potro entered the tournament as the 31st favorite, or that Kyrgios came in as the 15th. No set of rankings is perfect, but at the moment, the official rankings do a particularly poor job of reflecting the players with the best chances of winning hard court matches.  The less reliable the rankings, the better chance of a lopsided draw like the one in Indian Wells.

For a more in-depth look at the effect of the draw on players with lesser chances of winning the title, we need to look at “expected ranking points.” Using the odds that a player reaches each round, we can calculate his expected points for the entire event. For someone like Kyle Edmund, who would have almost no chance of winning the title regardless of the draw, expected points tells a more detailed story of the power of draw luck. Here are the ten players who were punished most severely by the bracket:

Player                 Pre-Draw Pts Post-Draw Pts  Effect  
Kyle Edmund                    28.8          14.3  -50.2%  
Steve Johnson                  65.7          36.5  -44.3%  
Vasek Pospisil                 29.1          19.4  -33.2%  
Juan Martin del Potro         154.0         104.2  -32.3%  
Stephane Robert                20.3          14.2  -30.1%  
Federico Delbonis              20.0          14.5  -27.9%  
Novak Djokovic                429.3         325.4  -24.2%  
Nick Kyrgios                  163.5         124.6  -23.8%  
Horacio Zeballos               17.6          14.1  -20.0%  
Alexander Zverev              113.6          91.5  -19.4%

At most tournaments, this list is dominated by players like Edmund and Pospisil: unseeded men with the misfortune of drawing an elite opponent in the first round. Much less common is to see so many seeds–particularly a top-two player–rating as the most unlucky. While Federer and Nadal don’t quite make the cut here, the numbers bear out our intuition: Fed’s draw knocked his expected points from 257 down to 227, and Nadal’s reduced his projected tally from 195 to 178.

The opposite list–those who enjoyed the best draw luck–features a lot of names from the top half, including both Murray and Wawrinka. Murray squandered his good fortune, putting Wawrinka in an even better position to take advantage of his own:

Player              Pre-Draw Pts  Post-Draw Pts  Effect  
Malek Jaziri                21.9           31.6   44.4%  
Damir Dzumhur               29.1           39.0   33.9%  
Martin Klizan               27.6           36.4   32.1%  
Joao Sousa                  24.7           31.1   25.9%  
Peter Gojowczyk             20.4           25.5   24.9%  
Tomas Berdych               93.6          116.6   24.6%  
Mischa Zverev               58.5           72.5   23.8%  
Yoshihito Nishioka          26.9           32.6   21.1%  
John Isner                  80.2           97.0   21.0%  
Andy Murray                369.1          444.2   20.3%  
Stan Wawrinka              197.8          237.7   20.1%

Over the course of the season, quirks like these tend to even out. Djokovic, on the other hand, must be wondering how he angered the draw gods: Just to earn a quarter-final place against Roger or Rafa, he’ll need to face Kyrgios and Delpo for the second consecutive tournament.

If Federer, Kyrgios, and del Potro can bring their ATP rankings closer in line with their true talent, they are less likely to find themselves in such dangerous draw sections. For Djokovic, that would be excellent news.

Measuring the Performance of Tennis Prediction Models

With the recent buzz about Elo rankings in tennis, both at FiveThirtyEight and here at Tennis Abstract, comes the ability to forecast the results of tennis matches. It’s not far fetched to ask yourself, which of these different models perform better and, even more interesting, how they fare compared to other ‘models’, such as the ATP ranking system or betting markets.

For this, admittedly limited, investigation, we collected the (implied) forecasts of five models, that is, FiveThirtyEight, Tennis Abstract, Riles, the official ATP rankings, and the Pinnacle betting market for the US Open 2016. The first three models are based on Elo. For inferring forecasts from the ATP ranking, we use a specific formula1 and for Pinnacle, which is one of the biggest tennis bookmakers, we calculate the implied probabilities based on the provided odds (minus the overround)2.

Next, we simply compare forecasts with reality for each model asking If player A was predicted to be the winner (P(a) > 0.5), did he really win the match? When we do that for each match and each model (ignoring retirements or walkovers) we come up with the following results.

Model		% correct
Pinnacle	76.92%
538		75.21%
TA		74.36%
ATP		72.65%
Riles		70.09%

What we see here is how many percent of the predictions were actually right. The betting model (based on the odds of Pinnacle) comes out on top followed by the Elo models of FiveThirtyEight and Tennis Abstract. Interestingly, the Elo model of Riles is outperformed by the predictions inferred from the ATP ranking. Since there are several parameters that can be used to tweak an Elo model, Riles may still have some room left for improvement.

However, just looking at the percentage of correctly called matches does not tell the whole story. In fact, there are more granular metrics to investigate the performance of a prediction model: Calibration, for instance, captures the ability of a model to provide forecast probabilities that are close to the true probabilities. In other words, in an ideal model, we want 70% forecasts to be true exactly in 70% of the cases. Resolution measures how much the forecasts differ from the overall average. The rationale here is, that just using the expected average values for forecasting will lead to a reasonably well-calibrated set of predictions, however, it will not be as useful as a method that manages the same calibration while taking current circumstances into account. In other words, the more extreme (and still correct) forecasts are, the better.

In the following table we categorize the set of predictions into bins of different probabilities and show how many percent of the predictions were correct per bin. This also enables us to calculate Calibration and Resolution measures for each model.

Model    50-59%  60-69%  70-79%  80-89%  90-100% Cal  Res   Brier
538      53%     61%     85%     80%     91%     .003 .082  .171
TA       56%     75%     78%     74%     90%     .003 .072  .182
Riles    56%     86%     81%     63%     67%     .017 .056  .211
ATP      50%     73%     77%     84%     100%    .003 .068  .185
Pinnacle 52%     91%     71%     77%     95%     .015 .093  .172

As we can see, the predictions are not always perfectly in line with what the corresponding bin would suggest. Some of these deviations, for instance the fact that for the Riles model only 67% of the 90-100% forecasts were correct, can be explained by small sample size (only three in that case). However, there are still two interesting cases (marked in bold) where sample size is better and which raised my interest. Both the Riles and Pinnacle models seem to be strongly underconfident (statistically significant) with their 60-69% predictions. In other words, these probabilities should have been higher, because, in reality, these forecasts were actually true 86% and 91% percent of the times.3 For the betting aficionados, the fact that Pinnacle underestimates the favorites here may be really interesting, because it could reveal some value as punters would say. For the Riles model, this would maybe be a starting point to tweak the model.

In the last three columns Calibration (the lower the better), Resolution (the higher the better), and the Brier score (the lower the better) are shown. The Brier score combines Calibration and Resolution (and the uncertainty of the outcomes) into a single score for measuring the accuracy of predictions. The models of FiveThirtyEight and Pinnacle (for the used subset of data) essentially perform equally good. Then there is a slight gap until the model of Tennis Abstract and the ATP ranking model come in third and fourth, respectively. The Riles model performs worst in terms of both Calibration and Resolution, hence, ranking fifth in this analysis.

To conclude, I would like to show a common visual representation that is used to graphically display a set of predictions. The reliability diagram compares the observed rate of forecasts with the forecast probability (similar to the above table).

The closer one of the colored lines is to the black line, the more reliable the forecasts are. If the forecast lines are above the black line, it means that forecasts are underconfident, in the opposite case, forecasts are overconfident. Given that we only investigated one tournament and therefore had to work with a low sample size (117 predictions), the big swings in the graph are somewhat expected. Still, we can see that the model based on ATP rankings does a really good job in preventing overestimations even though it is known to be outperformed by Elo in terms of prediction accuracy.

To sum up, this analysis shows how different predictive models for tennis can be compared among each other in a meaningful way. Moreover, I hope I could exhibit some of the areas where a model is good and where it’s bad. Obviously, this investigation could go into much more detail by, for example, comparing the models in how well they do for different kinds of players (e.g., based on ranking), different surfaces, etc. This is something I will spare for later. For now, I’ll try to get my sleeping patterns accustomed to the schedule of play for the Australian Open, and I hope, you can do the same.

This is a guest article by me, Peter Wetz. I am a computer scientist interested in racket sports and data analytics based in Vienna, Austria.

Footnotes

1. P(a) = a^e / (a^e + b^e) where a are player A’s ranking points, b are player B’s ranking points, and e is a constant. We use e = 0.85 for ATP men’s singles.

2. The betting market in itself is not really a model, that is, the goal of the bookmakers is simply to balance their book. This means that the odds, more or less, reflect the wisdom of the crowd, making it a very good predictor.

3. As an example, one instance, where Pinnacle was underconfident and all other models were more confident is the R32 encounter between Ivo Karlovic and Jared Donaldson. Pinnacle’s implied probability for Karlovic to win was 64%. The other models (except the also underconfident Riles model) gave 72% (ATP ranking), 75% (FiveThirtyEight), and 82% (Tennis Abstract). Turns out, Karlovic won in straight sets. One factor at play here might be that these were the US Open where more US citizens are likely to be confident about the US player Jared Donaldson and hence place a bet on him. As a consequence, to balance the book, Pinnacle will lower the odds on Donaldson, which results in higher odds (and a lower implied probability) for Karlovic.

Forecasting Davis Cup Doubles

One of the most enjoyable aspects of Davis Cup is the spotlight it shines on doubles. At ATP events, doubles matches are typically relegated to poorly-attended side courts. In Davis Cup, doubles gets a day of its own, and crowds turn out in force. Even better, the importance of Davis Cup inspires many players who normally skip doubles to participate.

Because singles specialists are more likely to play doubles, and because most Davis Cup doubles teams are not regular pairings, forecasting these matches is particularly difficult. In the past, I haven’t even tried. But now that we have D-Lo–Elo ratings for doubles–it’s a more manageable task.

To my surprise, D-Lo is even more effective with Davis Cup than it is with regular-season tour-level matches. D-Lo correctly predicts the outcome of about 65% of tour-level doubles matches since 2003. For Davis Cup World Group and World Group Play-Offs in that time frame, D-Lo is right 70% of the time. To put it another way, this is more evidence that Davis Cup is about the chalk.

What’s particularly odd about that result is that D-Lo itself isn’t that confident in its Davis Cup forecasts. For ATP events, D-Lo forecasts are well-calibrated, meaning that if you look at 100 matches where the favorite is given a 60% chance of winning, the favorite will win about 60 times. For the Davis Cup forecasts, D-Lo thinks the favorite should win about 60% of the time, but the higher-rated team ends up winning 70 matches out of 100.

Davis Cup’s best-of-five format is responsible for part of that discrepancy. In a typical ATP doubles match, the no-ad scoring and third-set tiebreak introduce more luck into the mix, making upsets more likely. A matchup that would result in a 60% forecast in the no-ad, super-tiebreak format translates to a 64.5% forecast in the best-of-five format. That accounts for about half the difference: Davis Cup results are less likely to be influenced by luck.

The other half may be due to the importance of the event. For many players, regular-season doubles matches are a distant second priority to singles, so they may not play at a consistent level from one match to the next. In Davis Cup, however, it’s a rare competitor who doesn’t give the doubles rubber 100% of their effort. Thus, we appear to have quite a few matches in which D-Lo picks the winner, but since it uses primarily tour-level results, it doesn’t realize how heavily the winner should have been favored.

Incidentally, home-court advantage doesn’t seem to play a big role in doubles outcomes. The hosting side has won 52.6% of doubles matches, an edge which could have as much to do with hosts’ ability to choose the surface as it is does with screaming crowds and home cooking. This isn’t a factor that affects D-Lo forecasts, as the system’s predictions are as accurate when it picks the away side as when it picks the home side.

Forecasting Argentina-Croatia doubles

Here are the D-Lo ratings for the eight nominated players this weekend. The asterisks indicate those players who are currently slated to contest tomorrow’s doubles rubber:

Player                 Side  D-Lo     
Juan Martin del Potro  ARG   1759     
Leonardo Mayer         ARG   1593  *  
Federico Delbonis      ARG   1540     
Guido Pella            ARG   1454  *  
                                      
Ivan Dodig             CRO   1856  *  
Marin Cilic            CRO   1677     
Ivo Karlovic           CRO   1580     
Franco Skugor          CRO   1569  *

As it stands now, Croatia has a sizable advantage. Based on the D-Lo ratings of the currently scheduled doubles teams, the home side has a 189-point edge, which converts to a 74.8% probability of winning. But remember, that’s the chance of winning a no-ad, super-tiebreak match, with all the luck that entails. In best-of-five, that translates to a whopping 83.7% chance of winning.

Making matters worse for Argentina, it’s likely that Croatia could improve their side. Argentina could increase their odds of winning the doubles rubber by playing Juan Martin del Potro, but given Delpo’s shaky physical health, it’s unlikely he’ll play all three days. Marin Cilic, on the other hand, could very well play as much as possible. A Cilic-Ivan Dodig pairing would have a 243-point advantage over Leonardo Mayer and Guido Pella, which translates to an 89% chance of winning a best-of-five match. Even Mayer’s Davis Cup heroics are unlikely to overcome a challenge of that magnitude.

Given the likelihood that Pella will sit on the bench for every meaningful singles match, it’s easy to wonder if there is a better option. Sure enough, in Horacio Zeballos, Argentina has a quality doubles player sitting at home. The two-time Grand Slam doubles semifinalist has a current D-Lo rating of 1758, almost identical to del Potro’s. Paired with Mayer, Zeballos would bring Argentina’s chances of upsetting a Dodig-Franco Skugor team to 43%. Zeballos-Mayer would also have a 32% chance of defeating Dodig-Cilic.

A full Argentina-Croatia forecast

With the doubles rubber sorted, let’s see who is likely to win the 2016 Davis Cup. Here are the Elo– and D-Lo-based forecasts for each currently-scheduled match, shown from the perspective of Croatia:

Rubber                      Forecast (CRO)  
Cilic v Delbonis                     90.8%  
Karlovic v del Potro                 15.8%  
Dodig/Skugor v Mayer/Pella           83.7%  
Cilic v del Potro                    36.3%  
Karlovic v Delbonis                  75.8%

Elo still believes Delpo is an elite-level player, which is why it makes him the favorite in the pivotal fourth rubber against Cilic. The system is less positive about Federico Delbonis, who it ranks 68th in the world, against his #41 spot on the ATP computer.

These match-by-match forecasts imply a 74.2% probability that Croatia will win the tie. That’s more optimistic than the betting market which, a few hours before play begins, gives Croatia about a 65% edge.

However, most of the tweaks we could make would move the needle further toward a Croatia victory. Delpo’s body may not allow him to play two singles matches at full strength, and the gap in singles skill between him and Mayer is huge. Croatia could improve their doubles chances if Cilic plays. And if there is a home-court or surface advantage, it would probably work against the South Americans.

Even more likely than a Croatian victory is a 1-1 split of the first two matches. If that happens, everything will hang in the balance tomorrow, when the world tunes in to watch a doubles match.

Forecasting the 2016 ATP World Tour Finals

Andy Murray is the #1 seed this week in London, but as I wrote for The Economist, Novak Djokovic likely remains the best player in the world. According to my Elo ratings, he would have a 63% chance of winning a head-to-head match between the two. And with the added benefit of an easier round-robin draw, the math heavily favors Djokovic to win the tournament.

Here are the results of a Monte Carlo simulation of the draw:

Player        SF      F      W  
Djokovic   95.3%  73.9%  54.6%  
Murray     86.3%  58.3%  29.7%  
Nishikori  60.4%  24.9%   7.8%  
Raonic     50.9%  16.3%   3.3%  
Wawrinka   29.4%   7.8%   1.6%  
Monfils    33.2%   8.7%   1.4%  
Cilic      23.9%   5.8%   1.1%  
Thiem      20.7%   4.1%   0.5%

I don’t think I’ve ever seen a player favored so heavily to progress out of the group stage. Murray’s 86% chance of doing so is quite high in itself; Novak’s 95% is otherworldly. His head-to-heads against the other players in his group are backed up by major differences in Elo points–Dominic Thiem is a lowly 15th on the Elo list, given only a 7.4% chance of beating the Serb.

If Milos Raonic is unable to compete, Djokovic’s chances climb even higher. Here are the probabilities if David Goffin takes Raonic’s place in the bracket:

Player        SF      F      W  
Djokovic   96.8%  75.2%  55.4%  
Murray     86.2%  60.7%  30.6%  
Nishikori  60.7%  26.3%   8.1%  
Monfils    47.7%  12.4%   1.8%  
Wawrinka   29.3%   8.5%   1.7%  
Cilic      23.8%   6.2%   1.1%  
Thiem      29.5%   5.8%   0.7%  
Goffin     26.0%   4.9%   0.5%

The luck of the draw was on Novak’s side. I ran another simulation with Djokovic and Murray swapping groups. Here, Djokovic is still heavily favored to win the tournament, but Murray’s semifinal chances get a sizable boost:

Player        SF      F      W  
Djokovic   92.8%  75.1%  54.9%  
Murray     90.9%  58.1%  29.8%  
Nishikori  58.4%  26.9%   7.5%  
Raonic     52.3%  14.3%   3.3%  
Wawrinka   26.9%   8.4%   1.6%  
Monfils    35.3%   7.5%   1.4%  
Cilic      21.9%   6.2%   1.0%  
Thiem      21.6%   3.4%   0.5%

Elo rates Djokovic so highly that he is favored no matter what the draw. But the draw certainly helped.

Doubles!

I’ve finally put together a sufficient doubles dataset to generate Elo ratings and tournament forecasts for ATP doubles. While I’m not quite ready to go into detail, I can say that, by using the Elo algorithm and rating players individually, the resulting forecasts outperform the ATP rankings about as much as singles Elo ratings do.

Here is the forecast for the doubles event at the World Tour Finals:

Team               SF      F      W  
Herbert/Mahut   76.4%  49.5%  32.1%  
Bryan/Bryan     68.7%  36.8%  19.9%  
Kontinen/Peers  55.7%  29.1%  13.8%  
Dodig/Melo      58.4%  28.1%  13.2%  
Murray/Soares   48.3%  20.8%   8.6%  
Lopez/Lopez     37.7%  16.4%   6.2%  
Klaasen/Ram     30.2%  11.9%   4.0%  
Huey/Mirnyi     24.6%   7.3%   2.2%

This distribution is more like what round-robin forecasts usually look like, without a massive gap between the top of the field and the rest. Pierre-Hugues Herbert and Nicolas Mahut are the top rated team, followed closely by Bob Bryan and Mike Bryan. Max Mirnyi was, at his peak, one of the highest Elo-rated doubles players, but his pairing with Treat Huey is the weakest of the bunch.

The men’s doubles bracket has some legendary names, along with some players–like Herbert and Henri Kontinen–who may develop into all-time greats, but it has no competitors who loom over the rest of the field like Murray and Djokovic do in singles.

Elo-Forecasting the WTA Tour Finals in Singapore

With the field of eight divided into two round-robin groups for the WTA Tour Finals in Singapore, we can play around with some forecasts for this event. I’ve updated my Elo ratings through last week’s tournaments, and the first thing that jumps out is how different they are from the official rankings.

Here’s the Singapore field:

EloRank  Player                Elo  Group  
2        Maria Sharapova      2296    RED  
4        Simona Halep         2181    RED  
6        Garbine Muguruza     2147  WHITE  
8        Petra Kvitova        2136  WHITE  
9        Angelique Kerber     2129  WHITE  
11       Agnieszka Radwanska  2100    RED  
15       Lucie Safarova       2051  WHITE  
21       Flavia Pennetta      2004    RED

Serena Williams (#1 in just about every imaginable ranking system) chose not to play, but if Elo ruled the day, Belinda Bencic, Venus Williams, and Victoria Azarenka would be playing this week in place of Agnieszka Radwanska, Lucie Safarova, and Flavia Pennetta.

Anyway, we’ll work with what we’ve got. Maria Sharapova is, according to Elo, a huge favorite here. The ratings translate into a forecast that looks like this:

Player                  SF  Final  Title  
Maria Sharapova      83.7%  61.1%  43.6%  
Simona Halep         60.8%  35.4%  15.9%  
Garbine Muguruza     59.4%  25.7%  11.3%  
Petra Kvitova        55.2%  23.0%   9.8%  
Angelique Kerber     53.1%  21.7%   8.8%  
Agnieszka Radwanska  37.4%  17.4%   6.1%  
Lucie Safarova       32.3%   9.7%   3.1%  
Flavia Pennetta      18.1%   6.0%   1.4%

If Sharapova is really that good, the loser in today’s draw was Simona Halep. The top seed would typically benefit from having the second seed in the other group, but because Garbine Muguruza recently took over the third spot in the rankings, Pova entered the draw as a dangerous floater.

However, these ratings don’t reflect the fact that Sharapova hasn’t completed a match since Wimbledon. They don’t decline with inactivity, so Pova’s rating is the same as it was the day after she lost to Serena back in July. (My algorithm also excludes retirements, so her attempted return in Wuhan isn’t considered.)

With as little as we know about Sharapova’s health, it’s tough to know how to tweak her rating. For lack of any better ideas, I revised her Elo rating to 2132, right between Petra Kvitova and Angelique Kerber. At her best, Sharapova is better than that, but consider this a way of factoring in the substantial possibility that she’ll play much, much worse–or that she’ll get injured and her matches will be played by Carla Suarez Navarro instead. The revised forecast:

Player                  SF  Final  Title  
Simona Halep         69.9%  40.9%  24.0%  
Garbine Muguruza     59.4%  31.5%  16.5%  
Maria Sharapova      57.6%  29.5%  14.5%  
Petra Kvitova        55.6%  28.4%  14.4%  
Angelique Kerber     52.5%  26.3%  13.2%  
Agnieszka Radwanska  47.9%  22.3%   9.9%  
Lucie Safarova       32.6%  12.9%   4.9%  
Flavia Pennetta      24.7%   8.3%   2.7%

If this is a reasonably accurate estimate of Sharapova’s current ability, the Red group suddenly looks like the right place to be. Because Elo doesn’t give any particular weight to Grand Slams, it suggests that the official rankings far overestimate the current level of Safarova and Pennetta. The weakness of those two makes Halep a very likely semifinalist and also means that, in this forecast, the winner of the tournament is more likely (54% to 46%) to come from the White group.

Without Serena, and with Sharapova’s health in question, there are simply no dominant players in the field this week. If nothing else, these forecasts illustrate that we’d be foolish to take any Singapore predictions too seriously.

Forecasting the Effects of Performance Byes in Beijing

To the uninitiated, the WTA draw in Beijing this week looks a little strange. The 64-player draw includes four byes, which were given to the four semifinalists from last week’s event in Wuhan. So instead of empty places in the bracket next to the top four seeds, those free passes go to the 5th, 10th, and 15th seeds, along with one unseeded player, Venus Williams.

“Performance byes”–those given to players based on their results the previous week, rather than their seed–have occasionally featured in WTA draws over the last few years. If you’re interested in their recent history, Victoria Chiesa wrote an excellent overview.

I’m interested in measuring the benefit these byes confer on the recipients–and the negative effect they have on the players who would have received those byes had they been awarded in the usual way. I’ve written about the effects of byes before, but I haven’t contrasted different approaches to awarding them.

This week, the beneficiaries are Garbine Muguruza, Angelique Kerber, Roberta Vinci, and Venus Williams. The top four seeds–the women who were atypically required to play first-round matches, were Simona Halep, Petra Kvitova, Flavia Pennetta, and Agnieszka Radwanska.

To quantify the impact of the various possible formats of a 64-player draw, I used a variety of tools: Elo to rate players and predict match outcomes, Monte Carlo tournament simulations to consider many different permutations of each draw, and a modified version of my code to “reseed” brackets. While this is complicated stuff under the hood, the results aren’t that opaque.

Here are three different types of 64-player draws that Beijing might have employed:

  1. Performance byes to last week’s semifinalists. This gives a substantial boost to the players receiving byes, and compared to any other format, has a negative effect on top players. Not only are the top four seeds required to play a first-round match, they are a bit more likely to play last week’s semifinalists, since the byes give those players a better chance of advancing.
  2. Byes to the top four seeds. The top four seeds get an obvious boost, and everyone else suffers a bit, as they are that much more likely to face the top four.
  3. No byes: 64 players in the draw instead of 60. The clear winners in this scenario are the players who wouldn’t otherwise make it into the main draw. Unseeded players (excluding Venus) also benefit slightly, as the lack of byes mean that top players are less likely to advance.

Let’s crunch the numbers. For each of the three scenarios, I ran simulations based on the field without knowing how the draw turned out. That is, Kvitova is always seeded second, but she doesn’t always play Sara Errani in the first round. This approach eliminates any biases in the actual draw. To simulate the 64-player field, I added the four top-ranked players who lost in the final round of qualifying.

To compare the effects of each draw type on every player, I calculated “expected points” based on their probability of reaching each round. For instance, if Halep entered the tournament with a 20% chance of winning the event with its 1,000 ranking points, she’d have 200 “expected points,” plus her expected points for the higher probabilities (and lower number of points) of reaching every round in between. It’s simply a way of combining a lot of probabilities into a single easier-to-understand number.

Here are the expected points in each draw scenario (plus the actual Beijing draw) for the top four players, the four players who received performance byes, plus a couple of others (Belinda Bencic and Caroline Wozniacki) who rated particularly highly:

Player               Seed  PerfByes  TopByes  NoByes  Actual  
Simona Halep            1       323      364     330     341  
Petra Kvitova           2       276      323     290     291  
Venus Williams                  247      216     218     279  
Belinda Bencic         11       255      249     268     254  
Garbine Muguruza        5       243      202     210     227  
Angelique Kerber       10       260      224     235     227  
Caroline Wozniacki      8       208      203     205     199  
Flavia Pennetta         3       142      177     144     195  
Agnieszka Radwanska     4       185      233     192     188  
Roberta Vinci          15       120       91      94      90

As expected, the top four seeds are expected to reap far more points when given first-round byes. It’s most noticeable for Pennetta and Radwanska, who would enjoy a 20% boost in expected points if given a first-round bye. Oddly, though, the draw worked out very favorably for Flavia–Elo gave her a 95% chance of beating her first-round opponent Xinyun Han, and her draw steered her relatively clear of other dangerous players in subsequent rounds.

Similarly, the performance byes are worth a 15 to 30% advantage in expected points to the players who receive them. Vinci is the biggest winner here, as we would generally expect from the player most likely to suffer an upset without the bye.

Like Pennetta, Venus was treated very well by the way the draw turned out. The bye already gave her an approximately 15% boost compared to her expectations without a bye, and the draw tacked another 13% onto that. Both the structure of the draw and some luck on draw day made her the event’s third most likely champion, while the other scenarios would have left her in fifth.

All byes–conventional or unconventional–work to the advantage of some players and against others. However they are granted, they tend to work in favor of those who are already successful, whether that success is over the course of a year or a single week.

Performance byes are easy enough to defend: They give successful players a bit more rest between two demanding events, and from the tour’s perspective, they make it a little more likely that last week’s best players won’t pull off of this week’s tourney. And if all byes tend to the make the rich a little richer, at least performance byes open the possibility of benefiting different players than usual.