Men’s Doubles On the Dirt

Angelique Kerber wasn’t the only top seed to crash out early at this year’s French Open. In the men’s doubles draw, the top section opened up when Henri Kontinen and John Peers, the world’s top-ranked team, lost to the Spanish pair of David Marrero and Tommy Robredo. It’s plausible to attribute the upset to the clay, as Kontinen-Peers have tallied a pedestrian five wins against four losses on the dirt this season and one could guess that the Spaniards are at their strongest on clay.

Fortunately we don’t have to guess. Using a doubles variant of sElo–surface-specific Elo, which I began writing about a few days ago in the context of women’s singles–we can make rough estimates of how Kontinen/Peers would fare against Marrero/Robredo on each surface. The top seeds are solid on all surfaces–less than a year ago, they won a clay title in Hamburg–but stronger on hard courts. sElo ranks them 4th and 8th on hard, but 10th and 13th on clay among tour regulars.  Marrero is the surface-specialist of the bunch, ranking 37th on clay and 78th on hard. Robredo throws a wrench into the exercise, as he has played very little doubles recently, only eight events since the beginning of 2016.

Using these numbers–including those derived from Robredo’s limited sample–we find that sElo would have given Kontinen/Peers a 73.6% chance of winning yesterday, compared to a 78.3% advantage on a hard court. Even if we adjust Robredo’s clay-court sElo to something closer to his all-surface rating, the top seeds still look like 69% favorites.

A more striking example comes from yesterday’s other big upset, in which Julio Peralta and Horacio Zeballos took out Feliciano Lopez and Marc Lopez. On any surface, the Lopezes are the superior team, but Peralta and Zeballos have a much larger surface differential:

Player    Hard sElo  Clay sElo  
M Lopez        1720       1804  
F Lopez        1713       1772  
Zeballos       1651       1756  
Peralta        1517       1770

On a hard court, sElo gives the Lopezes a 68.1% chance of winning this matchup. But on clay, the gap narrows all the way to 53.6%. It’s still a bit of an upset for the South Americans, but not one that should come as much of a surprise.

Mismatches

I’ve speculated in the past that surface preferences aren’t as pronounced in doubles as they are in singles. Regardless of surface, points are shorter, and many teams position one player at the net even on the dirt. While some hard-courters are probably uncomfortable on clay (and vice versa), I wouldn’t expect the effects to be as substantial as they are in singles.

The numbers tell a different story. Here are the top ten, ranked by hard court sElo:

Rank  Player          Hard sElo  
1     Jack Sock            1947  
2     Nicolas Mahut        1893  
3     Marcelo Melo         1883  
4     Henri Kontinen       1879  
5     P-H Herbert          1862  
6     Bob Bryan            1851  
7     Mike Bryan           1846  
8     John Peers           1842  
9     Bruno Soares         1829  
10    Jamie Murray         1828

By clay court sElo:

Rank  Player                Clay sElo  
1     Mike Bryan                 1950  
2     Bob Bryan                  1950  
3     P-H Herbert                1894  
4     Nicolas Mahut              1889  
5     Jack Sock                  1887  
6     Robert Farah               1850  
7     Juan Sebastian Cabal       1849  
8     Pablo Cuevas               1824  
9     Rohan Bopanna              1812  
10    John Peers                 1810

Jamie Murray and Bruno Soares, who appear in the hard court top ten, sit outside the top 25 in clay court sElo. Robert Farah and Juan Sebastian Cabal are 41st and 42nd in hard court sElo, despite ranking in the clay court top seven. Pablo Cuevas, another clay court top-tenner, is 87th on the hard court list.

To go beyond these anecdotes–noteworthy as they are–we need to compare the level of surface preference in men’s doubles to other tours. To do that, I calculated the correlation coefficent between hard court and clay court sElo for the top 50 players (ranked by overall Elo) in men’s doubles, men’s singles, and women’s singles. (I don’t yet have an adequate database to generate ratings for women’s doubles.)

In other words, we’re testing how much a player’s results on one surface predict his or her results on the other major surface. The higher the correlation coefficient, the more likely it is that a player will have similar results on hard and clay. Here’s how the tours compare:

Tour             Correl  
Men's Singles     0.708  
Women's Singles   0.417  
Men's Doubles     0.323

In contrast to my hypothesis above, surface preferences in men’s doubles appear to be much stronger than in either men’s or women’s singles. (And there’s a huge difference between men’s and women’s singles, but that’s a subject for another day.)

Randomness

I suspect that the low correlation of surface-specific Elos in men’s doubles is partly due to the more random nature of doubles results. Because the event is more serve-dominated, there are more close sets ending in tiebreaks, and because of the no-ad, super-tiebreak format used outside of Slams, tight matches are decided by a smaller number of points. Thus, every doubles player’s results–and their various Elo ratings–reflect the influence of chance more than the singles results are.

Another consideration–one that I haven’t yet made sense of–is that surface-specific ratings don’t improve doubles forecasts they way that they do men’s and women’s singles predictions. As I wrote on Sunday, sElo represents a big improvement over surface-neutral Elo for women’s forecasts, and in an upcoming post, I’ll be able to make some similar observations for the men’s game. Using Brier score, a measure of the calibration of predictions, we can see the effect of using surface-specific Elo ratings in 2016 tour-level matches:

Tour             Elo Brier  sElo Brier  
Men's Singles        0.202       0.169  
Women's Singles      0.220       0.179  
Men's Doubles        0.171       0.181

The lower the Brier score, the more accurate the forecasts. This isn’t a fluke of 2016: The differences in men’s doubles Brier scores are around 0.01 for each of the last 15 seasons. By this measure, Elo does a very good job predicting the outcome of men’s doubles matches, but the surface-specific sElo represents a small step back. It could be that the smaller sample–using only one surface’s worth of results–is more damaging to forecasts in doubles than it is in singles.

Doubles analytics is particularly uncharted territory, and there’s plenty of work remaining for researchers even in this narrow subtopic. There’s lots of work to do for the world’s top doubles players as well, now that we can point to a noticeably weaker surface for so many of them.

The Steadily Less Predictable WTA

Update: The numbers in this post summarizing the effectiveness of sElo are much too high–a bug in my code led to calculating effectiveness with post-match ratings instead of pre-match ratings. The parts of the post that don’t have to do with sElo are unaffected and–I hope–remain of interest.

One of the talking points throughout the 2017 WTA season has been the unpredictability of the field. With the absence of Serena Williams, Victoria Azarenka, and until recently, Petra Kvitova and Maria Sharapova, there is a dearth of consistently dominant players. Many of the top remaining players have been unsteady as well, due to some combination of injury (Simona Halep), extreme surface preferences (Johanna Konta), and good old-fashioned regression to the mean (Angelique Kerber).

No top seed has yet won a title at the Premier level or above so far this year. Last week, Stephanie Kovalchik went into more detail, quantifying how seeds have failed to meet expectations and suggesting that the official WTA ranking system–the algorithm that determines which players get those seeds–has failed.

There are plenty of problems with the WTA ranking system, especially if you expect it to have predictive value–that is, if you want it to properly reflect the performance level of players right now. Kovalchik is correct that the rankings have done a particularly poor job this year identifying the best players. However, there’s something else going on: According to much more accurate algorithms, the WTA is more chaotic than it has been for decades.

Picking winners

Let’s start with a really basic measurement: picking winners. Through Rome, there had been more than 1100 completed WTA matches. The higher-ranked player won 62.4% of those. Since 1990, the ranking system has picked the winner of 67.9% of matches, and topped 70% during several years in the 1990s. It never fell below 66% until 2014, and this year’s 62.4% is the worst in the 28-year time frame under consideration.

Elo does a little better. It rates players by the quality of their opponents, meaning that draw luck is taken out of the equation, and does a better job of estimating the ability level of players like Serena and Sharapova, who for various reasons have missed long stretches of time. Since 1990, Elo has picked the winner of 68.6% of matches, falling to an all-time low of 63.1% so far in 2017.

For a big improvement, we need surface-specific Elo (sElo). An effective surface-based system isn’t as complicated as I expected it to be. By generating separate rankings for each surface (using only matches on that surface), sElo has correctly predicted the winner of 76.2% of matches since 1990, almost cracking 80% back in 1992. Even sElo is baffled by 2017, falling to it’s lowest point of 71.0% in 2017.

(sElo for all three major surfaces is now shown on the Tennis Abstract Elo ratings report.)

This graph shows how effectively the three algorithms picked winners. It’s clear that sElo is far better, and the graph also shows that some external factor is driving the predictability of results, affecting the accuracy of all three systems to a similar degree:

Brier scores

We see a similar effect if we use a more sophisticated method to rate the WTA ranking system against Elo and sElo. The Brier score of a collection of predictions measures not only how accurate they are, but also how well calibrated they are–that is, a player forecast to win a matchup 90% of the time really does win nine out of ten, not six out of ten, and vice versa. Brier scores average the square of the difference between each prediction and its corresponding result. Because it uses the square, very bad predictions (for instance, that a player has a 95% chance of winning a match she ended up losing) far outweigh more pedestrian ones (like a player with a 95% chance going on to win).

In 2017 so far, the official WTA ranking system has a Brier score of .237, compared to Elo of .226 and sElo of .187. Lower is better, since we want a system that minimizes the difference between predictions and actual outcomes. All three numbers are the highest of any season since 1990. The corresponding averages over that time span are .207 (WTA), .202 (Elo), and .164 (sElo).

As with the simpler method of counting correct predictions, we see that Elo is a bit better than the official ranking, and both of the surface-agnostic methods are crushed by sElo, even though the surface-specific method uses considerably less data. (For instance, the clay-specific Elo ignores hard and grass court results entirely.) And just like the results of picking winners, we see that the differences in Brier scores of the three methods are fairly consistent, meaning that some other factor is causing the year-to-year differences:

The takeaway

The WTA ranking system has plenty of issues, but its unusually bad performance this year isn’t due to any quirk in the algorithm. Elo and sElo are structured completely differently–the only thing they have in common with the official system is that they use WTA match results–and they show the same trends in both of the above metrics.

One factor affecting the last two years of forecasting accuracy is the absence of players like Serena, Sharapova, and Azarenka. If those three played full schedules and won at their usual clip, there would be quite a few more correct predictions for all three systems, and perhaps there would be fewer big upsets from the players who have tried to replace them at the top of the game.

But that isn’t the whole story. A bunch of no-brainer predictions don’t affect Brier score very much, and the presence of heavily-favored players also make it more likely that massively surprising results occur, such as Serena’s loss to Madison Brengle, or Sharapova’s ouster at the hands of Eugenie Bouchard. Many unexpected results are completely independent of the top ten, like Marketa Vondrousova’s recent title in Biel.

While some of the year-to-year differences in the graphs above are simply noise, the last several years looks much more like a meaningful trend. It could be that we are seeing a large-scale changing of a guard, with young players (and their low rankings) regularly upsetting established stars, while the biggest names in the sport are spending more time on the sidelines. Upsets may also be somewhat contagious: When one 19-year-old aspirant sees a peer beating top-tenners, she may be more confident that she can do the same.

Whatever influences have given us the WTA’s current state of unpredictability, we can see that it’s not just a mirage created by a flawed ranking system. Upsets are more common now than at any other point in recent memory, whichever algorithm you use to pick your favorites.

The Indian Wells Quarter of Death

The Indian Wells men’s draw looks a bit lopsided this year. The bottom quarter, anchored by No. 2 seed Novak Djokovic, also features Roger Federer, Rafael Nadal, Juan Martin del Potro, and Nick Kyrgios. It doesn’t take much analysis to see that the bracket makes life more difficult for Djokovic, and by extension, it cleared the way for Andy Murray. Alas, Murray lost his opening match against Vasek Pospisil on Saturday, making No. 3 seed Stan Wawrinka the luckiest man in the desert.

The draw sets up some very noteworthy potential matches: Federer and Nadal haven’t played before the quarterfinal since their first encounter back in 2004, and Fed hasn’t played Djokovic before the semis in more than 40 meetings, since 2007. Kyrgios, who has now beaten all three of the elites in his quarter, is likely to get another chance to prove his mettle against the best.

I haven’t done a piece on draw luck for awhile, and this seemed like a great time to revisit the subject. The principle is straightforward: By taking the tournament field and generating random draws, we can do a sort of “retro-forecast” of what each player’s chances looked like before the draw was conducted–back when Djokovic’s road wouldn’t necessarily be so rocky. By comparing the retro-forecast to a projection based on the actual draw, we can see how much the luck of the draw impacted each player’s odds of piling up ranking points or winning the title.

Here are the eight players most heavily favored by the pre-draw forecast, along with the their chances of winning the title, both before and after the draw was conducted:

Player                 Pre-Draw  Post-Draw  
Novak Djokovic           26.08%     19.05%  
Andy Murray              19.30%     26.03%  
Roger Federer            10.24%      8.71%  
Rafael Nadal              5.46%      4.80%  
Stan Wawrinka             5.08%      7.14%  
Kei Nishikori             5.01%      5.67%  
Nick Kyrgios              4.05%      2.62%  
Juan Martin del Potro     4.00%      2.34%

These odds are based on my jrank rating system, which correlates closely with Elo. I use jrank here instead of Elo because it’s surface-specific. I’m also ignoring the first round of the main draw, which–since all 32 seeds get a first-round bye–is just a glorified qualifying round and has very little effect on the title chances of seeded players.

As you can see, the bottom quarter–the “group of death”–is in fact where title hopes go to die. Djokovic, who is still considered to be the best player in the game by both jrank and Elo, had a 26% pre-draw chance of defending his title, but it dropped to 19% once the names were placed in the bracket. Not coincidentally, Murray’s odds went in the opposite direction. Federer’s and Nadal’s title chances weren’t hit quite as hard, largely because they weren’t expected to get past Djokovic, no matter when they faced him.

The issue here isn’t just luck, it’s the limitation of the ATP ranking system. No one really thinks that del Potro entered the tournament as the 31st favorite, or that Kyrgios came in as the 15th. No set of rankings is perfect, but at the moment, the official rankings do a particularly poor job of reflecting the players with the best chances of winning hard court matches.  The less reliable the rankings, the better chance of a lopsided draw like the one in Indian Wells.

For a more in-depth look at the effect of the draw on players with lesser chances of winning the title, we need to look at “expected ranking points.” Using the odds that a player reaches each round, we can calculate his expected points for the entire event. For someone like Kyle Edmund, who would have almost no chance of winning the title regardless of the draw, expected points tells a more detailed story of the power of draw luck. Here are the ten players who were punished most severely by the bracket:

Player                 Pre-Draw Pts Post-Draw Pts  Effect  
Kyle Edmund                    28.8          14.3  -50.2%  
Steve Johnson                  65.7          36.5  -44.3%  
Vasek Pospisil                 29.1          19.4  -33.2%  
Juan Martin del Potro         154.0         104.2  -32.3%  
Stephane Robert                20.3          14.2  -30.1%  
Federico Delbonis              20.0          14.5  -27.9%  
Novak Djokovic                429.3         325.4  -24.2%  
Nick Kyrgios                  163.5         124.6  -23.8%  
Horacio Zeballos               17.6          14.1  -20.0%  
Alexander Zverev              113.6          91.5  -19.4%

At most tournaments, this list is dominated by players like Edmund and Pospisil: unseeded men with the misfortune of drawing an elite opponent in the first round. Much less common is to see so many seeds–particularly a top-two player–rating as the most unlucky. While Federer and Nadal don’t quite make the cut here, the numbers bear out our intuition: Fed’s draw knocked his expected points from 257 down to 227, and Nadal’s reduced his projected tally from 195 to 178.

The opposite list–those who enjoyed the best draw luck–features a lot of names from the top half, including both Murray and Wawrinka. Murray squandered his good fortune, putting Wawrinka in an even better position to take advantage of his own:

Player              Pre-Draw Pts  Post-Draw Pts  Effect  
Malek Jaziri                21.9           31.6   44.4%  
Damir Dzumhur               29.1           39.0   33.9%  
Martin Klizan               27.6           36.4   32.1%  
Joao Sousa                  24.7           31.1   25.9%  
Peter Gojowczyk             20.4           25.5   24.9%  
Tomas Berdych               93.6          116.6   24.6%  
Mischa Zverev               58.5           72.5   23.8%  
Yoshihito Nishioka          26.9           32.6   21.1%  
John Isner                  80.2           97.0   21.0%  
Andy Murray                369.1          444.2   20.3%  
Stan Wawrinka              197.8          237.7   20.1%

Over the course of the season, quirks like these tend to even out. Djokovic, on the other hand, must be wondering how he angered the draw gods: Just to earn a quarter-final place against Roger or Rafa, he’ll need to face Kyrgios and Delpo for the second consecutive tournament.

If Federer, Kyrgios, and del Potro can bring their ATP rankings closer in line with their true talent, they are less likely to find themselves in such dangerous draw sections. For Djokovic, that would be excellent news.

Measuring the Performance of Tennis Prediction Models

With the recent buzz about Elo rankings in tennis, both at FiveThirtyEight and here at Tennis Abstract, comes the ability to forecast the results of tennis matches. It’s not far fetched to ask yourself, which of these different models perform better and, even more interesting, how they fare compared to other ‘models’, such as the ATP ranking system or betting markets.

For this, admittedly limited, investigation, we collected the (implied) forecasts of five models, that is, FiveThirtyEight, Tennis Abstract, Riles, the official ATP rankings, and the Pinnacle betting market for the US Open 2016. The first three models are based on Elo. For inferring forecasts from the ATP ranking, we use a specific formula1 and for Pinnacle, which is one of the biggest tennis bookmakers, we calculate the implied probabilities based on the provided odds (minus the overround)2.

Next, we simply compare forecasts with reality for each model asking If player A was predicted to be the winner (P(a) > 0.5), did he really win the match? When we do that for each match and each model (ignoring retirements or walkovers) we come up with the following results.

Model		% correct
Pinnacle	76.92%
538		75.21%
TA		74.36%
ATP		72.65%
Riles		70.09%

What we see here is how many percent of the predictions were actually right. The betting model (based on the odds of Pinnacle) comes out on top followed by the Elo models of FiveThirtyEight and Tennis Abstract. Interestingly, the Elo model of Riles is outperformed by the predictions inferred from the ATP ranking. Since there are several parameters that can be used to tweak an Elo model, Riles may still have some room left for improvement.

However, just looking at the percentage of correctly called matches does not tell the whole story. In fact, there are more granular metrics to investigate the performance of a prediction model: Calibration, for instance, captures the ability of a model to provide forecast probabilities that are close to the true probabilities. In other words, in an ideal model, we want 70% forecasts to be true exactly in 70% of the cases. Resolution measures how much the forecasts differ from the overall average. The rationale here is, that just using the expected average values for forecasting will lead to a reasonably well-calibrated set of predictions, however, it will not be as useful as a method that manages the same calibration while taking current circumstances into account. In other words, the more extreme (and still correct) forecasts are, the better.

In the following table we categorize the set of predictions into bins of different probabilities and show how many percent of the predictions were correct per bin. This also enables us to calculate Calibration and Resolution measures for each model.

Model    50-59%  60-69%  70-79%  80-89%  90-100% Cal  Res   Brier
538      53%     61%     85%     80%     91%     .003 .082  .171
TA       56%     75%     78%     74%     90%     .003 .072  .182
Riles    56%     86%     81%     63%     67%     .017 .056  .211
ATP      50%     73%     77%     84%     100%    .003 .068  .185
Pinnacle 52%     91%     71%     77%     95%     .015 .093  .172

As we can see, the predictions are not always perfectly in line with what the corresponding bin would suggest. Some of these deviations, for instance the fact that for the Riles model only 67% of the 90-100% forecasts were correct, can be explained by small sample size (only three in that case). However, there are still two interesting cases (marked in bold) where sample size is better and which raised my interest. Both the Riles and Pinnacle models seem to be strongly underconfident (statistically significant) with their 60-69% predictions. In other words, these probabilities should have been higher, because, in reality, these forecasts were actually true 86% and 91% percent of the times.3 For the betting aficionados, the fact that Pinnacle underestimates the favorites here may be really interesting, because it could reveal some value as punters would say. For the Riles model, this would maybe be a starting point to tweak the model.

In the last three columns Calibration (the lower the better), Resolution (the higher the better), and the Brier score (the lower the better) are shown. The Brier score combines Calibration and Resolution (and the uncertainty of the outcomes) into a single score for measuring the accuracy of predictions. The models of FiveThirtyEight and Pinnacle (for the used subset of data) essentially perform equally good. Then there is a slight gap until the model of Tennis Abstract and the ATP ranking model come in third and fourth, respectively. The Riles model performs worst in terms of both Calibration and Resolution, hence, ranking fifth in this analysis.

To conclude, I would like to show a common visual representation that is used to graphically display a set of predictions. The reliability diagram compares the observed rate of forecasts with the forecast probability (similar to the above table).

The closer one of the colored lines is to the black line, the more reliable the forecasts are. If the forecast lines are above the black line, it means that forecasts are underconfident, in the opposite case, forecasts are overconfident. Given that we only investigated one tournament and therefore had to work with a low sample size (117 predictions), the big swings in the graph are somewhat expected. Still, we can see that the model based on ATP rankings does a really good job in preventing overestimations even though it is known to be outperformed by Elo in terms of prediction accuracy.

To sum up, this analysis shows how different predictive models for tennis can be compared among each other in a meaningful way. Moreover, I hope I could exhibit some of the areas where a model is good and where it’s bad. Obviously, this investigation could go into much more detail by, for example, comparing the models in how well they do for different kinds of players (e.g., based on ranking), different surfaces, etc. This is something I will spare for later. For now, I’ll try to get my sleeping patterns accustomed to the schedule of play for the Australian Open, and I hope, you can do the same.

Peter Wetz is a computer scientist interested in racket sports and data analytics based in Vienna, Austria.

Footnotes

1. P(a) = a^e / (a^e + b^e) where a are player A’s ranking points, b are player B’s ranking points, and e is a constant. We use e = 0.85 for ATP men’s singles.

2. The betting market in itself is not really a model, that is, the goal of the bookmakers is simply to balance their book. This means that the odds, more or less, reflect the wisdom of the crowd, making it a very good predictor.

3. As an example, one instance, where Pinnacle was underconfident and all other models were more confident is the R32 encounter between Ivo Karlovic and Jared Donaldson. Pinnacle’s implied probability for Karlovic to win was 64%. The other models (except the also underconfident Riles model) gave 72% (ATP ranking), 75% (FiveThirtyEight), and 82% (Tennis Abstract). Turns out, Karlovic won in straight sets. One factor at play here might be that these were the US Open where more US citizens are likely to be confident about the US player Jared Donaldson and hence place a bet on him. As a consequence, to balance the book, Pinnacle will lower the odds on Donaldson, which results in higher odds (and a lower implied probability) for Karlovic.

The Unexpectedly Predictable IPTL

December is here, and with the tennis offseason almost five days old, it’s time to resume the annual ritual of pretending we care about exhibitions. The hit-and-giggle circuit gets underway in earnest tomorrow with the kickoff, in Japan, of the 2016 IPTL slate.

The star-studded IPTL, or International Premier Tennis League, is two years old, and uses a format similar to that of the USA’s World Team Tennis. Each match consists of five separate sets: one each of men’s singles, women’s singles, (men’s) champions’ singles, men’s doubles, and mixed doubles. Games are no-ad, each set is played to six games, and a tiebreak is played at 5-5. At the end of all those sets, if both teams have the same number of games, representatives of each side’s sponsors thumb-wrestle to determine the winner. Or something like that. It doesn’t really matter.

As with any exhibition, players don’t take the competition too seriously. Elites who sit out November tournaments due to injury find themselves able to compete in December, given a sufficient appearance fee. It’s entertaining, but compared to the first eleven months of the year, it isn’t “real” tennis.

That triggers an unusual research question: How predictable are IPTL sets? If players have nothing at stake, are outcomes simply random? Or do all the participants ease off to an equivalent degree, resulting in the usual proportion of sets going the way of the favorite?

Last season, there were 29 IPTL “matches,” meaning that we have a dataset consisting of 29 sets each of men’s singles, women’s singles, and men’s doubles. (For lack of data, I won’t look at mixed doubles, and for lack of interest, forget about champion’s singles.) Except for a handful of singles specialists who played doubles, we have plenty of data on every player. Using Elo ratings, we can generate forecasts for every set based on each competitor’s level at the time.

Elo-based predictions spit out forecasts for standard best-of-three contests, so we’ll need to adjust those a bit. Single-set results are more random, so we would expect a few more upsets. For instance, when Roger Federer faced Ivo Karlovic last December, Elo gave him an 89.9% chance of winning a traditional match, and the relevant IPTL forecast is a more modest 80.3%. With these estimates, we can see how many sets went the way of the favorite and how many upsets we should have expected given the short format.

Let’s start with men’s singles. Karlovic beat Federer, and Nick Kyrgios lost a set to Ivan Dodig, but in general, decisions went the direction we would expect. Of the 29 sets, favorites won 18, or 62.1%. The Elo single-set forecasts imply that the favorites should have won 64.2%, or 18.6 sets. So far, so predictable: If IPTL were a regular-season event, its results wouldn’t be statistically out of place.

The results are similar for women’s singles. The forecasts show the women’s field to be more lopsided, due mostly to the presence of Serena Williams and Maria Sharapova. Elo expected that the favorites would win 20.4, or 70.4% of the 29 sets. In fact, the favorites won 21 of 29.

The men’s doubles results are more complex, but they nonetheless provide further evidence that IPTL results are predictable. Elo implied that most of the men’s doubles matches were close: Only one match (Kei Nishikori and Pierre-Hugues Herbert against Gael Monfils and Rohan Bopanna) had a forecast above 62%, and overall, the system expected only 16.4 victories for the favorites, or 56.4%. In fact, the Elo-favored teams won 19, or 65.5% of the 29 sets, more than the singles favorites did.

The difference of less than three wins in a small sample could easily just be noise, but even so, a couple of explanations spring to mind. First, almost every team had at least one doubles specialist, and those guys are accustomed to the rapid-fire no-ad format. Second, the higher-than-usual number of non-specialists–such as Federer, Nishikori, and Monfils–means that the player ratings may not be as reliable as they are for specialists, or for singles. It might be the case that Nishikori is a better doubles player than Monfils, but because both usually stick to singles, no rating system can capture the difference in abilities very accurately.

Here is a summary of all these results:

Competition      Sets  Fave W  Fave W%  Elo Forecast%  
Men's Singles      29      18    62.1%          64.2%  
Women's Singles    29      21    72.4%          70.4%  
ALL SINGLES        58      39    67.3%          67.3%  
                                                       
Men's Doubles      29      19    65.5%          56.4%  
ALL SETS           87      58    66.7%          63.7%

Taken together, last season’s evidence shows that IPTL contests tend to go the way of the favorites. In fact, when we account for the differences in format, favorites win more often than we’d expect. That’s the surprising bit. The conventional wisdom suggests that the elites became champions thanks to their prowess at high-pressure moments; many dozens of pros could reach the top if they were only stronger mentally. In exhos, the mental game is largely taken out of the picture, yet in this case, the elites are still winning.

No matter how often the favorites win, these matches are still meaningless, and I’m not about to include them in the next round of player ratings. However, it’s a mistake to disregard exhibitions entirely. By offering a contrast to the high-pressure tournaments of the regular season, they may offer us perspectives we can’t get anywhere else.

Forecasting Davis Cup Doubles

One of the most enjoyable aspects of Davis Cup is the spotlight it shines on doubles. At ATP events, doubles matches are typically relegated to poorly-attended side courts. In Davis Cup, doubles gets a day of its own, and crowds turn out in force. Even better, the importance of Davis Cup inspires many players who normally skip doubles to participate.

Because singles specialists are more likely to play doubles, and because most Davis Cup doubles teams are not regular pairings, forecasting these matches is particularly difficult. In the past, I haven’t even tried. But now that we have D-Lo–Elo ratings for doubles–it’s a more manageable task.

To my surprise, D-Lo is even more effective with Davis Cup than it is with regular-season tour-level matches. D-Lo correctly predicts the outcome of about 65% of tour-level doubles matches since 2003. For Davis Cup World Group and World Group Play-Offs in that time frame, D-Lo is right 70% of the time. To put it another way, this is more evidence that Davis Cup is about the chalk.

What’s particularly odd about that result is that D-Lo itself isn’t that confident in its Davis Cup forecasts. For ATP events, D-Lo forecasts are well-calibrated, meaning that if you look at 100 matches where the favorite is given a 60% chance of winning, the favorite will win about 60 times. For the Davis Cup forecasts, D-Lo thinks the favorite should win about 60% of the time, but the higher-rated team ends up winning 70 matches out of 100.

Davis Cup’s best-of-five format is responsible for part of that discrepancy. In a typical ATP doubles match, the no-ad scoring and third-set tiebreak introduce more luck into the mix, making upsets more likely. A matchup that would result in a 60% forecast in the no-ad, super-tiebreak format translates to a 64.5% forecast in the best-of-five format. That accounts for about half the difference: Davis Cup results are less likely to be influenced by luck.

The other half may be due to the importance of the event. For many players, regular-season doubles matches are a distant second priority to singles, so they may not play at a consistent level from one match to the next. In Davis Cup, however, it’s a rare competitor who doesn’t give the doubles rubber 100% of their effort. Thus, we appear to have quite a few matches in which D-Lo picks the winner, but since it uses primarily tour-level results, it doesn’t realize how heavily the winner should have been favored.

Incidentally, home-court advantage doesn’t seem to play a big role in doubles outcomes. The hosting side has won 52.6% of doubles matches, an edge which could have as much to do with hosts’ ability to choose the surface as it is does with screaming crowds and home cooking. This isn’t a factor that affects D-Lo forecasts, as the system’s predictions are as accurate when it picks the away side as when it picks the home side.

Forecasting Argentina-Croatia doubles

Here are the D-Lo ratings for the eight nominated players this weekend. The asterisks indicate those players who are currently slated to contest tomorrow’s doubles rubber:

Player                 Side  D-Lo     
Juan Martin del Potro  ARG   1759     
Leonardo Mayer         ARG   1593  *  
Federico Delbonis      ARG   1540     
Guido Pella            ARG   1454  *  
                                      
Ivan Dodig             CRO   1856  *  
Marin Cilic            CRO   1677     
Ivo Karlovic           CRO   1580     
Franco Skugor          CRO   1569  *

As it stands now, Croatia has a sizable advantage. Based on the D-Lo ratings of the currently scheduled doubles teams, the home side has a 189-point edge, which converts to a 74.8% probability of winning. But remember, that’s the chance of winning a no-ad, super-tiebreak match, with all the luck that entails. In best-of-five, that translates to a whopping 83.7% chance of winning.

Making matters worse for Argentina, it’s likely that Croatia could improve their side. Argentina could increase their odds of winning the doubles rubber by playing Juan Martin del Potro, but given Delpo’s shaky physical health, it’s unlikely he’ll play all three days. Marin Cilic, on the other hand, could very well play as much as possible. A Cilic-Ivan Dodig pairing would have a 243-point advantage over Leonardo Mayer and Guido Pella, which translates to an 89% chance of winning a best-of-five match. Even Mayer’s Davis Cup heroics are unlikely to overcome a challenge of that magnitude.

Given the likelihood that Pella will sit on the bench for every meaningful singles match, it’s easy to wonder if there is a better option. Sure enough, in Horacio Zeballos, Argentina has a quality doubles player sitting at home. The two-time Grand Slam doubles semifinalist has a current D-Lo rating of 1758, almost identical to del Potro’s. Paired with Mayer, Zeballos would bring Argentina’s chances of upsetting a Dodig-Franco Skugor team to 43%. Zeballos-Mayer would also have a 32% chance of defeating Dodig-Cilic.

A full Argentina-Croatia forecast

With the doubles rubber sorted, let’s see who is likely to win the 2016 Davis Cup. Here are the Elo– and D-Lo-based forecasts for each currently-scheduled match, shown from the perspective of Croatia:

Rubber                      Forecast (CRO)  
Cilic v Delbonis                     90.8%  
Karlovic v del Potro                 15.8%  
Dodig/Skugor v Mayer/Pella           83.7%  
Cilic v del Potro                    36.3%  
Karlovic v Delbonis                  75.8%

Elo still believes Delpo is an elite-level player, which is why it makes him the favorite in the pivotal fourth rubber against Cilic. The system is less positive about Federico Delbonis, who it ranks 68th in the world, against his #41 spot on the ATP computer.

These match-by-match forecasts imply a 74.2% probability that Croatia will win the tie. That’s more optimistic than the betting market which, a few hours before play begins, gives Croatia about a 65% edge.

However, most of the tweaks we could make would move the needle further toward a Croatia victory. Delpo’s body may not allow him to play two singles matches at full strength, and the gap in singles skill between him and Mayer is huge. Croatia could improve their doubles chances if Cilic plays. And if there is a home-court or surface advantage, it would probably work against the South Americans.

Even more likely than a Croatian victory is a 1-1 split of the first two matches. If that happens, everything will hang in the balance tomorrow, when the world tunes in to watch a doubles match.

Why Novak Djokovic is Still Number One

Two weeks ago, Andy Murray took over the ATP #1 ranking from Novak Djokovic. Yesterday, he defeated Djokovic in their first meeting since June, securing his place at the top of the year-end ranking table. Murray has been outstanding in the second half of this season, winning all but three of his matches since the Roland Garros final, and he capped the year in style, beating four top-five players to claim the title at the World Tour Finals.

Despite all that, Murray is not the best player in the world. That title still belongs to Djokovic. Since June, Murray has closed the gap, establishing himself as part of what we might call the “Big Two,” but he hasn’t quite ousted his rival. There’s no question that over this period, Murray has played better–that sort of thing is occasionally debatable, but this season it’s just historical fact–but identifying the best player implies something more predictive, and it’s much more difficult to determine by simply looking over a list of recent results.

The ATP rankings generally do a good job of telling us which players are better than others. But the official system has two major problems: It ignores opponent quality, and it artificially limits its scope to the last 52 weeks. Pundits and fans tend to have different problems: They often give too much credit to opponent quality (“He beat Djokovic, so now he’s number one!”) and exhibit an even more extreme recency bias (“He’s looked unbeatable this week!”).

Two systems that avoid these issues–Elo and Jrank–both place Djokovic comfortably ahead of Murray. These algorithms handle the details of recent matches and opponent quality differently from each other, but what they share in common is more important: They consider opponent quality and they don’t use an arbitrary time cutoff like the ATP ranking system does.

Here’s how the three methods would forecast a Djokovic-Murray match, were it held today:

  • ATP: Murray favored, 51.6% chance of winning
  • Elo: Djokovic favored, 61.6% chance of winning
  • Jrank: Djokovic favored, 57.0% chance of winning

Betting markets favored Djokovic by a margin of slightly more than 60/40 yesterday, though bettors probably gave him some of that edge because they thought Murray would be fatigued after his marathon match on Saturday.

As I wrote last week, Elo doesn’t deny that Murray has had a tremendous half-season. Instead, it gives him less credit than the official algorithm does for victories over lesser opponents (such as John Isner in the Paris Masters final), and it recognizes that he started his current run of form at an enormous disadvantage. With his title in London, Murray reached a new peak Elo rating, but it still isn’t enough to overtake Djokovic.

Even though Elo still prefers Novak by a healthy margin, it reflects how much the situation at the top of the ranking list has changed. At the beginning of 2016, Elo gave Djokovic a 76.5% chance of winning a head-to-head against Murray, and that probability rose as high as 81% in April. It fell below 70% after the Olympics, and the gap is now the smallest it has been since February 2011.

Last week illustrates how difficult it will be for Murray take over the #1 Elo ranking place. The pre-tournament Elo difference of 91 points between the two players has shrunk by only 8%, to 84 points. Murray’s win yesterday was worth a bit more than a measly seven points, but Djokovic had several opportunities to nudge his rating upwards in his first four matches, as well. Despite some of Novak’s head-scratching losses this fall, he still wins most of his matches–some of them against very good players–slowing the decline of his Elo rating.

Of course, Elo is just a measuring stick–like any ranking system, it doesn’t tell us what’s really happening on court. It’s possible that Murray has made a significant (and semi-permanent) leap forward or that Djokovic has taken a major step back. On the other hand, streaks happen even without such leaps, and they always end. The smart money is usually on small, gradual changes to the status quo, and Elo gives us a way to measure those changes.

For Elo to rate Murray ahead of Djokovic, it will probably require several more months of these gradual changes. The only faster alternative is for Djokovic to start losing more matches to the likes of Jiri Vesely and Sam Querrey. When faced with dramatic evidence, Elo makes more dramatic changes. While Djokovic has occasionally provided that evidence this season, he has usually offered enough proof–like four wins at the World Tour Finals–to comfortably maintain his position at the top.