Best of Five and Marin Cilic’s Improbable Collapse

Italian translation at settesei.it

Leading up to the final two rubbers of this year’s Davis Cup final in Croatia, the hosts were heavily favored. They held a 2-1 advantage, and both of the remaining singles matches would pit a Croatian against a lower-ranked Argentine. To win the Cup, they only needed to win one of those matches.

When Marin Cilic built a two-set lead over Juan Martin del Potro, Croatian fans could be forgiven for thinking it was in the bag. Instead, Delpo fought back to win in five sets, and Federico Delbonis upset a flat Ivo Karlovic to seal Argentina’s first-ever Davis Cup title. Some people will point to the Cilic-Delpo match time of 4:53 as another reason to switch to best-of-three. The rest of us will see it as yet another reminder of why best-of-five must retain its role on tennis’s biggest stages.

In a best-of-three format, Cilic would’ve claimed the Cup for Croatia after two hours of play. Instead, he merely came very close. My Elo singles ratings gave Cilic a 36.3% chance of beating Delpo and Karlovic a 75.8% chance of defeating Delbonis. Taken together, that’s a likelihood of 84.6% that Croatia would claim the trophy. After Cilic won the first two sets, his odds increased to about 81%, pushing Croatia’s chances over the 95% mark. In fourteen previous tries, del Potro had never recovered from an 0-2 deficit.

And then Argentina came back. Comebacks from two sets down tend to stick in our memory, so it’s easy to forget just how rare they are. Yesterday’s match was only the 28th such comeback in 2016. That’s out of a pool of 656 best-of-five contests, including 431 in which one player built a 2-0 lead. This year isn’t unusual: Going back to 2000, the number of wins from a 0-2 deficit has never exceeded 32.

Comebacks from 0-2 are even rarer in Davis Cup. At the World Group level this year, including play-offs, Delpo was the 61st player to fall into a 0-2 hole, but he was only the second to recover and win the match. The other was Jack Sock, whose July comeback (over Cilic–more on that in a bit) wasn’t enough to move his USA squad into the semifinals. Since 2000, 5.8% of 2-0 leads turn into comeback victories, but only 4.3% of World Group 2-0 leads do the same.

Cilic’s season has defied the numbers. In addition to his 2-0 collapses against Sock and del Potro, he held a 2-0 advantage before losing to Roger Federer in the Wimbledon quarterfinals. His 2016 is only the third time in ATP history that a player lost three or more matches after winning the first two sets. The previous two–Viktor Troicki’s 2015 season and Jan Siemerink’s 1997–are unlikely to make Cilic feel any better.

Still, even Cilic’s record indicates the rarity of victories from an 0-2 disadvantage. Before the Wimbledon quarterfinal, the Croatian had never lost a match after taking the first two sets, for a record of 60-0. Even now, his Davis Cup record after going up two sets to love is a respectable 11-2. His overall career mark of 95.7% (66-3) is better than average.

Unless Cilic crumbles under certain spotlights (but not others, as evidenced by his five-set win over Delbonis on Friday), his series of unfortunate collapses may just be a fluke. In addition to that 60-0 streak, he has never had a problem converting one-set leads in best-of-three matches. This year, he won 29 out of 33 best-of-threes after winning the first set, an above-average rate of 88%. (And one of the losses was against Dominic Thiem, so he never had a chance.)

The longer the match format, the more likely that the better player emerges triumphant. That’s why there are fewer upsets in best-of-five than in best-of-three, and why tiebreaks are often little better than flips of a coin. Usually that works in favor of a top-tenner such as Cilic: In most matchups he is the superior player. But in two of his three collapses this season, he’s fallen victim to a favorite who uses the longer format to overcome an early run of poor form.

The debate over best-of-five will surely continue, despite this weekend’s Davis Cup tie adding another unforgettable five-set epic to an already long list. But after Delpo’s performance yesterday, you’ll have a harder time finding someone to campaign for shorter matches–especially in Argentina.

Forecasting Davis Cup Doubles

One of the most enjoyable aspects of Davis Cup is the spotlight it shines on doubles. At ATP events, doubles matches are typically relegated to poorly-attended side courts. In Davis Cup, doubles gets a day of its own, and crowds turn out in force. Even better, the importance of Davis Cup inspires many players who normally skip doubles to participate.

Because singles specialists are more likely to play doubles, and because most Davis Cup doubles teams are not regular pairings, forecasting these matches is particularly difficult. In the past, I haven’t even tried. But now that we have D-Lo–Elo ratings for doubles–it’s a more manageable task.

To my surprise, D-Lo is even more effective with Davis Cup than it is with regular-season tour-level matches. D-Lo correctly predicts the outcome of about 65% of tour-level doubles matches since 2003. For Davis Cup World Group and World Group Play-Offs in that time frame, D-Lo is right 70% of the time. To put it another way, this is more evidence that Davis Cup is about the chalk.

What’s particularly odd about that result is that D-Lo itself isn’t that confident in its Davis Cup forecasts. For ATP events, D-Lo forecasts are well-calibrated, meaning that if you look at 100 matches where the favorite is given a 60% chance of winning, the favorite will win about 60 times. For the Davis Cup forecasts, D-Lo thinks the favorite should win about 60% of the time, but the higher-rated team ends up winning 70 matches out of 100.

Davis Cup’s best-of-five format is responsible for part of that discrepancy. In a typical ATP doubles match, the no-ad scoring and third-set tiebreak introduce more luck into the mix, making upsets more likely. A matchup that would result in a 60% forecast in the no-ad, super-tiebreak format translates to a 64.5% forecast in the best-of-five format. That accounts for about half the difference: Davis Cup results are less likely to be influenced by luck.

The other half may be due to the importance of the event. For many players, regular-season doubles matches are a distant second priority to singles, so they may not play at a consistent level from one match to the next. In Davis Cup, however, it’s a rare competitor who doesn’t give the doubles rubber 100% of their effort. Thus, we appear to have quite a few matches in which D-Lo picks the winner, but since it uses primarily tour-level results, it doesn’t realize how heavily the winner should have been favored.

Incidentally, home-court advantage doesn’t seem to play a big role in doubles outcomes. The hosting side has won 52.6% of doubles matches, an edge which could have as much to do with hosts’ ability to choose the surface as it is does with screaming crowds and home cooking. This isn’t a factor that affects D-Lo forecasts, as the system’s predictions are as accurate when it picks the away side as when it picks the home side.

Forecasting Argentina-Croatia doubles

Here are the D-Lo ratings for the eight nominated players this weekend. The asterisks indicate those players who are currently slated to contest tomorrow’s doubles rubber:

Player                 Side  D-Lo     
Juan Martin del Potro  ARG   1759     
Leonardo Mayer         ARG   1593  *  
Federico Delbonis      ARG   1540     
Guido Pella            ARG   1454  *  
                                      
Ivan Dodig             CRO   1856  *  
Marin Cilic            CRO   1677     
Ivo Karlovic           CRO   1580     
Franco Skugor          CRO   1569  *

As it stands now, Croatia has a sizable advantage. Based on the D-Lo ratings of the currently scheduled doubles teams, the home side has a 189-point edge, which converts to a 74.8% probability of winning. But remember, that’s the chance of winning a no-ad, super-tiebreak match, with all the luck that entails. In best-of-five, that translates to a whopping 83.7% chance of winning.

Making matters worse for Argentina, it’s likely that Croatia could improve their side. Argentina could increase their odds of winning the doubles rubber by playing Juan Martin del Potro, but given Delpo’s shaky physical health, it’s unlikely he’ll play all three days. Marin Cilic, on the other hand, could very well play as much as possible. A Cilic-Ivan Dodig pairing would have a 243-point advantage over Leonardo Mayer and Guido Pella, which translates to an 89% chance of winning a best-of-five match. Even Mayer’s Davis Cup heroics are unlikely to overcome a challenge of that magnitude.

Given the likelihood that Pella will sit on the bench for every meaningful singles match, it’s easy to wonder if there is a better option. Sure enough, in Horacio Zeballos, Argentina has a quality doubles player sitting at home. The two-time Grand Slam doubles semifinalist has a current D-Lo rating of 1758, almost identical to del Potro’s. Paired with Mayer, Zeballos would bring Argentina’s chances of upsetting a Dodig-Franco Skugor team to 43%. Zeballos-Mayer would also have a 32% chance of defeating Dodig-Cilic.

A full Argentina-Croatia forecast

With the doubles rubber sorted, let’s see who is likely to win the 2016 Davis Cup. Here are the Elo– and D-Lo-based forecasts for each currently-scheduled match, shown from the perspective of Croatia:

Rubber                      Forecast (CRO)  
Cilic v Delbonis                     90.8%  
Karlovic v del Potro                 15.8%  
Dodig/Skugor v Mayer/Pella           83.7%  
Cilic v del Potro                    36.3%  
Karlovic v Delbonis                  75.8%

Elo still believes Delpo is an elite-level player, which is why it makes him the favorite in the pivotal fourth rubber against Cilic. The system is less positive about Federico Delbonis, who it ranks 68th in the world, against his #41 spot on the ATP computer.

These match-by-match forecasts imply a 74.2% probability that Croatia will win the tie. That’s more optimistic than the betting market which, a few hours before play begins, gives Croatia about a 65% edge.

However, most of the tweaks we could make would move the needle further toward a Croatia victory. Delpo’s body may not allow him to play two singles matches at full strength, and the gap in singles skill between him and Mayer is huge. Croatia could improve their doubles chances if Cilic plays. And if there is a home-court or surface advantage, it would probably work against the South Americans.

Even more likely than a Croatian victory is a 1-1 split of the first two matches. If that happens, everything will hang in the balance tomorrow, when the world tunes in to watch a doubles match.

Can Nick Kyrgios Win a Grand Slam?

Italian translation at settesei.it

Today’s breaking news? Former Wimbledon finalist Mark Philippoussis thinks that Nick Kyrgios can win the Australian Open. Hey, it’s almost the offseason. We take our news wherever we can get it.

Still, it’s an interesting question. Is it possible for such a volatile, one-dimensional player to string together seven wins on one of the biggest stages in the sport? Philippoussis–not the most versatile of players himself–reached two Slam finals. A big serve can take you far.

Last year, I published a post investigating the “minimum viable return game,” the level of return success that a player would need to maintain in order to reach the highest echelon of men’s tennis. It’s rare to finish a season in the top ten without winning at least 38% of return points, though a few players, including Milos Raonic, have managed it. When I wrote that article, Kyrgios’s average for the previous 52 weeks was a measly 31.7%, almost in the territory of John Isner and Ivo Karlovic.

Kyrgios has improved since then. In 2016, he won 35.4% of return points, almost equal to Raonic’s 35.9%–and most would agree that Milos had an excellent year. Philippoussis’s career mark was only 34.9%, though Kyrgios would be lucky to play as many tournaments on grass and carpet as Philippoussis did. Still, a sub-36% rate of return points won isn’t usually good enough in today’s game: Raonic was only the third player since 1991 (along with Pete Sampras and Goran Ivanisevic) to finish a season in the top five with such a low rate.

Then again, Philippoussis didn’t say anything about finishing in the top five. The “minimum viable Slam-winning return game” might be different. Looking at all Grand Slam champions back to 1991, here are the lowest single-tournament rates of return points won:

Year  Slam             Player               RPW%                     
2001  Wimbledon        Goran Ivanisevic    31.1%  
1996  US Open          Pete Sampras        32.8%  
2009  Wimbledon        Roger Federer       33.7%  
2002  US Open          Pete Sampras        35.6%  
2000  Wimbledon        Pete Sampras        36.6%  
2010  Wimbledon        Rafael Nadal        36.8%  
2014  Australian Open  Stan Wawrinka       37.0%  
1998  Wimbledon        Pete Sampras        37.2%  
1991  Wimbledon        Michael Stich       37.4%  
2000  US Open          Marat Safin         37.5%

Wimbledon is well-represented here, as we might expect. Not so for Kyrgios’s home Slam: Stan Wawrinka‘s 2014 Australian Open title is the only time it appears in the top 20, even though it has played very fast in recent years. Every other Melbourne titlist won at least 39.5% of return points. As with year-end top-ten finishes, 38% is a reasonable rule of thumb for the minimum viable level, though on rare occasions, it is possible to come in below that.

The bar is set: Can Kyrgios clear it? 18 months ago, when Kyrgios’s 52-week return-points-won average was below 32%, the obvious answer would have been negative. His current mark above 35% makes the question a more interesting one. To win a Slam, he’ll probably need to return better, but only for seven matches.

The Australian has enjoyed one seven-match streak–in fact, a nine-match run–that would be more than good enough. Combining his title in Marseille and his semifinal showing in Dubai this Februrary, Kyrgios played almost nine matches (he retired with a back injury in the last one) while winning a whopping 41.5% of return points. At 42 of the last 104 Slams, the champion has won return points at a lower rate.

However, February was an aberration. To approximate Kyrgios’s success over the length of a Slam, I looked at his return points won over every possible streak of ten matches. (Most of his matches have been best-of-three, so ten matches is about the same number of points as a Slam title run.) Aside from the streaks involving Marseille and Dubai this year, he has never topped 37% for that length of time.

There’s always hope for improvement, especially for a mercurial 21-year-old in a sport dominated by older men. But the evidence is against him here, as well. Research by falstaff78 suggests that players do not substantially improve their return statistics as they mature. That may seem counterintuitive, since some players clearly do develop their skills. However, as players get better, they go deeper in tournaments and alter their schedules, changing the mix of opponents they face. Two years ago, Kyrgios faced seven top-20 players. This year he played 18. Raonic, who represents an optimistic career trajectory for Kyrgios, faced 26 this season.

Against the top 20–the sorts of Grand Slam opponents a player has to beat to get from the fourth round to the trophy ceremony–Kyrgios has won less than 30% of his career return points. Even Raonic, who has yet to win a Slam himself, has done better, and won 32.6% of return points against top-20 opponents this year.

There’s little doubt that Kyrgios has the serve to win Grand Slams. And once the Big Four retire, I suppose someone will have to win the majors. But even in weak eras, you need to break serve, and at Slams, you typically need to do so many times, and against very high-quality opponents. The evidence we have so far strongly implies that Kyrgios, like Philippoussis before him, will struggle to triumph at a Slam.

The Speed of Every Surface, 2016 Edition

More than five years after I first started trying to use ATP match stats to estimate surface speed, the issue remains a contentious one. Most commentators agree that surface speeds have converged and generally gotten slower. The ATP has begun to release a trickle of court speed data, but it raises more questions than answers.

It’s been three years since I’ve published surface speed numbers, so we’re due for an update. Before we do that, it’s important to understand what exactly these figures mean, as well as their limitations.

Court surfaces–and, more broadly, the environments in which pro matches are played–have a variety of characteristics. Some courts are faster or slower and some cause higher or lower bounces. Tournaments use different balls, are played at a range of elevations, and take place in all sorts of weather conditions. All of these factors, and more, affect how matches are played.

Due to the limits of available tennis data, however, we can’t isolate those different factors. It would be great to know which surfaces allowed for the most effective slice approaches or the deadliest drop shots, but we don’t have the data to even begin trying to answer those questions. The Match Charting Project is a step in the right direction, but with only a few hundred men’s matches per year, there isn’t quite enough to compare surfaces while controlling for different players and playing styles.

So we work with what we have. Faster surfaces are more favorable to the server, which shows up in ace counts and service breaks. The ATP publishes those basic stats for every match, so that’s what we’ll use. When I first researched this issue, I discovered that there isn’t much difference between counting aces and counting service breaks, except that there’s a wider variation in ace rates between faster and slower surfaces, so the resulting numbers are easier to understand.

At the risk of repeating myself: Measuring surface speed by ace rate ignores a lot of court characteristics. It is far from complete and certainly imperfect. It does, however, give us an idea of how tournaments compare in one important regard.

Aces, adjusted

That said, simply counting aces–for example, 6.8% of points in Buenos Aires this year and 11.2% of points in Los Cabos–isn’t good enough. Players make scheduling choices based on their strengths and preferences, so the guys who show up for clay court events tend, on average, to be weaker servers than those who play on hard and grass courts. To take an extreme example, Gilles Muller managed to play only two matches on clay this season. As it turns out, the courts in Buenos Aires and Los Cabos had almost identical effects on ace rates–the difference is entirely due to the mix of players in each draw.

So we adjust for the makeup of the field. For every player with at least three tour-level matches on clay and another three on hard or grass, I calculated their season average ace rates on clay and hard/grass,which I then weighted (one-third clay, two-thirds hard/grass) so that the numbers give us idea of what their ace rate would’ve been had they played an “average” (that is, unbiased by scheduling preferences) season. I’ve lumped hard and grass together here, not because they are the same–of course they’re not–but because the small number of grass court events makes it difficult to treat on its own.

With player averages in hand, we can go through every match of the season (between players who meet our minimums) and, using their ace rates and the rates at which players hit aces against them, calculate a “predicted” ace rate for the match, given a neutral surface. Then, by comparing the match’s actual ace rate to the neutral prediction, we get one data point regarding the surface’s effect on aces. If the actual ace rate is greater than the prediction, it suggests the surface is faster than average. If the prediction is greater than the ace rate, it implies the surface is slower than average.

No single match can tell us about a court’s tendency, but by aggregating all the matches at an event, we get a fairly good idea. With that final step, we get a single number per event. A neutral surface rates at 1, faster surfaces are greater than 1, and slower surfaces are less than 1. For instance, this algorithm rates the 2016 Paris Masters as 1.18, meaning that there were 18% more aces than we would expect on a neutral surface, rating Bercy as faster than all but 10 other events this season.

Whew! Here are the ace-based surface ratings for the last three seasons of every current tour-level event listed from fastest to slowest:

Tournament            Surface  2016 Ace%  2016  2015  2014  
Shenzhen                 Hard      12.9%  1.54  1.20  1.49  
Quito                    Clay      11.9%  1.50  0.89        
Metz                     Hard      12.6%  1.43  1.28  1.37  
Marseille                Hard      15.3%  1.38  1.28  1.26  
Stuttgart               Grass      13.3%  1.38  1.32  0.89  
Chengdu                  Hard      11.7%  1.27              
Australian Open          Hard      12.3%  1.25  1.19  1.12  
Queen's Club            Grass      14.3%  1.25  1.27  1.26  
Washington               Hard      19.5%  1.24  1.12  1.25  
Cincinnati Masters       Hard      14.2%  1.18  1.04  1.17  
Paris Masters            Hard      13.7%  1.18  1.03  1.03  
Brisbane                 Hard      12.2%  1.16  1.20  1.23  
Canada Masters           Hard      12.6%  1.16  1.08  1.00  
Halle                   Grass      12.2%  1.16  1.12  1.31  
Nottingham              Grass      12.0%  1.15  1.21        
Gstaad                   Clay      10.1%  1.12  0.84  0.77  
Basel                    Hard      10.1%  1.12  1.01  1.20  
Tokyo                    Hard      11.5%  1.12  1.00  1.06  
Chennai                  Hard      10.3%  1.12  0.91  0.65  
Auckland                 Hard      12.9%  1.11  1.21  1.01  
                                                            
Tournament            Surface  2016 Ace%  2016  2015  2014  
Doha                     Hard       8.8%  1.11  1.06  0.83  
Sydney                   Hard      10.5%  1.11  1.32  1.27  
Montpellier              Hard       9.7%  1.10  1.29  1.29  
Shanghai Masters         Hard      10.7%  1.10  1.05  1.34  
Kitzbuhel                Clay       6.9%  1.09  0.85  0.81  
s-Hertogenbosch         Grass      13.2%  1.08  1.06  1.05  
Winston-Salem            Hard      10.4%  1.07  1.33  1.10  
Newport                 Grass      11.0%  1.07  1.26  1.23  
Tour Finals              Hard       9.5%  1.06  0.99  0.89  
Wimbledon               Grass      11.8%  1.06  1.20  1.35  
Rotterdam                Hard       9.8%  1.04  1.19  1.08  
Vienna                   Hard      11.8%  1.02  1.39  1.26  
Memphis                  Hard       8.7%  1.00  1.19  0.94  
Miami Masters            Hard      10.0%  1.00  0.86  1.04  
Sofia                    Hard       8.4%  1.00              
Beijing                  Hard       9.4%  0.99  1.05  0.81  
Atlanta                  Hard      15.5%  0.97  1.35  0.90  
St.Petersburg            Hard       8.1%  0.97  0.98        
Marrakech                Clay       8.5%  0.95              
Olympics                 Hard       7.1%  0.95              
                                                            
Tournament            Surface  2016 Ace%  2016  2015  2014  
Moscow                   Hard       6.6%  0.94  1.08  1.12  
Antwerp                  Hard       8.6%  0.93              
Delray Beach             Hard       9.2%  0.92  0.88  0.93  
US Open                  Hard       8.9%  0.91  1.10  1.10  
Dubai                    Hard       9.4%  0.88  0.93  0.81  
Madrid Masters           Clay       8.6%  0.86  0.85  0.94  
Los Cabos                Hard      11.2%  0.85              
Buenos Aires             Clay       6.8%  0.85  0.78  0.64  
Houston                  Clay      11.5%  0.84  0.76  0.70  
Sao Paulo                Clay       7.1%  0.83  1.03  1.20  
Acapulco                 Hard      10.5%  0.83  0.67  0.98  
Indian Wells Masters     Hard       8.2%  0.83  0.99  0.90  
Stockholm                Hard       7.6%  0.82  1.13  1.15  
Rio de Janeiro           Clay       7.4%  0.81  0.80  0.77  
Estoril                  Clay       7.4%  0.80  0.63  0.62  
Nice                     Clay       6.3%  0.79  0.64  0.74  
Geneva                   Clay       8.3%  0.77  0.78        
Umag                     Clay       5.4%  0.77  0.67  0.76  
Roland Garros            Clay       7.6%  0.77  0.72  0.71  
Rome Masters             Clay       7.2%  0.76  0.94  0.74  
Bucharest                Clay       5.9%  0.71  0.59  0.51  
Munich                   Clay       6.3%  0.71  1.01  0.87  
Monte Carlo Masters      Clay       6.2%  0.70  0.63  0.64  
Istanbul                 Clay       5.7%  0.67  0.83        
Barcelona                Clay       5.4%  0.65  0.70  0.72  
Bastad                   Clay       5.3%  0.65  0.64  1.07  
Hamburg                  Clay       5.7%  0.60  0.62  0.79

As usual, we have an interesting mix of usual suspects and surprises. The top of the list is primarily indoor hard and grass courts, along with the high-altitude clay in Quito and Gstaad. However, in both of the latter cases, those tournaments had lower-than-expected ace rates in 2015. The surface ratings for 250s are particularly volatile because, in addition to the small number of matches, many of these matches must be discarded because one or both of the players didn’t meet our minimums. For the 2015 Quito event, we have only 11 matches to work with.

The sample size problem doesn’t apply to larger events, however, so we can have a fair amount of confidence in the ratings for the Australian Open, showing up here as the fastest of the Grand Slams–considerably faster than Wimbledon, which is only a few ticks above neutral.

Ace ratings and Court Pace Index

Last month, TennisTV released some data on court speed for this season’s Masters events. Court Pace Index (CPI) is a commonly-accepted measure of the speed of the surface itself–that is, the physical makeup of the court. As I’ve said, that’s far from the only factor affecting how a court plays, but it is an important one.

cpi2

Here’s how my surface ratings compare to CPI:

Tournament            Surface  TA Rating   CPI  
Cincinnati Masters       Hard       1.18  35.1  
Paris Masters            Hard       1.18  39.1  
Canada Masters           Hard       1.16  35.2  
Shanghai Masters         Hard       1.10  44.1  
Tour Finals              Hard       1.06  40.6  
Miami Masters            Hard       1.00  33.1  
Madrid Masters           Clay       0.86  22.5  
Indian Wells Masters     Hard       0.83  30.0  
Rome Masters             Clay       0.76  24.0  
Monte Carlo Masters      Clay       0.70  23.7

It’s noteworthy that Madrid is, by my measure, the most ace-friendly of the three clay-court Masters, while its CPI is the lowest. Altitude could account for the difference.

The biggest mismatch, though, is the Tour Finals. The O2 Arena has one of the highest CPIs, but it doesn’t rate very far above average in aces. The Tour Finals has always been a bit problematic, as there is an unusually small number of matches, and the level of returning is very, very high. My algorithm takes into account how well each player prevents aces, but perhaps that issue is more complex when our view is limited to only the very best players.

TennisTV also showed CPI for the last several years of Tour Finals:

cpi1

Compared to my ratings:

Year  TA Rating   CPI  
2016       1.06  40.6  
2015       0.99  34.0  
2014       0.89  33.6  
2013       0.90  32.8  
2012       1.18  33.9

If the table cut off after 2013, it would look like a relatively good fit. As it is, the relationship between CPI and my rating for 2012 wouldn’t be out of place in the previous table, which included a 35.1 CPI for Cincinnati to go with an ace-based rating of 1.18.

I hope that this is a sign of more data to come. If so, we can move beyond approximations based on ace rate to get a better sense of what factors influence play at the ATP level. More data won’t settle the age-old surface speed debates, but it will make them a whole lot more interesting.

Andy Murray and the Longest Break-Per-Match Streaks

Italian translation at settesei.it

Among Andy Murray’s many accomplishments in 2016, he achieved an impressive–though obscure–feat. In each one of his 87 matches, he broke serve at least once. He has broken at least once per match since failing to do so against Roger Federer in the 2015 Cincinnati semifinals, for an active streak of 107 matches.

Where does that place him among the greats of men’s tennis? Just how unusual is it to break serve in every match for an entire season? As is the case with too many tennis statistics, we don’t know. Someone finds an impressive-sounding stat, and that’s the end of the story. We can’t always fix that, but in this case, we can add some context to Murray’s accomplishment.

Full break-per-match seasons

I’ve collected break stats for matches back to 1991, though we need to keep in mind that there are some mistakes in the 1990s data. Further, Davis Cup presents a problem, as it is excluded entirely. Sometimes we can tell from the scoreline that a player broke serve–as with all of Murray’s Davis Cup matches this year–but often we cannot. I’ll have more to say about that in specific cases below.

Since 1991, there have been at least 14, and perhaps as many as 20 instances in which a player broke serve in every match of a season. (Minimum 40 tour-level matches, and I’ve excluded retirements when calculating both minimums and the streaks themselves.) I say “instances” because several players–Andre Agassi, Lleyton Hewitt, Rafael Nadal, and Nikolay Davydenko–pulled it off more than once. Hewitt’s 2001 season had the most matches–95–of any of them, followed by Murray’s 2016 and Nadal’s 2005, at 87 each.

Here is the complete list:

Player                  Season  Matches  (Unsure)  
Andy Murray               2016       87         0  
Juan Monaco               2014       41         0  
Novak Djokovic            2013       83         0  
Rafael Nadal              2010       79         0  
Nikolay Davydenko         2008       73         0  
Nikolay Davydenko         2007       82         0  
Lleyton Hewitt            2006       46         0  
Rafael Nadal              2005       87         0  
David Nalbandian          2005       63         0  
Andre Agassi              2003       55         0  
Lleyton Hewitt            2001       95         0  
Lleyton Hewitt            2000       76         1  
Hernan Gumy               1997       53         1  
Alex Corretja             1997       67         0  
Andre Agassi              1995       81         0  
Magnus Gustafsson         1994       40         0  
Carlos Costa              1992       60         0  
Guillermo Perez Roldan    1991       40         2  
Ivan Lendl                1991       72         0  
Boris Becker              1991       61         2

(The “Unsure” column indicates how many matches are missing stats and may not have included a break of serve.)

Several more players came close. Federer broke serve in all but one match in three separate seasons. Agassi, Novak Djokovic, David Ferrer, and Thomas Muster all did so twice.

We shouldn’t be surprised that so many players–especially the greats–have broken so often. It’s very rare to win a match without breaking serve: Of the 2,570 ATP tour-level matches from this season for which I have match stats, the winner broke serve in all but 30 of them. Even losers break serve in more than two out of every three matches: In 2016, the loser broke serve in 1,843 of the 2,570 matches, 72% of the time.

Still, there are enough dominant servers on tour that it is difficult to last an entire season without being shut out of the break column. In 1995, Muster broke serve in 99 matches, but failed to do so when he drew the big-serving (and completely unheralded) qualifier TJ Middleton on the carpet in St. Petersburg. Murray’s current streak is all the more impressive because, in his 107 matches, he has faced Milos Raonic six times, John Isner four times, Kevin Anderson and Nick Kyrgios twice each, and Ivo Karlovic once. Given the chance, he probably would’ve broken TJ Middleton as well.

Break-per-match streaks

For Murray to surpass the longest streaks in this category, it will take several more months of high-quality returning. As we saw above, Davydenko and Hewitt may have gone two full years breaking serve in every match they played. In both cases, the lack of ITF data makes their records unclear, but regardless of those details, Davydenko has set an extremely high bar.

Here are all the break-per-match streaks of 100 or more matches since 2000:

Player             Start   End  Streak  Possible  
Nikolay Davydenko   2006  2009     159       182  
Rafael Nadal        2004  2006     156            
Rafael Nadal        2009  2011     146            
Andre Agassi        2002  2004     143            
Novak Djokovic      2012  2014     127            
Lleyton Hewitt      1999  2002     124       230  
Andy Murray         2015  2016     107         ∞  
David Nalbandian    2004  2006     104

This season, Murray didn’t play his 53rd match until August at the Olympics; he’ll need to break serve at least once in that many matches to reach the top of this list.

The exact length of Davydenko’s streak hinges on his 2008 Davis Cup semifinal match against Juan Martin del Potro, which he lost in straight sets. If he broke serve in that match, his streak stretched into early 2009, spanning 182 matches.

(Edit: Thanks to Andrew Moss, we now know that Davydenko did break serve in that match, according to this contemporaneous report.)

Hewitt’s best streak is even more unclear. I don’t have break stats for his 6-3 6-3 loss to Max Mirnyi at the 2000 Olympics. If he didn’t break Mirnyi–a definite possibility, given The Beast’s serving prowess–the streak is “only” 124 matches. If he did, the streak is at least 187, and the exact length depends on more unknowns, including both of his singles matches in the 1999 Davis Cup final against France.

(Edit #2! Thanks to Carl, we know that Hewitt broke Mirnyi, so his streak is at least 187 matches. The next issue is his last match of the 1999 season, a dead rubber against Sebastian Grosjean in that year’s Davis Cup final. Hewitt lost 6-4 6-3, but Grosjean was hardly an overpowering server. Hewitt lost his previous Davis Cup match in straight sets as well, a live rubber against Cedric Pioline, and a match report establishes that Hewitt broke serve. If he broke Grosjean, the streak stretches back to April 1999, and numbers the full 230 matches.)

In any case, Murray has already earned himself a place among the greatest returners in modern tennis. In 2017, we’ll see just how far he can climb this list.

Why Novak Djokovic is Still Number One

Italian translation at settesei.it

Two weeks ago, Andy Murray took over the ATP #1 ranking from Novak Djokovic. Yesterday, he defeated Djokovic in their first meeting since June, securing his place at the top of the year-end ranking table. Murray has been outstanding in the second half of this season, winning all but three of his matches since the Roland Garros final, and he capped the year in style, beating four top-five players to claim the title at the World Tour Finals.

Despite all that, Murray is not the best player in the world. That title still belongs to Djokovic. Since June, Murray has closed the gap, establishing himself as part of what we might call the “Big Two,” but he hasn’t quite ousted his rival. There’s no question that over this period, Murray has played better–that sort of thing is occasionally debatable, but this season it’s just historical fact–but identifying the best player implies something more predictive, and it’s much more difficult to determine by simply looking over a list of recent results.

The ATP rankings generally do a good job of telling us which players are better than others. But the official system has two major problems: It ignores opponent quality, and it artificially limits its scope to the last 52 weeks. Pundits and fans tend to have different problems: They often give too much credit to opponent quality (“He beat Djokovic, so now he’s number one!”) and exhibit an even more extreme recency bias (“He’s looked unbeatable this week!”).

Two systems that avoid these issues–Elo and Jrank–both place Djokovic comfortably ahead of Murray. These algorithms handle the details of recent matches and opponent quality differently from each other, but what they share in common is more important: They consider opponent quality and they don’t use an arbitrary time cutoff like the ATP ranking system does.

Here’s how the three methods would forecast a Djokovic-Murray match, were it held today:

  • ATP: Murray favored, 51.6% chance of winning
  • Elo: Djokovic favored, 61.6% chance of winning
  • Jrank: Djokovic favored, 57.0% chance of winning

Betting markets favored Djokovic by a margin of slightly more than 60/40 yesterday, though bettors probably gave him some of that edge because they thought Murray would be fatigued after his marathon match on Saturday.

As I wrote last week, Elo doesn’t deny that Murray has had a tremendous half-season. Instead, it gives him less credit than the official algorithm does for victories over lesser opponents (such as John Isner in the Paris Masters final), and it recognizes that he started his current run of form at an enormous disadvantage. With his title in London, Murray reached a new peak Elo rating, but it still isn’t enough to overtake Djokovic.

Even though Elo still prefers Novak by a healthy margin, it reflects how much the situation at the top of the ranking list has changed. At the beginning of 2016, Elo gave Djokovic a 76.5% chance of winning a head-to-head against Murray, and that probability rose as high as 81% in April. It fell below 70% after the Olympics, and the gap is now the smallest it has been since February 2011.

Last week illustrates how difficult it will be for Murray take over the #1 Elo ranking place. The pre-tournament Elo difference of 91 points between the two players has shrunk by only 8%, to 84 points. Murray’s win yesterday was worth a bit more than a measly seven points, but Djokovic had several opportunities to nudge his rating upwards in his first four matches, as well. Despite some of Novak’s head-scratching losses this fall, he still wins most of his matches–some of them against very good players–slowing the decline of his Elo rating.

Of course, Elo is just a measuring stick–like any ranking system, it doesn’t tell us what’s really happening on court. It’s possible that Murray has made a significant (and semi-permanent) leap forward or that Djokovic has taken a major step back. On the other hand, streaks happen even without such leaps, and they always end. The smart money is usually on small, gradual changes to the status quo, and Elo gives us a way to measure those changes.

For Elo to rate Murray ahead of Djokovic, it will probably require several more months of these gradual changes. The only faster alternative is for Djokovic to start losing more matches to the likes of Jiri Vesely and Sam Querrey. When faced with dramatic evidence, Elo makes more dramatic changes. While Djokovic has occasionally provided that evidence this season, he has usually offered enough proof–like four wins at the World Tour Finals–to comfortably maintain his position at the top.

Factchecking the History of the ATP Number One With Elo

Italian translation at settesei.it

As I wrote at The Economist this week, Andy Murray might sit atop the ATP rankings, but he probably isn’t the best player in tennis right now. That honor still belongs to Novak Djokovic, who comes in higher on the Elo ranking list, which uses an algorithm that is more predictive of match outcomes than the ATP table.

This isn’t the first time Elo has disagreed with the official rankings over the name at the top. Of the 26 men to have reached the ATP number one ranking, only 18 also became number one on the Elo list. A 19th player, Guillermo Coria, was briefly Elo #1 despite never achieving the same feat on the ATP rankings.

Four of the remaining eight players–Murray, Patrick Rafter, Marcelo Rios, and John Newcombe–climbed as high as #2 in the Elo rankings, while the last four–Thomas Muster, Carlos Moya, Marat Safin, and Yevgeny Kafelnikov–only got as high as #3. Moya and Kafelnikov are extreme cases of the rankings mismatch, as neither player spent even a single full season inside the Elo top five.

By any measure, though, Murray has spent a lot of time close to the top spot. What makes his current ascent to the #1 spot so odd is that in the past, Elo thought he was much closer. Despite his outstanding play over the last several months, there is still a 100-point Elo gap between him and Djokovic. That’s a lot of space: Most of the field at the WTA Finals in Singapore this year was within a little more than a 100-point range.

January 2010 was the Brit’s best shot. At the end of 2009, Murray, Djokovic, and Roger Federer were tightly packed at the top of the Elo leaderboard. In December, Murray was #3, but he trailed Fed–and the #1 position–by only 25 points. In January, Novak took over the top spot, and Murray closed to within 16 points–a small enough margin that one big upset could make the difference. Altogether, Murray has spent 63 weeks within 100 points of the Elo top spot, none of those since August 2013.

For most of the intervening three-plus years, Djokovic has been steadily setting himself apart from the pack. He reached his career Elo peak in April of this season, opening up a lead of almost 200 points over Federer, who was then #2, and 250 points over Murray. Since Roland Garros, Murray has closed the gap somewhat, but his lack of opportunities against highly-rated players has slowed his climb.

If Murray defeats Djokovic in the final this week in London, it will make the debate more interesting, not to mention secure the year-end ATP #1 ranking for the Brit. But it won’t affect the Elo standings. When two players have such lengthy track records, one match doesn’t come close to eliminating a 100-point gap. Novak will end the season as Elo #1, and he is well-positioned to maintain that position well into 2017.

Dominic Thiem and the Best Deciding-Sets Seasons in ATP History

Italian translation at settesei.it

Yesterday at the ATP World Tour Finals, Dominic Thiem won a three-set match against Gael Monfils, his 22nd deciding-set victory of 2016. Despite losing to Novak Djokovic in three sets on Sunday, Thiem is enjoying one of the best deciding-set seasons in ATP history.

The loss to Djokovic was only Thiem’s third in 25 deciding sets this year. He began the season with 14 consecutive deciding-set wins, including back-to-back third-set tiebreaks in Buenos Aires against Rafael Nadal and Nicolas Almagro. He strung together another seven straight between May and September, including a grass-court upset of Roger Federer in Stuttgart.

Among players who contested at least 20 deciding sets in a season, Thiem’s winning percentage of 88% is the fifth-best record in the ATP’s modern era. Not every player reaches the 20-decider threshold–some, like Djokovic, avoid it by winning most of their matches in straight sets–but it’s no statistical oddity. There have been nearly 1,000 player-seasons with at least 20 deciders since the 1970s, including Andy Murray’s 17-6 record in 2016.

Outstanding single-season deciding-set records don’t guarantee long-term success. Thiem appears on this list amid a mix of famous and lesser-known names, from Federer to Onny Parun:

Player           Year  Deciders  Wins  Win Perc  
Mario Ancic      2006        24    22     91.7%  
Ilie Nastase     1971        23    21     91.3%  
Tom Okker        1974        20    18     90.0%  
Roger Federer    2006        20    18     90.0%  
Dominic Thiem    2016        25    22     88.0%  
Kei Nishikori    2014        24    21     87.5%  
Stan Smith       1972        22    19     86.4%  
Joakim Nystrom   1984        22    19     86.4%  
Guillermo Vilas  1977        29    25     86.2%  
Onny Parun       1975        34    29     85.3%

Parun’s 1975 season is particularly notable, as no other player has won so many deciding sets in a single year. In 1996, Yevgeny Kafelnikov came close, winning 28. One gets the idea he was trying: He played 105 matches that year, 40 of which went the distance. In more recent years, big names have played more limited schedules, and Thiem is the only active player to win at least 22 deciding sets in a single season. Dmitry Tursunov gave it a shot in 2006, playing 37 deciders, but he won only 20.

Like so many tennis stats, this one can be fluky. For every Kei Nishikori–who has won an incredible 77% of deciding sets at tour level, including some record-setting streaks--there is a Grigor Dimitrov, who won 18 of 22 deciding sets in 2014, then barely broke even the following year, claiming only 11 of 21. Of the 27 players who have posted a 20-decider, 80% winning percentage season, not a single one managed an 80% winning percentage the following year.

For all of his talents, Thiem probably won’t follow in Nishikori’s footsteps. The Austrian won only half of his 40 deciding sets before this season. But a more modest record in these matches is hardly insurmountable. In 1996, Pete Sampras put together his best deciding-sets record, winning 83% of his 24 deciders. The following year, his record fell to a pedestrian 56%, which didn’t keep him from winning two Grand Slams and finishing the season at the top of the rankings.

If Thiem is to continue climbing the rankings, he’ll be better off taking Djokovic’s path, winning most of his deciding sets, but playing them much less frequently. In the last decade, Novak has played 20 deciding sets in a season only three times, and he has only gone the distance 10 times in 2016. Even Nishikori would have to agree: Djokovic’s method is working just fine.

Forecasting the 2016 ATP World Tour Finals

Italian translation at settesei.it

Andy Murray is the #1 seed this week in London, but as I wrote for The Economist, Novak Djokovic likely remains the best player in the world. According to my Elo ratings, he would have a 63% chance of winning a head-to-head match between the two. And with the added benefit of an easier round-robin draw, the math heavily favors Djokovic to win the tournament.

Here are the results of a Monte Carlo simulation of the draw:

Player        SF      F      W  
Djokovic   95.3%  73.9%  54.6%  
Murray     86.3%  58.3%  29.7%  
Nishikori  60.4%  24.9%   7.8%  
Raonic     50.9%  16.3%   3.3%  
Wawrinka   29.4%   7.8%   1.6%  
Monfils    33.2%   8.7%   1.4%  
Cilic      23.9%   5.8%   1.1%  
Thiem      20.7%   4.1%   0.5%

I don’t think I’ve ever seen a player favored so heavily to progress out of the group stage. Murray’s 86% chance of doing so is quite high in itself; Novak’s 95% is otherworldly. His head-to-heads against the other players in his group are backed up by major differences in Elo points–Dominic Thiem is a lowly 15th on the Elo list, given only a 7.4% chance of beating the Serb.

If Milos Raonic is unable to compete, Djokovic’s chances climb even higher. Here are the probabilities if David Goffin takes Raonic’s place in the bracket:

Player        SF      F      W  
Djokovic   96.8%  75.2%  55.4%  
Murray     86.2%  60.7%  30.6%  
Nishikori  60.7%  26.3%   8.1%  
Monfils    47.7%  12.4%   1.8%  
Wawrinka   29.3%   8.5%   1.7%  
Cilic      23.8%   6.2%   1.1%  
Thiem      29.5%   5.8%   0.7%  
Goffin     26.0%   4.9%   0.5%

The luck of the draw was on Novak’s side. I ran another simulation with Djokovic and Murray swapping groups. Here, Djokovic is still heavily favored to win the tournament, but Murray’s semifinal chances get a sizable boost:

Player        SF      F      W  
Djokovic   92.8%  75.1%  54.9%  
Murray     90.9%  58.1%  29.8%  
Nishikori  58.4%  26.9%   7.5%  
Raonic     52.3%  14.3%   3.3%  
Wawrinka   26.9%   8.4%   1.6%  
Monfils    35.3%   7.5%   1.4%  
Cilic      21.9%   6.2%   1.0%  
Thiem      21.6%   3.4%   0.5%

Elo rates Djokovic so highly that he is favored no matter what the draw. But the draw certainly helped.

Doubles!

I’ve finally put together a sufficient doubles dataset to generate Elo ratings and tournament forecasts for ATP doubles. While I’m not quite ready to go into detail, I can say that, by using the Elo algorithm and rating players individually, the resulting forecasts outperform the ATP rankings about as much as singles Elo ratings do.

Here is the forecast for the doubles event at the World Tour Finals:

Team               SF      F      W  
Herbert/Mahut   76.4%  49.5%  32.1%  
Bryan/Bryan     68.7%  36.8%  19.9%  
Kontinen/Peers  55.7%  29.1%  13.8%  
Dodig/Melo      58.4%  28.1%  13.2%  
Murray/Soares   48.3%  20.8%   8.6%  
Lopez/Lopez     37.7%  16.4%   6.2%  
Klaasen/Ram     30.2%  11.9%   4.0%  
Huey/Mirnyi     24.6%   7.3%   2.2%

This distribution is more like what round-robin forecasts usually look like, without a massive gap between the top of the field and the rest. Pierre-Hugues Herbert and Nicolas Mahut are the top rated team, followed closely by Bob Bryan and Mike Bryan. Max Mirnyi was, at his peak, one of the highest Elo-rated doubles players, but his pairing with Treat Huey is the weakest of the bunch.

The men’s doubles bracket has some legendary names, along with some players–like Herbert and Henri Kontinen–who may develop into all-time greats, but it has no competitors who loom over the rest of the field like Murray and Djokovic do in singles.

How To Keep Round Robin Matches Interesting, Part Two

Italian translation at settesei.it

Earlier this week, I published a deep dive into the possible outcomes of four-player round robin groups and offered an ideal schedule that would minimize the likelihood of dead rubbers on the final day. I’ve since heard from a few readers who pointed out two things:

  1. You might do better if you determined the schedule for day two after getting the results of the first two matches.
  2. Major tournaments such as the ATP and WTA Tour Finals already do this, pairing the winners of the first two matches and the losers of the first two matches on day two.

This is an appealing idea. You’re guaranteed to end the second day with one undefeated (2-0) player, two competitors at 1-1, and the last at 0-2. The two participants at 1-1 have everything to play for, and depending on day three’s schedule and tiebreak factors, the 0-2 player could still be in the running as well.

Best of all, you avoid the nightmare scenario of two undefeated players and two eliminated players, in which the final two matches are nearly meaningless.

However, this “contingent schedule” approach isn’t perfect.

Surprise, surprise

We learned in my last post that, if we set the entire schedule before play begins, the likelihood of a dead rubber on the final day is 17%, and if we choose the optimal schedule, leaving #4 vs #1 and #3 vs #2 for the final day, we can drop those chances as low as 10.7%.

(These were based on a range of player skill levels equivalent to 200 points on the Elo scale. The bigger the range of player skills–for instance, the ATP finals is likely to have a group with a range well over 300–the more dramatic the differences in these numbers.)

In addition, we discovered that “dead/seed” matches–those in which one player is already eliminated and the other can only affect their semifinal seeding–are even more common. When the schedule is chosen in advance, the probability of a dead rubber or a “dead/seed” match is always near 40%.

If the day two schedule is determined by day one outcomes, the overall likelihood of these “mostly meaningless” (dead or “dead/seed”) matches drops to about 30%. That’s a major step in the right direction.

Yet there is a drawback: The chances of a dead rubber increase! With the contingent day two schedule, there is a roughly 20% chance of a completely meaningless match on day three.

Our intuition should bear this out. After day two, we are guaranteed one 2-0 player and one 0-2 player. It is somewhat likely that these two have faced each other already, but there still remains a reasonable chance they will play on day three. If they do, the 0-2 player is already eliminated–there will be two 2-1 players at the end of day three. The 2-0 player has clinched a place in the semifinals, so the most that could be at stake is a semifinal seeding.

In other words, if the “winner versus winner” schedule results in a 2-0 vs 0-2 matchup on day three, the odds are that it’s meaningless. And this schedule often does just that.

The ideal contingent schedule

If the goal is to avoid dead rubbers at all costs, the contingent schedule is not for you. You can do a better job by properly arranging the schedule in advance. However, a reasonable person might prefer the contingent schedule because it completely avoids the risk of the low-probability “nightmare scenario” that I described above, of two mostly meaningless day three matches.

Within the contingent schedule, there’s still room for optimization. If the day one slate consists of matches setting #1 against #3 and #2 against #4 (sorted by ranking), the probability of a meaningless match on day three is about average. If day one features #1 vs #2 and #3 vs #4, the odds are even higher: about a 21% chance of a dead rubber and another 11% chance of a “dead/seed” match.

That leaves us with the optimal day one schedule of #1 vs #4 and #2 vs #3. It lowers the probability of a dead rubber to 19% and the chances of a “dead/seed” match to 9.7%. Neither number represents a big difference, but given all the eyes on every match at major year-end events, it seems foolish not to make a small change in order to maximize the probability that both day three matches will matter.