New at Tennis Abstract: Point-by-Point Stats

Yesterday, I announced the new ATP doubles results on Tennis Abstract. Today, I want to show you something else I rolled out over the offseason: sequential point-by-point stats for more than 100,000 matches.

Traditional match stats can do no more than summarize the action. Point-by-point stats are so much more revealing: They show us how matches unfold and allow us to look much deeper into topics such as momentum and situational skill. These are subjects that remain mysteries–or, at the very least, poorly quantified–in tennis.

As an example, let’s take a look at the new data available for one memorable contest, the World Tour Finals semifinal between Andy Murray and Milos Raonic:

The centerpiece of each page is a win probability graph, which shows the odds that one player would win the match after each point. These graphs do not take player skill into account, though they are adjusted for gender and surface. The red line shows one player’s win probability, while the grey line indicates “volatility”–a measure of how much each point matters. You can see exact win probability and volatility numbers by moving your cursor over the graph. Most match graphs aren’t nearly as dramatic as this one; of course, most matches aren’t nearly as dramatic as this one was.

(I’ve written a lot about win probability in the past, and I’ve also published the code I use to calculate in-match win probability.)

Next comes a table with situational serving stats for both players. In the screenshot above, you can see deuce/ad splits; the page continues, with tiebreak-specific totals and tallies for break points, set points, and match points. After that is an exhaustive, point-by-point text recap of the match, which displays the sequence of every point played.

I’ve tried to make these point-by-point match pages as easy to find as possible. Whenever you see a link on a match score, just click that link for the point-by-point page. For instance, here is part of Andy Murray’s page, showing where to click to find the Murray-Raonic example shown above:

As you can see from all the blue scores in this screenshot, most 2016 ATP tour-level matches have point-by-point data available. The same is true for the last few seasons, as well as top-level WTA matches. The lower the level, the fewer matches are available, but you might be surprised by the breadth and depth of the coverage. The site now contains point-by-point data for almost half of 2016 main-draw men’s Futures matches. For instance, here’s the graph for a Futures final last May between Stefanos Tsitsipas and Casper Ruud.

I’ll keep these as up-to-date as I can, but with my current setup, you can expect to wait 1-4 weeks after a match before the point-by-point page becomes available. I’m hoping to further automate the process and shorten the wait over the course of this season.

Enjoy!

New at Tennis Abstract: ATP Doubles!

At last, I’ve added ATP doubles results to player pages at Tennis Abstract. Doubles has long been relegated to second-class status by tennis analytics, largely because the data just isn’t there. Now, much more is readily available.

Tennis Abstract now has career doubles results (including Challengers, Futures, and Satellites) for thousands of ATP players, and they’ll be updated throughout each day’s play, just like singles results. Match times and traditional match stats are included for most 2016 ATP and Challenger tournaments, and I hope that will continue to be the case in 2017 and beyond.

Let me give you a brief tour of what you’ll find, using doubles legend Jack Sock as a starting point:

The big red “1” shows where to click to switch over to doubles results. For full-time doubles specialists, you won’t have to click–the site will automatically show you doubles results.

The “2” indicates three new doubles-specific filters: by partner, by opponent, and by opposing team. For instance, you can see Sock’s results with Vasek Pospisil, his eight matches against Daniel Nestor, or his twelve meetings with the Bryan Brothers. You may always combine multiple filters, so for example, you can look at Sock’s record against the Bryans only when partnering Pospisil.

There are three more new filters, marked by the big “3” toward the bottom. The “vs Hands” filter allows you to select matches against righty-righty, righty-lefty, or lefty-lefty teams. “Partner Hand” and “Partner Rank” make it possible to limit matches to those in which the partner had certain characteristics.

Finally, the “4” shows you where to access more detailed stats. Doubles results take a lot more room to display than singles results, so on the default view, the only “stats” on offer are match time and Dominance Ratio. Click on “Serve,” “Return,” or “Raw” to get the other traditional numbers, such as ace rate, first-serve points won, or break points converted. All of these numbers are totals for each team; individual player stats are almost never available for doubles matches.

I hope you enjoy this new resource. It’s something I’ve wanted for a long time, so I’m excited to be able to use it myself. There are still some minor gaps in the record, as well as some kinks in the functionality, so please be patient as I try to work all of that out.

For those of you who’d like to see WTA doubles results, as well: Me too! I can’t promise any particular deadline, but I’ve already done much of the work to build the dataset, so I’m hoping to add them to women’s player pages early this year. Stay tuned!

The Match Charting Project, 2017 Update

2016 was a great year for the Match Charting Project (MCP), my crowdsourced effort to improve the state of tennis statistics. Many new contributors joined the project, the data played a part in more research than ever, and best of all, we added over 1,000 new matches to the database.

For those who don’t know, the MCP is a volunteer effort from dozens of devoted tennis fans to collect shot-by-shot data for professional matches. The resulting data is vastly more detailed than anything else available to the public. You can find an extremely in-depth report on every match in the database–for example, here’s the 2016 Singapore final–as well as an equally detailed report on every player with more than one charted match. Here’s Andy Murray.

In 2016, we:

  • added 1,145 new matches to the database, more than in any previous year;
  • charted more WTA than ATP matches, bringing women’s tennis to near parity in the project;
  • nearly completed the set of charted Grand Slam finals back to 1980;
  • filled in the gaps to have at least one charted match of every member of the ATP top 200, and 198 of the WTA top 200;
  • reached double digits in charted matches for every player in the ATP top 49 (sorry, Florian Mayer, we’re working on it!) and the WTA top 58;
  • logged over 174,000 points and nearly 700,000 shots.

I believe 2017 can be even better. To make that happen, we could really use your help. As with most projects of this nature, a small number of contributors do the bulk of the work, and the MCP is no different–Isaac and Edo both charted more than 200 matches last year.

There are plenty of reasons to contribute: It will make you a more knowledgeable tennis fan, it will help add to the sum of human knowledge, and it can even be fun. Click here to find out how to get started.

I’m proud of the work we’ve done so far, and I hope that the first 2,700 matches are only the beginning.

The Unexpectedly Predictable IPTL

December is here, and with the tennis offseason almost five days old, it’s time to resume the annual ritual of pretending we care about exhibitions. The hit-and-giggle circuit gets underway in earnest tomorrow with the kickoff, in Japan, of the 2016 IPTL slate.

The star-studded IPTL, or International Premier Tennis League, is two years old, and uses a format similar to that of the USA’s World Team Tennis. Each match consists of five separate sets: one each of men’s singles, women’s singles, (men’s) champions’ singles, men’s doubles, and mixed doubles. Games are no-ad, each set is played to six games, and a tiebreak is played at 5-5. At the end of all those sets, if both teams have the same number of games, representatives of each side’s sponsors thumb-wrestle to determine the winner. Or something like that. It doesn’t really matter.

As with any exhibition, players don’t take the competition too seriously. Elites who sit out November tournaments due to injury find themselves able to compete in December, given a sufficient appearance fee. It’s entertaining, but compared to the first eleven months of the year, it isn’t “real” tennis.

That triggers an unusual research question: How predictable are IPTL sets? If players have nothing at stake, are outcomes simply random? Or do all the participants ease off to an equivalent degree, resulting in the usual proportion of sets going the way of the favorite?

Last season, there were 29 IPTL “matches,” meaning that we have a dataset consisting of 29 sets each of men’s singles, women’s singles, and men’s doubles. (For lack of data, I won’t look at mixed doubles, and for lack of interest, forget about champion’s singles.) Except for a handful of singles specialists who played doubles, we have plenty of data on every player. Using Elo ratings, we can generate forecasts for every set based on each competitor’s level at the time.

Elo-based predictions spit out forecasts for standard best-of-three contests, so we’ll need to adjust those a bit. Single-set results are more random, so we would expect a few more upsets. For instance, when Roger Federer faced Ivo Karlovic last December, Elo gave him an 89.9% chance of winning a traditional match, and the relevant IPTL forecast is a more modest 80.3%. With these estimates, we can see how many sets went the way of the favorite and how many upsets we should have expected given the short format.

Let’s start with men’s singles. Karlovic beat Federer, and Nick Kyrgios lost a set to Ivan Dodig, but in general, decisions went the direction we would expect. Of the 29 sets, favorites won 18, or 62.1%. The Elo single-set forecasts imply that the favorites should have won 64.2%, or 18.6 sets. So far, so predictable: If IPTL were a regular-season event, its results wouldn’t be statistically out of place.

The results are similar for women’s singles. The forecasts show the women’s field to be more lopsided, due mostly to the presence of Serena Williams and Maria Sharapova. Elo expected that the favorites would win 20.4, or 70.4% of the 29 sets. In fact, the favorites won 21 of 29.

The men’s doubles results are more complex, but they nonetheless provide further evidence that IPTL results are predictable. Elo implied that most of the men’s doubles matches were close: Only one match (Kei Nishikori and Pierre-Hugues Herbert against Gael Monfils and Rohan Bopanna) had a forecast above 62%, and overall, the system expected only 16.4 victories for the favorites, or 56.4%. In fact, the Elo-favored teams won 19, or 65.5% of the 29 sets, more than the singles favorites did.

The difference of less than three wins in a small sample could easily just be noise, but even so, a couple of explanations spring to mind. First, almost every team had at least one doubles specialist, and those guys are accustomed to the rapid-fire no-ad format. Second, the higher-than-usual number of non-specialists–such as Federer, Nishikori, and Monfils–means that the player ratings may not be as reliable as they are for specialists, or for singles. It might be the case that Nishikori is a better doubles player than Monfils, but because both usually stick to singles, no rating system can capture the difference in abilities very accurately.

Here is a summary of all these results:

Competition      Sets  Fave W  Fave W%  Elo Forecast%  
Men's Singles      29      18    62.1%          64.2%  
Women's Singles    29      21    72.4%          70.4%  
ALL SINGLES        58      39    67.3%          67.3%  
                                                       
Men's Doubles      29      19    65.5%          56.4%  
ALL SETS           87      58    66.7%          63.7%

Taken together, last season’s evidence shows that IPTL contests tend to go the way of the favorites. In fact, when we account for the differences in format, favorites win more often than we’d expect. That’s the surprising bit. The conventional wisdom suggests that the elites became champions thanks to their prowess at high-pressure moments; many dozens of pros could reach the top if they were only stronger mentally. In exhos, the mental game is largely taken out of the picture, yet in this case, the elites are still winning.

No matter how often the favorites win, these matches are still meaningless, and I’m not about to include them in the next round of player ratings. However, it’s a mistake to disregard exhibitions entirely. By offering a contrast to the high-pressure tournaments of the regular season, they may offer us perspectives we can’t get anywhere else.

The Most Exciting Matches of the 2016 WTA Season

Italian translation at settesei.it

In my most recent piece for The Economist, I used a metric called Excitement Index (EI) to consider the implications of shortening singles matches to a format like the no-ad, super-tiebreak rules used for doubles. In my simulations, the shorter format didn’t fare well: The most gripping contests are often the longest ones, and the full-length third set is frequently the best part.

I used data from ATP tournaments in that piece, and several readers have asked how women’s matches score on the EI scale. Many matches from the 2016 season rate extremely highly, while some players we tend to think of as exciting fail to register among the best by this metric. I’ll share some of the results in a moment.

First, a quick overview of EI. We can calculate the probability that each player will win a match at any point in the contest, and using those numbers, it’s possible to determine the leverage of every point–that is, the difference between a player’s odds if she wins the next point and her odds if she loses it. At 40-0, down a break in the first set, that leverage is very low: less than 2%. In a tight third-set tiebreak, leverage can climb as high as 25%. The average point is around 5% to 6%, and as long as neither player has a substantial lead, points at 30-30 or later are higher.

EI is calculated by averaging the leverage of every point in the match. The more high-leverage points, the higher the EI. To make the results a bit more viewer-friendly, I multiply the average leverage by 1,000, so if the typical point has the potential for a 5% (0.05) swing, the EI is 50. The most boring matches, like Garbine Muguruza‘s 6-1 6-0 dismantling of Ekaterina Makarova in Rome, rate below 25. The most exciting will occasionally top 100, and the average WTA match this year scored a 53.7. By comparison, the average ATP match this year rated at 48.9.

Of course, the number and magnitude of crucial moments isn’t the only thing that can make a tennis match “exciting.” Finals tend to be more gripping than first-round tilts, long rallies and daring net play are more watchable than error-riddled ballbashing, and Fed Cup rubbers feature crowds that can make the warmup feel like a third-set tiebreak. When news outlets make their “Best Matches of 2016” lists, they’ll surely take some of those other factors into account. EI takes a narrower view, and it is able to show us which matches, independent of context, offered the most pressure-packed tennis.

Here are the top ten matches of the 2016 WTA season, ranked by EI:

Tournament    Match                Score                    EI  
Charleston    Lucic/Mladenovic     4-6 6-4 7-6(13)       109.9  
Wimbledon     Cibulkova/Radwanska  6-3 5-7 9-7           105.0  
Wimbledon     Safarova/Cepelova    4-6 6-1 12-10         101.7  
Kuala Lumpur  Nara/Hantuchova      6-4 6-7(4) 7-6(10)    100.2  
Brisbane      CSN/Lepchenko        4-6 6-4 7-5            99.0  
Quebec City   Vickery/Tig          7-6(5) 6-7(3) 7-6(7)   98.5  
Miami         Garcia/Petkovic      7-6(5) 3-6 7-6(2)      98.1  
Wimbledon     Vesnina/Makarova     5-7 6-1 9-7            97.2  
Beijing       Keys/Kvitova         6-3 6-7(2) 7-6(5)      96.8  
Acapulco      Stephens/Cibulkova   6-4 4-6 7-6(5)         96.7

Getting to 6-6 in the final set is clearly a good way to appear on this list. The top fifty matches of the season (out of about 2,700) all reached at least 5-5 in the third. The highest-rated clash that didn’t get that far was Angelique Kerber‘s 1-6 7-6(2) 6-4 defeat of Elina Svitolina, with an EI of 88.2. Svitolina’s 4-6 6-3 6-4 victory over Bethanie Mattek Sands in Wuhan, the top match on the list without any sets reaching 5-5, scored an EI of 87.3.

Wimbledon featured an unusual number of very exciting matches this year, especially compared to Roland Garros and the Australian Open, the other tournaments that forgo a tiebreak in the final set. The top-rated French Open contest was the first-rounder between Johanna Larsson and Magda Linette, which scored 95.3 and ranks 13th for the season, while the highest EI among Aussie Open matches is all the way down at 27th on the list, a 92.8 between Monica Puig and Kristyna Pliskova.

Dominika Cibulkova is the only player who appears twice on this list. That doesn’t mean she’s a sure thing for exciting matches: As we’ll see, elite players rarely are. The only year-end top-tenner who ranks among the highest average EIs is Svetlana Kuznetsova, who played as many “very exciting” matches–those rating among the top fifth of matches this season–as any other woman on tour:

Rank  Player                M  Avg EI  V. Exc  Exc %  Bor %  
1     Kristina Mladenovic  60    59.8      19  55.0%  25.0%  
2     Christina McHale     46    59.6      16  50.0%  19.6%  
3     Heather Watson       35    58.5      12  48.6%  25.7%  
4     Jelena Jankovic      43    57.6      12  55.8%  30.2%  
5     Svetlana Kuznetsova  64    57.4      21  48.4%  32.8%  
6     Venus Williams       38    57.1      10  55.3%  31.6%  
7     Yanina Wickmayer     43    56.5      13  46.5%  30.2%  
8     Alison Riske         46    56.5      10  45.7%  32.6%  
9     Caroline Garcia      62    56.4      18  43.5%  33.9%  
10    Irina-Camelia Begu   42    56.4      14  45.2%  40.5% 

(Minimum 35 tour-level matches (“M” above), excluding retirements. My data is also missing a random handful of matches throughout the season.)

The “V. Exc” column tallies how many top-quintile matches the player took part in. The “Exc %” column shows the percent of matches that rated in the top 40% of all WTA contests, while “Bor %” shows the same for the bottom 40%, the more boring matches. Big servers who reach a disproportionate number of tiebreaks and 7-5 sets do well on this list, though it is far from a perfect correspondence. Tiebreaks can create a lot of big moments, but if there were many love service games en route to 6-6, the overall picture isn’t nearly so exciting.

Unlike Kuznetsova, who played a whopping 32 deciding sets this year, most of the other top women enjoy plenty of blowouts. Muguruza, Simona Halep, and Serena Williams occupy the very last three places on the average-EI ranking, largely because when they win, they do so handily–and they win a lot. The next table shows the WTA year-end top-ten, with their ranking (out of 59) on the average-EI list:

Rank  Player        WTA#  Matches  Avg EI  V. Exc  Exc %  Bor %  
5     Kuznetsova       9       64    57.4      21  48.4%  32.8%  
13    Pliskova         6       66    55.6      19  48.5%  39.4%  
16    Keys             8       64    55.4      13  40.6%  35.9%  
23    Cibulkova        5       68    54.6      21  42.6%  42.6%  
28    Kerber           1       77    54.0      12  42.9%  41.6%  
      tour average                   53.7          40.0%  40.0%  
41    Radwanska        3       69    52.5      12  29.0%  44.9%  
51    Konta           10       67    51.2      12  34.3%  46.3%  
57    Muguruza         7       51    49.9       5  33.3%  43.1%  
58    Halep            4       59    49.6       8  30.5%  50.8%  
59    Williams         2       44    48.1       3  27.3%  50.0%

It’s a good thing that fans love Serena, because her matches rarely provide much in the way of big moments. As low as Williams and Halep rate on this measure, Victoria Azarenka scores even lower. Her Miami fourth-rounder against Muguruza was her only match this season to rank in the “exciting” category, and her average EI was a mere 44.0.

Clearly, EI isn’t much of a method for identifying the best players. Even looking at the lowest-rated competitors by EI would be misleading: In 56th place, right above Muguruza, is the otherwise unheralded Nao Hibino. EI excels as a metric for ferreting out the most riveting individual matches, whether they were broadcast worldwide or ignored entirely. And the next time someone suggests shortening matches, EI is a great tool to highlight just how much excitement would be lost by doing so.

How Argentina’s Road Warriors Defied the Davis Cup Home-Court Odds

Italian translation at settesei.it

The conventional wisdom has long held that there is a home court advantage in Davis Cup. It makes sense: In almost every sport, there is a documented advantage to playing at home, and Davis Cup gives us what seem to be the most extreme home courts in tennis.

However, Argentina won this year’s competition despite playing all four of their ties on the road. After the first round this season, only one of seven hosts managed to give the home crowd a victory. Bob Bryan has some ideas as to why:

Which is it? Do players excel in front of an enthusiastic home crowd, on a surface chosen for their advantage? Or do they suffer from the distractions that Bryan cites?

To answer that question, I looked at 322 Davis Cup ties, encompassing all World Group and World Group Play-off weekends back to 2003. Of those, the home side won 196, of 60.9% of the time. So far, the conventional wisdom looks pretty good.

But we need to do more. To check whether the hosting teams were actually better, meaning that they should have won more ties regardless of venue, I used singles and doubles Elo ratings to simulate every match of every one of those ties. (In cases where the tie was decided before the fourth or fifth rubber, I simulated matches between the best available players who could have contested those matches.) Based on those simulations, the hosts “should” have won 171 of the 322 ties, or 53.1%.

The evidence in favor of home-court advantage–and against Bryan’s “distractions” theory–is strong. Home sides have won World Group ties about 15% more often than we would expect. Some of that is likely due to the hosts’ ability to choose surface. I doubt surface accounts for the whole effect, since some court types (like the medium-slow hard court in Croatia last weekend) don’t heavily favor either side, and many ties are rather lopsided regardless of surface. Teasing out the surface advantage from the more intangible home-court edge is a worthy subject to study, but I won’t be going any further in that direction today.

If distractions are a danger to hosts, we might expect see the home court advantage erode in later rounds. Many early-round matchups are minor news events compared to semifinals and finals. (On the other hand, there were over 100 representatives of the Argentinian press in Croatia last weekend, so the effect isn’t entirely straightforward.) The following table shows how home sides have fared in each round:

Round         Ties  Home Win %  Wins/Exp  
First Round    112       58.9%      1.11  
Quarterfinal    56       60.7%      1.16  
Semifinal       28       82.1%      1.30  
Final           14       57.1%      1.14  
Play-off       112       58.9%      1.14

Aside from a blip at the semifinal level, home-court advantage is quite consistent from one round to the next. The “Wins/Exp” shows how much better the hosts fared than my simulations would have predicted; for instance, in first-round encounters, hosts won 11% more ties than expected.

There is also no meaningful difference between home court advantage on day one and day three. The hosts’s singles players win 15% more matches than my simulations would expect on day one, and 15% more on day three. The day three divide is intriguing: Home players win the fourth rubber 12% more often than expected, but they claim the deciding fifth rubber a whopping 23% more frequently than they would in neutral environments. However, only 91 of the 322 ties involved five live rubbers, so the extreme home advantage in the deciding match may just be nothing more than a random spike.

The doubles rubber is less likely to be influenced by venue. Compared to the 15% advantage enjoyed by World Group singles players, the hosting side’s doubles pairings win only 6% more often than expected. This again raises the issue of surface: Not only are doubles results less influenced by court speed than singles results, but home sides are less likely to choose a surface based on the desire of their doubles team, if that preference clashes with the needs of their singles players.

Argentina on the road

In the sense that they never played at home or chose a surface, Argentina beat the odds in all four rounds this year. Of course, home court advantage can only take you so far; it helps to have a good squad. My simulations indicate that the Argentines had a nearly 4-in-5 chance of defeating their Polish first-round opponents on neutral ground, while Juan Martin del Potro and company had a more modest 59% chance of beating the Italians in Italy.

For the last two rounds, though, the Argentines were fighting an uphill battle. The semifinal venue in Glasglow didn’t matter much; the prospect of facing the Murray brothers meant Argentina had less than a 10% chance of advancing no matter what the location. And as I wrote last week, Croatia was rightfully favored in the final. Playing yet another tie on the road simply made the task more difficult.

Once we adjust my simulations of each tie for home court advantage, it turns out that Argentina’s chances of winning the Cup this year were less than 1%, barely 1 in 200. The following table shows the last 14 winners, along with the number of ties they played at home and their chances of winning the Cup in my simulations, given which countries they ended up facing and the players who turned up for each tie:

Year  Winner  Home Ties  Win Prob  
2016  ARG             0      0.5%  
2015  GBR             3     18.9%  
2014  SUI             2     54.7%  
2013  CZE             1     10.5%  
2012  CZE             3     19.7%  
2011  ESP             2     12.2%  
2010  SRB             3     17.6%  
2009  ESP             4     44.0%  
2008  ESP             1     14.3%  
2007  USA             2     24.4%  
2006  RUS             2      1.7%  
2005  CRO             2      7.4%  
2004  ESP             3     23.8%  
2003  AUS             3     15.9%

In the time span I’ve studied, only the 2006 Russian squad managed anything close to the same season-long series of upsets. (I don’t yet have adequate doubles data to analyze earlier Davis Cup competitions.)  At the other end of the spectrum, the simulations emphasize how smoothly Switzerland swept through the bracket in 2014. A wide-open draw, together with Roger Federer, certainly helps.

It was tough going for Argentina, and the luck of the home-court draw made it tougher. Without a solid #2 singles player or an elite doubles specialist, it isn’t likely to get much easier. For all that, they’ll open the 2017 Davis Cup campaign against Italy with at least one unfamiliar weapon in their arsenal: They finally get to play a tie at home.

Best of Five and Marin Cilic’s Improbable Collapse

Italian translation at settesei.it

Leading up to the final two rubbers of this year’s Davis Cup final in Croatia, the hosts were heavily favored. They held a 2-1 advantage, and both of the remaining singles matches would pit a Croatian against a lower-ranked Argentine. To win the Cup, they only needed to win one of those matches.

When Marin Cilic built a two-set lead over Juan Martin del Potro, Croatian fans could be forgiven for thinking it was in the bag. Instead, Delpo fought back to win in five sets, and Federico Delbonis upset a flat Ivo Karlovic to seal Argentina’s first-ever Davis Cup title. Some people will point to the Cilic-Delpo match time of 4:53 as another reason to switch to best-of-three. The rest of us will see it as yet another reminder of why best-of-five must retain its role on tennis’s biggest stages.

In a best-of-three format, Cilic would’ve claimed the Cup for Croatia after two hours of play. Instead, he merely came very close. My Elo singles ratings gave Cilic a 36.3% chance of beating Delpo and Karlovic a 75.8% chance of defeating Delbonis. Taken together, that’s a likelihood of 84.6% that Croatia would claim the trophy. After Cilic won the first two sets, his odds increased to about 81%, pushing Croatia’s chances over the 95% mark. In fourteen previous tries, del Potro had never recovered from an 0-2 deficit.

And then Argentina came back. Comebacks from two sets down tend to stick in our memory, so it’s easy to forget just how rare they are. Yesterday’s match was only the 28th such comeback in 2016. That’s out of a pool of 656 best-of-five contests, including 431 in which one player built a 2-0 lead. This year isn’t unusual: Going back to 2000, the number of wins from a 0-2 deficit has never exceeded 32.

Comebacks from 0-2 are even rarer in Davis Cup. At the World Group level this year, including play-offs, Delpo was the 61st player to fall into a 0-2 hole, but he was only the second to recover and win the match. The other was Jack Sock, whose July comeback (over Cilic–more on that in a bit) wasn’t enough to move his USA squad into the semifinals. Since 2000, 5.8% of 2-0 leads turn into comeback victories, but only 4.3% of World Group 2-0 leads do the same.

Cilic’s season has defied the numbers. In addition to his 2-0 collapses against Sock and del Potro, he held a 2-0 advantage before losing to Roger Federer in the Wimbledon quarterfinals. His 2016 is only the third time in ATP history that a player lost three or more matches after winning the first two sets. The previous two–Viktor Troicki’s 2015 season and Jan Siemerink’s 1997–are unlikely to make Cilic feel any better.

Still, even Cilic’s record indicates the rarity of victories from an 0-2 disadvantage. Before the Wimbledon quarterfinal, the Croatian had never lost a match after taking the first two sets, for a record of 60-0. Even now, his Davis Cup record after going up two sets to love is a respectable 11-2. His overall career mark of 95.7% (66-3) is better than average.

Unless Cilic crumbles under certain spotlights (but not others, as evidenced by his five-set win over Delbonis on Friday), his series of unfortunate collapses may just be a fluke. In addition to that 60-0 streak, he has never had a problem converting one-set leads in best-of-three matches. This year, he won 29 out of 33 best-of-threes after winning the first set, an above-average rate of 88%. (And one of the losses was against Dominic Thiem, so he never had a chance.)

The longer the match format, the more likely that the better player emerges triumphant. That’s why there are fewer upsets in best-of-five than in best-of-three, and why tiebreaks are often little better than flips of a coin. Usually that works in favor of a top-tenner such as Cilic: In most matchups he is the superior player. But in two of his three collapses this season, he’s fallen victim to a favorite who uses the longer format to overcome an early run of poor form.

The debate over best-of-five will surely continue, despite this weekend’s Davis Cup tie adding another unforgettable five-set epic to an already long list. But after Delpo’s performance yesterday, you’ll have a harder time finding someone to campaign for shorter matches–especially in Argentina.

Forecasting Davis Cup Doubles

One of the most enjoyable aspects of Davis Cup is the spotlight it shines on doubles. At ATP events, doubles matches are typically relegated to poorly-attended side courts. In Davis Cup, doubles gets a day of its own, and crowds turn out in force. Even better, the importance of Davis Cup inspires many players who normally skip doubles to participate.

Because singles specialists are more likely to play doubles, and because most Davis Cup doubles teams are not regular pairings, forecasting these matches is particularly difficult. In the past, I haven’t even tried. But now that we have D-Lo–Elo ratings for doubles–it’s a more manageable task.

To my surprise, D-Lo is even more effective with Davis Cup than it is with regular-season tour-level matches. D-Lo correctly predicts the outcome of about 65% of tour-level doubles matches since 2003. For Davis Cup World Group and World Group Play-Offs in that time frame, D-Lo is right 70% of the time. To put it another way, this is more evidence that Davis Cup is about the chalk.

What’s particularly odd about that result is that D-Lo itself isn’t that confident in its Davis Cup forecasts. For ATP events, D-Lo forecasts are well-calibrated, meaning that if you look at 100 matches where the favorite is given a 60% chance of winning, the favorite will win about 60 times. For the Davis Cup forecasts, D-Lo thinks the favorite should win about 60% of the time, but the higher-rated team ends up winning 70 matches out of 100.

Davis Cup’s best-of-five format is responsible for part of that discrepancy. In a typical ATP doubles match, the no-ad scoring and third-set tiebreak introduce more luck into the mix, making upsets more likely. A matchup that would result in a 60% forecast in the no-ad, super-tiebreak format translates to a 64.5% forecast in the best-of-five format. That accounts for about half the difference: Davis Cup results are less likely to be influenced by luck.

The other half may be due to the importance of the event. For many players, regular-season doubles matches are a distant second priority to singles, so they may not play at a consistent level from one match to the next. In Davis Cup, however, it’s a rare competitor who doesn’t give the doubles rubber 100% of their effort. Thus, we appear to have quite a few matches in which D-Lo picks the winner, but since it uses primarily tour-level results, it doesn’t realize how heavily the winner should have been favored.

Incidentally, home-court advantage doesn’t seem to play a big role in doubles outcomes. The hosting side has won 52.6% of doubles matches, an edge which could have as much to do with hosts’ ability to choose the surface as it is does with screaming crowds and home cooking. This isn’t a factor that affects D-Lo forecasts, as the system’s predictions are as accurate when it picks the away side as when it picks the home side.

Forecasting Argentina-Croatia doubles

Here are the D-Lo ratings for the eight nominated players this weekend. The asterisks indicate those players who are currently slated to contest tomorrow’s doubles rubber:

Player                 Side  D-Lo     
Juan Martin del Potro  ARG   1759     
Leonardo Mayer         ARG   1593  *  
Federico Delbonis      ARG   1540     
Guido Pella            ARG   1454  *  
                                      
Ivan Dodig             CRO   1856  *  
Marin Cilic            CRO   1677     
Ivo Karlovic           CRO   1580     
Franco Skugor          CRO   1569  *

As it stands now, Croatia has a sizable advantage. Based on the D-Lo ratings of the currently scheduled doubles teams, the home side has a 189-point edge, which converts to a 74.8% probability of winning. But remember, that’s the chance of winning a no-ad, super-tiebreak match, with all the luck that entails. In best-of-five, that translates to a whopping 83.7% chance of winning.

Making matters worse for Argentina, it’s likely that Croatia could improve their side. Argentina could increase their odds of winning the doubles rubber by playing Juan Martin del Potro, but given Delpo’s shaky physical health, it’s unlikely he’ll play all three days. Marin Cilic, on the other hand, could very well play as much as possible. A Cilic-Ivan Dodig pairing would have a 243-point advantage over Leonardo Mayer and Guido Pella, which translates to an 89% chance of winning a best-of-five match. Even Mayer’s Davis Cup heroics are unlikely to overcome a challenge of that magnitude.

Given the likelihood that Pella will sit on the bench for every meaningful singles match, it’s easy to wonder if there is a better option. Sure enough, in Horacio Zeballos, Argentina has a quality doubles player sitting at home. The two-time Grand Slam doubles semifinalist has a current D-Lo rating of 1758, almost identical to del Potro’s. Paired with Mayer, Zeballos would bring Argentina’s chances of upsetting a Dodig-Franco Skugor team to 43%. Zeballos-Mayer would also have a 32% chance of defeating Dodig-Cilic.

A full Argentina-Croatia forecast

With the doubles rubber sorted, let’s see who is likely to win the 2016 Davis Cup. Here are the Elo– and D-Lo-based forecasts for each currently-scheduled match, shown from the perspective of Croatia:

Rubber                      Forecast (CRO)  
Cilic v Delbonis                     90.8%  
Karlovic v del Potro                 15.8%  
Dodig/Skugor v Mayer/Pella           83.7%  
Cilic v del Potro                    36.3%  
Karlovic v Delbonis                  75.8%

Elo still believes Delpo is an elite-level player, which is why it makes him the favorite in the pivotal fourth rubber against Cilic. The system is less positive about Federico Delbonis, who it ranks 68th in the world, against his #41 spot on the ATP computer.

These match-by-match forecasts imply a 74.2% probability that Croatia will win the tie. That’s more optimistic than the betting market which, a few hours before play begins, gives Croatia about a 65% edge.

However, most of the tweaks we could make would move the needle further toward a Croatia victory. Delpo’s body may not allow him to play two singles matches at full strength, and the gap in singles skill between him and Mayer is huge. Croatia could improve their doubles chances if Cilic plays. And if there is a home-court or surface advantage, it would probably work against the South Americans.

Even more likely than a Croatian victory is a 1-1 split of the first two matches. If that happens, everything will hang in the balance tomorrow, when the world tunes in to watch a doubles match.

Can Nick Kyrgios Win a Grand Slam?

Italian translation at settesei.it

Today’s breaking news? Former Wimbledon finalist Mark Philippoussis thinks that Nick Kyrgios can win the Australian Open. Hey, it’s almost the offseason. We take our news wherever we can get it.

Still, it’s an interesting question. Is it possible for such a volatile, one-dimensional player to string together seven wins on one of the biggest stages in the sport? Philippoussis–not the most versatile of players himself–reached two Slam finals. A big serve can take you far.

Last year, I published a post investigating the “minimum viable return game,” the level of return success that a player would need to maintain in order to reach the highest echelon of men’s tennis. It’s rare to finish a season in the top ten without winning at least 38% of return points, though a few players, including Milos Raonic, have managed it. When I wrote that article, Kyrgios’s average for the previous 52 weeks was a measly 31.7%, almost in the territory of John Isner and Ivo Karlovic.

Kyrgios has improved since then. In 2016, he won 35.4% of return points, almost equal to Raonic’s 35.9%–and most would agree that Milos had an excellent year. Philippoussis’s career mark was only 34.9%, though Kyrgios would be lucky to play as many tournaments on grass and carpet as Philippoussis did. Still, a sub-36% rate of return points won isn’t usually good enough in today’s game: Raonic was only the third player since 1991 (along with Pete Sampras and Goran Ivanisevic) to finish a season in the top five with such a low rate.

Then again, Philippoussis didn’t say anything about finishing in the top five. The “minimum viable Slam-winning return game” might be different. Looking at all Grand Slam champions back to 1991, here are the lowest single-tournament rates of return points won:

Year  Slam             Player               RPW%                     
2001  Wimbledon        Goran Ivanisevic    31.1%  
1996  US Open          Pete Sampras        32.8%  
2009  Wimbledon        Roger Federer       33.7%  
2002  US Open          Pete Sampras        35.6%  
2000  Wimbledon        Pete Sampras        36.6%  
2010  Wimbledon        Rafael Nadal        36.8%  
2014  Australian Open  Stan Wawrinka       37.0%  
1998  Wimbledon        Pete Sampras        37.2%  
1991  Wimbledon        Michael Stich       37.4%  
2000  US Open          Marat Safin         37.5%

Wimbledon is well-represented here, as we might expect. Not so for Kyrgios’s home Slam: Stan Wawrinka‘s 2014 Australian Open title is the only time it appears in the top 20, even though it has played very fast in recent years. Every other Melbourne titlist won at least 39.5% of return points. As with year-end top-ten finishes, 38% is a reasonable rule of thumb for the minimum viable level, though on rare occasions, it is possible to come in below that.

The bar is set: Can Kyrgios clear it? 18 months ago, when Kyrgios’s 52-week return-points-won average was below 32%, the obvious answer would have been negative. His current mark above 35% makes the question a more interesting one. To win a Slam, he’ll probably need to return better, but only for seven matches.

The Australian has enjoyed one seven-match streak–in fact, a nine-match run–that would be more than good enough. Combining his title in Marseille and his semifinal showing in Dubai this Februrary, Kyrgios played almost nine matches (he retired with a back injury in the last one) while winning a whopping 41.5% of return points. At 42 of the last 104 Slams, the champion has won return points at a lower rate.

However, February was an aberration. To approximate Kyrgios’s success over the length of a Slam, I looked at his return points won over every possible streak of ten matches. (Most of his matches have been best-of-three, so ten matches is about the same number of points as a Slam title run.) Aside from the streaks involving Marseille and Dubai this year, he has never topped 37% for that length of time.

There’s always hope for improvement, especially for a mercurial 21-year-old in a sport dominated by older men. But the evidence is against him here, as well. Research by falstaff78 suggests that players do not substantially improve their return statistics as they mature. That may seem counterintuitive, since some players clearly do develop their skills. However, as players get better, they go deeper in tournaments and alter their schedules, changing the mix of opponents they face. Two years ago, Kyrgios faced seven top-20 players. This year he played 18. Raonic, who represents an optimistic career trajectory for Kyrgios, faced 26 this season.

Against the top 20–the sorts of Grand Slam opponents a player has to beat to get from the fourth round to the trophy ceremony–Kyrgios has won less than 30% of his career return points. Even Raonic, who has yet to win a Slam himself, has done better, and won 32.6% of return points against top-20 opponents this year.

There’s little doubt that Kyrgios has the serve to win Grand Slams. And once the Big Four retire, I suppose someone will have to win the majors. But even in weak eras, you need to break serve, and at Slams, you typically need to do so many times, and against very high-quality opponents. The evidence we have so far strongly implies that Kyrgios, like Philippoussis before him, will struggle to triumph at a Slam.

The Speed of Every Surface, 2016 Edition

More than five years after I first started trying to use ATP match stats to estimate surface speed, the issue remains a contentious one. Most commentators agree that surface speeds have converged and generally gotten slower. The ATP has begun to release a trickle of court speed data, but it raises more questions than answers.

It’s been three years since I’ve published surface speed numbers, so we’re due for an update. Before we do that, it’s important to understand what exactly these figures mean, as well as their limitations.

Court surfaces–and, more broadly, the environments in which pro matches are played–have a variety of characteristics. Some courts are faster or slower and some cause higher or lower bounces. Tournaments use different balls, are played at a range of elevations, and take place in all sorts of weather conditions. All of these factors, and more, affect how matches are played.

Due to the limits of available tennis data, however, we can’t isolate those different factors. It would be great to know which surfaces allowed for the most effective slice approaches or the deadliest drop shots, but we don’t have the data to even begin trying to answer those questions. The Match Charting Project is a step in the right direction, but with only a few hundred men’s matches per year, there isn’t quite enough to compare surfaces while controlling for different players and playing styles.

So we work with what we have. Faster surfaces are more favorable to the server, which shows up in ace counts and service breaks. The ATP publishes those basic stats for every match, so that’s what we’ll use. When I first researched this issue, I discovered that there isn’t much difference between counting aces and counting service breaks, except that there’s a wider variation in ace rates between faster and slower surfaces, so the resulting numbers are easier to understand.

At the risk of repeating myself: Measuring surface speed by ace rate ignores a lot of court characteristics. It is far from complete and certainly imperfect. It does, however, give us an idea of how tournaments compare in one important regard.

Aces, adjusted

That said, simply counting aces–for example, 6.8% of points in Buenos Aires this year and 11.2% of points in Los Cabos–isn’t good enough. Players make scheduling choices based on their strengths and preferences, so the guys who show up for clay court events tend, on average, to be weaker servers than those who play on hard and grass courts. To take an extreme example, Gilles Muller managed to play only two matches on clay this season. As it turns out, the courts in Buenos Aires and Los Cabos had almost identical effects on ace rates–the difference is entirely due to the mix of players in each draw.

So we adjust for the makeup of the field. For every player with at least three tour-level matches on clay and another three on hard or grass, I calculated their season average ace rates on clay and hard/grass,which I then weighted (one-third clay, two-thirds hard/grass) so that the numbers give us idea of what their ace rate would’ve been had they played an “average” (that is, unbiased by scheduling preferences) season. I’ve lumped hard and grass together here, not because they are the same–of course they’re not–but because the small number of grass court events makes it difficult to treat on its own.

With player averages in hand, we can go through every match of the season (between players who meet our minimums) and, using their ace rates and the rates at which players hit aces against them, calculate a “predicted” ace rate for the match, given a neutral surface. Then, by comparing the match’s actual ace rate to the neutral prediction, we get one data point regarding the surface’s effect on aces. If the actual ace rate is greater than the prediction, it suggests the surface is faster than average. If the prediction is greater than the ace rate, it implies the surface is slower than average.

No single match can tell us about a court’s tendency, but by aggregating all the matches at an event, we get a fairly good idea. With that final step, we get a single number per event. A neutral surface rates at 1, faster surfaces are greater than 1, and slower surfaces are less than 1. For instance, this algorithm rates the 2016 Paris Masters as 1.18, meaning that there were 18% more aces than we would expect on a neutral surface, rating Bercy as faster than all but 10 other events this season.

Whew! Here are the ace-based surface ratings for the last three seasons of every current tour-level event listed from fastest to slowest:

Tournament            Surface  2016 Ace%  2016  2015  2014  
Shenzhen                 Hard      12.9%  1.54  1.20  1.49  
Quito                    Clay      11.9%  1.50  0.89        
Metz                     Hard      12.6%  1.43  1.28  1.37  
Marseille                Hard      15.3%  1.38  1.28  1.26  
Stuttgart               Grass      13.3%  1.38  1.32  0.89  
Chengdu                  Hard      11.7%  1.27              
Australian Open          Hard      12.3%  1.25  1.19  1.12  
Queen's Club            Grass      14.3%  1.25  1.27  1.26  
Washington               Hard      19.5%  1.24  1.12  1.25  
Cincinnati Masters       Hard      14.2%  1.18  1.04  1.17  
Paris Masters            Hard      13.7%  1.18  1.03  1.03  
Brisbane                 Hard      12.2%  1.16  1.20  1.23  
Canada Masters           Hard      12.6%  1.16  1.08  1.00  
Halle                   Grass      12.2%  1.16  1.12  1.31  
Nottingham              Grass      12.0%  1.15  1.21        
Gstaad                   Clay      10.1%  1.12  0.84  0.77  
Basel                    Hard      10.1%  1.12  1.01  1.20  
Tokyo                    Hard      11.5%  1.12  1.00  1.06  
Chennai                  Hard      10.3%  1.12  0.91  0.65  
Auckland                 Hard      12.9%  1.11  1.21  1.01  
                                                            
Tournament            Surface  2016 Ace%  2016  2015  2014  
Doha                     Hard       8.8%  1.11  1.06  0.83  
Sydney                   Hard      10.5%  1.11  1.32  1.27  
Montpellier              Hard       9.7%  1.10  1.29  1.29  
Shanghai Masters         Hard      10.7%  1.10  1.05  1.34  
Kitzbuhel                Clay       6.9%  1.09  0.85  0.81  
s-Hertogenbosch         Grass      13.2%  1.08  1.06  1.05  
Winston-Salem            Hard      10.4%  1.07  1.33  1.10  
Newport                 Grass      11.0%  1.07  1.26  1.23  
Tour Finals              Hard       9.5%  1.06  0.99  0.89  
Wimbledon               Grass      11.8%  1.06  1.20  1.35  
Rotterdam                Hard       9.8%  1.04  1.19  1.08  
Vienna                   Hard      11.8%  1.02  1.39  1.26  
Memphis                  Hard       8.7%  1.00  1.19  0.94  
Miami Masters            Hard      10.0%  1.00  0.86  1.04  
Sofia                    Hard       8.4%  1.00              
Beijing                  Hard       9.4%  0.99  1.05  0.81  
Atlanta                  Hard      15.5%  0.97  1.35  0.90  
St.Petersburg            Hard       8.1%  0.97  0.98        
Marrakech                Clay       8.5%  0.95              
Olympics                 Hard       7.1%  0.95              
                                                            
Tournament            Surface  2016 Ace%  2016  2015  2014  
Moscow                   Hard       6.6%  0.94  1.08  1.12  
Antwerp                  Hard       8.6%  0.93              
Delray Beach             Hard       9.2%  0.92  0.88  0.93  
US Open                  Hard       8.9%  0.91  1.10  1.10  
Dubai                    Hard       9.4%  0.88  0.93  0.81  
Madrid Masters           Clay       8.6%  0.86  0.85  0.94  
Los Cabos                Hard      11.2%  0.85              
Buenos Aires             Clay       6.8%  0.85  0.78  0.64  
Houston                  Clay      11.5%  0.84  0.76  0.70  
Sao Paulo                Clay       7.1%  0.83  1.03  1.20  
Acapulco                 Hard      10.5%  0.83  0.67  0.98  
Indian Wells Masters     Hard       8.2%  0.83  0.99  0.90  
Stockholm                Hard       7.6%  0.82  1.13  1.15  
Rio de Janeiro           Clay       7.4%  0.81  0.80  0.77  
Estoril                  Clay       7.4%  0.80  0.63  0.62  
Nice                     Clay       6.3%  0.79  0.64  0.74  
Geneva                   Clay       8.3%  0.77  0.78        
Umag                     Clay       5.4%  0.77  0.67  0.76  
Roland Garros            Clay       7.6%  0.77  0.72  0.71  
Rome Masters             Clay       7.2%  0.76  0.94  0.74  
Bucharest                Clay       5.9%  0.71  0.59  0.51  
Munich                   Clay       6.3%  0.71  1.01  0.87  
Monte Carlo Masters      Clay       6.2%  0.70  0.63  0.64  
Istanbul                 Clay       5.7%  0.67  0.83        
Barcelona                Clay       5.4%  0.65  0.70  0.72  
Bastad                   Clay       5.3%  0.65  0.64  1.07  
Hamburg                  Clay       5.7%  0.60  0.62  0.79

As usual, we have an interesting mix of usual suspects and surprises. The top of the list is primarily indoor hard and grass courts, along with the high-altitude clay in Quito and Gstaad. However, in both of the latter cases, those tournaments had lower-than-expected ace rates in 2015. The surface ratings for 250s are particularly volatile because, in addition to the small number of matches, many of these matches must be discarded because one or both of the players didn’t meet our minimums. For the 2015 Quito event, we have only 11 matches to work with.

The sample size problem doesn’t apply to larger events, however, so we can have a fair amount of confidence in the ratings for the Australian Open, showing up here as the fastest of the Grand Slams–considerably faster than Wimbledon, which is only a few ticks above neutral.

Ace ratings and Court Pace Index

Last month, TennisTV released some data on court speed for this season’s Masters events. Court Pace Index (CPI) is a commonly-accepted measure of the speed of the surface itself–that is, the physical makeup of the court. As I’ve said, that’s far from the only factor affecting how a court plays, but it is an important one.

cpi2

Here’s how my surface ratings compare to CPI:

Tournament            Surface  TA Rating   CPI  
Cincinnati Masters       Hard       1.18  35.1  
Paris Masters            Hard       1.18  39.1  
Canada Masters           Hard       1.16  35.2  
Shanghai Masters         Hard       1.10  44.1  
Tour Finals              Hard       1.06  40.6  
Miami Masters            Hard       1.00  33.1  
Madrid Masters           Clay       0.86  22.5  
Indian Wells Masters     Hard       0.83  30.0  
Rome Masters             Clay       0.76  24.0  
Monte Carlo Masters      Clay       0.70  23.7

It’s noteworthy that Madrid is, by my measure, the most ace-friendly of the three clay-court Masters, while its CPI is the lowest. Altitude could account for the difference.

The biggest mismatch, though, is the Tour Finals. The O2 Arena has one of the highest CPIs, but it doesn’t rate very far above average in aces. The Tour Finals has always been a bit problematic, as there is an unusually small number of matches, and the level of returning is very, very high. My algorithm takes into account how well each player prevents aces, but perhaps that issue is more complex when our view is limited to only the very best players.

TennisTV also showed CPI for the last several years of Tour Finals:

cpi1

Compared to my ratings:

Year  TA Rating   CPI  
2016       1.06  40.6  
2015       0.99  34.0  
2014       0.89  33.6  
2013       0.90  32.8  
2012       1.18  33.9

If the table cut off after 2013, it would look like a relatively good fit. As it is, the relationship between CPI and my rating for 2012 wouldn’t be out of place in the previous table, which included a 35.1 CPI for Cincinnati to go with an ace-based rating of 1.18.

I hope that this is a sign of more data to come. If so, we can move beyond approximations based on ace rate to get a better sense of what factors influence play at the ATP level. More data won’t settle the age-old surface speed debates, but it will make them a whole lot more interesting.