April 2018 - Heavy Topspin

Podcast Episode 23: Rafa, Las Undecimas, and The Shot Clock

Episode 23 of the Tennis Abstract Podcast, with Carl Bialik of the Thirty Love podcast, starts with the greatness of Rafael Nadal, moves on to a broader discussion of the rest of the men’s clay court season, and some talk about return aggression, and concludes with our thoughts on the shot clock, which will be implemented at this year’s US Open.

Thanks for listening!

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Update: An index for this episode, thanks to FBITennis:

Nadal’s Undecima (Barcelona)	1:05
Alexander Zverev on Clay	9:12
Del Potro’s Chances vs Rafa on Clay	16:50
Return Aggressiveness	18:44
Tennys Sandgren	25:25
Adrian Mannarino	32:50
Shot Clock at the U.S. Open	38:43
Sidebar: Longest Rally in Pro Tennis (w/Time Violation)	41:40
Shot Clock (Reprise)	48:57
Time between First and Second Serves	52:38

The Most Aggressive ATP Returners

In yesterday’s post, I outlined a new method to measure return aggression. Using Aggression Score (AS) as a starting point, I made some adjustments in order to treat return winners (and induced forced errors) and return errors separately. The resulting metric–Return Aggression Score (RAS)–gives equal weight to return winners and return errors. A positive RAS represents an aggressive return game, while a negative number indicates a more conservative one. The most aggressive single-match performances were nearly four standard deviations above the mean, while player averages varied between about one standard deviation above and below the mean.

We can now point the algorithm at the ATP, and calculate RAS for each player in the 1,500 or so 2010-present men’s matches logged by the Match Charting Project.

The difference between the frequency of return errors and return winners is even greater for men than it is for women. The WTA tour averages, as we saw yesterday, are 17.8% and 5.5%, respectively, and the men’s averages are 20.9% and 4.1%. Thus, treating the two categories separately is even more important when analyzing ATP matches.

The overall range in single-match RAS figures is about the same as it is for women. The most aggressive one-match returners are nearly four standard deviations above the mean (a RAS mark near 4.0), while the lowest are almost two standard deviations below (RAS marks near -2.0). What differs between genders is that the most aggressive men’s single-match performances are not clustered around one player, as Serena Williams dominates the women’s list. Of the top ten one-match men’s RAS marks, only one player appears twice, and that is partly an accident:

Year  Event         Returner      Opponent   RAS  
2015  Halle         Berdych       Karlovic  3.96  
2014  Halle         D Brown       Nadal     3.72  
2016  Stuttgart     Marchenko     Groth     3.49  
2014  Aus Open      Dolgopolov    Berankis  2.99  
2016  Dallas CH     Tiafoe        Groth     2.91  
2014  Bogota        J Wang        Karlovic  2.79  
2015  Fairfield CH  Tiafoe        D Brown   2.72  
2017  Montpellier   De Schepper   M Zverev  2.64  
2015  Madrid        Isner         Kyrgios   2.60  
2014  Halle         An Kuznetsov  D Brown   2.58

Two factors make it more likely a returner appears on this list: His opponent, and the surface. Facing a serve-and-volleyer means adopting a higher-risk return strategy, and playing on a faster surface has a similar effect. Four of the top ten matches here were played on grass, and seven of the ten returners faced opponents who often come in behind their serves. Frances Tiafoe is partly responsible for his double-appearance here, but I suspect it has more to do with his opponents.

Grass is, by far, the most extreme surface in its effect on return tactics. Here are the numbers for each court type, along with the RAS of the average match on that surface:

Surface  RetE%  RetW%    RAS  
Hard     21.4%   4.1%   0.04  
Grass    25.3%   5.6%   0.54  
Clay     18.5%   3.5%  -0.24  
Average  20.9%   4.1%   0.00

Even though the average clay court match isn’t as extreme as a grass court match in this regard, the ten least aggressive single-match return performances all took place on clay, five of them recorded by Rafael Nadal.

Player averages

The Match Charting Project has at least 10 matches (2010-present) for about 75 players. Here is the top quintile, the 15 most aggressive players of that group:

Player                 Matches  RetPts   RAS  
Dustin Brown                11     676  1.90  
Ivo Karlovic                16    1116  0.85  
John Isner                  30    2202  0.77  
Alexandr Dolgopolov         20    1417  0.76  
Philipp Kohlschreiber       18    1334  0.69  
Lukas Rosol                 11     841  0.67  
Vasek Pospisil              14     812  0.62  
Andrey Kuznetsov            11     585  0.54  
Benoit Paire                17    1198  0.54  
Jeremy Chardy               14     923  0.39  
Kevin Anderson              23    1681  0.39  
Kei Nishikori               47    3128  0.38  
Milos Raonic                42    3211  0.34  
Sam Querrey                 17    1219  0.31  
Fernando Verdasco           17    1109  0.30

There’s aggression, and then there’s Dustin Brown. No other player is one full standard deviation above average, and he is nearly two, more than twice as aggressive as the next-most tactically extreme ATPer.

We don’t see quite the same extremes in the other direction, just a bunch of clay-courters:

Player                  Matches  RetPts    RAS  
Jiri Vesely                  11     716  -0.76  
Marcel Granollers            12     746  -0.64  
Paolo Lorenzi                13     912  -0.58  
Inigo Cervantes Huegun       10     705  -0.58  
Tommy Robredo                10     622  -0.57  
Damir Dzumhur                11     688  -0.56  
Guido Pella                  11     749  -0.51  
Guillermo Garcia Lopez       10     734  -0.49  
Casper Ruud                  16    1000  -0.48  
Hyeon Chung                  10     621  -0.48  
Rafael Nadal                157   11773  -0.42  
Richard Gasquet              36    2180  -0.42  
Roberto Bautista Agut        25    1633  -0.42  
Diego Schwartzman            44    3289  -0.42  
Juan Martin Del Potro        42    2900  -0.40

These least-aggressive numbers are partly a reflection of playing styles, and partly the surface, as we’ve already seen.

Next, let’s look at how much players alter their style to the circumstances. Here are 16 players–top guys along with some others I found interesting–along with their average RAS numbers on the three major surfaces:

Player                   RAS   Hard   Clay  Grass  
John Isner              0.77   0.71   1.03   0.72  
Marin Cilic             0.28   0.09   0.02   1.38  
Jo Wilfried Tsonga      0.24   0.31  -0.22   0.38  
Gilles Muller           0.10   0.07  -0.74   1.13  
Roger Federer           0.08   0.04  -0.07   0.40  
Grigor Dimitrov         0.07   0.12  -0.30   0.28  
Novak Djokovic          0.02   0.03  -0.12   0.25  
Nick Kyrgios            0.02  -0.06   0.07   1.20  
Jack Sock              -0.08  -0.09   0.08         
Stanislas Wawrinka     -0.09  -0.11  -0.23   0.95  
Alexander Zverev       -0.13  -0.06  -0.33   0.18  
Andy Murray            -0.20  -0.25  -0.32   0.15  
Dominic Thiem          -0.24  -0.13  -0.40   0.25  
Juan Martin Del Potro  -0.40  -0.43  -0.58  -0.07  
Diego Schwartzman      -0.42  -0.34  -0.45         
Rafael Nadal           -0.42  -0.25  -0.76   0.57

The big servers have some surprises in store: John Isner is more aggressive on the return on clay than on other surfaces, and Jack Sock and Nick Kyrgios show the same, at least compared to hard courts. Marin Cilic is extremely aggressive on the grass court return, but his clay court tactics are similar to those on hard courts. In stark contrast is Gilles Muller, second only to Nadal as a conservative returner on clay, but quite aggressive on other surfaces.

One of the many underexplored topics in tennis analytics is the different ways players change (or choose not to change) their tactics on different surfaces. While comparing Return Aggression Score by surface is a tiny step in that direction, it does suggest just how much those strategies vary.

As always, a reminder that analyses like these are only possible with the volunteer-generated shot-by-shot logs of the Match Charting Project. I hope you’ll contribute.

Measuring Return Aggression

In the last couple of years, I’ve gotten a lot of mileage out of a metric called Aggression Score (AS), first outlined here by Lowell West. The stat is so useful due to its simplicity. The more aggressive a player is, the more she’ll rack up both winners and unforced errors. AS, then, is essentially the rate at which a player hits winners and unforced errors.

Yet one limitation lies in Aggression Score’s simplicity. It works best when winners and unforced errors move together, and when they are roughly similar. If someone is having a really bad day, her unforced errors might skyrocket, resulting in a higher AS, even if the root cause of the errors is poor play, not aggression. On the flip side, a locked-in player will see her AS increase by hitting more winners, even if those winners are more a reflection of good form than a high-risk tactic.

I’ve long wanted to extend the idea behind Aggression Score to return tactics, but when we narrow our view to the second shot of the rally, the simplicity of the metric becomes a handicap. On the return, the vast majority of “aggressive” shots are errors, so the results will be swamped by error rate, minimizing the role of return winners, which are a more reliable indicator. Using Match Charting Project data from 2010-present women’s tennis, returns result in errors 18% of the time, while they turn into winners (or they induce forced errors) less than one-third as often, 5.5% of the time. The appealingly simple Aggression Score formula, narrowed to consider only returns of serve, won’t do the job here.

Return aggression score

Let’s walk through a formula to measure return aggression, using last month’s Miami final between Sloane Stephens and Jelena Ostapenko as an example. Tallying up return points (excluding aces and service winners), along with return errors* and return winners** for both players from the match chart, we get the following:

Returner          RetPts  RetErr  RetWin  RetE%  RetW%  
Sloane Stephens       64       9       1  14.1%   1.6%  
Jelena Ostapenko      63      11       6  17.5%   9.5%

* “errors” are a combination of forced and unforced, because most return errors are scored as forced errors, and because the distinction between the two is so unreliable as to be meaningless. Some forced error returns are nearly impossible to make, so they don’t really belong in this analysis, but with the state of available data, it’ll have to do.

** throughout this post, I’ll use “winners” as short-hand for “winners plus induced forced errors” — that is, shots that were good enough to end the point.

These numbers make clear which of the two players is the aggressive one, and they confirm the obvious: Ostapenko plays much higher-risk tennis than Stephens does. In this case, Ostapenko’s rates are nearly equal to or above the tour averages of 17.8% and 5.5%, while both of Stephens’s are well below them.

The next step is to normalize the error and winner rates so that we can more easily see how they relate to each other. To do that, I simply divide each number by the tour average:

Returner          RetE%  RetW%  RetE+  RetW+  
Sloane Stephens   14.1%   1.6%   0.79   0.28  
Jelena Ostapenko  17.5%   9.5%   0.98   1.73

The last two columns show the normalized figures, which reflect how each rate compares to tour average, where 1.0 is average, greater than 1 means more aggressive, and less than 1 means less aggressive.

We’re not quite done yet, because, as Ostapenko and Stephens illustrate, return winner rates are much noisier than return error rates. That’s largely a function of how few there are. The gap between the two players’ normalized rates, 0.28 and 1.73, looks huge, but represents a difference of only five winners. If we leave return winner rates untouched, we’ll end up with a metric that varies largely due to movement in winner rates–the opposite problem from where we started.

To put winners and errors on a more equal footing, we can express both in terms of standard deviations. The standard deviation of the adjusted error ratio is 0.404, while the standard deviation of the adjusted winner ratio is 0.768, so when we divide the ratios by the standard deviations, we’re essentially reducing the variance in the winner number by half. The resulting numbers tell us how many standard deviations a certain statistic is above or below the mean, and these final results give us winner and error rates that are finally comparable to each other:

Returner          RetE+  RetW+  RetE-SD  RetW-SD  
Sloane Stephens    0.79   0.28    -0.52    -0.93  
Jelena Ostapenko   0.98   1.73    -0.05     0.95

(Math-oriented readers might notice that the last two steps don’t need to be separate; we could just as easily think of these last two numbers as standard deviations above or below the mean of the original winner and error rates. I included the intermediate step to–I hope–make the process a bit more intuitive.)

Our final stat, Return Aggression Score (RAS) is simply the average of those two rates measured in standard deviations:

Returner          RetE-SD  RetW-SD    RAS  
Sloane Stephens     -0.52    -0.93  -0.73  
Jelena Ostapenko    -0.05     0.95   0.45

Positive numbers represent more aggression than tour average; negative numbers less aggression. Ostapenko’s +0.45 figure is higher than about 75% of player-matches among the nearly 4,000 in the Match Charting Project dataset, though as we’ll see, it is far more conservative than her typical strategy. Stephens’s -0.73 mark is at the opposite position on the spectrum, higher than only one-quarter of player-matches. It is also lower than her own average, though it is higher than the -0.97 RAS she posted in the US Open final last fall.

The extremes

The first test of any new metric is whether the results actually make sense, and we need look no further than the top ten most aggressive player-matches for confirmation. Five of the top ten most aggressive single-match return performances belong to Serena Williams, and the overall most aggressive match is Serena’s 2013 Roland Garros semifinal against Sara Errani, which rates at 3.63–well over three standard deviations above the mean. The other players represented in the top ten are Ostapenko, Oceane Dodin, Petra Kvitova, Madison Keys, and Julia Goerges–a who’s who of high-risk returning in women’s tennis.

The opposite end of the spectrum includes another group of predictable names, such as Simona Halep, Agnieszka Radwanska, Caroline Wozniacki, Annika Beck, and Errani. Two of Halep’s early matches are lowest and third-lowest, including the 2012 Brussels final against Radwanska, in which her return aggression was 1.6 standard deviations below the mean. It’s not as extreme a mark as Serena’s performances, but that’s the nature of the metric: Halep returned 46 of 48 non-ace serves, and none of the 46 returns went for winners. It’s tough to be less aggressive than that.

The leaderboard

The Match Charting Project has shot-by-shot data on at least ten matches each for over 100 WTA players. Of those, here are the top ten, as ranked by RAS:

Player                    Matches  RetPts   RAS  
Oceane Dodin                   11     665  1.18  
Aryna Sabalenka                11     816  1.12  
Camila Giorgi                  19    1155  1.07  
Mirjana Lucic                  11     707  1.05  
Julia Goerges                  27    1715  0.94  
Petra Kvitova                  65    4142  0.90  
Serena Williams                91    5593  0.90  
Jelena Ostapenko               35    2522  0.88  
Anastasia Pavlyuchenkova       21    1180  0.78  
Lucie Safarova                 34    2294  0.77

We’ve already seen some of these names, in our discussion of the highest single-match marks. When we average across contests, a few more players turn up with RAS marks over one full standard deviation above the mean: Aryna Sabalenka, Camila Giorgi, and Mirjana Lucic-Baroni.

Again, the more conservative players don’t look as extreme: Only Madison Brengle has a RAS more than one standard deviation below the mean. I’ve included the top 20 on this list because so many notable names (Wozniacki, Radwanska, Kerber) are between 11 and 20:

Player                Matches  RetPts     RAS  
Madison Brengle            11     702   -1.06  
Monica Niculescu           32    2099   -0.93  
Stefanie Voegele           12     855   -0.85  
Annika Beck                16    1181   -0.78  
Lara Arruabarrena          10     627   -0.72  
Johanna Larsson            14     873   -0.65  
Barbora Strycova           20    1275   -0.63  
Sara Errani                25    1546   -0.60  
Carla Suarez Navarro       36    2585   -0.55  
Svetlana Kuznetsova        27    2271   -0.55 

Player                Matches  RetPts     RAS  
Viktorija Golubic          16    1272   -0.53  
Agnieszka Radwanska        96    6239   -0.51  
Yulia Putintseva           22    1552   -0.51  
Caroline Wozniacki         80    5165   -0.50  
Christina McHale           11     763   -0.48  
Angelique Kerber           93    6611   -0.46  
Louisa Chirico             13     806   -0.44  
Darya Kasatkina            26    1586   -0.43  
Magdalena Rybarikova       12     725   -0.41  
Anastasija Sevastova       30    1952   -0.40

A few more notable names: Halep, Stephens and Elina Svitolina all count among the next ten lowest, with RAS figures between -0.30 and -0.36. The most “average” player among game’s best is Victoria Azarenka, who rates at -0.08. Venus Williams, Johanna Konta, and Garbine Muguruza make up a notable group of aggressive-but-not-really-aggressive women between +0.15 and +0.20, just outside of the game’s top third, while Maria Sharapova, at +0.63, misses our first list by only a few places.

Unsurprisingly, these results track quite closely to overall Aggression Score figures, as players who adopt a high-risk strategy overall are probably doing the same when facing the serve. This metric, however, allows to identify players–or even single matches–for which the two strategies don’t move in concert. Further, the approach I’ve taken here, to separate and normalize winners and errors, rather than treat them as an undifferentiated mass, could be applied to Aggression Score itself, or to other more targeted versions of the metric, such as a third-shot AS, or a backhand-specific AS.

As always, the more data we have, the more we can learn from it. Analyses like these are only possible with the work of the volunteers who have contributed to the Match Charting Project. Please help us continue to expand our coverage and give analysts the opportunity to look at shot-by-shot data, instead of just the basics published by tennis’s official federations.

Translating ATP Statistics Across Main Tour and Challenger Levels

Italian translation at settesei.it

What is the gap between the top-level ATP Tour and the lower-level ATP Challenger Tour? Some players pile up trophies in the minor leagues yet have a hard time converting that success to match wins on the big tour, while others struggle with the week-to-week grind of the challengers but excel when given opportunities on the larger stage.

Let’s take a look at a method that measures the difference between the skill level on the two tours. Once we can translate stats between levels, we can identify those players who are much better or worse than expected when they have the chance to compete against the best.

The algorithm I’ll use is almost identical to the one baseball analysts have used for decades to determine league equivalencies. For instance, we might find that a batting average of .300 in Triple-A (the highest minor league) is equivalent to .280 in the majors, meaning that, if a player is batting .300 in Triple-A, we’ll expect him to bat .280 in the majors. In tennis terms, it may be that a 10% ace rate in challengers is equivalent to a 8% ace rate on the main tour. Not every player will exhibit that precise drop in performance–some may even appear to get a little better–but on average, a league equivalency tells us what to expect when a player changes levels.

Here is the algorithm for league equivalencies, as applied to men’s tennis:

Pick a stat to focus on. I’ll use Total Points Won (TPW) here.
Neutralize that stat as much as possible. In baseball, that means controlling for the difference in parks; in tennis, it means controlling for competition. For the following, I’ve adjusted for each player’s quality of competition using a method I described about a year ago. Most players’ numbers are about the same after the adjustment, but a particularly easy or tough schedule means a bigger shift. For instance, Denis Shapovalov posted a TPW of 49.8% on the big tour last season, but because he played such high-quality competition, the adjustment bumps him up to 52.1%, 18th among tour regulars.
Identify players who competed at both levels, and find their adjusted stats at each level. Shapovalov played 18 tour-level matches and 30 challenger-level matches last year, with adjusted TPW numbers of 52.1% and 54.4%, respectively.
Calculate the ratio for each player. For Shapovalov last year, it was 1.044 (54.4 / 52.1).
Finally, take a weighted average of every player’s ratio. The weight is determined by the minimum number of matches played at either level, so for Shapovalov, it’s 18. Using the minimum means that a player like Gleb Sakharov (1 ATP match, 37 challenger matches) can be included in the calculation, but has very little effect on the end result.

Here are the results for the last six full seasons. Each ratio is the relationship between challenger-level TPW and tour-level TPW:

Year  Ratio  
2017  1.086  
2016  1.086  
2015  1.098  
2014  1.103  
2013  1.100  
2012  1.100

The average of these yearly equivalency factors is roughly the difference between a 52.5% TPW at challengers and a 48.0% TPW on the main tour. The shift from 2012-15 to 2016-17 may reflect the injuries that have sidelined the elites. With fewer elite players on court, the gap between the two tours narrows.

Now that we know the difference between the levels, we can find the players who defy the usual patterns. Of the 100 players with the most “paired” matches–that is, with the most matches at both levels in the same years–here are the 20 with the lowest ratios. Low ratios mean less difference in performance between the two levels, so these guys are either overperforming at tour level or underperforming at challengers:

Player              ATP M  CH M  Min M  Ratio  
Matthew Ebden          62   140     39  0.982  
Jared Donaldson        68    78     37  1.030  
Jack Sock              81    45     38  1.039  
James Duckworth        53   156     53  1.042  
Andrey Rublev          56    79     42  1.047  
Vasek Pospisil         96    76     60  1.047  
Thiemo De Bakker       48    87     44  1.048  
Samuel Groth           84   133     58  1.049  
Michael Berrer         59   107     56  1.050  
Ruben Bemelmans        41   178     41  1.052  
Dustin Brown          120   173    111  1.055  
Benoit Paire          295    53     53  1.059  
Peter Gojowczyk        46   132     44  1.059  
Michael Russell        58    78     58  1.061  
Marius Copil           58   180     58  1.063  
Taylor Harry Fritz     59    44     41  1.065  
Jordan Thompson        38    88     38  1.066  
Illya Marchenko        56   116     37  1.066  
Tatsuma Ito            65   179     65  1.066  
Ryan Harrison         124    84     59  1.068

The middle columns show the total number of ATP matches, challenger matches, and “paired” matches between 2012 and 2017 (“Min M”) for each player. (The last number gives an indication of just how much data was available for the single-player calculation.) Aside from a few big-serving North Americans near the top of this list, I don’t see a lot of obvious commonalities. There are some youngsters, some veterans, more big servers than not, but nothing obvious.

(Shapovalov doesn’t have enough paired matches to qualify, but his overall ratio is 1.035, good for third on this list.)

Here is the opposite list, the quintile of 20 players who have overperformed at challengers or underperformed on tour:

Player               ATP M  CH M  Min M  Ratio  
Florian Mayer          152    45     45  1.180  
Mikhail Youzhny         91    38     38  1.169  
Aljaz Bedene           144   121     80  1.160  
Filippo Volandri        62   101     62  1.158  
Robin Haase            194    71     71  1.157  
Tobias Kamke           102   144     73  1.155  
Adrian Mannarino       234   115     86  1.155  
Filip Krajinovic        36   167     36  1.148  
Albert Ramos           111    67     62  1.144  
Paul Henri Mathieu     147    96     82  1.141  
Kenny De Schepper       77   196     77  1.140  
Facundo Bagnis          45   197     45  1.136  
Pablo Cuevas           127    52     43  1.136  
Ivan Dodig              76    48     41  1.135  
Santiago Giraldo       146    70     56  1.135  
Paolo Lorenzi          204   191    124  1.135  
Thomaz Bellucci        162    44     44  1.134  
Albert Montanes        113   109     70  1.130  
Rogerio Dutra Silva     57   210     57  1.130  
Lukas Lacko            122   181    108  1.129

There are more clay-courters here than on the first list, and the very top of the ranking includes veterans who have mastered the challenger level, even if they still struggle to maintain a foothold on the main tour. I’ve had to exclude one player who belongs on this list: Gilles Muller broke my algorithm with his 45-9 challenger season in 2014. When I took him out of the 2014 calculations, the overall numbers changed very little, but it means no Muller here. Whatever his exact ratio, I can say that his tour-level performance hasn’t matched that 2014 run at challengers.

The bottoms of the two lists indicate that there isn’t that much variation between players. The middle 60% of players all have ratios between about 1.07 and 1.13, while the yearly averages hover around 1.09 and 1.10. Some players under consideration here have fewer than 50 “paired” matches over the six seasons, so a difference of a couple hundredths is far too little to draw any conclusions.

This algorithm, beyond suggesting what to expect from players when they move up from challengers to the main tour, could apply the same reasoning to other pairs of levels, such as ITF Futures and challengers, or women’s ITFs and the WTA tour. It could even compare narrower levels, such as ITF $10,000 events with ITF $15,000s, or ATP 250s with ATP 500s. The method is a staple of analytics in other sports, and it has a place in tennis, as well.

Big Four Losing Streaks

Italian translation at settesei.it

This is a guest post by Peter Wetz.

Novak Djokovic’s loss against Benoit Paire in his first match at this year’s Miami Masters caused a lot of head scratching. Not only did Benoit equalize his head to head against Novak–next to Hyeon Chung he is now the only active player with a balanced record against Novak; four active players hold positive records–but this was also the Serbian’s third consecutive loss.

Novak immediately made some changes, announcing the end of his partnership with his coach Andre Agassi and part-time coach Radek Stepanek after having worked with them just a few months.

A losing streak of this length by such a dominant player must be rare, and it prompted me to look for similar instances among the big four. The following table shows all three (or more) match losing streaks of the big four after they cracked the top ten in reverse chronological order. The last column shows the Elo-based probability (Prob) of having such a streak. This is simply the product of the probabilities of losing the matches that made up the streak.

Player    Start	        End	Length	Prob
Djokovic  2018-01-15	-*	3	0.002%  (0.027%**)
Murray	  2011-01-17	03-23	4	0.02%
Murray	  2010-03-11	04-11	3	0.63%
Nadal	  2009-11-08	11-22	4	1.89%
Djokovic  2007-10-15	11-12	5	0.07%
Federer	  2002-07-08	08-19	4	0.66%

* Streak still active

** Probability when adjusting Elo ratings due to absence from the tour

The table shows that since August 2002 Roger Federer never lost more than two matches in a row. Even his four match losing streak is the second most likely due to the strong competition he had to face. In November 2009 Rafael Nadal lost four matches in a row, but with a probability far higher than the other streaks. The reason is that three of the four matches occurred at the World Tour Finals, increasing the likelihood of a loss.

A number that stands out is the probability of Novak’s current streak: 0.002%. However, this number is based on traditional Elo ratings which do not take into account player absence, for instance, due to injury. Before this season Novak took a six month break suffering from a shoulder injury.

As has already been discussed, there are ways to adjust Elo ratings for players coming back on the tour. In the case of Maria Sharapova, who stayed absent for 15 months, a 200 point drop in her first five matches after the break was more in line with her level of play than simply assuming that she remained as competitive as before. For this analysis I used a drop of 150 rating points for Novak, which results in a more realistic streak probability of 0.027%, still the second lowest in the list.

This brings us to Andy Murray‘s losing streak of 2011, which most of us probably have already forgotten. After losing the Australian Open final to Novak, Andy lost against Marcos Baghdatis (#20) in Rotterdam, Donald Young (#143) in Indian Wells, and Alex Bogomolov (#118) in Miami. This looks very similar to Novak’s current situation, but Murray bounced back to achieve a 50-9 record for the remainder of the season. It remains to be seen whether Djokovic can do the same.

—

Peter Wetz is a computer scientist interested in racket sports and data analytics based in Vienna, Austria.

Podcast Episode 22: Davis Cup, Present and Future

Episode 22 of the Tennis Abstract Podcast features a new voice for the show, Austrian tennis fan and occasional guest blogger Peter Wetz. Peter and I delve into the weekend’s Davis Cup action, with particular focus on Rafael Nadal’s return (and the potential threats to his dominance on clay) and the deep rosters of the France and USA squads.

We also consider the ITF’s proposal to radically revamp Davis Cup and discuss what kind of year-end festival of tennis might attract both players and fans. Thanks for listening!

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Update: Here are some timestamps for the topics we covered in this episode, thanks to FBITennis:

Nadal’s return (Davis Cup) and 2018 clay court competition	1:45
Upcoming Davis Cup semifinals	13:50
Lefty advantage	26:30
Davis Cup: Austria vs Russia	36:10
Proposals to change Davis Cup	44:50

Houston and the Swarm of American Men

Italian translation at settesei.it

Of the 28 men in the ATP Houston main draw this week, 15 have a “USA” next to their names. The Americans include three of the top four seeds (John Isner, Sam Querrey, and Jack Sock), two of the four qualifiers (Stefan Kozlov and Denis Kudla) and one of the three wild cards (Mackenzie McDonald). The home-country dominance at the US Clay Court Championships hearkens back to earlier eras of professional tennis, when a few countries–the USA often first among them–dominated the ranks.

Those days are long gone, but this week’s turnout in Texas is the latest sign of an American resurgence. Sure, many top players are taking the week off, and plenty of European contenders opted for a similarly valuable event in Marrakech, so US players hardly represent half of the best ATPers. But 15 of 28–a main draw made up of such a high percentage of USAs–is something the tennis world hasn’t seen in a long time.

Going back five decades, there have been just over 400 ATP-level tournaments in which one country represented more than half of main draw entrants–an average of about eight events per year. The average is misleading, though: Houston is the first time it has happened since 2004, and there are only two previous instances in the last two decades. To find another draw so packed with Americans, we need to go back to 1996. Here are the last 20 tournaments in which one country represented more than half of main draw players:

Date      Tourney         Draw  Country  Count      %  
20040412  Valencia        32    ESP         20  62.5%  
19990913  Mallorca        32    ESP         18  56.3%  
19970908  Marbella        32    ESP         18  56.3%  
19960930  Marbella        32    ESP         18  56.3%  
19960212  San Jose        32    USA         17  53.1%  
19951002  Valencia        32    ESP         18  56.3%  
19950206  San Jose        32    USA         18  56.3%  
19940214  Philadelphia    32    USA         18  56.3%  
19940131  San Jose        32    USA         17  53.1%  
19930802  Los Angeles     32    USA         17  53.1%  
19930201  San Francisco   32    USA         19  59.4%  
19920803  Los Angeles     32    USA         17  53.1%  
19910708  Newport         32    USA         17  53.1%  
19910506  Charlotte       32    USA         17  53.1%  
19910401  Orlando         32    USA         20  62.5%  
19900730  Los Angeles     32    USA         19  59.4%  
19900507  Kiawah Island   32    USA         24  75.0%  
19900402  Orlando         32    USA         17  53.1%  
19900219  Philadelphia    48    USA         27  56.3%  
19900212  Toronto Indoor  56    USA         30  53.6%

The four most recent tournaments took place in three different places, but were instances of the same event. The rest of the draws on this list suggest just how many good tennis players were produced in that era by the United States. In about 85% of the tournaments in which one country made up half or more of the field, the dominant nation was the USA. Australia accounts for another 50, all at tournaments in Oz, most of them before 1980. The US is the only country to fill up more than half of a draw outside of its own borders.

What makes this week’s feat in Houston even more remarkable is that the tournament’s organizers gave only one of the three wild cards to a local player. (The other two went to 4th seed Nick Kyrgios, who didn’t bother to enter via conventional means, and fan favorite Dustin Brown.) In other words, Americans would have accounted for half of the draw even without the aid of wild cards.

This more specialized feat–non-wild cards from one country accounting for half of the draw–is even rarer over the last 25 years or so. Of the 20 tournaments listed above, only nine met this more rigorous standard. The other 11 only cleared the bar with the aid of wild cards. The 2004 Valencia tournament still qualifies, but for the most recent instance on American soil, we need to go back more than 25 years, to the 1993 event in San Francisco. That tourney had good reason to retain at least one wild card for a foreigner, as the organizers managed to attract Bjorn Borg. Borg lost in the first round, and Andre Agassi took the championship with a final-round win over Brad Gilbert.

It remains to be seen whether the sheer force of numbers will be enough to keep the Houston title in American hands. (Steve Johnson, the sixth seed this week, won it last year.) Of the 400-plus events with more than half of players representing the same flag, the winner has come from that dominant country about 73% of the time. My model suggests it is a toss-up this week, with a 48.9% probability that a US player wins it all. One of the favorites, however, is Australian Nick Kyrgios, with nearly a 45% chance of winning himself. One dark horse is the most interesting of all: Fifth seed Fernando Verdasco won this event four years ago. And fourteen years ago in Valencia, the last time one country made up more than half of an ATP draw, Verdasco was the man who hoisted the trophy.

Feast, Famine, and Sloane Stephens

Italian translation at settesei.it

Last week, Sloane Stephens reeled off an impressive series of victories, defeating Garbine Muguruza, Angelique Kerber, Victoria Azarenka, and Jelena Ostapenko to secure the title at the WTA Premier Mandatory event in Miami. The trophy isn’t quite as life-changing as the one she claimed at the US Open last September, but it’s a close second, and the competition she faced along the way was every bit as good.

The Miami title comes with 1,000 WTA ranking points, and by adding those to her previous tally, Stephens moved into the top ten, reaching a career high No. 9 on Monday. With two high-profile championships to her name, not to mention semifinal showings last summer in Toronto and Cincinnati, there’s little doubt she deserves it. Elo isn’t quite convinced, but its more sophisticated algorithm (and its disregard for the magnitude of the US Open and Miami titles) puts her within spitting distance of the top ten as well.

What makes Stephens’s rise to the top ten so remarkable is her efficiency in converting wins to ranking points. Since her return from injury at Wimbledon last year, she has played only 38 matches, winning 24 of them. She has suffered six first-round losses, plus two more defeats at last year’s Zhuhai Elite Trophy round-robin and another pair in the Fed Cup final against Belarus. All told, in the last nine months, she has won matches at only six different events. Her unusual record illustrates some of the quirks in the ranking system, and how a player who peaks at the right times can exploit them.

24 wins is almost never enough for a spot in the vaunted top ten. From 1990 to 2017, a player has finished a season with a top-ten ranking only seven times while winning fewer than 30 matches. Only two of those involved fewer wins than Sloane’s 24: Monica Seles‘s 1993 and 1995, the timespans leading up to her tragic on-court stabbing and following her eventual comeback. Here are the top-ten seasons with the fewest victories, including the last 52 weeks of a few players currently near the top of the WTA table:

Year  Player              YE Rk   W   L  W-L %  
1995  Monica Seles*           1  11   1    92%  
1993  Monica Seles            8  17   2    89%  
2018  Sloane Stephens**       9  24  14    63%  
2010  Serena Williams         4  25   4    86%  
1993  Jennifer Capriati       9  28  10    74%  
2015  Flavia Pennetta         8  28  20    58%  
2000  Mary Pierce             7  29  11    73%  
2004  Jennifer Capriati      10  29  12    71%  
1993  Mary Joe Fernandez      7  31  12    72%  
1995  Iva Majoli              9  31  13    70%  
2018  Venus Williams**        8  31  14    69%  
1995  Mary Joe Fernandez      8  31  15    67%  
2015  Lucie Safarova          9  32  21    60%  
2008  Maria Sharapova         9  33   6    85%  
1998  Steffi Graf             9  33   9    79%  
2018  Petra Kvitova**        10  33  14    70%

* ranking frozen after her assault

** rankings as of April 2, 2018; wins and losses based on previous 52 weeks

What almost all of these seasons have in common is exceptional performances at grand slams. Sloane won the US Open; Seles won the 1993 Australian; Serena Williams won a pair of majors in 2010; Flavia Pennetta capped an otherwise anonymous 2015 campaign with a title in New York. The slams are where the rankings points are.

Even within this group of slam successes, Sloane stands out. Of the 16 players on that list, only two–Pennetta and Lucie Safarova–won matches at a lower rate than Stephens has since her comeback. In other words, most women who are this efficient with their victories don’t lose quite so early or often at lesser events.

That 63% won-loss record is even more extreme than the above list makes it look. Of the nearly 300 year-end top-tenners since 1990, only eight finished the season with a lower win rate. Here’s that list, expanded to the top 11 to include another noteworthy recent season:

Year  Player              YE Rk   W   L  W-L %  
2014  Dominika Cibulkova     10  33  24    58%  
2000  Nathalie Tauziat       10  36  26    58%  
2015  Flavia Pennetta         8  28  20    58%  
1999  Nathalie Tauziat        7  37  25    60%  
2007  Marion Bartoli         10  47  31    60%  
2015  Lucie Safarova          9  32  21    60%  
2000  Anna Kournikova         8  47  29    62%  
2010  Jelena Jankovic         8  38  23    62%  
2018  Sloane Stephens*        9  24  14    63%  
2004  Elena Dementieva        6  40  23    63%  
2016  Garbine Muguruza        7  35  20    64%

* ranking as of April 2, 2018; wins and losses based on previous 52 weeks

There’s not much overlap between these lists; the first group generally missed some time, then made up for it by scoring big at slams, while the second group slogged through a long season and leveled up with a strong finish or two at a major. The typical player with a 63% winning percentage doesn’t end up in the top ten: She wraps up the season, on average, in the mid-twenties. At least that’s better than the average 24-win season: Those result in year-end finishes near No. 40.

Stephens has always been a big-match player: She made an early splash at the 2013 Australian Open, reaching the semifinals and upsetting Serena as a 19-year-old, and her overall career record at majors (66%) is nearly ten percentage points higher than her record at other tour events (57%). For all that, she will probably not conclude 2018 with such a extreme set of won-loss numbers. To do so, she’d probably need to win a major to replace her 2017 US Open points while losing early at most other events. Recovered from injury, Stephens may maintain her feast-or-famine ways to some degree, but it’s unlikely she’ll continue to display such extreme peaks and valleys.