Measuring Return Aggression

In the last couple of years, I’ve gotten a lot of mileage out of a metric called Aggression Score (AS), first outlined here by Lowell West. The stat is so useful due to its simplicity. The more aggressive a player is, the more she’ll rack up both winners and unforced errors. AS, then, is essentially the rate at which a player hits winners and unforced errors.

Yet one limitation lies in Aggression Score’s simplicity. It works best when winners and unforced errors move together, and when they are roughly similar. If someone is having a really bad day, her unforced errors might skyrocket, resulting in a higher AS, even if the root cause of the errors is poor play, not aggression. On the flip side, a locked-in player will see her AS increase by hitting more winners, even if those winners are more a reflection of good form than a high-risk tactic.

I’ve long wanted to extend the idea behind Aggression Score to return tactics, but when we narrow our view to the second shot of the rally, the simplicity of the metric becomes a handicap. On the return, the vast majority of “aggressive” shots are errors, so the results will be swamped by error rate, minimizing the role of return winners, which are a more reliable indicator. Using Match Charting Project data from 2010-present women’s tennis, returns result in errors 18% of the time, while they turn into winners (or they induce forced errors) less than one-third as often, 5.5% of the time. The appealingly simple Aggression Score formula, narrowed to consider only returns of serve, won’t do the job here.

Return aggression score

Let’s walk through a formula to measure return aggression, using last month’s Miami final between Sloane Stephens and Jelena Ostapenko as an example. Tallying up return points (excluding aces and service winners), along with return errors* and return winners** for both players from the match chart, we get the following:

Returner          RetPts  RetErr  RetWin  RetE%  RetW%  
Sloane Stephens       64       9       1  14.1%   1.6%  
Jelena Ostapenko      63      11       6  17.5%   9.5%

* “errors” are a combination of forced and unforced, because most return errors are scored as forced errors, and because the distinction between the two is so unreliable as to be meaningless. Some forced error returns are nearly impossible to make, so they don’t really belong in this analysis, but with the state of available data, it’ll have to do.

** throughout this post, I’ll use “winners” as short-hand for “winners plus induced forced errors” — that is, shots that were good enough to end the point.

These numbers make clear which of the two players is the aggressive one, and they confirm the obvious: Ostapenko plays much higher-risk tennis than Stephens does. In this case, Ostapenko’s rates are nearly equal to or above the tour averages of 17.8% and 5.5%, while both of Stephens’s are well below them.

The next step is to normalize the error and winner rates so that we can more easily see how they relate to each other. To do that, I simply divide each number by the tour average:

Returner          RetE%  RetW%  RetE+  RetW+  
Sloane Stephens   14.1%   1.6%   0.79   0.28  
Jelena Ostapenko  17.5%   9.5%   0.98   1.73

The last two columns show the normalized figures, which reflect how each rate compares to tour average, where 1.0 is average, greater than 1 means more aggressive, and less than 1 means less aggressive.

We’re not quite done yet, because, as Ostapenko and Stephens illustrate, return winner rates are much noisier than return error rates. That’s largely a function of how few there are. The gap between the two players’ normalized rates, 0.28 and 1.73, looks huge, but represents a difference of only five winners. If we leave return winner rates untouched, we’ll end up with a metric that varies largely due to movement in winner rates–the opposite problem from where we started.

To put winners and errors on a more equal footing, we can express both in terms of standard deviations. The standard deviation of the adjusted error ratio is 0.404, while the standard deviation of the adjusted winner ratio is 0.768, so when we divide the ratios by the standard deviations, we’re essentially reducing the variance in the winner number by half. The resulting numbers tell us how many standard deviations a certain statistic is above or below the mean, and these final results give us winner and error rates that are finally comparable to each other:

Returner          RetE+  RetW+  RetE-SD  RetW-SD  
Sloane Stephens    0.79   0.28    -0.52    -0.93  
Jelena Ostapenko   0.98   1.73    -0.05     0.95

(Math-oriented readers might notice that the last two steps don’t need to be separate; we could just as easily think of these last two numbers as standard deviations above or below the mean of the original winner and error rates. I included the intermediate step to–I hope–make the process a bit more intuitive.)

Our final stat, Return Aggression Score (RAS) is simply the average of those two rates measured in standard deviations:

Returner          RetE-SD  RetW-SD    RAS  
Sloane Stephens     -0.52    -0.93  -0.73  
Jelena Ostapenko    -0.05     0.95   0.45

Positive numbers represent more aggression than tour average; negative numbers less aggression. Ostapenko’s +0.45 figure is higher than about 75% of player-matches among the nearly 4,000 in the Match Charting Project dataset, though as we’ll see, it is far more conservative than her typical strategy. Stephens’s -0.73 mark is at the opposite position on the spectrum, higher than only one-quarter of player-matches. It is also lower than her own average, though it is higher than the -0.97 RAS she posted in the US Open final last fall.

The extremes

The first test of any new metric is whether the results actually make sense, and we need look no further than the top ten most aggressive player-matches for confirmation. Five of the top ten most aggressive single-match return performances belong to Serena Williams, and the overall most aggressive match is Serena’s 2013 Roland Garros semifinal against Sara Errani, which rates at 3.63–well over three standard deviations above the mean. The other players represented in the top ten are Ostapenko, Oceane Dodin, Petra Kvitova, Madison Keys, and Julia Goerges–a who’s who of high-risk returning in women’s tennis.

The opposite end of the spectrum includes another group of predictable names, such as Simona Halep, Agnieszka Radwanska, Caroline Wozniacki, Annika Beck, and Errani. Two of Halep’s early matches are lowest and third-lowest, including the 2012 Brussels final against Radwanska, in which her return aggression was 1.6 standard deviations below the mean. It’s not as extreme a mark as Serena’s performances, but that’s the nature of the metric: Halep returned 46 of 48 non-ace serves, and none of the 46 returns went for winners. It’s tough to be less aggressive than that.

The leaderboard

The Match Charting Project has shot-by-shot data on at least ten matches each for over 100 WTA players. Of those, here are the top ten, as ranked by RAS:

Player                    Matches  RetPts   RAS  
Oceane Dodin                   11     665  1.18  
Aryna Sabalenka                11     816  1.12  
Camila Giorgi                  19    1155  1.07  
Mirjana Lucic                  11     707  1.05  
Julia Goerges                  27    1715  0.94  
Petra Kvitova                  65    4142  0.90  
Serena Williams                91    5593  0.90  
Jelena Ostapenko               35    2522  0.88  
Anastasia Pavlyuchenkova       21    1180  0.78  
Lucie Safarova                 34    2294  0.77

We’ve already seen some of these names, in our discussion of the highest single-match marks. When we average across contests, a few more players turn up with RAS marks over one full standard deviation above the mean: Aryna Sabalenka, Camila Giorgi, and Mirjana Lucic-Baroni.

Again, the more conservative players don’t look as extreme: Only Madison Brengle has a RAS more than one standard deviation below the mean. I’ve included the top 20 on this list because so many notable names (Wozniacki, Radwanska, Kerber) are between 11 and 20:

Player                Matches  RetPts     RAS  
Madison Brengle            11     702   -1.06  
Monica Niculescu           32    2099   -0.93  
Stefanie Voegele           12     855   -0.85  
Annika Beck                16    1181   -0.78  
Lara Arruabarrena          10     627   -0.72  
Johanna Larsson            14     873   -0.65  
Barbora Strycova           20    1275   -0.63  
Sara Errani                25    1546   -0.60  
Carla Suarez Navarro       36    2585   -0.55  
Svetlana Kuznetsova        27    2271   -0.55 

Player                Matches  RetPts     RAS  
Viktorija Golubic          16    1272   -0.53  
Agnieszka Radwanska        96    6239   -0.51  
Yulia Putintseva           22    1552   -0.51  
Caroline Wozniacki         80    5165   -0.50  
Christina McHale           11     763   -0.48  
Angelique Kerber           93    6611   -0.46  
Louisa Chirico             13     806   -0.44  
Darya Kasatkina            26    1586   -0.43  
Magdalena Rybarikova       12     725   -0.41  
Anastasija Sevastova       30    1952   -0.40

A few more notable names: Halep, Stephens and Elina Svitolina all count among the next ten lowest, with RAS figures between -0.30 and -0.36. The most “average” player among game’s best is Victoria Azarenka, who rates at -0.08. Venus Williams, Johanna Konta, and Garbine Muguruza make up a notable group of aggressive-but-not-really-aggressive women between +0.15 and +0.20, just outside of the game’s top third, while Maria Sharapova, at +0.63, misses our first list by only a few places.

Unsurprisingly, these results track quite closely to overall Aggression Score figures, as players who adopt a high-risk strategy overall are probably doing the same when facing the serve. This metric, however, allows to identify players–or even single matches–for which the two strategies don’t move in concert. Further, the approach I’ve taken here, to separate and normalize winners and errors, rather than treat them as an undifferentiated mass, could be applied to Aggression Score itself, or to other more targeted versions of the metric, such as a third-shot AS, or a backhand-specific AS.

As always, the more data we have, the more we can learn from it. Analyses like these are only possible with the work of the volunteers who have contributed to the Match Charting Project. Please help us continue to expand our coverage and give analysts the opportunity to look at shot-by-shot data, instead of just the basics published by tennis’s official federations.

Translating ATP Statistics Across Main Tour and Challenger Levels

Italian translation at settesei.it

What is the gap between the top-level ATP Tour and the lower-level ATP Challenger Tour? Some players pile up trophies in the minor leagues yet have a hard time converting that success to match wins on the big tour, while others struggle with the week-to-week grind of the challengers but excel when given opportunities on the larger stage.

Let’s take a look at a method that measures the difference between the skill level on the two tours. Once we can translate stats between levels, we can identify those players who are much better or worse than expected when they have the chance to compete against the best.

The algorithm I’ll use is almost identical to the one baseball analysts have used for decades to determine league equivalencies. For instance, we might find that a batting average of .300 in Triple-A (the highest minor league) is equivalent to .280 in the majors, meaning that, if a player is batting .300 in Triple-A, we’ll expect him to bat .280 in the majors. In tennis terms, it may be that a 10% ace rate in challengers is equivalent to a 8% ace rate on the main tour. Not every player will exhibit that precise drop in performance–some may even appear to get a little better–but on average, a league equivalency tells us what to expect when a player changes levels.

Here is the algorithm for league equivalencies, as applied to men’s tennis:

  1. Pick a stat to focus on. I’ll use Total Points Won (TPW) here.
  2. Neutralize that stat as much as possible. In baseball, that means controlling for the difference in parks; in tennis, it means controlling for competition. For the following, I’ve adjusted for each player’s quality of competition using a method I described about a year ago. Most players’ numbers are about the same after the adjustment, but a particularly easy or tough schedule means a bigger shift. For instance, Denis Shapovalov posted a TPW of 49.8% on the big tour last season, but because he played such high-quality competition, the adjustment bumps him up to 52.1%, 18th among tour regulars.
  3. Identify players who competed at both levels, and find their adjusted stats at each level. Shapovalov played 18 tour-level matches and 30 challenger-level matches last year, with adjusted TPW numbers of 52.1% and 54.4%, respectively.
  4. Calculate the ratio for each player. For Shapovalov last year, it was 1.044 (54.4 / 52.1).
  5. Finally, take a weighted average of every player’s ratio. The weight is determined by the minimum number of matches played at either level, so for Shapovalov, it’s 18. Using the minimum means that a player like Gleb Sakharov (1 ATP match, 37 challenger matches) can be included in the calculation, but has very little effect on the end result.

Here are the results for the last six full seasons. Each ratio is the relationship between challenger-level TPW and tour-level TPW:

Year  Ratio  
2017  1.086  
2016  1.086  
2015  1.098  
2014  1.103  
2013  1.100  
2012  1.100

The average of these yearly equivalency factors is roughly the difference between a 52.5% TPW at challengers and a 48.0% TPW on the main tour. The shift from 2012-15 to 2016-17 may reflect the injuries that have sidelined the elites. With fewer elite players on court, the gap between the two tours narrows.

Now that we know the difference between the levels, we can find the players who defy the usual patterns. Of the 100 players with the most “paired” matches–that is, with the most matches at both levels in the same years–here are the 20 with the lowest ratios. Low ratios mean less difference in performance between the two levels, so these guys are either overperforming at tour level or underperforming at challengers:

Player              ATP M  CH M  Min M  Ratio  
Matthew Ebden          62   140     39  0.982  
Jared Donaldson        68    78     37  1.030  
Jack Sock              81    45     38  1.039  
James Duckworth        53   156     53  1.042  
Andrey Rublev          56    79     42  1.047  
Vasek Pospisil         96    76     60  1.047  
Thiemo De Bakker       48    87     44  1.048  
Samuel Groth           84   133     58  1.049  
Michael Berrer         59   107     56  1.050  
Ruben Bemelmans        41   178     41  1.052  
Dustin Brown          120   173    111  1.055  
Benoit Paire          295    53     53  1.059  
Peter Gojowczyk        46   132     44  1.059  
Michael Russell        58    78     58  1.061  
Marius Copil           58   180     58  1.063  
Taylor Harry Fritz     59    44     41  1.065  
Jordan Thompson        38    88     38  1.066  
Illya Marchenko        56   116     37  1.066  
Tatsuma Ito            65   179     65  1.066  
Ryan Harrison         124    84     59  1.068

The middle columns show the total number of ATP matches, challenger matches, and “paired” matches between 2012 and 2017 (“Min M”) for each player. (The last number gives an indication of just how much data was available for the single-player calculation.) Aside from a few big-serving North Americans near the top of this list, I don’t see a lot of obvious commonalities. There are some youngsters, some veterans, more big servers than not, but nothing obvious.

(Shapovalov doesn’t have enough paired matches to qualify, but his overall ratio is 1.035, good for third on this list.)

Here is the opposite list, the quintile of 20 players who have overperformed at challengers or underperformed on tour:

Player               ATP M  CH M  Min M  Ratio  
Florian Mayer          152    45     45  1.180  
Mikhail Youzhny         91    38     38  1.169  
Aljaz Bedene           144   121     80  1.160  
Filippo Volandri        62   101     62  1.158  
Robin Haase            194    71     71  1.157  
Tobias Kamke           102   144     73  1.155  
Adrian Mannarino       234   115     86  1.155  
Filip Krajinovic        36   167     36  1.148  
Albert Ramos           111    67     62  1.144  
Paul Henri Mathieu     147    96     82  1.141  
Kenny De Schepper       77   196     77  1.140  
Facundo Bagnis          45   197     45  1.136  
Pablo Cuevas           127    52     43  1.136  
Ivan Dodig              76    48     41  1.135  
Santiago Giraldo       146    70     56  1.135  
Paolo Lorenzi          204   191    124  1.135  
Thomaz Bellucci        162    44     44  1.134  
Albert Montanes        113   109     70  1.130  
Rogerio Dutra Silva     57   210     57  1.130  
Lukas Lacko            122   181    108  1.129

There are more clay-courters here than on the first list, and the very top of the ranking includes veterans who have mastered the challenger level, even if they still struggle to maintain a foothold on the main tour. I’ve had to exclude one player who belongs on this list: Gilles Muller broke my algorithm with his 45-9 challenger season in 2014. When I took him out of the 2014 calculations, the overall numbers changed very little, but it means no Muller here. Whatever his exact ratio, I can say that his tour-level performance hasn’t matched that 2014 run at challengers.

The bottoms of the two lists indicate that there isn’t that much variation between players. The middle 60% of players all have ratios between about 1.07 and 1.13, while the yearly averages hover around 1.09 and 1.10. Some players under consideration here have fewer than 50 “paired” matches over the six seasons, so a difference of a couple hundredths is far too little to draw any conclusions.

This algorithm, beyond suggesting what to expect from players when they move up from challengers to the main tour, could apply the same reasoning to other pairs of levels, such as ITF Futures and challengers, or women’s ITFs and the WTA tour. It could even compare narrower levels, such as ITF $10,000 events with ITF $15,000s, or ATP 250s with ATP 500s. The method is a staple of analytics in other sports, and it has a place in tennis, as well.

Big Four Losing Streaks

Italian translation at settesei.it

This is a guest post by Peter Wetz.

Novak Djokovic’s loss against Benoit Paire in his first match at this year’s Miami Masters caused a lot of head scratching. Not only did Benoit equalize his head to head against Novak–next to Hyeon Chung he is now the only active player with a balanced record against Novak; four active players hold positive records–but this was also the Serbian’s third consecutive loss.

Novak immediately made some changes, announcing the end of his partnership with his coach Andre Agassi and part-time coach Radek Stepanek after having worked with them just a few months.

A losing streak of this length by such a dominant player must be rare, and it prompted me to look for similar instances among the big four. The following table shows all three (or more) match losing streaks of the big four after they cracked the top ten in reverse chronological order. The last column shows the Elo-based probability (Prob) of having such a streak. This is simply the product of the probabilities of losing the matches that made up the streak.

Player    Start	        End	Length	Prob
Djokovic  2018-01-15	-*	3	0.002%  (0.027%**)
Murray	  2011-01-17	03-23	4	0.02%
Murray	  2010-03-11	04-11	3	0.63%
Nadal	  2009-11-08	11-22	4	1.89%
Djokovic  2007-10-15	11-12	5	0.07%
Federer	  2002-07-08	08-19	4	0.66%

* Streak still active

** Probability when adjusting Elo ratings due to absence from the tour

The table shows that since August 2002 Roger Federer never lost more than two matches in a row. Even his four match losing streak is the second most likely due to the strong competition he had to face. In November 2009 Rafael Nadal lost four matches in a row, but with a probability far higher than the other streaks. The reason is that three of the four matches occurred at the World Tour Finals, increasing the likelihood of a loss.

A number that stands out is the probability of Novak’s current streak: 0.002%. However, this number is based on traditional Elo ratings which do not take into account player absence, for instance, due to injury. Before this season Novak took a six month break suffering from a shoulder injury.

As has already been discussed, there are ways to adjust Elo ratings for players coming back on the tour. In the case of Maria Sharapova, who stayed absent for 15 months, a 200 point drop in her first five matches after the break was more in line with her level of play than simply assuming that she remained as competitive as before. For this analysis I used a drop of 150 rating points for Novak, which results in a more realistic streak probability of  0.027%, still the second lowest in the list.

This brings us to Andy Murray‘s losing streak of 2011, which most of us probably have already forgotten. After losing the Australian Open final to Novak, Andy lost against Marcos Baghdatis (#20) in Rotterdam, Donald Young (#143) in Indian Wells, and Alex Bogomolov (#118) in Miami. This looks very similar to Novak’s current situation, but Murray bounced back to achieve a 50-9 record for the remainder of the season. It remains to be seen whether Djokovic can do the same.

Peter Wetz is a computer scientist interested in racket sports and data analytics based in Vienna, Austria.

Podcast Episode 22: Davis Cup, Present and Future

Episode 22 of the Tennis Abstract Podcast features a new voice for the show, Austrian tennis fan and occasional guest blogger Peter Wetz. Peter and I delve into the weekend’s Davis Cup action, with particular focus on Rafael Nadal’s return (and the potential threats to his dominance on clay) and the deep rosters of the France and USA squads.

We also consider the ITF’s proposal to radically revamp Davis Cup and discuss what kind of year-end festival of tennis might attract both players and fans. Thanks for listening!

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Update: Here are some timestamps for the topics we covered in this episode, thanks to FBITennis:

Nadal’s return (Davis Cup) and 2018 clay court competition 1:45
Upcoming Davis Cup semifinals 13:50
Lefty advantage 26:30
Davis Cup: Austria vs Russia 36:10
Proposals to change Davis Cup 44:50

Houston and the Swarm of American Men

Italian translation at settesei.it

Of the 28 men in the ATP Houston main draw this week, 15 have a “USA” next to their names. The Americans include three of the top four seeds (John Isner, Sam Querrey, and Jack Sock), two of the four qualifiers (Stefan Kozlov and Denis Kudla) and one of the three wild cards (Mackenzie McDonald). The home-country dominance at the US Clay Court Championships hearkens back to earlier eras of professional tennis, when a few countries–the USA often first among them–dominated the ranks.

Those days are long gone, but this week’s turnout in Texas is the latest sign of an American resurgence. Sure, many top players are taking the week off, and plenty of European contenders opted for a similarly valuable event in Marrakech, so US players hardly represent half of the best ATPers. But 15 of 28–a main draw made up of such a high percentage of USAs–is something the tennis world hasn’t seen in a long time.

Going back five decades, there have been just over 400 ATP-level tournaments in which one country represented more than half of main draw entrants–an average of about eight events per year. The average is misleading, though: Houston is the first time it has happened since 2004, and there are only two previous instances in the last two decades. To find another draw so packed with Americans, we need to go back to 1996. Here are the last 20 tournaments in which one country represented more than half of main draw players:

Date      Tourney         Draw  Country  Count      %  
20040412  Valencia        32    ESP         20  62.5%  
19990913  Mallorca        32    ESP         18  56.3%  
19970908  Marbella        32    ESP         18  56.3%  
19960930  Marbella        32    ESP         18  56.3%  
19960212  San Jose        32    USA         17  53.1%  
19951002  Valencia        32    ESP         18  56.3%  
19950206  San Jose        32    USA         18  56.3%  
19940214  Philadelphia    32    USA         18  56.3%  
19940131  San Jose        32    USA         17  53.1%  
19930802  Los Angeles     32    USA         17  53.1%  
19930201  San Francisco   32    USA         19  59.4%  
19920803  Los Angeles     32    USA         17  53.1%  
19910708  Newport         32    USA         17  53.1%  
19910506  Charlotte       32    USA         17  53.1%  
19910401  Orlando         32    USA         20  62.5%  
19900730  Los Angeles     32    USA         19  59.4%  
19900507  Kiawah Island   32    USA         24  75.0%  
19900402  Orlando         32    USA         17  53.1%  
19900219  Philadelphia    48    USA         27  56.3%  
19900212  Toronto Indoor  56    USA         30  53.6%

The four most recent tournaments took place in three different places, but were instances of the same event. The rest of the draws on this list suggest just how many good tennis players were produced in that era by the United States. In about 85% of the tournaments in which one country made up half or more of the field, the dominant nation was the USA. Australia accounts for another 50, all at tournaments in Oz, most of them before 1980. The US is the only country to fill up more than half of a draw outside of its own borders.

What makes this week’s feat in Houston even more remarkable is that the tournament’s organizers gave only one of the three wild cards to a local player. (The other two went to 4th seed Nick Kyrgios, who didn’t bother to enter via conventional means, and fan favorite Dustin Brown.) In other words, Americans would have accounted for half of the draw even without the aid of wild cards.

This more specialized feat–non-wild cards from one country accounting for half of the draw–is even rarer over the last 25 years or so. Of the 20 tournaments listed above, only nine met this more rigorous standard. The other 11 only cleared the bar with the aid of wild cards. The 2004 Valencia tournament still qualifies, but for the most recent instance on American soil, we need to go back more than 25 years, to the 1993 event in San Francisco. That tourney had good reason to retain at least one wild card for a foreigner, as the organizers managed to attract Bjorn Borg. Borg lost in the first round, and Andre Agassi took the championship with a final-round win over Brad Gilbert.

It remains to be seen whether the sheer force of numbers will be enough to keep the Houston title in American hands. (Steve Johnson, the sixth seed this week, won it last year.) Of the 400-plus events with more than half of players representing the same flag, the winner has come from that dominant country about 73% of the time. My model suggests it is a toss-up this week, with a 48.9% probability that a US player wins it all. One of the favorites, however, is Australian Nick Kyrgios, with nearly a 45% chance of winning himself. One dark horse is the most interesting of all: Fifth seed Fernando Verdasco won this event four years ago. And fourteen years ago in Valencia, the last time one country made up more than half of an ATP draw, Verdasco was the man who hoisted the trophy.

Feast, Famine, and Sloane Stephens

Italian translation at settesei.it

Last week, Sloane Stephens reeled off an impressive series of victories, defeating Garbine Muguruza, Angelique Kerber, Victoria Azarenka, and Jelena Ostapenko to secure the title at the WTA Premier Mandatory event in Miami.  The trophy isn’t quite as life-changing as the one she claimed at the US Open last September, but it’s a close second, and the competition she faced along the way was every bit as good.

The Miami title comes with 1,000 WTA ranking points, and by adding those to her previous tally, Stephens moved into the top ten, reaching a career high No. 9 on Monday. With two high-profile championships to her name, not to mention semifinal showings last summer in Toronto and Cincinnati, there’s little doubt she deserves it. Elo isn’t quite convinced, but its more sophisticated algorithm (and its disregard for the magnitude of the US Open and Miami titles) puts her within spitting distance of the top ten as well.

What makes Stephens’s rise to the top ten so remarkable is her efficiency in converting wins to ranking points. Since her return from injury at Wimbledon last year, she has played only 38 matches, winning 24 of them. She has suffered six first-round losses, plus two more defeats at last year’s Zhuhai Elite Trophy round-robin and another pair in the Fed Cup final against Belarus. All told, in the last nine months, she has won matches at only six different events. Her unusual record illustrates some of the quirks in the ranking system, and how a player who peaks at the right times can exploit them.

24 wins is almost never enough for a spot in the vaunted top ten. From 1990 to 2017, a player has finished a season with a top-ten ranking only seven times while winning fewer than 30 matches. Only two of those involved fewer wins than Sloane’s 24: Monica Seles‘s 1993 and 1995, the timespans leading up to her tragic on-court stabbing and following her eventual comeback. Here are the top-ten seasons with the fewest victories, including the last 52 weeks of a few players currently near the top of the WTA table:

Year  Player              YE Rk   W   L  W-L %  
1995  Monica Seles*           1  11   1    92%  
1993  Monica Seles            8  17   2    89%  
2018  Sloane Stephens**       9  24  14    63%  
2010  Serena Williams         4  25   4    86%  
1993  Jennifer Capriati       9  28  10    74%  
2015  Flavia Pennetta         8  28  20    58%  
2000  Mary Pierce             7  29  11    73%  
2004  Jennifer Capriati      10  29  12    71%  
1993  Mary Joe Fernandez      7  31  12    72%  
1995  Iva Majoli              9  31  13    70%  
2018  Venus Williams**        8  31  14    69%  
1995  Mary Joe Fernandez      8  31  15    67%  
2015  Lucie Safarova          9  32  21    60%  
2008  Maria Sharapova         9  33   6    85%  
1998  Steffi Graf             9  33   9    79%  
2018  Petra Kvitova**        10  33  14    70%

* ranking frozen after her assault

** rankings as of April 2, 2018; wins and losses based on previous 52 weeks

What almost all of these seasons have in common is exceptional performances at grand slams. Sloane won the US Open; Seles won the 1993 Australian; Serena Williams won a pair of majors in 2010; Flavia Pennetta capped an otherwise anonymous 2015 campaign with a title in New York. The slams are where the rankings points are.

Even within this group of slam successes, Sloane stands out. Of the 16 players on that list, only two–Pennetta and Lucie Safarova–won matches at a lower rate than Stephens has since her comeback. In other words, most women who are this efficient with their victories don’t lose quite so early or often at lesser events.

That 63% won-loss record is even more extreme than the above list makes it look. Of the nearly 300 year-end top-tenners since 1990, only eight finished the season with a lower win rate. Here’s that list, expanded to the top 11 to include another noteworthy recent season:

Year  Player              YE Rk   W   L  W-L %  
2014  Dominika Cibulkova     10  33  24    58%  
2000  Nathalie Tauziat       10  36  26    58%  
2015  Flavia Pennetta         8  28  20    58%  
1999  Nathalie Tauziat        7  37  25    60%  
2007  Marion Bartoli         10  47  31    60%  
2015  Lucie Safarova          9  32  21    60%  
2000  Anna Kournikova         8  47  29    62%  
2010  Jelena Jankovic         8  38  23    62%  
2018  Sloane Stephens*        9  24  14    63%  
2004  Elena Dementieva        6  40  23    63%  
2016  Garbine Muguruza        7  35  20    64%

* ranking as of April 2, 2018; wins and losses based on previous 52 weeks

There’s not much overlap between these lists; the first group generally missed some time, then made up for it by scoring big at slams, while the second group slogged through a long season and leveled up with a strong finish or two at a major. The typical player with a 63% winning percentage doesn’t end up in the top ten: She wraps up the season, on average, in the mid-twenties. At least that’s better than the average 24-win season: Those result in year-end finishes near No. 40.

Stephens has always been a big-match player: She made an early splash at the 2013 Australian Open, reaching the semifinals and upsetting Serena as a 19-year-old, and her overall career record at majors (66%) is nearly ten percentage points higher than her record at other tour events (57%). For all that, she will probably not conclude 2018 with such a extreme set of won-loss numbers. To do so, she’d probably need to win a major to replace her 2017 US Open points while losing early at most other events. Recovered from injury, Stephens may maintain her feast-or-famine ways to some degree, but it’s unlikely she’ll continue to display such extreme peaks and valleys.

Should Serena Be Seeded?

Italian translation at settesei.it

Serena Williams returned to professional tennis this month after more than a year of pregnancy, childbirth, and recovery. She took wild cards into both Indian Wells and Miami, competing as an unseeded player for the first time since August 2011. In her initial effort in California, she reached the third round before falling to sister Venus, and this week in Miami, she drew Indian Wells champ Naomi Osaka in her opening match and went home early, losing 6-3 6-2.

Seeing Serena without a number next to her name feels wrong. She left the tour for maternity leave just after winning last year’s Australian Open, a title that moved her back into the No. 1 ranking position. While she is clearly rusty–as she has been after previous absences–there’s little doubt she’ll quickly resume competing at a top-32 level (the threshold for an Indian Wells or Miami seed), if not considerably higher.

The brutal Miami draw and Serena’s ensuing early exit prompted all sorts of commentary, much of it calling for a rule change, some castigating the WTA for its lack of a maternity leave policy. The latter is not quite true: The WTA rulebook addresses absences for childbirth and treats returning players almost exactly as it handles women coming back from injury. Nevertheless, edge cases–like the greatest player in women’s tennis rejoining the tour without a single ranking point to her name–tend to put rules to the test.

Seedings are not just a convenient way to identify the top players on a printed bracket. They have an effect on the outcome of the tournament. In the March tournaments, seeded players get free passes to the second round. At every event, the seeding system keeps top players away from each other until the final rounds. Even minor differences, like the one between the fourth and fifth seeds, can have a major effect on two players’ potential routes to the title. This is all to say: Seedings matter, not just to returning players like Serena, but also to everyone else in the draw. While granting a seed to Williams right now may be the right thing to do, it would also push another seeded player into the unseeded pool, affecting that competitor’s chances at late-round ranking points and prize money. It’s important to acknowledge how the rules affect the entire field.

In a moment, I’ll outline various approaches the WTA could take to deal with future maternity leaves. I don’t have a strong opinion; there’s merit in each of them, as I’ll try to explain. What is most important to me, as a fan, is that any rules adopted are designed for the benefit of the whole tour, not just patches to handle once-in-a-generation superstars. Serena deserves a fair shake from the WTA, and her peers are entitled to the same.

1. Minor tweaks to the existing rule. The most likely outcome is almost always the status quo, and Osaka notwithstanding, the status quo is not that bad. The WTA rules allow for returning players (whether from injury or motherhood) to use a “Special Ranking” (SR) in eight events, including two slams.  The SR is the player’s ranking at the time she left the tour, and it determines whether she qualifies to enter tournaments upon her return. While Serena used wild cards for her two events thus far (more on that later), she could have used her SR for either or both.

In other words, new mothers are already allowed to pick up where they left off … with the important exception of seeding. Serena’s SR will allow her to enter, say, the French Open as if she were the No. 1 ranked player, but unless Roland Garros invokes their right to tweak seedings (like Wimbledon does), her seed will be determined by her actual ranking at that time. Since it’s only two months away, it’s very possible she’ll be unseeded there as well, making possible another nasty first-round matchup in the vein of the Simona HalepMaria Sharapova opener at last year’s US Open.

The debate over seeding boils down to “respect” versus “practicality.” Serena’s achievements and her probable quick return to greatness suggest that she “deserves” to be seeded as such. On the other hand, many players (including Sharapova, different as her situation is) have had a hard time returning to their previous level. The post-comeback results of Sharapova or, more recently, Novak Djokovic, indicates that a star’s ranking 12 months ago might not tell you much about how she’ll play now. Seedings exist partly to induce top players to compete, but also to increase the likelihood that the best women will face each other in the final rounds. By the latter criterion, it’s not clear that Serena (or any returning player) should immediately reclaim a top seed.

If the WTA does stick with this basic principle, I would suggest offering a few more SR entries–perhaps 12 instead of 8, and 3 slams instead of 2. Maternity leave necessitates more time on the sidelines than the six-month injury break required to qualify for the SR rule, and it may require still more time to return to form. The WTA might also convince the ITF to offer an additional few SR entries to lower-level events. Kei Nishikori came back from injury by playing a couple of Challengers; women might prefer to get their feet wet with a few ITF $100Ks before using their SR entries on top-tier events.

2. Link seeding to Special Rankings. The second option is essentially what fans wanted when they realized Serena might not make it to the Miami second round. Instead of using current ranking to determine seeding, tournaments could use SR for players who used them to enter the event.

There is a precedent for this: Monica Seles was given a top seeding when she returned from injuries sustained during her 1993 on-court stabbing. More than two years later, she came back as the top seed in Canada and the second seed at the 1995 US Open, where she lived up to that draw placement, winning 11 matches in a row before falling to Steffi Graf in the New York final.

The pros and cons of this route are the opposite of the first proposal. Giving players their pre-break seeding would show respect for their accomplishments, but since most players don’t come back from any length time off court the way Seles did, it’s possible the seedings would appear overly optimistic. (And yes, I realize the irony of saying so during the 2018 Miami tournament, when the top two seeded women won only one match between them.)

3. Devise a time-off-court algorithm. Players usually need some time to resume their former level, but their skill upon return has some relationship to how they played before. When I wrote about Sharapova’s return from her drugs ban last year, I showed that elite players who missed a year or more (for whatever reason), tended to play much worse than their pre-break level for their first five or so matches, and then a moderately lower level for the next 50. I measured it in Elo points: a 200 point drop at first, then a 100 point drop.

I don’t expect the WTA to adopt Elo anytime soon, but an algorithm of this sort could be based on any ranking system, and it represents a reasonable compromise between the first two positions. For someone as dominant as Serena, it would fulfill most of her fans’ wishes: A 200-point drop from her pre-break level would still leave her roughly even with Halep, meaning that a system of this sort would’ve made her the first or second seed in this month’s draws.

A better illustration of how the algorithm would work requires a player who didn’t so overwhelmingly outclass the rest of the field: If Wozniacki (current Elo: 2156) were to miss the next year, her seeding upon return would use an Elo 200 points lower, of 1956, dropping her to about 30th (assuming all the top players were competing). After the first five matches, when players usually start getting their groove back, her seedings would rise to around 15th. Several months in, her ranking would rise, and her seeding would no longer need to be adjusted.

The obvious flaw here is the level of complexity. My algorithm is approximate at best and would need to be improved for such an important role. The advantage, though, is that if an acceptable formula could be found, it would allow the WTA to offer a perfect compromise between the needs of returning mothers and the rights of the rest of the field.

And about those wild cards… 

I’ve mentioned that Serena used wild cards to enter both Indian Wells and Miami, even though she could have used her Special Ranking. Just about every WTA event would happily hand her a wild card, as they should. So in Serena’s case, the SR rule is largely irrelevant–if it didn’t exist, she could immediately resume a full schedule.

I also wrote that, as a fan, what matters to me is that all tour players are treated equally. Tournament entries are opportunities to gain ranking points, which in turn determine entries and seeding, which affects the likelihood of racking up wins and titles. Wild cards are often thought of as gifts, but we rarely acknowledge the effect that those gifts have on the players who rarely get them. Because tournaments understandably tend to hand out free passes to home-country players (like Donald Young) and marquee personalities (like Eugenie Bouchard), the wild card system introduces systemic bias into rankings and results. Wild cards can’t make a journeyman into a superstar, but they can boost a player from the top 200 to the top 100, or from No. 70 to No. 50. For some tour players, these differences really matter.

Thus, when a superstar or media darling–or just a player from a country that happens to host a lot of tournaments, like the United States–returns from maternity leave, injury, or a suspension, the regular rules don’t apply. Maria Sharapova was wild carded into most of the tournaments she wanted to play last year, while Sara Errani has spent the last six months playing ITFs, $125Ks, and qualifying. Sharapova gets to play matches with 100 ranking points at stake while Errani contests entire tournaments with less on the line.

Wildly different as their cases are, Serena’s situation with regard to wild cards is the same as Sharapova’s. Her allotment of SR entries doesn’t matter. But imagine if, say, Anastasija Sevastova or Magdalena Rybarikova took time off to have a child. They might get a few free entries into European international-level events, or maybe a wild card into a tournament they’ve previously won. But for the most part, a Sevastova or a Rybarikova–despite taking her hypothetical absence while a top-20 player–would be jealously protecting her eight SRs. She would need them.

Just to be clear, I’m not trying to say that Serena doesn’t “deserve” all the wild cards she’s going to get. Her achievements make it obvious that she does. On a tour where events can award draw places at their discretion, no one deserves them more. However, the very existence of those discretionary spots means that maternity leave means something very different for Serena than it would for the more anonymous players near the top of the WTA rankings.

How about this proposal, then: For players coming back from maternity leave, expand the number of SR entries from 8 to 12, and tack on another four free entries to ITFs, so that returning players can have a child knowing that they’ll be able to compete at the top level for nearly a season once they come back. But–they may accept no wild cards during that time. If they take a wild card, they lose their SRs. That proposal would put all players on an even keel: Close to a year of tournament entries at their pre-break ranking. It would give the next Serena-level superstar plenty of time to regain her lost status, and best of all, it would do the same for her lesser-known peers.

Podcast Episode 21: Talking About Talking About Tactics

Episode 21 of the Tennis Abstract Podcast, with Carl Bialik of the Thirty Love podcast, is our attempt to reconcile discussions of tennis tactics with those of tennis analytics. We start with our interest in the talented and crafty Alex De Minaur, then cover the oft-ignored tactical skills of Jo Wilfried Tsonga, the usefulness of tactical tips in the amateur game, and the constraints that coaches face when turning their observations into insights that players can use.

As a bonus, we also touch on this year’s new incentive to play doubles at Indian Wells, the crazy new proposal to overhaul Davis Cup, and the recent spate of tennis movies, especially Battle of the Sexes. In keeping with Jeff’s recent return to the U.S., it’s a super-sized episode, clocking in at just under 95 minutes.

Finally, we hope our audio quality is continuing to improve. We’re both now using proper mics, and the difference is, well, audible. Thanks for listening!

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Update: Thank you to FBITennis for outlining this episode, with timestamps:

Alex De Minaur 1:38
Tsonga Decisionmaking 8:10
Pesky/Smart Players 12:30
What are “tactics”? How to analyze? 15:00
Testing effectiveness of shot patterns 20:43
DIY Tactics 27:00
Pros and Coaching Advice 34:40
Which analytics might be useful to pros? 39:58
Indian Wells Singles/Doubles Bonus 1:02:00
Reformatting Davis Cup 1:05:19
Tennis Movies 1:18:00

ATP Streaks of 2017

Italian translation at settesei.it

This is a guest post by Peter Wetz.

With the recent update of Jeff’s ATP and WTA GitHub repositories, we can take a look at notable streaks that happened in 2017. In this post I show matches won/lost and tiebreaks won/lost streaks of the 2017 ATP tour.

Let’s start with matches won:

Name               Start   End     Length
Rafael Nadal       04-17   05-15   17
Rafael Nadal       08-28   10-09   16
Roger Federer      06-19   08-07   16
Roger Federer      10-09   11-13   13
Roger Federer      03-06   03-20   12
Alexander Zverev   07-31   08-07   10
Rafael Nadal       05-29   07-03   10
Stan Wawrinka      05-22   05-29   10
Grigor Dimitrov    01-02   01-16   10

We see that, as far as streaks are concerned, the 2017 season was dominated by Roger Federer and Rafael Nadal. Rafa’s streak of 17 wins, which was halted by Dominic Thiem in the Rome quarterfinal, is the only streak containing three back-to-back tournament wins. Besides Roger and Rafa, only Alexander Zverev won two tournaments back-to-back.

When we talk about the less glamorous category of losing streaks, two names immediately should pop into our minds: Vincent Spadea and Donald Young. The former holds the record of 21 consecutive matches lost* on ATP level, and the latter holds one of the longer losing streaks (17 matches lost in a row) in recent years.

During the 2017 ATP season no player came close to any of these marks, but still there were a few moments where players seemed to have forgotten how to win a match. The following list shows all players with 8 or more consecutive matches lost.

Name                  Start   End     Length
Pablo Cuevas          05-29   10-23   10
Maximilian Marterer   02-06   08-28   10
Paolo Lorenzi         08-28   10-30    8 
Malek Jaziri          03-20   07-03    8 
Daniil Medvedev       07-31   10-09    8 
Stefanos Tsitsipas    02-13   10-02    8 

Regarding Maximilian Marterer‘s streak of 10 matches lost, we have to mention that he played a good season at the Challenger level. In between his losses at the ATP level there were deep runs at various Challenger tournaments. Still, it must be frustrating to lose your first round main draw match every time after having successfully gone through qualies. This fact accounts for 7 of his 10 losses at ATP main draws (the other 3 coming from entries as a wild card). Pablo Cuevas, the other player having lost 10 matches in a row last season, on the other hand, achieved a real losing streak with no Challenger level wins hidden among them.

Winning tiebreaks has been discussed on this blog a lot. One of the conclusions was that in the past three players–Roger Federer, Rafael Nadal, and John Isner–consistently outperform their tiebreak expectations. The list of consecutive tiebreaks won in 2017 supports this statement as can be seen in the following table.

Name           Start   End     Length
John Isner     05-15   05-29   11   
Roger Federer  06-19   08-07   8 
Roger Federer  03-06   03-20   8 
(Many tied)                    7

John Isner’s streak went over the course of 8 matches including 2 matches he lost, whereas Roger won all the matches in which he won the tiebreaks contributing to his streaks.

The list of consecutive tiebreaks lost looks as follows.

Name            Start   End     Length
Lucas Pouille   07-03   10-09   12 
Florian Mayer   01-02   07-03   11
Dusan Lajovic   03-06   07-24   8

Lucas Pouille holds the crown for most tiebreaks lost in a row in 2017. In fact he got really close to Robin Haase‘s infamous run of 13 tiebreaks lost.

Finally, I want to present an odd 2017 achievement by Nick Kyrgios: He is the only player ever to lose three matches in a row by retirement.

Date   Tourney   Matchup                           Result
07-31  W'ington  Nick Kyrgios vs Tennys Sandgren   3-6 0-3 RET
07-03  Wimby     Nick Kyrgios vs Pierre H Herbert  3-6 4-6 RET
06-19  London    Nick Kyrgios vs Donald Young      6-7(3) 0-0 RET

The list of 2018 streaks is shaping up nicely already: Doubles partners Oliver Marach and Mate Pavic opened their season with 17 straight wins, including three titles, finally ending with a loss in the Rotterdam final to Pierre-Hugues Herbert and Nicolas Mahut. Marach can even claim an 18-match streak, since he won a Davis Cup match for Austria last month while pairing with Philipp Oswald. Doubles data is tougher to come by, but it’s safe to say that the season-opening run for Marach/Pavic will have a prominent place in any summaries of this year’s ATP streaks.

 

* The list excludes one loss of Vincent Spadea at the 1999 World Team Cup in Düsseldorf, where he lost to Rainer Schüttler 5-7, 6-3, 1-6.

Peter Wetz is a computer scientist interested in racket sports and data analytics based in Vienna, Austria.

Trivia: Deja Vu All Over Again

Italian translation at settesei.it

In the last several days, Fernando Verdasco has seen a little too much of Diego Schwartzman. On Sunday in Rio de Janeiro, the two players met in the final of the 500-level clay court event, which Schwartzman won in straight sets. Both players immediately headed for the hard court tournament in Acapulco, where they drew each other in the first round. Verdasco lost again, this time winning six games instead of five.

The odds of this sort of final-to-first-round scenario, with back-to-back matches against the same opponent, is quite rare, and the surface switch makes this one even more unlikely. For one thing, the tour doesn’t move from one court type to another very frequently, and when they do, players don’t always travel through the same sequence of events. Another cause of improbability is that a pair of players who contest a final are usually pretty good, meaning that both of them are often seeded at their next event, making a first-round meeting impossible. In order to see a pair of consecutive matches like Schwartzman’s and Verdasco’s, we require synchronized schedules and a hefty helping of luck.

As Carl Bialik pointed out, this isn’t the first time Verdasco has played back-to-back matches in February against the same opponent, albeit on the same surface: He did so in 2011, dropping the San Jose final and then a Memphis first-rounder to Milos Raonic. Remarkably, when we broaden the search a bit, Verdasco’s name comes up twice more. In 2009, he lost to Radek Stepanek in the Brisbane final, then in his next event, the Australian Open, he beat Stepanek in the third round. (Radek played Sydney in the meantime, for what it’s worth.) And five years later, Verdasco overcame Nicolas Almagro to win the 2014 Houston title, then faced his countryman in his next event two weeks later, losing to Almagro in the round of 16. (Again, while they were back-to-back tourneys for Verdasco, Nico squeezed in a few matches in Monte Carlo in between.)

Back to the matter at hand: In the course of five decades of Open Era men’s tennis, just about everything has happened at least once before. But this exact scenario–two guys facing each other in a final, then a first round match the very next week on a different surface–is a new one. Relax any one of those constraints, and we see a few instances in the past.

Since 1970, there have been about 3,750 tour-level finals. Roughly one-third of the time, the two finalists ended up playing each other at least once more over the course of the season. 197 of those pairs drew each other in their very next event, and in another 62 of the finals, one of the players faced the other in his next tournament (though the other had played an event or two in the meantime, like Almagro and Stepanek). Several of the 197 duos played each other the next week, though it is a bit more common that there was a week off in between.

Of the 197 finalist pairs, 25 of them drew each other in the round of 32 or earlier in their following tournament, though not all of those were first-round matches. (Or, in the case of Andy Murray and Philipp Kohlschreiber in 2015 after contesting the Munich final, they played in Murray’s first Madrid match the following week but not Kohlschreiber’s, since Murray had a bye.) The most common round in which finalists met again was another final, which ensued about one-third of the time.

Dividing up the 197 pairs a different way, about one-fifth (39) played the follow-up match on a different surface. In only a few of these instances were the two surfaces hard and clay; a disproportionate number of these back-to-back matches happened in the 1970s and early 1980s, when carpet was regular feature on tour, so the hard-to-carpet or carpet-to-hard transition shows up in these results much more frequently than hard-to-clay or clay-to-hard. For any pair of surfaces in these 39 matches, only three occured in the round of 32, and none in the round of 64 or 128.

The three precedents for Schwartzman’s back-to-back wins all have several things in common. First, like Diego’s feat, the same player won both matches. The other two are unlike the Schwartzman double: In each case, there was a one-week break between the tournaments and one of the events was played on carpet.

The first similar achievement was recorded by Tom Gorman, who won consecutive matches against Bob Carmichael in 1976. The first was the Sacramento final (on carpet), followed by the first round in Las Vegas (on hard). Next up was Martin Jaite‘s pair of wins over Javier Sanchez in 1989. After triumphing in the Sao Paulo final (on carpet), Jaite won a hard-court first-rounder against the same opponent two weeks later. Finally, Fernando Gonzalez defeated Jose Acususo twice in a row in 2002, first in the clay-court final in Palermo, then a bit more than a week later on carpet in the first round in Lyon.

Like Schwartzman and his three closest predecessors, most of the finalists managed to defend their victory. Of the different-surface instances, the same player won both matches 26 of 39 times. When the two matches took place on the same surface, the title winner won the next match 101 of 158 times. Most recently, Yuichi Sugita failed to do so: After beating Adrian Mannarino for his first tour-level title in Antalya last summer, he met the Frenchman again in the Wimbledon second round and lost. In a more notable exception, Andre Agassi knocked out Petr Korda for the 1991 Washington title, then lost to Korda in his first match the next week in Montreal. (It wasn’t Korda’s first match, as he didn’t get a bye like Agassi did, but the extra effort paid off. The Czech reached the final.)

We could wait fifty years for an exact parallel of Schwartzman’s feat. Or we could set the bar a little lower and see a rematch almost immediately: Another of last week’s finalist pairs, Lucas Pouille and Karen Khachanov, followed up their Marseille title match with another meeting in the Dubai second round only three days later. Regardless of which standard you choose, there’s one person who would surely prefer to take a break from consecutive matches against the same opponent, and that’s Fernando Verdasco.