The Negative Impact of Time of Court

With 96 men’s matches in the books so far at Roland Garros this year, we’ve seen only one go to the absolute limit, past 6-6 in the fifth set. Still, we’ve had our share of lengthy, brutal five-set fights, including three matches in the first round that exceeded the four-hour mark. The three winners of those battles–Victor Estrella, David Ferrer, and Rogerio Dutra Silva–all fell to their second-round opponent.

A few years ago, I identified a “hangover effect” after Grand Slam marathons, defined as those matches that reach 6-6 in the fifth. Players who emerge victorious from such lengthy struggles would often already be considered underdogs in their next matches–after all, elite players rarely need to work so hard to advance–but marathon winners underperform even when we take their underdog status into account. (Earlier this week, I showed that women suffer little or no hangover effect after marathon third sets.)

A number of readers suggested I take a broader look at the effect of match length. After all, there are plenty of slugfests that fall just short of the marathon threshold, and some of those, like Ferrer’s loss yesterday to Feliciano Lopez, 6-4 in the final set, are more physically testing than some of those that reach 6-6. Match time still isn’t a perfect metric for potential fatigue–a four-hour match against Ferrer is qualitatively different from four hours on court with Ivo Karlovic–but it’s the best proxy we have for a very large sample of matches.

What happens next?

I took over 7,200 completed men’s singles matches from Grand Slams back to 2001 and separated them into groups by match time: one hour to 1:29, 1:30 to 2:00, and so on, up to a final category of 4:30 and above. Then I looked at how the winners of all those matches fared against their next opponents:

Prev Length   Matches  Wins  Win %  
1:00 to 1:29      448   275  61.4%  
1:30 to 1:59     1918  1107  57.7%  
2:00 to 2:29     1734   875  50.5%  
2:30 to 2:59     1384   632  45.7%  
3:00 to 3:29      976   430  44.1%  
3:30 to 3:59      539   232  43.0%  
4:00 to 4:29      188    64  34.0%  
4:30 and up        72    23  31.9%

The trend couldn’t be any clearer. If the only thing you know about a Slam matchup is how long the players spent on court in their previous match, you’d bet on the guy who recorded his last win in the shortest amount of time.

Of course, we know a lot more about the players than that. Andy Murray spent 3:34 on court yesterday, but even with his clay-court struggles this year, we would favor him in the third round against most of the men in the draw. As I’ve done in previous studies, let’s account for overall player skill by estimating the probability of each player winning each of these 7,200+ matches. Here are the same match-length categories, with “expected wins” (based on surface-specific Elo, or sElo) shown as well:

Prev Length   Wins  Exp Wins  Exp Win %  Ratio  
1:00 to 1:29   275       258      57.5%   1.07  
1:30 to 1:59  1107      1058      55.2%   1.05  
2:00 to 2:29   875       881      50.8%   0.99  
2:30 to 2:59   632       657      47.5%   0.96  
3:00 to 3:29   430       445      45.6%   0.97  
3:30 to 3:59   232       244      45.3%   0.95  
4:00 to 4:29    64        77      41.2%   0.83  
4:30 and up     23        30      42.1%   0.76

Again, there’s not much ambiguity in the trend here. Better players spend less time on court, so if you know someone beat their previous opponent in 1:14, you can infer that he’s a very good player. Often that assumption is wrong, but in the aggregate, it holds up.

The “Ratio” column shows the relationship between actual winning percentage (from the first table) and expected winning percentage. If previous match time had no effect, we’d expect to see ratios randomly hovering around 1. Instead, we see a steady decline from 1.07 at the top–meaning that players coming off of short matches win 7% more often than their skill level would otherwise lead us to forecast–to 0.76 at the bottom, indicating that competitors tend to underperform following a battle of 4:30 or longer.

It’s difficult to know whether we’re seeing a direct effect of time of court or a proxy for form. As good as surface-specific Elo ratings are, they don’t capture everything that could possibly predict the outcome of a match, especially micro-level considerations like a player’s comfort on a specific type of surface or at a certain tournament. sElo also needs a little time to catch up with players making fast improvements, particularly when they are very young. All this is to say that our correction for overall skill level will never be perfect.

Thus, a 75-minute win may improve a player’s chances by keeping him fresh for the next round … or it might tell us that–for whatever reason–he’s a stronger competitor right now than our model gives him credit for. One point in favor of the latter is that, at the most extreme, less time on court doesn’t help: Players don’t appear to benefit from advancing via walkover. That isn’t a slam-dunk argument–some commentators believe that walkovers could be detrimental due to the long resulting layoff at a Slam–but it does show us that less time on court isn’t always a positive.

Whatever the underlying cause, we can tweak our projections accordingly. Murray could be a little weaker than usual tomorrow after his length battle yesterday with Martin Klizan. Albert Ramos, the only man to complete a second-rounder in less than 90 minutes, might be playing a bit better than his rating suggest. It’s certainly evident that match time has something to tell us even when players aren’t stretched to the breaking point of a marathon fifth set.

Angelique Kerber’s Unclutch Unforced Errors

It’s been a rough year for Angelique Kerber. Despite her No. 1 WTA ranking and place at the top of the French Open draw, she lost her opening match on Sunday against the unseeded Ekaterina Makarova. Adding insult to injury, the loss goes down in the record books as a lopsided-looking 6-2 6-2.

Andrea Petkovic chimed in with her diagnosis of Kerber’s woes:

She’s simply playing without confidence right now. It was tight, even though the scoreline was 2 and 2 but everyone who knows a thing about tennis knew that Angie made errors whenever it mattered because she’s playing without any confidence right now – errors she didn’t make last year.

This is one version of a common analysis: A player lost because she crumbled on the big points. While that probably doesn’t cover all of Kerber’s issues on Sunday–Makarova won 72 points to her 55–it is true that big points have a disproportionate effect on the end result. For every player who squanders a dozen break points yet still wins the match, there are others who falter at crucial moments and ultimately lose.

This family of theories–that a player over- or under-performed at big moments–is testable. For instance, I showed last summer that Roger Federer’s Wimbledon loss to Milos Raonic was due in part to his weaker performance on more important points. We can do the same with Kerber’s early exit.

Here’s how it works. Once we calculate each player’s probability of winning the match before each point, we can assign each point a measure of importance–I prefer to call it leverage, or LEV–that quantifies how much the single point could effect the outcome of the match. At 3-0, 40-0, it’s almost zero. At 3-3, 40-AD in the deciding set, it might be over 10%. Across an entire tournament’s worth of matches, the average LEV is around 5% to 6%.

If Petko is right, we’ll find that the average LEV of Kerber’s unforced errors was higher than on other points. (I’ve excluded points that ended with the serve, since neither player had a chance to commit an unforced error.) Sure enough, Kerber’s 13 groundstroke UEs (that is, excluding double faults) had an average LEV of 5.5%, compared to 3.8% on points that ended some other way. Her UE points were 45% more important than non-UE points.

Let’s put that number in perspective. Among the 86 women for whom I have point-by-point UE data for their first-round matches this week*, ten timed their errors even worse than Kerber did. Magdalena Rybarikova was the most extreme: Her eight UEs against Coco Vandeweghe were more than twice as important, on average, as the rest of the points in that match. Seven of the ten women with bad timing lost their matches, and two others–Agnieszka Radwanska and Marketa Vondrousova–committed so few errors (3 and 4, respectively), that it didn’t really matter. Only Dominika Cibulkova, whose 15 errors were about as badly timed as Kerber’s, suffered from unclutch UEs yet managed to advance.

* This data comes from the Roland Garros website. I aggregate it after each major and make it available here.

Another important reference point: Unforced errors are evenly distributed across all leverage levels. Our instincts might tell us otherwise–we might disproportionately recall UEs that came under pressure—-but the numbers don’t bear it out. Thus, Kerber’s badly timed errors are just as badly timed when we compare her to tour average.

They are also poorly timed when compared to her other recent performances at majors. Petkovic implied as much when she said her compatriot was making “errors she didn’t make last year.” Across her 19 matches at the previous four Slams, her UEs occurred on points that were 11% less important than non-UE points. Her errors caused her to lose relatively more important points in only 5 of the 19 matches, and even in those matches, the ratio of UE leverage to non-UE leverage never exceeded 31%, her ratio in Melbourne this year against Tsurenko. That’s still better than her performance on Sunday.

Across so many matches, a difference of 11% is substantial. Of the 30 players with point-by-point UE data for at least eight matches at the previous four majors, only three did a better job timing their unforced errors. Radwanska heads the list, at 16%, followed by Timea Bacsinszky at 14% and Kiki Bertens at 12%. The other 26 players committed their unforced errors at more important moments than Kerber did.

As is so often the case in tennis, it’s difficult to establish if a stat like this is indicative of a longer-trend trend, or if it is mostly noise. We don’t have point-by-point data for most of Kerber’s matches, so we can’t take the obvious next step of checking the rest of her 2017 matches for similarly unclutch performances. Instead, we’ll have to keep tabs on how well she limits UEs at big moments on those occasions where we have the data necessary to do so.

Bouncing Back From a Marathon Third Set

In this year’s edition of the French Open, we’ve already seen two women’s matches charge past the 6-6 mark in the third set. On Sunday, Madison Brengle outlasted Julia Goerges 13-11 in the decider, and yesterday, Kristina Mladenovic overcame Jennifer Brady 9-7 in the final set. Marathon three-setters aren’t as gut-busting as the five-set equivalent on the men’s tour, yet they still require players to go beyond the usual limit of a tour match.

Do marathon three-setters affect the fortunes of those players that move on to the next round? Back in 2012, I published a study showing that men who win marathon five-setters (that is, matches that go to 8-6 or longer) win fewer than 30% of their following matches, a rate far worse than what we would expect, given the quality of their next opponents. It seems likely that long three-setters wouldn’t have the same effect, especially since many top women are willing to play five-setters themselves.

The numbers bear out the intuition. From 2001 to the 2017 Australian Open, there have been 185 marathon three-setters in Grand Slam main draws, and the winners of those matches have gone on to win 42.2% of their next contests. That’s more than the equivalent number for men, and it’s even better than it sounds.

Players who need to go deep into a third set to vanquish an early-round opponent are, on average, weaker than those who win in straight sets, so many of the marathon women would already be considered underdogs in their next matches. Using sElo–surface-specific Elo, which I recently introduced–we see that these 185 marathon women would have been expected to win only 44.0% of their following matches. There may be a real effect here, but it is a minor one, especially compared to the fortunes of players who struggle through marathon five-setters.

I ran the same algorithm for women’s Slam matches that ended at 7-6, 7-5, and 6-4 or 6-3 in the final set. Since only the US Open uses the third-set tiebreak format, the available sample for that score is limited, which may explain a slightly wacky result. For the other scores, we see numbers that are roughly similar to the marathon findings. Winners tend to be underdogs against their next opponents, but there is little, if any, hangover effect:

3rd Set Score  Sample  Next W%  Next ExpW%  
Marathons         185    42.2%       44.0%  
7-6                56    48.2%       42.2%  
7-5               232    43.1%       42.7%  
6-4 / 6-3         421    41.6%       43.2%

In short: A long match often tells us something about the winner’s chances against her next foe, but it’s something that we already knew. The tight three-setter itself–marathon or otherwise–has little effect on her chances later on. That’s good news for Mladenovic, who will be back on court tomorrow against Sara Errani, an opponent likely to give her another grueling workout.

Diego Schwartzman’s Return Game Is Even Better Than I Thought

Diego Schwartzman is one of the most unusual players on the ATP tour. Even shorter than David Ferrer, his serve will never be a weapon, so the only way he can compete is by neutralizing everyone else’s offerings and winning baseline battles. Up to No. 34 in this week’s official rankings and No. 35 on the Elo list, he’s proven he can do that against some very good players.

Using the ATP stats leaderboard at Tennis Abstract, we can get a quick sense of how his return game compares with the elites. At tour level in the last 52 weeks (through Monte Carlo), he ranks third with 42.3% return points won, behind only Andy Murray and Novak Djokovic. He is particularly effective against second serves, winning 56.6% of those, better than anyone else on tour. He has broken in 31.8% of his return games, another third-place showing, this time behind Murray and Rafael Nadal.

Yet the leaderboard warns us to tread carefully. In the last year, Murray’s opponents have been far superior to Schwartzman’s, with a median rank of 24 and a mean rank of 41.5. The Argentine’s opponents have rated at 45.5 and 54.8, respectively. Murray, Djokovic, and Nadal are far better all-around players than Schwartzman, so they regularly reach later rounds, where the quality of competition goes way up.

Competition quality is one of the knottiest aspects of tennis analytics, and it is far from being solved. If we want to compare Murray to Djokovic, competition quality isn’t such a big factor. One or the other might get lucky over a span of months, but in the long run, the two best players on tour will face roughly equivalent levels of competition. But when we expand our view to players like Schwartzman–or even a top-tenner such as Dominic Thiem–we can no longer assume that opponent quality will even out. To use a term from other sports, the ATP has a very unbalanced schedule, and the schedule is always more challenging for the best players.

Correcting for competition quality is also key to understanding how any particular player evolves over time. If a player’s results improve, he’ll usually start facing more challenging competition, as Schwartzman is doing this spring in his first shot at the full slate of clay-court Masters events. If his return numbers decline, is he actually playing worse, or is he simply competing at his past level against tougher opponents?

Adjusting for competition

To properly compare players, we need to identify similarities in their schedules. Any pair of tour regulars have played many of the same opponents, even if they’ve never played each other. For instance, since the beginning of last season, Murray and Djokovic have faced 18 of the same players–some more than once. Further down the ranking list, players tend to have fewer opponents in common, but as we’ll see, that’s an obstacle we can overcome.

Here’s how the adjustment works: For a pair of players, find all the opponents both men have faced on the same surface. For example, both Murray and Djokovic have played David Goffin on clay in the last 16 months. Murray won 53.7% of clay return points against the Belgian, while Djokovic won only 42.1%, meaning that Djokovic returned about 22% worse than Murray did. We repeat the process for every surface-player combination, weight the results so that longer matches (or larger numbers of matches) count more heavily, and find the average.

When we do that for the top two men, we find that Djokovic has returned 2.3% better. (That’s a percentage, not percentage points. A great returner wins about 40% of return points, and a 2.3% improvement on that is roughly 41%.) Our finding suggests that Murray has faced somewhat weaker-serving competition: Since the beginning of 2016, he has won 42.9% of return points, compared to Djokovic’s 43.3%–a smaller gap than the competition-adjusted one.

It takes more work to reliably compare someone like Schwartzman to the elites, since their schedules overlap so much less. So before adjusting Diego’s return numbers, we’ll take several intermediate steps. Let’s start with the world No. 3 Stanislas Wawrinka. We follow the above process twice: Once for Wawrinka and Murray, then again for Stan and Novak. Run the numbers, and we find that Wawrinka’s return game is 22.5% weaker than Murray’s and 24.3% weaker than Djokovic’s. Wawrinka’s rates relative to the other two players correspond very well with what we already found, suggesting that Djokovic is a little better than his rival. Weighting the two numbers by sample size–which, in this case, is almost identical–we slightly adjust those two comparisons and conclude that Wawrinka’s return game is 22.4% worse than Murray’s.

Generating competition-adjusted numbers for each subsequent player follows the same pattern. For No. 4 Federer, we run the algorithm three times, one for each of the players ranked above him, then we aggregate the results. For No. 34 Schwartzman, we go through the process 33 times. Thanks to the magic of computers, it takes only a few seconds to adjust 16 months worth of return stats for the ATP top 50.

Below are the results for 2016-17. Players are ranked by “relative return points won” (REL RPW), where a rating of 1.0 is arbitrarily given to Murray, and a rating of 0.98 means that a player wins 2% fewer return points than Murray against equivalent opposition. The “EX RPW” column puts those numbers in a more familiar context: The top-ranked player’s rating is set equal to 43.0%–approximately the best RPW of any player in the last few seasons–and everyone else’s is adjusted accordingly.  The last two columns show each player’s actual rate of return points won and their rank among the ATP top 50:

RANK  PLAYER                 REL RPW  EX RPW  ACTUAL  RANK  
1     Diego Schwartzman         1.04   43.0%   42.4%     4  
2     Novak Djokovic            1.02   42.1%   43.3%     1  
3     Andy Murray               1.00   41.2%   42.9%     2  
4     Rafael Nadal              0.98   40.3%   42.6%     3  
5     David Goffin              0.97   40.1%   41.3%     5  
6     Gilles Simon              0.96   39.6%   40.1%     9  
7     Kei Nishikori             0.95   39.3%   40.1%    10  
8     David Ferrer              0.95   39.1%   40.6%     7  
9     Roger Federer             0.94   38.7%   38.7%    15  
10    Gael Monfils              0.93   38.5%   39.8%    11  


RANK  PLAYER                 REL RPW  EX RPW  ACTUAL  RANK
11    Roberto Bautista Agut     0.93   38.3%   40.3%     8  
12    Ryan Harrison             0.92   37.9%   36.7%    33  
13    Richard Gasquet           0.92   37.9%   40.8%     6  
14    Daniel Evans              0.91   37.6%   36.9%    27  
15    Juan Martin Del Potro     0.91   37.5%   36.8%    32  
16    Benoit Paire              0.90   37.0%   38.1%    19  
17    Mischa Zverev             0.90   36.9%   36.9%    28  
18    Grigor Dimitrov           0.89   36.4%   38.2%    18  
19    Fabio Fognini             0.88   36.4%   39.7%    12  
20    Fernando Verdasco         0.88   36.4%   38.3%    16  

RANK  PLAYER                 REL RPW  EX RPW  ACTUAL  RANK
21    Joao Sousa                0.88   36.2%   38.3%    17  
22    Dominic Thiem             0.88   36.2%   38.1%    20  
23    Stani Wawrinka            0.88   36.1%   37.5%    22  
24    Alexander Zverev          0.88   36.0%   37.5%    23  
25    Albert Ramos              0.87   35.9%   38.9%    14  
26    Kyle Edmund               0.86   35.5%   36.1%    37  
27    Jack Sock                 0.86   35.5%   36.6%    34  
28    Viktor Troicki            0.86   35.4%   37.1%    26  
29    Marin Cilic               0.86   35.4%   37.3%    25  
30    Pablo Carreno Busta       0.86   35.3%   39.4%    13  

RANK  PLAYER                 REL RPW  EX RPW  ACTUAL  RANK
31    Milos Raonic              0.86   35.2%   36.1%    38  
32    Pablo Cuevas              0.85   35.1%   36.9%    29  
33    Tomas Berdych             0.85   35.1%   36.9%    30  
34    Borna Coric               0.85   34.9%   36.1%    39  
35    Nick Kyrgios              0.85   34.9%   35.7%    41  
36    Philipp Kohlschreiber     0.84   34.7%   37.9%    21  
37    Jo Wilfried Tsonga        0.84   34.6%   36.2%    36  
38    Sam Querrey               0.83   34.3%   34.6%    44  
39    Lucas Pouille             0.82   33.9%   36.9%    31  
40    Feliciano Lopez           0.81   33.2%   35.2%    43  

RANK  PLAYER                 REL RPW  EX RPW  ACTUAL  RANK
41    Robin Haase               0.80   33.0%   36.1%    40  
42    Paolo Lorenzi             0.80   32.9%   37.5%    24  
43    Donald Young              0.78   32.2%   36.3%    35  
44    Bernard Tomic             0.78   32.1%   34.1%    45  
45    Nicolas Mahut             0.76   31.4%   35.4%    42  
46    Steve Johnson             0.75   31.0%   33.8%    46  
47    Florian Mayer             0.74   30.3%   33.5%    47  
48    John Isner                0.73   30.0%   29.8%    49  
49    Gilles Muller             0.72   29.8%   32.4%    48  
50    Ivo Karlovic              0.63   25.9%   26.4%    50

The big surprise: Schwartzman is number one! While the average ranking of his opponents was considerably lower than that of the elites, it appears that he has faced bigger-serving opponents than have Murray or Djokovic. The top five on this list–Schwartzman, Murray, Djokovic, Nadal, and Goffin–do not force any major re-evaluation of who we consider to be the game’s best returners, but the competition-adjusted metric does offer more evidence that Schwartzman really belongs there.

There is a similar predictability at the bottom of the list. The five players rated the worst by the competition-adjusted metric–Steve Johnson, Florian Mayer, John Isner, Gilles Muller, and Ivo Karlovic–are the same five who sit at the bottom of the actual RPW ranking, with only Isner and Muller swapping places. This degree of consistency at the top and bottom of the list is reassuring: The metric is correcting for something important, but it isn’t spitting out any truly crazy results.

There are, however, some surprises. Three players do very well when their return games are adjusted for competition: Ryan Harrison, Daniel Evans, and Juan Martin del Potro, all of whom jump from the bottom half to the top 15. In a sense, this is a surface adjustment for Harrison and Evans, both of whom have played almost exclusively on hard courts. Players win fewer return points on faster surfaces (and faster surfaces attract bigger-serving competitors, magnifying the effect), so when adjusted for competition, someone who plays only on hard courts will see his numbers improve. Del Potro, on the other hand, has been absolutely hammered by tough competition, so in his case the correction is giving him credit for the difficult opponents he has had to face.

Several clay court specialists find their return stats adjusted in the wrong direction. Last week’s finalist, Albert Ramos, falls from 14th to 25th, Pablo Carreno Busta drops from 13th to 30th, and Roberto Bautista Agut and Paolo Lorenzi see their numbers take a hit as well. This is the reverse of the effect that pushed Harrison and Evans up the list: Clay-court specialists spend more time on the dirt and they play against weaker-serving opponents, so their season averages make them look like better returners than they really are. It appears that these players are all particularly bad on hard courts: When I ran the algorithm with only clay-court results, Bautista Agut, Ramos, and Carreno Busta all appeared among the top 12 in competition-adjusted return points won. It’s their abysmal hard-court performances that pull down their longer-term numbers.

Beyond RPW

This algorithm–or something like it–has a great deal of potential beyond simply correcting return points won for tour-level competition quality. It could be used for any stat, and if competition-adjusted return rates were combined with corrected rates of service points won, it would generate a plausible overall player rating system.

Such a rating system would be more valuable if the algorithm were extended to players beyond the top 50, as well. Just as Schwartzman doesn’t yet have that many common opponents with the elites, Challenger-level stalwarts don’t have share many opponents with tour regulars. But there is enough overlap that, when combining the shared opponents of dozens of players, we might be able to get a better grip on how Challenger-level competition compares to that of the highest levels. Essentially, we can compare adjacent levels–the elites to the middle of the pack (say, ATP ranks 21 to 50), the middle of the pack to the next 50, and so on–to get a more comprehensive idea of how much players must improve to achieve certain goals.

Finally, adjusting serve and return stats so that we have a set of competition-neutral numbers for every player, for each season of his career, we will gain a clearer picture of which players are improving and by how much. Official rankings and Elo ratings tell us a lot, but they are sometimes fooled by lucky breaks, close wins, or inconsistent opposition. And they cannot isolate individual stats, which may be particularly useful for developmental purposes.

Adjusting for opposition quality is standard practice for analysts of many other sports, and it will help tennis analytics move forward as well. If nothing else, it has shown us that one extreme performance–Schwartzman’s return game–is much more than a fluke, and that service return greatness isn’t limited to the big four.

Del Potro’s Draws and the Possible Persistence of Bad Luck

Tennis’s draw gods have not been kind to Juan Martin del Potro this year.

In Acapulco and Indian Wells, he drew Novak Djokovic as his second-match opponent. In Miami, Delpo got a third-rounder with Roger Federer. In each of the March Masters events, with 1,000 ranking points at stake, del Potro was handed the most difficult opponents for his first round against a fellow seed. Thanks in part to the resulting early exits, one of the most dangerous players on tour is still languishing outside of the top 30 in the ATP rankings.

When I wrote about the Indian Wells quarter of death–the section of the draw containing del Potro, Djokovic, Federer, Rafael Nadal, and Nick Kyrgios–I attempted to quantify the effect of the draw on each player’s expected ranking points. Before each player’s name was placed in the bracket, my model predicted that Delpo would earn about 150 ranking points–the weighted average of his likelihood of reaching the third round, the fourth round, and so on–and after the draw was conducted, his higher probability of a clash with Djokovic knocked that number down to just over 100. That negative effect was one of the worst of any player in the tournament.

The story in Miami is similar, if less extreme. Pre-draw, Delpo’s expected points were 183. Post draw: 155. In the four tournaments he has entered this year, he has been uniformly unlucky:

Tournament    Pre-Draw  Post-Draw  Effect  
Delray Beach      89.3       74.0  -17.1%  
Acapulco         121.5       97.1  -20.1%  
Indian Wells     154.6      102.5  -33.7%  
Miami            182.9      155.4  -15.0%  
TOTAL            548.2      429.0  -21.7%

*The numbers above for Indian Wells are slightly different than what I published in the Indian Wells article, since the simulations I ran for this post consider the entire 96-player field, not just the 64-player second round.

The good news, as we’ll see, is that it’s virtually impossible for this degree of misfortune to continue. The bad news is that those 119 points are gone forever, and at Delpo’s current position in the ranking table, that disadvantage will affect his tournament seeds, which in turn will result in worse draws (earlier meetings with higher-ranked players, independent of luck) for at least another few weeks.

Before we go any further, let me review the methodology I’m using here. (If you’re not interested, skip this paragraph.) For “post-draw” expected points, I’m taking jrank-based forecasts–like the ones on the front page of Tennis Abstract–and using each player’s probability of each round to calculate a weighted average of expected points. “Pre-draw” forecasts are much more computationally demanding. In Miami, for instance, Delpo could’ve faced any of the 64 unseeded players in the second round and been slated to meet any of the top eight seeds in the third round. For each tournament, I ran a Monte Carlo simulation with the tournament seeds, generating a new draw and simulating the tournament–100,000 times, then summing all those outcomes. So in the pre-draw forecast, Delpo had a one-eighth chance of getting Fed in the third round, a one-eighth chance of getting Kei Nishikori there, and so on.

It seems clear that a 22%, 119-point rankings hit over the course of four tournaments is some seriously bad luck. Last year, there were about 750 instances of a player being seeded at an ATP tournament, and in fewer than 60 of those, the draw resulted in an effect of -22% or worse on the player’s expected ranking points. And that’s just one tournament! The odds that Delpo would get such a rough deal in all four of his 2017 tournaments are 1 in more than 20,000.

Over the course of a full season, draw luck mostly evens out. It’s rare to see an effect of more than 10% in either direction. Last year, Thiemo de Bakker saw a painful difference of 18% between his pre-draw and post-draw expected points in 12 ATP events, but everyone else with at least that many tournaments fell between -11% and +11%, with three-quarters of players between -5% and +5%. Even when draw luck doesn’t balance itself out, the effect isn’t as bad as what Delpo has seen in 2017.

Del Potro’s own experience in 2016 is a case in point. His most memorable event of the season was the Olympics, where he drew Djokovic in the first round, so it’s easy to recall his year as being equally riddled with bad luck. But in his 12 other ATP events, the draw aided him in six–including a +34% boost at the US Open–and hurt him at the other six. Altogether, his 2016 ATP draws gave him a 5.9% advantage over his “pre-draw” expected points–a bonus of 17 ranking points. (I didn’t include the Olympics, since no ranking points were awarded there.)

Taken together, Delpo’s 2016-17 draws have deprived him of about 100 ranking points, which would move him three spots up the ranking table. So even with a short stretch of extreme misfortune, draw luck hasn’t affected him that much. Last year’s most extreme case among elite players, Richard Gasquet, suffered a similar effect: His draws knocked down his expected take by 9%, or 237 points, a difference that would bump him up from #22 to #19 in this week’s ranking list.

There are many reasons to believe that del Potro is a much better player than his current ranking suggests, such as his Elo rating, which stands at No. 7. But his ATP ranking reflects his limited schedule and modest start last year much more than it does the vagaries of each week’s brackets. The chances are near zero that he will continue to draw the toughest player in each tournament’s field in the earliest possible round, so we’ll soon have a better idea of what exactly he is capable of, and where exactly he should stand in the rankings.

Are Taller Players the Future of Tennis?

This is a guest post by Wiley Schubert Reed.

This week, the Memphis Open features the three tallest players ever to play professional tennis: 6-foot-10″ John Isner, 6-foot-11″ Ivo Karlovic, and 6-foot-11″ Reilly Opelka. And while these three certainly stand out among all players in the sport, they are by no means the only giants in the game. Also in the Memphis draw: 6-foot-5″ Dustin Brown, 6-foot-6″ Sam Querrey, and 6-foot-8″ Kevin Anderson. (Brown withdrew due to injury, and with Opelka’s second-round loss yesterday, Isner and Karlovic are the only giants remaining in the field.)

We got tall guys. #MemphisOpen

A post shared by Memphis Open (@memphisopen) on

There is no denying that the players on the ATP and WTA tours are taller than the ones who were competing 25 years ago. The takeover by the tall has been obvious for some time in the men’s game, and it’s extended to near the very top of the women’s game as well. But despite alarms raised about the unbeatable giants among men, the merely tall men have held on to control of the game.

The main reason: The elegant symmetry at the game’s heart. The tallest players have an edge on serve, but that’s just half of tennis. And on the return, extreme height–at least for the men–turns out to be a big disadvantage. But a rising crop of tall men have shown promise beyond their service games. If one of the tallest young stars is going to challenge the likes of Novak Djokovic and Andy Murray, he’ll have to do it by trying to return serve like them, too.

Sorting out exactly how much height helps a player is a complicated thing. Just looking at the top 100 pros, for instance, makes the state of things look like a blowout win in favor of the tall. The median top-100 man is nearly an inch taller today than in 1990, and the average top-100 woman is 1.5 inches taller [1]. The number of extremely tall players in the top 100 has gone up, too:

                                    1990  Aug 2016  
Top 100 Men      Median Height  6-ft-0.0  6-ft-0.8  
               At least 6-ft-5        3%       16%  
Top 100 Women    Median Height  5-ft-6.9  5-ft-8.5  
                 At least 6-ft        8%        9%

Height is clearly a competitive advantage, as taller young players rise faster through the rankings than their shorter peers. Among the top 100 juniors each year from 2000 to 2009 [2], the tallest players (6-foot-5 and over for men and 6-foot and over for women) [3] typically sit in the middle of the rankings. But they do better as pros: They were ranked on average approximately 127 spots higher than shorter players their age after four years for men and approximately 113 spots higher after four years for women.

Boys' pro ranking by height Girls' pro ranking by height

 

Thus, juniors who are very tall have the best chance to build a solid pro career. But does that advantage hold within the top 100 of the pro rankings? Are the tallest pros the highest ranked? 

For the women, they clearly are. From 1985 to 2016, the median top 10 woman was 1.2 inches taller than the median player ranked between No. 11 and No. 100, and the tallest women are winning an outsize portion of titles, with women 6-foot and taller winning 15.0 percent of Grand Slams, while making up only 6.6 percent of the top 100 over the same period. Most of these wins were by Lindsay Davenport, Venus Williams and Maria Sharapova. Garbiñe Muguruza became the latest 6-foot women’s champ at the French Open last year [4]. 

It’s a different story for the men, however. From 1985 to 2016, the median height of both the top 10 men and men ranked No. 11 to No. 100 was the same: 6-foot-0.8. And in those same 32 years, only three Grand Slam titles (2.4 percent) were won by players 6-foot-5 or taller (one each by Richard Krajicek, Juan Martin del Potro and Marin Cilic), while over the same period, players 6-foot-5 and above made up 7.7 percent of the top 100. In short, the tallest women are overperforming, while the tallest men are underperforming.

Why have all the big men accomplished so little collectively? One big reason is that whatever edge the tallest men gain in serving is cancelled out by their disadvantage when returning serve. I compared total points played by top-100 pros since 2011, and found that while players 6-foot-5 and over have a clear service advantage and return disadvantage, their height doesn’t seem to have a major impact on overall points won:

Height            % Svc Pts Won  % Ret Pts Won  % Tot Pts Won  
6-ft-5 and above          66.8%          35.7%          51.2%  
6-ft-1 to 6-ft-4          64.5%          37.8%          51.1%  
6-ft-0 and below          62.3%          39.1%          51.1%

Taller players serve better for two reasons. First, their height lets them serve at a sharper angle by changing the geometry of the court. With a sharper angle available to them, they have a greater margin for error to clear the top of the net while still getting the ball to bounce on or inside the service line. And a sharper angle also makes the ball bounce higher, up and out of returners’ strike zone [5].

Serve trajectory

Disregarding spin, for a 6-foot player to serve the ball at 120 miles per hour at the same angle as a 6-foot-5 player, he would need to stand more than 3 feet inside the baseline.

Second, a taller player’s longer serving arm allows him to whip the ball faster. For you physics fans, the torque (in this case magnitude of force imparted on the ball) is directly proportional to the radius of the lever arm (in this case the server’s extended arm and racket). As radius (arm length) increases, so does torque. There is no way for shorter players to make up this advantage. Six-foot-8 Kevin Anderson, current No. 74 in the world and one of the tallest players ever to make the top 10, told me, “I always say it’ll be easier for me to move like Djokovic than it will be for Djokovic to serve like me.”

One would think that height could be an advantage on return as well, with increased wingspan offering greater reach. 18-year-old, 6-foot-11 Reilly Opelka, who is already as tall as the tour’s reigning giant Ivo Karlovic and who ESPN commentator Brad Gilbert said will be “for sure the biggest ever,” told me his height gives him longer leverage. “My reach is a lot longer than a normal tennis player, so I’m able to cover a couple extra inches, which is pretty huge in tennis.”

But Gilbert and Tennis Channel commentator Justin Gimelstob said they believe tall players struggle on return because their higher center of gravity hurts their movement. If a very tall man can learn to move like the merely tall players that have long dominated the sport––Djokovic, Murray (6-foot-3), Roger Federer (6-foot-1) and Rafael Nadal (6-foot-1)–– Gilbert thinks he could be hard to stop. “If you’re 6-foot-6 and are able to move like that, I can easily see that size dominating,” he said.

Interestingly, Gilbert pointed out that some of the best returners in the women’s game––such as Victoria Azarenka (6-foot-0) and Maria Sharapova (6-foot-2)––are among its tallest players [6]. Carl Bialik asked three American women — 5-foot-11 Julia Boserup, 5-foot-10 Jennifer Brady and 5-foot-4 Sachia Vickery — why they think taller women aren’t at a disadvantage on return. They cited two main reasons: 1) Women are returning women’s serves, which are slower and have less spin, on average, than men’s serves, so they have more time to make up for any difficulty in movement; and 2) Women play on the same size court that men do, but a height that’s relatively tall for a woman is about average for men, and it’s a height that works well for returning, no matter your gender.

“On the women’s side, we don’t really have anyone who’s almost 6-foot-11 or 7-foot tall,” Brady said. While she’s above average height on the women’s tour, “I’m not as tall as Reilly Opelka,” she said.

Another reason players as tall as Opelka tend to struggle on return could be that they focus more in practice on improving their service game, which exacerbates the serve-oriented skew of their games. “Being tall helps with the serve and you maybe tend to focus on your serve games even more,” Karlovic, the tallest top 100 player at 6-foot-11 [7], said in an interview conducted on my behalf by members of the ATP World Tour PR & Marketing staff at the Bucharest tournament in April. “Shorter players aren’t as strong at serve so they work their return more.”

Charting the careers of all active male players 6-foot-5 and above who at some point ranked year-end top 100 bears this out. Their percentage of service points won increased by about 6 percentage points over their first eight years on tour [8], while percentage of return points won only increased by about 1.5 percentage points. In contrast, Novak Djokovic has steadily improved his return points won from 36.7 percent in 2005 to 43.9 percent in 2016.

When very tall men break through, it’s usually because of strong performance on return: del Potro and Cilic, who are both 6-foot-6, boosted their return performances to win the US Open in 2009 and 2014, respectively. At the 2009 US Open, del Potro won 44 percent of return points, up from his 40 percent rate on the whole year, including the Open. At the 2014 US Open, Cilic won 41 percent of return points, up from 38 percent that year. And they didn’t improve their return games by facing easy slates of opponents: Each man improved on his return-point winning rates against those same opponents over his career by about the same amount as he elevated his return game compared to the season as a whole.

“It’s a different type of pressure when you’re playing a big server who is putting pressure on you on both the serve and the return,” Gimelstob said. “That’s what Cilic was doing when he won the US Open. That’s the challenge of playing del Potro because he hits the ball so well, but obviously serves so well, also.” To put things into perspective, if del Potro and Cilic had returned at these levels across 2016, each would have ranked among the top seven returners in the game, joining Djokovic, Nadal, Murray, 5-foot-11 David Goffin, and 5-foot-9 David Ferrer. Neither man, though, has been able to return to a Slam final; del Potro has struggled with injury and Cilic with inconsistency.

For the tallest players, return performance is the difference between making the top 50 and the top 10. On average, active players 6-foot-5 and above who finished a year ranked in the top 10 won 67.7 percent of service points that year, while those who finished a year ranked 11 through 50 won 68.1 percent of service points, on average. That’s a difference of only 0.4 percentage points. The difference in return performance between merely making the top 50 and reaching the top 10, however, is far more striking: Tall players who made the top 10 win return points at a rate nearly 4 percentage points higher than do players ranked 11 through 50.

Tall players' points won

A solid-serving player 6-foot-5 or taller who can consistently win more than 38 percent of points on return has an excellent chance of making the top 10. Tomas Berdych and del Potro have done it, and Milos Raonic is approaching that mark, one reason he reached his first major final this year at Wimbledon. Today there are several tall young men who look like they could eventually win 38 percent of return points or better. Alexander Zverev (ranked 18) and Karen Khachanov (ranked 48) are both 6-foot-6, each won about 38 percent of return points in 2016, and neither is older than 20. Khachanov has impressed Gilbert and Karlovic. “That guy moves tremendous for 6-foot-6,” Gilbert said.

Other giants have impressed recently. Jiri Vesely, who is 23 and 6-foot-6, beat Novak Djokovic last year in Monte Carlo and won nearly 36 percent of return points in 2016. Opelka reached his first tour-level semifinal, in Atlanta. Most of the top 10 seeds at Wimbledon lost to players 6-foot-5 or taller. Del Potro won Olympic silver, beating Djokovic and Nadal along the way.

But moving from the top 10 to the top 1 or 2 is another question. Can a taller tennis player develop the skills to move as well as the top shorter players, and win multiple major titles? Well, it’s happened in basketball. “We haven’t had a big guy play tennis that’s like 6-foot-6, 6-foot-7, 6-foot-8, that’s moved like an NBA guy,” Gilbert said. “When you get that, that’s when you get a multiple Slam winner.” Anderson agrees that height is not the obstacle to movement people play it up to be: “You know, LeBron is 6-foot-8. If he can move as well as somebody who’s 5-foot-10, his size now is a huge advantage; there’s not a negative to it.”

Opelka, who qualified for his first grand slam main draw at the 2017 Australian Open where he pushed 11th-ranked David Goffin to five sets, says he is specifically focusing on the return part of his game in practice. “I’ve been spending a ton of time working on my return. When you look at the drills I’m doing in the gym, they work on explosive movement.” But he also points out that basketball players “move better than [tennis players] and are more explosive than [tennis players]” because of their incredible muscle mass, which won’t work for tennis. “I don’t know how they’d be able to keep up for four or five hours with that mass and muscle.” Put LeBron on Arthur Ashe Stadium at the U.S. Open in 100 degree heat for an afternoon, “it’s tough to say how they’ll compare.”

Zverev, who is 19 and 6-foot-6, agrees that tall tennis players face unique challenges: “Movement is much more difficult, and I think building your body is more difficult as well.” But the people I talked to believe that both Opelka and Zverev could be at the top of the game in a few years’ time. “Zverev––that guy could be No. 1 in the world,” Gilbert said. “He serves great, he returns great and he moves great.” And as for Opelka, Gilbert says: “Right now he’s got a monster serve. If he can develop movement, or a return game, who knows where he could go?”

Whether the tallest guys can develop the skills to consistently return at the level of a Djokovic or a Murray remains to be seen. But starting out with a huge serve is a major step toward eventually challenging them. As Opelka says, “every inch is important.”

 

Wiley Schubert Reed is a junior tennis player and fan who has written about tennis for fivethirtyeight.com. He is a senior at the United Nations International School in New York and will be entering Harvard University in the fall.

 

Continue reading Are Taller Players the Future of Tennis?

Benchmarks for Shot-by-Shot Analysis

In my post last week, I outlined what the error stats of the future may look like. A wide range of advanced stats across different sports, from baseball to ice hockey–and increasingly in tennis–follow the same general algorithm:

  1. Classify events (shots, opportunities, whatever) into categories;
  2. Establish expected levels of performance–often league-average–for each category;
  3. Compare players (or specific games or tournaments) to those expected levels.

The first step is, by far, the most complex. Classification depends in large part on available data. In baseball, for example, the earliest fielding metrics of this type had little more to work with than the number of balls in play. Now, batted balls can be categorized by exact location, launch angle, speed off the bat, and more. Having more data doesn’t necessarily make the task any simpler, as there are so many potential classification methods one could use.

The same will be true in tennis, eventually, when Hawkeye data (or something similar) is publicly available. For now, those of us relying on public datasets still have plenty to work with, particularly the 1.6 million shots logged as part of the Match Charting Project.*

*The Match Charting Project is a crowd-sourced effort to track professional matches. Please help us improve tennis analytics by contributing to this one-of-a-kind dataset. Click here to find out how to get started.

The shot-coding method I adopted for the Match Charting Project makes step one of the algorithm relatively straightforward. MCP data classifies shots in two primary ways: type (forehand, backhand, backhand slice, forehand volley, etc.) and direction (down the middle, or to the right or left corner). While this approach omits many details (depth, speed, spin, etc.), it’s about as much data as we can expect a human coder to track in real-time.

For example, we could use the MCP data to find the ATP tour-average rate of unforced errors when a player tries to hit a cross-court forehand, then compare everyone on tour to that benchmark. Tour average is 10%, Novak Djokovic‘s unforced error rate is 7%, and John Isner‘s is 17%. Of course, that isn’t the whole picture when comparing the effectiveness of cross-court forehands: While the average ATPer hits 7% of his cross-court forehands for winners, Djokovic’s rate is only 6% compared to Isner’s 16%.

However, it’s necessary to take a wider perspective. Instead of shots, I believe it will be more valuable to investigate shot opportunities. That is, instead of asking what happens when a player is in position to hit a specific shot, we should be figuring out what happens when the player is presented with a chance to hit a shot in a certain part of the court.

This is particularly important if we want to get beyond the misleading distinction between forced and unforced errors. (As well as the line between errors and an opponent’s winners, which lie on the same continuum–winners are simply shots that were too good to allow a player to make a forced error.) In the Isner/Djokovic example above, our denominator was “forehands in a certain part of the court that the player had a reasonable chance of putting back in play”–that is, successful forehands plus forehand unforced errors. We aren’t comparing apples to apples here: Given the exact same opportunities, Djokovic is going to reach more balls, perhaps making unforced errors where we would call Isner’s mistakes forced errors.

Outcomes of opportunities

Let me clarify exactly what I mean by shot opportunities. They are defined by what a player’s opponent does, regardless of how the player himself manages to respond–or if he manages to get a racket on the ball at all. For instance, assuming a matchup between right-handers, here is a cross-court forehand:

illustration of a shot opportunity

Player A, at the top of the diagram, is hitting the shot, presenting player B with a shot opportunity. Here is one way of classifying the outcomes that could ensue, together with the abbreviations I’ll use for each in the charts below:

  • player B fails to reach the ball, resulting in a winner for player A (vs W)
  • player B reaches the ball, but commits a forced error (FE)
  • player B commits an unforced error (UFE)
  • player B puts the ball back in play, but goes on to lose the point (ip-L)
  • player B puts the ball back in play, presents player A with a “makeable” shot, and goes on to win the point (ip-W)
  • player B causes player A to commit a forced error (ind FE)
  • player B hits a winner (W)

As always, for any given denominator, we could devise different categories, perhaps combining forced and unforced errors into one, or further classifying the “in play” categories to identify whether the player is setting himself up to quickly end the point. We might also look at different categories altogether, like shot selection.

In any case, the categories above give us a good general idea of how players respond to different opportunities, and how those opportunities differ from each other. The following chart shows–to adopt the language of the example above–player B’s outcomes based on player A’s shots, categorized only by shot type:

Outcomes of opportunities by shot type

The outcomes are stacked from worst to best. At the bottom is the percentage of opponent winners (vs W)–opportunities where the player we’re interested in didn’t even make contact with the ball. At the top is the percentage of winners (W) that our player hit in response to the opportunity. As we’d expect, forehands present the most difficult opportunities: 5.7% of them go for winners and another 4.6% result in forced errors. Players are able to convert those opportunities into points won only 42.3% of the time, compared to 46.3% when facing a backhand, 52.5% when facing a backhand slice (or chip), and 56.3% when facing a forehand slice.

The above chart is based on about 374,000 shots: All the baseline opportunities that arose (that is, excluding serves, which need to be treated separately) in over 1,000 logged matches between two righties. Of course, there are plenty of important variables to further distinguish those shots, beyond simply categorizing by shot type. Here are the outcomes of shot opportunities at various stages of the rally when the player’s opponent hits a forehand:

Outcomes of forehand responses based on number of shots

The leftmost column can be seen as the results of “opportunities to hit a third shot”–that is, outcomes when the serve return is a forehand. Once again, the numbers are in line with what we would expect: The best time to hit a winner off a forehand is on the third shot–the “serve-plus-one” tactic. We can see that in another way in the next column, representing opportunities to hit a fourth shot. If your opponent hits a forehand in play for his serve-plus-one shot, there’s a 10% chance you won’t even be able to get a racket on it. The average player’s chances of winning the point from that position are only 38.4%.

Beyond the 3rd and 4th shot, I’ve divided opportunities into those faced by the server (5th shot, 7th shot, and so on) and those faced by the returner (6th, 8th, etc.). As you can see, by the 5th shot, there isn’t much of a difference, at least not when facing a forehand.

Let’s look at one more chart: Outcomes of opportunities when the opponent hits a forehand in various directions. (Again, we’re only looking at righty-righty matchups.)

Outcomes of forehand responses based on shot direction

There’s very little difference between the two corners, and it’s clear that it’s more difficult to make good of a shot opportunity in either corner than it is from the middle. It’s interesting to note here that, when faced with a forehand that lands in play–regardless of where it is aimed–the average player has less than a 50% chance of winning the point. This is a confusing instance of selection bias that crops up occasionally in tennis analytics: Because a significant percentage of shots are errors, the player who just placed a shot in the court has a temporary advantage.

Next steps

If you’re wondering what the point of all of this is, I understand. (And I appreciate you getting this far despite your reservations.) Until we drill down to much more specific situations–and maybe even then–these tour averages are no more than curiosities. It doesn’t exactly turn the analytics world upside down to show that forehands are more effective than backhand slices, or that hitting to the corners is more effective than hitting down the middle.

These averages are ultimately only tools to better quantify the accomplishments of specific players. As I continue to explore this type of algorithm, combined with the growing Match Charting Project dataset, we’ll learn a lot more about the characteristics of the world’s best players, and what makes some so much more effective than others.