Podcast Episode 17: US Open in Review

Episode 17 of the Tennis Abstract Podcast, with Carl Bialik, is our US Open recap. We start with a discussion of Sloane Stephens–her performance here as well as what we expect from her, and we delve into possible explanations of her impressive performance in particular against aggressive players.

We then talk Nadal and his no-nonsense strategy to defeat Kevin Anderson, and consider best-case scenarios for the rest of Kev’s career. We finish up with some doubles and a bit on this weekend’s Davis Cup ties. As always, thanks for listening!

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Podcast Episode 16: The Second Half of our Second Week Chat

Episode 16 of the Tennis Abstract Podcast, with Carl Bialik, didn’t quite work out as planned — my microphone malfunctioned for much of the first half of the recording — but the second half of our conversation could be salvaged. Thus, this episode is missing the big news of the second week, the all-American women’s semifinals, but we still touched on a variety of burning US Open topics, like the youngsters making news in New York and the inevitable hypotheticals of the players who weren’t able to participate in Flushing this year.

This is a shorter-than usual episode, clocking in at 34 minutes; we hope to return on Friday with another mid-Slam update. Thanks for listening!

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Quantifying Cakewalks, or The Time Rafa Finally Got Lucky

During this year’s US Open, much has been made of some rather patchy sections of the draw. Many great players are sitting out the tournament with injury, and plenty of others crashed out early. Pablo Carreno Busta reached the quarterfinals by defeating four straight qualifiers, and Rafael Nadal could conceivably win the title without beating a single top-20 player.

None of this is a reflection on the players themselves: They can play only the draw they’re dealt, and we’ll never know how they would’ve handled a more challenging array of opponents. The weakness of the draw, however, could affect how we remember this tournament.  If we are going to let the quality of the field color our memories, we should at least try to put this year’s players in context to see how they compare with majors in the past.

How to measure draw paths

There are lots of ways to quantify draw quality. (There’s an entire category on this blog devoted to it.) Since we’re interested in the specific sets of opponents faced by our remaining contenders, we need a metric that focuses on those. It doesn’t really matter that, say, Nick Kyrgios was in the draw, since none of the semifinalists had to play him.

Instead of draw difficulty, what we’re after is what I’ll call path ease. It’s a straightforward enough concept: How hard is it to beat the specific set of guys that Rafa (for instance) had to play?

To get a number, we’ll need a few things: The surface-weighted Elo ratings of each one of a player’s opponents, along with a sort of “reference Elo” for an average major semifinalist. (Or finalist, or title winner.) To determine the ease of Nadal’s path so far, we don’t want to use Nadal’s Elo. If we did that, the exact same path would look easier or harder depending on the quality of the player who faced it.

(The exact value of the “reference Elo” isn’t that important, but for those of you interested in the numbers: I found the average Elo rating of every slam semifinalist, finalist, and winner back to 1988 on each of the three major surfaces. On hard courts, those numbers are 2145, 2198, and 2233, respectively. When measuring the difficulty of a path to the semifinal round, I used the first of those numbers; for the difficulty of a path to the title, I used the last.)

To measure path ease, then, we answer the question: What are the odds that an average slam semifinalist (for instance) would beat this particular set of players? In Rafa’s case, he has yet to face a player with a weighted-hard-court Elo rating above 1900, and the typical 2145-rated semifinalist would beat those five players 71.5% of the time. That’s a bit easier than Kevin Anderson‘s path the semis, but a bit harder than Carreno Busta’s. Juan Martin del Potro, on the other hand, is in a different world altogether. Here are the path ease numbers for all four semifinalists, showing the likelihood that average contenders in each round would advance, giving the difficulty of the draws each player has faced:

Semifinalist   Semi Path  Final Path  Title Path  
Nadal              71.5%       49.7%       51.4%  
del Potro           9.1%        7.5%       10.0%  
Anderson           69.1%       68.9%       47.1%  
Carreno Busta      74.3%       71.2%       48.4%

(We don’t yet know each player’s path to the title, so I averaged the Elos of possible opponents. Anderson and Carreno Busta are very close, so for Rafa and Delpo, their potential final opponent doesn’t make much difference.)

There’s one quirk with this metric that you might have noticed: For Nadal and del Potro, their difficulty of reaching the final is greater than that of winning the title altogether! Obviously that doesn’t make logical sense–the numbers work out that way because of the “reference Elos” I’m using. The average slam winner is better than the average slam finalist, so the table is really saying that it’s easier for the average slam winner to beat Rafa’s seven opponents than it would be for the average slam finalist to get past his first six opponents. This metric works best when comparing title paths to title paths, or semifinal paths to semifinal paths, which is what we’ll do for the rest of this post.

Caveats and quirks aside, it’s striking just how easy three of the semifinal paths have been compared to del Potro’s much more arduous route. Even if we discount the difficulty of beating Roger Federer–Elo thinks he’s the best active player on hard courts but doesn’t know about his health issues–Delpo’s path is wildly different from those of his semifinal and possible final opponents.

Cakewalks in context

Semifinalist path eases of 69% or higher–that is, easier–are extremely rare. In fact, the paths of Anderson, Carreno Busta, and Nadal are all among the ten easiest in the last thirty years! Here are the previous top ten:

Year  Slam             Semifinalist               Path Ease  
1989  Australian Open  Thomas Muster                  84.1%  
1989  Australian Open  Miloslav Mecir                 74.2%  
1990  Australian Open  Ivan Lendl                     73.8%  
2006  Roland Garros    Ivan Ljubicic                  73.7%  
1988  Australian Open  Ivan Lendl                     72.2%  
1988  Australian Open  Pat Cash                       70.1%  
2004  Australian Open  Juan Carlos Ferrero            69.2%  
1996  US Open          Michael Chang                  68.8%  
1990  Roland Garros    Andres Gomez                   68.4%  
1996  Australian Open  Michael Chang                  66.2%

In the last decade, the easiest path to the semifinal was Stan Wawrinka‘s route to the 2016 French Open final four, which rated 59.8%. As we’ll see further on, Wawrinka’s draw got a lot more difficult after that.

Del Potro’s draw so far isn’t quite as extreme, but it is quite difficult in the historical context. Of the nearly 500 major semifinalists since 1988, all but 15 are easier than his 9.1% path difficulty. Here are the top ten, all of whom faced draws that would have given the average slam semifinalist less than an 8% chance of getting that far:

Year  Slam             Semifinalist              Path Ease  
2009  Roland Garros    Robin Soderling                1.6%  
1988  Roland Garros    Jonas Svensson                 1.9%  
2017  Wimbledon        Tomas Berdych                  3.7%  
1996  Wimbledon        Richard Krajicek               6.4%  
2011  Wimbledon        Jo Wilfried Tsonga             6.6%  
2012  US Open          Tomas Berdych                  6.8%  
2017  Roland Garros    Dominic Thiem                  6.9%  
2014  Australian Open  Stan Wawrinka                  7.0%  
1989  Roland Garros    Michael Chang                  7.1%  
2017  Wimbledon        Sam Querrey                    7.5%

Previewing the history books

In the long term, we’ll care a lot more about how the 2017 US Open champion won the title than how he made it through the first five rounds. As we saw above, three of the four semifinalists have a path ease of around 50% to win the title–again, meaning that a typical slam winner would have a roughly 50/50 chance of getting past this particular set of seven opponents.

No major winner in recent memory has had it so easy. Nadal’s path would rate first in the last thirty years, while Carreno Busta’s or Anderson’s would rate in the top five. (If it comes to that, their exact numbers will depend on who they face in the final.) Here is the list that those three men have the chance to disrupt:

Year  Slam             Winner                  Path Ease  
2002  Australian Open  Thomas Johansson            48.1%  
2001  Australian Open  Andre Agassi                47.6%  
1999  Roland Garros    Andre Agassi                45.6%  
2000  Wimbledon        Pete Sampras                45.3%  
2006  Australian Open  Roger Federer               44.5%  
1997  Australian Open  Pete Sampras                44.4%  
2003  Australian Open  Andre Agassi                43.9%  
1999  US Open          Andre Agassi                41.5%  
2002  Wimbledon        Lleyton Hewitt              39.9%  
1998  Wimbledon        Pete Sampras                39.1%

At the 2006 Australian Open, Federer lucked into a path that was nearly as easy as Rafa’s this year. His 2003 Wimbledon title just missed the top ten as well. By comparison, Novak Djokovic has never won a major with a path ease greater than 18.7%–harder than that faced by more than half of major winners.

Nadal has hardly had it easy as he has racked up his 15 grand slams, either. Here are the top ten most difficult title paths:

Year  Slam             Winner                Path Ease  
2014  Australian Open  Stan Wawrinka              2.2%  
2015  Roland Garros    Stan Wawrinka              3.1%  
2016  Us Open          Stan Wawrinka              3.2%  
2013  Roland Garros    Rafael Nadal               4.4%  
2014  Roland Garros    Rafael Nadal               4.7%  
1989  Roland Garros    Michael Chang              5.0%  
2012  Roland Garros    Rafael Nadal               5.2%  
2016  Australian Open  Novak Djokovic             5.4%  
2009  US Open          J.M. Del Potro             5.9%  
1990  Wimbledon        Stefan Edberg              6.2%

As I hinted in the title of this post, while Nadal got lucky in New York this year, it hasn’t always been that way. He appears three times on this list, facing greater challenges than any major winner other than Wawrinka the giant-killer.

On average, Rafa’s grand slam title paths haven’t been quite as harrowing as Djokovic’s, but compared to most other greats of the last few decades, he has worked hard for his titles. Here are the average path eases of players with at least three majors since 1988:

Player           Majors        Avg Path Ease  
Stan Wawrinka         3                 2.8%  
Novak Djokovic       12                11.3%  
Rafael Nadal         15                13.6%  
Stefan Edberg         4                14.6%  
Andy Murray           3                18.8%  
Boris Becker          4                18.8%  
Mats Wilander         3                19.8%  
Gustavo Kuerten       3                22.0%  
Roger Federer        19                23.5%  
Jim Courier           4                26.4%  
Pete Sampras         14                28.9%  
Andre Agassi          8                32.3%

If Rafa adds to his grand slam haul this weekend, his average path ease will take a bit of a hit. Still, he’ll only move one place down the list, behind Stefan Edberg. After more than a decade of battling all-time greats in the late rounds of majors, it’s fair to say that Nadal deserved this cakewalk.


Update: This post reads a bit differently than when I first wrote it: I’ve changed the references to “path difficulty” to “path ease” to make it clearer what the metric is showing.

Nadal and Anderson advanced to the final, so we can now determine the exact path ease number for whichever one of them wins the title. Rafa’s exact number remains 51.4%, and should he win, his career average across 16 slams will increase to about 15%. Anderson’s path ease to the title is “only” 41.3%, which would be good for ninth on the list shown above, and just barely second easiest of the last 30 US Opens.

Measuring the Impact of the Serve in Men’s Tennis

By just about any measure, the serve is the most important shot in tennis. In men’s professional tennis, with its powerful deliveries and short points, the serve is all the more crucial. It is the one shot guaranteed to occur in every rally, and in many points, it is the only shot.

Yet we don’t have a good way of measuring exactly how important it is. It’s easy to determine which players have the best serves–they tend to show up at the top of the leaderboards for aces and service points won–but the available statistics are very limited if we want a more precise picture. The ace stat counts only a subset of those points decided by the serve, and the tally of service points won (or 1st serve points won, or 2nd serve points won) combines the effect of the serve with all of the other shots in a player’s arsenal.

Aces are not the only points in which the serve is decisive, and some service points won are decided long after the serve ceases to have any relevance to the point. What we need is a method to estimate how much impact the serve has on points of various lengths.

It seems like a fair assumption that if a server hits a winner on his second shot, the serve itself deserves some of the credit, even if the returner got it back in play. In any particular instance, the serve might be really important–imagine Roger Federer swatting away a weak return from the service line–or downright counterproductive–think of Rafael Nadal lunging to defend against a good return and hitting a miraculous down-the-line winner. With the wide variety of paths a tennis point can follow, though, all we can do is generalize. And in the aggregate, the serve probably has a lot to do with a 3-shot rally. At the other extreme, a 25-shot rally may start with a great serve or a mediocre one, but by the time by the point is decided, the effect of the serve has been canceled out.

With data from the Match Charting Project, we can quantify the effect. Using about 1,200 tour-level men’s matches from 2000 to the present, I looked at each of the server’s shots grouped by the stage of the rally–that is, his second shot, his third shot, and so on–and calculated how frequently it ended the point. A player’s underlying skills shouldn’t change during a point–his forehand is as good at the end as it is at the beginning, unless fatigue strikes–so if the serve had no effect on the success of subsequent shots, players would end the point equally often with every shot.

Of course, the serve does have an effect, so points won by the server end much more frequently on the few shots just after the serve than they do later on. This graph illustrates how the “point ending rate” changes:

On first serve points (the blue line), if the server has a “makeable” second shot (the third shot of the rally, “3” on the horizontal axis, where “makeable” is defined as a shot that results in an unforced error or is put back in play), there is a 28.1% chance it ends the point in the server’s favor, either with a winner or by inducing an error on the next shot. On the following shot, the rate falls to 25.6%, then 21.8%, and then down into what we’ll call the “base rate” range between 18% and 20%.

The base rate tells us how often players are able to end points in their favor after the serve ceases to provide an advantage. Since the point ending rate stabilizes beginning with the fifth shot (after first serves), we can pinpoint that stage of the rally as the moment–for the average player, anyway–when the serve is no longer an advantage.

As the graph shows, second serve points (shown with a red line) are a very different story. It appears that the serve has no impact once the returner gets the ball back in play. Even that slight blip with the server’s third shot (“5” on the horizontal axis, for the rally’s fifth shot) is no higher than the point ending rate on the 15th shot of first-serve rallies. This tallies with the conclusions of some other research I did six years ago, and it has the added benefit of agreeing with common sense, since ATP servers win only about half of their second serve points.

Of course, some players get plenty of positive after-effects from their second serves: When John Isner hits a second shot on a second-serve point, he finishes the point in his favor 30% of the time, a number that falls to 22% by his fourth shot. His second serve has effects that mirror those of an average player’s first serve.

Removing unforced errors

I wanted to build this metric without resorting to the vagaries of differentiating forced and unforced errors, but it wasn’t to be. The “point-ending” rates shown above include points that ended when the server’s opponent made an unforced error. We can argue about whether, or how much, such errors should be credited to the server, but for our purposes today, the important thing is that unforced errors aren’t affected that much by the stage of the rally.

If we want to isolate the effect of the serve, then, we should remove unforced errors. When we do so, we discover an even sharper effect. The rate at which the server hits winners (or induces forced errors) depends heavily on the stage of the rally. Here’s the same graph as above, only with opponent unforced errors removed:

The two graphs look very similar. Again, the first serve loses its effect around the 9th shot in the rally, and the second serve confers no advantage on later shots in the point. The important difference to notice is the ratio between the peak winner rate and the base rate, which is now just above 10%. When we counted unforced errors, the ratio between peak and base rate was about 3:2. With unforced errors removed, the ratio is close to 2:1, suggesting that when the server hits a winner on his second shot, the serve and the winner contributed roughly equally to the outcome of the point. It seems more appropriate to skip opponent unforced errors when measuring the effect of the serve, and the resulting 2:1 ratio jibes better with my intuition.

Making a metric

Now for the fun part. To narrow our focus, let’s zero in on one particular question: What percentage of service points won can be attributed to the serve? To answer that question, I want to consider only the server’s own efforts. For unreturned serves and unforced errors, we might be tempted to give negative credit to the other player. But for today’s purposes, I want to divvy up the credit among the server’s assets–his serve and his other shots–like separating the contributions of a baseball team’s pitching from its defense.

For unreturned serves, that’s easy. 100% of the credit belongs to the serve.

For second serve points in which the return was put in play, 0% of the credit goes to the serve. As we’ve seen, for the average player, once the return comes back, the server no longer has an advantage.

For first-serve points in which the return was put in play and the server won by his fourth shot, the serve gets some credit, but not all, and the amount of credit depends on how quickly the point ended. The following table shows the exact rates at which players hit winners on each shot, in the “Winner %” column:

Server's…  Winner %  W%/Base  Shot credit  Serve credit  
2nd shot      21.2%     1.96        51.0%         49.0%  
3rd shot      18.1%     1.68        59.6%         40.4%  
4th shot      13.3%     1.23        81.0%         19.0%  
5th+          10.8%     1.00       100.0%          0.0%

Compared to a base rate of 10.8% winners per shot opportunity, we can calculate the approximate value of the serve in points that end on the server’s 2nd, 3rd, and 4th shots. The resulting numbers come out close to round figures, so because these are hardly laws of nature (and the sample of charted matches has its biases), we’ll go with round numbers. We’ll give the serve 50% of the credit when the server needed only two shots, 40% when he needed three shots, and 20% when he needed four shots. After that, the advantage conferred by the serve is usually canceled out, so in longer rallies, the serve gets 0% of the credit.

Tour averages

Finally, we can begin the answer the question, What percentage of service points won can be attributed to the serve? This, I believe, is a good proxy for the slipperier query I started with, How important is the serve?

To do that, we take the same subset of 1,200 or so charted matches, tally the number of unreturned serves and first-serve points that ended with various numbers of shots, and assign credit to the serve based on the multipliers above. Adding up all the credit due to the serve gives us a raw number of “points” that the player won thanks to his serve. When we divide that number by the actual number of service points won, we find out how much of his service success was due to the serve itself. Let’s call the resulting number Serve Impact, or SvI.

Here are the aggregates for the entire tour, as well as for each major surface:

         1st SvI  2nd SvI  Total SvI  
Overall    63.4%    31.0%      53.6%  
Hard       64.6%    31.5%      54.4%  
Clay       56.9%    27.0%      47.8%  
Grass      70.8%    37.3%      61.5%

Bottom line, it appears that just over half of service points won are attributable to the serve itself. As expected, that number is lower on clay and higher on grass.

Since about two-thirds of the points that men win come on their own serves, we can go even one step further: roughly one-third of the points won by a men’s tennis player are due to his serve.

Player by player

These are averages, and the most interesting players rarely hew to the mean. Using the 50/40/20 multipliers, Isner’s SvI is a whopping 70.8% and Diego Schwartzman‘s is a mere 37.7%. As far from the middle as those are, they understate the uniqueness of these players. I hinted above that the same multipliers are not appropriate for everyone; the average player reaps no positive after-effects of his second serve, but Isner certainly does. The standard formula we’ve used so far credits Isner with an outrageous SvI, even without giving him credit for the “second serve plus one” points he racks up.

In other words, to get player-specific results, we need player-specific multipliers. To do that, we start by finding a player-specific base rate, for which we’ll use the winner (and induced forced error) rate for all shots starting with the server’s fifth shot on first-serve points and shots starting with the server’s fourth on second-serve points. Then we check the winner rate on the server’s 2nd, 3rd, and 4th shots on first-serve points and his 2nd and 3rd shots on second-serve points, and if the rate is at least 20% higher than the base rate, we give the player’s serve the corresponding amount of credit.

Here are the resulting multipliers for a quartet of players you might find interesting, with plenty of surprises already:

                   1st serve              2nd serve       
                    2nd shot  3rd  4th     2nd shot  3rd  
Roger Federer            55%  50%  30%           0%   0%  
Rafael Nadal             31%   0%   0%           0%   0%  
John Isner               46%  41%   0%          34%   0%  
Diego Schwartzman        20%  35%   0%           0%  25%  
Average                  50%  30%  20%           0%   0%

Roger Federer gets more positive after-effects from his first serve than average, more even than Isner does. The big American is a tricky case, both because so few of his serves come back and because he is so aggressive at all times, meaning that his base winner rate is very high. At the other extreme, Schwartzman and Rafael Nadal get very little follow-on benefit from their serves. Schwartzman’s multipliers are particularly intriguing, since on both first and second serves, his winner rate on his third shot is higher than on his second shot. Serve plus two, anyone?

Using player-specific multipliers makes Isner’s and Schwartzman’s SvI numbers more extreme. Isner’s ticks up a bit to 72.4% (just behind Ivo Karlovic), while Schwartzman’s drops to 35.0%, the lowest of anyone I’ve looked at. I’ve calculated multipliers and SvI for all 33 players with at least 1,000 tour-level service points in the Match Charting Project database:

Player                 1st SvI  2nd SvI  Total SvI  
Ivo Karlovic             79.2%    56.1%      73.3%  
John Isner               78.3%    54.3%      72.4%  
Andy Roddick             77.8%    51.0%      71.1%  
Feliciano Lopez          83.3%    37.1%      68.9%  
Kevin Anderson           77.7%    42.5%      68.4%  
Milos Raonic             77.4%    36.0%      66.0%  
Marin Cilic              77.1%    34.1%      63.3%  
Nick Kyrgios             70.6%    41.0%      62.5%  
Alexandr Dolgopolov      74.0%    37.8%      61.3%  
Gael Monfils             69.8%    37.7%      60.8%  
Roger Federer            70.6%    32.0%      58.8%  
                                                    
Player                 1st SvI  2nd SvI  Total SvI  
Bernard Tomic            67.6%    28.7%      58.5%  
Tomas Berdych            71.6%    27.0%      57.2%  
Alexander Zverev         65.4%    30.2%      54.9%  
Fernando Verdasco        61.6%    32.9%      54.3%  
Stan Wawrinka            65.4%    33.7%      54.2%  
Lleyton Hewitt           66.7%    32.1%      53.4%  
Juan Martin Del Potro    63.1%    28.2%      53.4%  
Grigor Dimitrov          62.9%    28.6%      53.3%  
Jo Wilfried Tsonga       65.3%    25.9%      52.7%  
Marat Safin              68.4%    22.7%      52.3%  
Andy Murray              63.4%    27.5%      52.0%  
                                                    
Player                 1st SvI  2nd SvI  Total SvI  
Dominic Thiem            60.6%    28.9%      50.8%  
Roberto Bautista Agut    55.9%    32.5%      49.5%  
Pablo Cuevas             57.9%    28.9%      47.8%  
Richard Gasquet          56.0%    29.0%      47.5%  
Novak Djokovic           56.0%    26.8%      47.3%  
Andre Agassi             54.3%    31.4%      47.1%  
Gilles Simon             55.7%    28.4%      46.7%  
Kei Nishikori            52.2%    30.8%      45.2%  
David Ferrer             46.9%    28.2%      41.0%  
Rafael Nadal             42.8%    27.1%      38.8%  
Diego Schwartzman        39.5%    25.8%      35.0%

At the risk of belaboring the point, this table shows just how massive the difference is between the biggest servers and their opposites. Karlovic’s serve accounts for nearly three-quarters of his success on service points, while Schwartzman’s can be credited with barely one-third. Even those numbers don’t tell the whole story: Because Ivo’s game relies so much more on service games than Diego’s does, it means that 54% of Karlovic’s total points won–serve and return–are due to his serve, while only 20% of Schwartzman’s are.

We didn’t need a lengthy analysis to show us that the serve is important in men’s tennis, or that it represents a much bigger chunk of some players’ success than others. But now, instead of asserting a vague truism–the serve is a big deal–we can begin to understand just how much it influences results, and how much weak-serving players need to compensate just to stay even with their more powerful peers.

Podcast Episode 15: Return of the Podcast

The long-awaited Episode 15 of the Tennis Abstract Podcast is Carl Bialik’s and my lightning-round update on the summer of tennis and a midway-point checkup of the US Open. I boldly select Diego Schwartzman and Anastasija Sevastova as my picks to win it all, while Carl says much more sensible things.

This is a shorter-than usual episode, clocking in at 34 minutes; we hope to return on Friday with another mid-Slam update. Thanks for listening!

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

How Much Does Height Matter in Men’s Tennis?

Clearly, height matters. On average, tall players can serve faster and more effectively than can shorter players. And usually, short players who succeed on tour do so by returning and moving better than their taller colleagues. The conventional wisdom is that height is an advantage, but only up to a point. An inch or two above six feet (a range between 185 and 190 cm) is good, but much more than that is too much. No player above 6’4″ (193 cm, Marat Safin) has ever reached No. 1 in the ATP rankings.

While 5’7″ (170 cm) Diego Schwartzman‘s surprise run to the US Open quarterfinals has brought this issue to the forefront, pundits and fans talk about it all the time. This is a topic crying out for some basic data analysis, yet as is too often the case in tennis, some really simple work is missing from the conversation. Let’s try to fix that.

When I say “basic,” I really mean it. We all know that tall men hit more aces than short men. But how many? How strong is the relationship between height and, say, first serve points won? In this post, I’ll show the relationship between height and each of nine different stats, from overall records to serve- and return-specific numbers.

For my dataset, I took age-25 seasons from 1998 to 2017 in which the player completed at least 30 tour-level matches. (I used only one season per player so that the best players with the longest careers wouldn’t be weighted too heavily.) That gives us 156 player-seasons, from Hicham Arazi and Greg Rusedski in 1998 up to Schwartzman and Jack Sock in 2017. There aren’t very many players at the extremes, so I lumped together everyone 5’8″ (173 cm) and below and did the same with everyone 6’5″ (196 cm) and above. I also grouped players standing 5’10” with those at 5’9″, because there were only four 5’10” guys in the dataset.

That gives us nine “height levels”: one per inch from 5’8″ to 6’5″ with the exception of 5’10”. (The ATP website displays heights in meters, but its database must record and/or store them in inches, because every height translates to something close to an integer height in inches. For example, no player is listed at 174 cm, or 5’8.5″.) Some individual heights are certainly exaggerated, as male athletes and their organizations tend to do, but we have to make do with the information available, and we may assume that the exaggerations are fairly consistent.

Let’s start with the most basic building block of tennis, the match win. There is a reasonably strong relationship here, although the group of players at 6’1″ is nearly as good as the tallest subset. In each of these graphs, height is given on the horizontal axis in centimeters, from 173 (the 5’8″ and below group) up to 196 (the 6’5″ and higher group).

There is a similar, albeit slightly weaker, relationship when we look at the level of single points. Since a small difference in points results in a larger difference in matches won (at the extreme, winning 55% of points translates to nearly a 100% chance of winning the match) this isn’t a surprise. At the match level, r^2 = 0.38, and at the point level, below, r^2 = 0.27:

(If you’re wondering how all of the averages are above 50%, it’s because the sample is limited to player-seasons with at least 30 matches. A fair number of those matches are against players who aren’t tour regulars, and the regulars–the guys in this sample–win a hefty proportion of those matches.)

Serve stats

Now we get to confirm our main assumptions. Taller players are better servers, and the gap is enormous, ranging from 60% of service points won for the shortest players up to nearly 70% for the tallest:

As strong as that relationship is (r^2 = 0.81), the relationship between height and ace rate is stronger still, at r^2 = 0.83:

Aces don’t tell the whole story–the stat with the strongest correlation to height is first serve points won (r^2 = 0.92) as you can see here:

But this is where things start to get interesting. Nearly every inch makes a player more effective on the first serve, but opponents are able to negotiate tall players’ second serves much more successfully. There remains a modest relationship with height (r^2 = 0.18), but it is the weakest of all the stats presented here:

It’s nice to be tall, as anyone who has seen John Isner casually spin a second-serve ace out of the reach of an unlucky opponent. But except in the tallest category, height doesn’t confer much of a second-serve advantage. Players standing 6’4″ (193 cm) win about as many second-serve points as do players at 5’9″ (175 cm). That doesn’t mean that the second serves of the shorter players are just as good–they probably aren’t–but that shorter players tend to possess other skills that they can leverage in second-serve points, which usually last longer. For the purposes of today’s overview, it doesn’t really matter why short players are able to negate the advantage of height on second serve points, just that they are clearly able to do so.

Return stats

We wouldn’t be having this conversation–and David Ferrer wouldn’t be headed to a likely place in the Hall of Fame–if the inverse relationship between height and return effectiveness weren’t nearly as strong as the positive one between height and serving prowess. “Nearly” is the key word here. The relationship between height and overall return points won is almost as strong (r^2 = 0.74) as that of height and overall service points won, but not quite:

Schwartzman is doing more than his part to hold up the left side of that trendline: He is both the shortest player in the top 50 and the best returner. On first serve points, however, there’s only so much the returner can do, so while shorter players still have an advantage, it is less substantial. The relationship here is a bit weaker, at r^2 = 0.63:

It follows, then, that the relationship between height and second-serve return points won must be stronger, at r^2 = 0.77:

The overall and first-serve return point graphs make clear just how much worse the tallest players are than the rest of the pack. The graphs exaggerate it a bit, because I’ve grouped players from 6’5″ all the way up to 6’11”, and the Isners of the sport are considerably less effective than players such as Marin Cilic. Still, we find plenty of confirmation for the conventional wisdom that a height of 6’2″ or 6’3″ (188 cm to 190 cm) allows for players to remain effective on both sides of the ball, while a small increase from there can be a disadvantage.

A note on selection bias

It’s easy to lapse into shorthand and say something like, “shorter players are better returners.” More precisely, what we mean is, “of the players who have become tour regulars, shorter players are better returners.” They have to be, because it is nearly impossible for them to be top-tier servers. If they’ve cracked the top 50, they must have developed a world-class return game. The shorter the player, the more likely this is true.

The same logic is considerably weaker if we descend a couple rungs lower on the ladder of tennis skill. In collegiate tennis, it’s still an advantage to be tall–as Isner can attest–but a player such as 5’10” Benjamin Becker can serve as well as nearly all the competition he will face at that level.

One more note on selection bias

My choice to use each player’s age-25 season might understate the ability of either short or tall players. It is possible that certain playing styles result in earlier or later peaks, meaning that while tall players could be better at age 25, shorter players may be superior at age 28. There are anecdotes that support the argument in both directions, so I don’t think it’s a major issue, but it is one worthy of additional study.

Further reading

A guest post on this blog earlier this year posed the question, Are Taller Players the Future of Tennis?

I didn’t mention serve speed in the above, but here’s a quick study of the fastest serves and their correlation with height.

Sebastian Ofner and ATP Debuts

This is a guest post by Peter Wetz.

Sebastian Ofner, the still relatively young Austrian, received some media attention this June when he qualified for the Wimbledon main draw at his first attempt and even reached the round of 32 by beating Thomaz Bellucci and Jack Sock. Therefore, some people, including me, had an eye on the 21-year-old when he made his ATP tour debut* at Kitzbuhel a few weeks later, where he was awarded a wild card.

Stunningly, Ofner made it into the semifinals despite having drawn top seed Pablo Cuevas in the second round. Cuevas, who admittedly seems to be out of form lately (or possibly is just regressing to his mean), had a 79% chance of reaching the quarterfinal when the draw came out, according to First Ball In’s forecast.

Let’s look at the numbers to contextualize Ofner’s achievement. How deep do players go when making their debut at ATP level? How often would we expect to see what Ofner did in Kitzbuhel?

The following table shows the results of ATP debutantes with different types of entry into the main draw (WC = wild card, Q = qualifier, Direct = direct acceptance, All = WC + Q + Direct). The data considers tournaments starting in 1990.

Round	WC       Q        Direct    All
R16	14.51%	 26.73%   24.46%    21.77%			
QF	 2.39%	  6.39%    4.32%     4.64%
SF	 0.51%	  2.30%    2.16%     1.59%
F	 0.17%	  0.64%    0.72%     0.46%
W	 0.17%	  0.26%    0.72%     0.27%

Since 1990 there have been 1507 ATP debuts: 586 wild cards (39%), 782 qualifiers (52%) and 139 direct acceptances (9%). Given these numbers, we would expect a wild card debutante to get to the semifinal (or further) every 9 years. In other words, it is a once in a decade feat. In fact, in the 28 years of data, only Lleyton Hewitt (Adelaide 1998), Michael Ryderstedt (Stockholm 2004) and Ernests Gulbis (St. Petersburg 2006) accomplished what Ofner did. Only Hewitt went on to win the tournament.

More than half of the players of all entry types who reached the final won the tournament. Speaking in absolute terms, 4 of 7 finalists (of ATP debutantes) won the tournament. (Due to the small sample size, it is perfectly possible that this is just noise in the data.)

If we exclude rounds starting from the semifinals because of small sample sizes, qualifiers outperform direct acceptances. This may be the result of qualifiers having already played two or three matches and having already become accustomed to the conditions, making it easier for them than it is for debutantes who got accepted directly into the main draw. But to really prove this, more investigation is needed.

For now we know that what Sebastian Ofner has achieved rarely happens. We should also know that by no means is his feat a predictor of future greatness.

* I define Kitzbuhel as Ofner’s ATP tour debut because Grand Slam events are run by the ITF. However, Grand Slam statistics, such as match wins, are included in ATP statistics.

Peter Wetz is a computer scientist interested in racket sports and data analytics based in Vienna, Austria.