Measuring the Best Smashes in Tennis

How can we identify the best shots in tennis? At first glance, it seems like a simple problem. Thanks to the shot-by-shot data collected for over 3,500 matches by the Match Charting Project, we can look at every instance of the shot in question and see what happened. If a player hits a lot of winners, or wins most of the ensuing points, he or she is probably pretty good at that shot. Lots of unforced errors would lead us to conclude the opposite.

A friend recently posed a more specific question: Who has the best smash in the men’s game? Compared to other shots such as, say, slice backhands, smashes should be pretty easy to evaluate. A large percentage of them end the point–in the contemporary men’s game (I discuss the women’s game later on), 69% are winners or induce forced errors–which reduces the problem to a straightforward one.

The simplest algorithm to answer my friend’s question is to determine how often each player ends the point in his favor when hitting a smash–that is, with a winner or by inducing a forced error. Call the resulting ratio “W/SM.” The Match Charting Project (MCP) dataset has at least 10 tour-level matches for 80 different men, and the W/SM ratio for those players ranges from 84% (Jeremy Chardy) all the way down to 30% (Paolo Lorenzi). Both of those extremes are represented by players with relatively small samples; if we limit our scope to men with at least 90 recorded smashes, the range isn’t quite as wide. The best of the bunch is Jo-Wilfried Tsonga, at 79%, and the “worst” is Ivan Lendl, at 57%. That isn’t quite fair to Lendl, since smash success rates have improved quite a bit over the years, and Lendl’s rate is only a couple percentage points below the average for the 1980s. Among active players with at least 90 smashes in the books, Stan Wawrinka brings up the rear, with a W/SM of 65%.

We can look at the longer-term effects of a player’s smashes without adding much complexity. It’s ideal to end the point with a smash, but most players would settle for winning the point. When hitting a smash, ATPers these days end up winning the point 81% of the time, ranging from 97% (Chardy again) down to 45% (Lorenzi again). Once again, Tsonga leads the pack of the bigger-sample-size players, winning the point 90% of the time after hitting a smash, and among active players, Wawrinka is still at the bottom of that subset, at 77%.

Here is a list of all players with at least 90 smashes in the MCP dataset, with their winners (and induced forced errors) per smash (W/SM), errors per smash (E/SM), and points won per smash (PTS/SM):

```PLAYER              W/SM  E/SM  PTS/SM
Jo-Wilfried Tsonga   78%    6%     90%
Tomas Berdych        76%    6%     88%
Pete Sampras         75%    7%     86%
Roger Federer        73%    7%     86%
Rafael Nadal         69%    7%     84%
Milos Raonic         73%    9%     82%
Andy Murray          67%    6%     82%
Kei Nishikori        68%   11%     81%
David Ferrer         71%    9%     81%
Andre Agassi         67%    8%     80%
Novak Djokovic       66%    9%     80%
Stefan Edberg        62%   12%     78%
Stan Wawrinka        65%   10%     77%
Ivan Lendl           57%   13%     71%```

These numbers give us a pretty good idea of who you should back if the ATP ever hosts the smash-hitting equivalent of baseball’s Home Run Derby. Best of all, it doesn’t commit any egregious offenses against common sense: We’d expect to see Tsonga and Roger Federer near the top, and we’d know something was wrong if Novak Djokovic were too far from the bottom.

Smash opportunities

Still, we need to do better. Almost every shot made in a tennis match represents a decision made by the player hitting it: topspin or slice? backhand or run-around forehand? approach or stay back? Many smashes are obvious choices, but a large number are not. Different players make different choices, and to evaluate any particular shot, we need to subtly reframe the question. Instead of vaguely asking for “the best,” we’d be better served looking for the player who gets the most value out of his smash. While the two questions are similar, they are not the same.

Let’s expand our view to what we might call “smash opportunities.” Once again, smashes make our task relatively straightforward: We can define a smash opportunity simply as a lob hit by the opponent.* In the contemporary ATP, roughly 72% of lobs result in smashes–the rest either go for winners or are handled with a different shot. Different players have very different strategies: Federer, Pete Sampras, and Milos Raonic all hit smashes in more than 84% of opportunities, while a few other men come in under 50%. Nick Kyrgios, for instance, tried a smash in only 20 of 49 (41%) of recorded opportunities. Of those players with more available data, Juan Martin Del Potro elected to go for the overhead in 61 of 114 (54%) of chances, and Andy Murray in 271 of 433 (62.6%).

* Using an imperfect dataset, it’s a bit more complicated; sometimes the shots that precede smashes are coded as topspin or slice groundstrokes. I’ve counted those as smash opportunities as well.

Not all lobs are created equal, of course. With a large number of points, we would expect them to even out, but even then, a player’s overall style may effect the smash opportunities he sees. That’s a more difficult issue for another day; for now, it’s easiest to assume that each player’s mix of smash opportunities are roughly equal, though we’ll keep in mind the likelihood that we’ve swept some complexity under the rug.

With such a wide range of smashes per smash opportunities (SM/SMO), it’s clear that some players’ average smashes are more difficult than others. Federer hits about half again as many smashes per opportunity as del Potro does, suggesting that Fed’s attempts are more difficult than Delpo’s; on those more difficult attempts, Delpo is choosing a different shot. The Argentine is very effective when he opts for the smash, winning 84% of those points, but it seems likely that his rate would not be so high if he hit smashes as frequently as Federer does.

This leads us to a slightly different question: Which players are most effective when dealing with smash opportunities? The smash itself doesn’t necessarily matter–if a player is equally effective with, say, swinging volleys, the lack of a smash would be irrelevant. The smash is simply an effective tool that most players employ to deal with these situations.

Smash opportunities don’t offer the same level of guarantee that smashes themselves do: In the ATP these days, players win 72% of points after being handed a smash opportunity, and 56% of the shots they hit result in winners or induced forced errors. Looking at these situations takes us a bit off-track, but it also allows us to study a broader question with more impact on the game as a whole, because smash opportunities represent a larger number of shots than smashes themselves do.

Here is a list of all the players with at least 99 smash opportunities in the MCP dataset, along with the rate at which they hit smashes (SM/SMO), the rate at which they hit winners or induced forced errors in response to smash opportunites (W/SMO), hit errors in those situations (E/SMO), and won the points when given lobs (PTW/SMO). Like the list above, players are ranked by the rightmost column, points won.

```PLAYER              SM/SMO  W/SMO  E/SMO  PTW/SMO
Jo-Wilfried Tsonga     80%    68%    13%      80%
Roger Federer          84%    66%    13%      78%
Pete Sampras           86%    68%    15%      78%
Tomas Berdych          75%    66%    16%      76%
Milos Raonic           85%    67%    14%      76%
Novak Djokovic         81%    60%    13%      75%
Kevin Anderson         66%    57%    12%      74%
Rafael Nadal           74%    57%    16%      73%
Andre Agassi           77%    62%    17%      73%
Boris Becker           85%    59%    18%      72%
Stan Wawrinka          79%    58%    15%      72%
Kei Nishikori          72%    57%    17%      70%
Andy Murray            63%    52%    15%      70%
Dominic Thiem          66%    52%    11%      70%
David Ferrer           71%    57%    17%      69%
Pablo Cuevas           73%    54%    14%      67%
Stefan Edberg          81%    52%    23%      65%
Bjorn Borg             81%    41%    20%      63%
JM del Potro           54%    48%    19%      60%
Ivan Lendl             74%    45%    28%      59%
John McEnroe           74%    43%    24%      56%```

The order of this list has much in common with the previous one, with names like Federer, Sampras, and Tsonga at the top. Yet there are key differences: Djokovic and Wawrinka are particularly effective when they respond to a lob with something other than an overhead, while del Potro is the opposite, landing near the bottom of this ranking despite being quite effective with the smash itself.

The rate at which a player converts opportunities to smashes has some impact on his overall success rate on smash opportunities, but the relationship isn’t that strong (r^2 = 0.18). Other options, such as swinging volleys or mid-court forehands, also give players a good chance of winning the point.

Smash value

Let’s get back to my revised question: Who gets the most value out of his smash? A good answer needs to combine how well he hits it with how often he hits it. Once we can quantify that, we’ll be able to see just how much a good or bad smash can impact a player’s bottom line, measured in overall points won, and how much a great smash differs from an abysmal one.

As noted above, the average current-day ATPer wins the point 81% of the time that he hits a smash. Let’s reframe that in terms of the probability of winning a point: When a lob is flying through the air and a player readies his racket to hit an overhead, his chance of winning the point is 81%–most of the hard work is already done, having generated such a favorable situation. If our player ends up winning the point, the smash improved his odds by 0.19 points (from 0.81 to 1.0), and if he ends up losing the point, the smash hurt his odds by 0.81 (from 0.81 to 0.0). A player who hits five successful smashes in a row has a smash worth about one total point: 5 multiplied by 0.19 equals 0.95.

We can use this simple formula to estimate how much each player’s smash is worth, denominated in points. We’ll call that Point Probability Added (PPA). Finally, we need to take into account how often the player hits his smash. To do so, we’ll simply divide PPA by total number of points played, then multiply by 100 to make the results more readable. The metric, then, is PPA per 100 points, reflecting the impact of the smash in a typical short match. Most players have similar numbers of smash opportunities, but as we’ve seen, some choose to hit far more overheads than others. When we divide by points, we give more credit to players who hit their smashes more often.

The overall impact of the smash turns out to be quite small. Here are the 1990s-and-later players with at least 99 smash opportunities in the dataset along with their smash PPA per 100 points:

```PLAYER                 SM PPA/100
Jo-Wilfried Tsonga           0.17
Pete Sampras                 0.11
Tomas Berdych                0.11
Roger Federer                0.10
Milos Raonic                 0.04
Juan Martin del Potro        0.02
Andy Murray                  0.01
Kevin Anderson               0.01
Kei Nishikori                0.00
David Ferrer                 0.00
Andre Agassi                 0.00
Novak Djokovic              -0.02
Stan Wawrinka               -0.07
Dominic Thiem               -0.07
Pablo Cuevas                -0.10```

Tsonga reigns supreme, from the most basic measurement to the most complex. His 0.17 smash PPA per 100 points means that the quality of his overhead earns him about one extra point (compared to an average ATPer) every 600 points. That doesn’t sound like much, and rightfully so: He hits fewer than one smash per 50 points, and as good as Tsonga is, the average player has a very serviceable smash as well.

The list gives us an idea of the overall range of smash-hitting ability, as well. Among active players, the laggard in this group is Pablo Cuevas, at -0.1 points per 100, meaning that his subpar smash costs him one point out of every thousand he plays. It’s possible to be worse–in Lorenzi’s small sample, his rate is -0.65–but if we limit our scope to these well-studied players, the difference between the high and low extremes is barely 0.25 points per 100, or one point out of every 400.

I’ve excluded several players from earlier generations from this list; as mentioned earlier, the average smash success rate in those days was lower, so measuring legends like McEnroe and Borg using a 2010s-based point probability formula is flat-out wrong. That said, we’re on safe ground with Sampras and Agassi; the rate at which players convert smashes into points won has remained fairly steady since the early 1990s.

Lob-responding value

We’ve seen the potential impact of smash skill; let’s widen our scope again and look at the potential impact of smash opportunity skill. When a player is faced with a lob, but before he decides what shot to hit, his chance of winning the point is about 72%. Thus, hitting a shot that results in winning the point is worth 0.28 points of point probability added, while a choice that ends up losing the point translates to -0.72.

There are more smash opportunities than smashes, and more room to improve on the average (72% instead of 81%), so we would expect to see a bigger range of PPA per 100 points. Put another way, we would expect that lob-responding skill, which includes smashes, is more important than smash-specific skill.

It’s a modest difference, but it does look like lob-responding skill has a bigger range than smash skill. Here is the same group of players, still showing their PPA/100 for smashes (SM PPA/100), now also including their PPA/100 for smash opportunities (SMO PPA/100):

```PLAYER                 SM PPA/100  SMO PPA/100
Jo-Wilfried Tsonga           0.17         0.18
Roger Federer                0.10         0.16
Pete Sampras                 0.11         0.16
Milos Raonic                 0.04         0.12
Tomas Berdych                0.11         0.09
Kevin Anderson               0.01         0.08
Novak Djokovic              -0.02         0.07
Rafael Nadal                 0.05         0.03
Andre Agassi                 0.00         0.01
Stan Wawrinka               -0.07         0.00
Kei Nishikori                0.00        -0.03
Andy Murray                  0.01        -0.03
Dominic Thiem               -0.07        -0.05
David Ferrer                 0.00        -0.06
Pablo Cuevas                -0.10        -0.12
Juan Martin del Potro        0.02        -0.19```

Djokovic and Delpo draw our attention again as the players whose smash skills do not accurately represent their smash opportunity skills. Djokovic is slightly below average with smashes, but a few notches above the norm on opportunities; Delpo is a tick above average when he hits smashes, but dreadful when dealing with lobs in general.

As it turns out, we can measure the best smashes in tennis, both to compare players and to get a general sense of the shot’s importance. What we’ve also seen is that smashes don’t tell the entire story–we learn more about a player’s overall ability when we widen our view to smash opportunities.

Smashes in the women’s game

Contemporary women hit far fewer smashes than men do, and they win points less often when they hit them. Despite the differences, the reasoning outlined above applies just as well to the WTA. Let’s take a look.

In the WTA of this decade, smashes result in winners (or induced forced errors) 63% of the time, and smashes result in points won about 75% of the time. Both numbers are lower than the equivalent ATP figures (69% and 81%, respectively), but not dramatically so. Here are the rates of winners, errors, and points won per smash for the 14 women with at least 80 smashes in the MCP dataset:

```PLAYER               W/SM  E/SM  PTS/SM
Jelena Jankovic       73%    9%     83%
Serena Williams       72%   13%     81%
Steffi Graf           61%    9%     81%
Svetlana Kuznetsova   70%   10%     79%
Simona Halep          66%   11%     76%
Caroline Wozniacki    61%   16%     74%
Karolina Pliskova     62%   18%     74%
Agnieszka Radwanska   54%   13%     74%
Angelique Kerber      57%   15%     72%
Martina Navratilova   54%   13%     71%
Monica Niculescu      50%   15%     70%
Garbine Muguruza      63%   19%     70%
Petra Kvitova         59%   22%     68%
Roberta Vinci         58%   14%     68%```

Historical shot-by-shot data is less representative for women than for men, so it’s probably safest to assume that trends in smash success rates are similar for men than for women. If that’s true, Steffi Graf’s era is similar to the present, while Martina Navratilova’s prime saw far fewer smashes going for winners or points won.

Where the women’s game really differs from the men’s is the difference between smash opportunities (lobs) and smashes. As we saw above, 72% of ATP smash opportunities result in smashes. In the current WTA, the corresponding figure is less than half that: 35%. Some of the single-player numbers are almost too extreme to be believed: In 12 matches, Catherine Bellis faced 41 lobs and hit 3 smashes; in 29 charted matches, Jelena Ostapenko saw 103 smash opportunities and tried only 10 smashes. A generation ago, the gender difference was tiny: Graf, Martina Hingis, Arantxa Sanchez Vicario, and Monica Seles all hit smashes in at least three-quarters of their opportunities. But among active players, only Barbora Strycova comes in above 70%.

Here are the smash opportunity numbers for the 17 women with at least 150 smash opportunities in the MCP dataset. SM/SMO is smashes per chance, W/SMO is winners (and induced forced errors) per smash opportunity, E/SMO is errors per opportunity, and PTS/SMO is points won per smash opportunity:

```PLAYER                SM/SMO  W/SMO  E/SMO  PTW/SMO
Maria Sharapova          12%    57%    11%      76%
Serena Williams          55%    58%    18%      72%
Steffi Graf              82%    52%    17%      71%
Karolina Pliskova        47%    52%    16%      70%
Simona Halep             14%    41%    11%      69%
Carla Suarez Navarro     25%    33%     9%      69%
Eugenie Bouchard         29%    50%    18%      68%
Victoria Azarenka        35%    52%    17%      67%
Angelique Kerber         39%    42%    14%      66%
Garbine Muguruza         43%    51%    18%      66%
Monica Niculescu         57%    41%    19%      65%
Petra Kvitova            48%    50%    19%      65%
Agnieszka Radwanska      44%    42%    18%      65%
Johanna Konta            30%    47%    21%      64%
Caroline Wozniacki       36%    44%    18%      64%
Elina Svitolina          14%    38%    14%      63%
Martina Navratilova      67%    42%    26%      58%```

It’s clear from the top of this list that women’s tennis is a different ballgame. Maria Sharapova almost never opts for an overhead, but when faced with a lob, she is the best of them all. Next up is Serena Williams, who hits almost as many smashes as any active player on this list, and is nearly as successful. Recall that in the men’s game, there is a modest positive correlation between smashes per opportunity and points won per smash opportunity; here, the relationship is weaker, and slightly negative.

Because most women hit so few smashes, there isn’t quite as much to be gained by using point probability added (PPA) to measure WTA smash skill. Graf was exceptionally good, comparable to Tsonga in the value she extracted from her smash, but among active players, only Serena and Victoria Azarenka can claim a smash that is worth close to one point per thousand. At the other extreme, Monica Niculescu is nearly as bad as Graf was good, suggesting she ought to figure out a way to respond to more smash opportunities with her signature forehand slice.

Here is the same group of women (minus Navratilova, whose era makes PPA comparisons misleading), with their PPA per 100 points for smashes (SM PPA/100) and smash opportunities (SMO PPA/100):

```PLAYER                SM PPA/100  SMO PPA/100
Maria Sharapova             0.03         0.21
Serena Williams             0.09         0.15
Steffi Graf                 0.15         0.14
Karolina Pliskova          -0.01         0.09
Carla Suarez Navarro        0.04         0.08
Simona Halep                0.00         0.07
Eugenie Bouchard           -0.02         0.03
Victoria Azarenka           0.08         0.00
Angelique Kerber           -0.03        -0.02
Garbine Muguruza           -0.07        -0.03
Petra Kvitova              -0.07        -0.04
Monica Niculescu           -0.13        -0.06
Caroline Wozniacki         -0.01        -0.07
Agnieszka Radwanska        -0.02        -0.07
Johanna Konta              -0.12        -0.08
Elina Svitolina             0.01        -0.09```

The table is sorted by smash opportunity PPA, which tells us about a much more relevant skill in the women’s game. Sharapova’s lob-responding ability is well ahead of the pack, worth better than one point above average per 500, with Serena and Graf not far behind. The overall range among these well-studied players, from Sharapova’s 0.21 to Elina Svitolina’s -0.09, is somewhat smaller than the equivalent range in the ATP, but with such outliers as Sharapova here and Delpo on the men’s side, it’s tough to draw firm conclusions from small subsets of players, however elite they are.

Final thought

The approach I’ve outlined here to measure the impact of smash and smash-opportunity skills is one that could be applied to other shots. Smashes are a good place to start because they are so simple: Many of them end points, and even when they don’t, they often virtually guarantee that one player will win the point. While smashes are a bit more complex than they first appear, the complications involved in applying a similar algorithm to, say, backhands and backhand opportunities, are considerably greater. That said, I believe this algorithm represents a promising entry point to these more daunting problems.

Measuring the Impact of the Serve in Men’s Tennis

By just about any measure, the serve is the most important shot in tennis. In men’s professional tennis, with its powerful deliveries and short points, the serve is all the more crucial. It is the one shot guaranteed to occur in every rally, and in many points, it is the only shot.

Yet we don’t have a good way of measuring exactly how important it is. It’s easy to determine which players have the best serves–they tend to show up at the top of the leaderboards for aces and service points won–but the available statistics are very limited if we want a more precise picture. The ace stat counts only a subset of those points decided by the serve, and the tally of service points won (or 1st serve points won, or 2nd serve points won) combines the effect of the serve with all of the other shots in a player’s arsenal.

Aces are not the only points in which the serve is decisive, and some service points won are decided long after the serve ceases to have any relevance to the point. What we need is a method to estimate how much impact the serve has on points of various lengths.

It seems like a fair assumption that if a server hits a winner on his second shot, the serve itself deserves some of the credit, even if the returner got it back in play. In any particular instance, the serve might be really important–imagine Roger Federer swatting away a weak return from the service line–or downright counterproductive–think of Rafael Nadal lunging to defend against a good return and hitting a miraculous down-the-line winner. With the wide variety of paths a tennis point can follow, though, all we can do is generalize. And in the aggregate, the serve probably has a lot to do with a 3-shot rally. At the other extreme, a 25-shot rally may start with a great serve or a mediocre one, but by the time by the point is decided, the effect of the serve has been canceled out.

With data from the Match Charting Project, we can quantify the effect. Using about 1,200 tour-level men’s matches from 2000 to the present, I looked at each of the server’s shots grouped by the stage of the rally–that is, his second shot, his third shot, and so on–and calculated how frequently it ended the point. A player’s underlying skills shouldn’t change during a point–his forehand is as good at the end as it is at the beginning, unless fatigue strikes–so if the serve had no effect on the success of subsequent shots, players would end the point equally often with every shot.

Of course, the serve does have an effect, so points won by the server end much more frequently on the few shots just after the serve than they do later on. This graph illustrates how the “point ending rate” changes:

On first serve points (the blue line), if the server has a “makeable” second shot (the third shot of the rally, “3” on the horizontal axis, where “makeable” is defined as a shot that results in an unforced error or is put back in play), there is a 28.1% chance it ends the point in the server’s favor, either with a winner or by inducing an error on the next shot. On the following shot, the rate falls to 25.6%, then 21.8%, and then down into what we’ll call the “base rate” range between 18% and 20%.

The base rate tells us how often players are able to end points in their favor after the serve ceases to provide an advantage. Since the point ending rate stabilizes beginning with the fifth shot (after first serves), we can pinpoint that stage of the rally as the moment–for the average player, anyway–when the serve is no longer an advantage.

As the graph shows, second serve points (shown with a red line) are a very different story. It appears that the serve has no impact once the returner gets the ball back in play. Even that slight blip with the server’s third shot (“5” on the horizontal axis, for the rally’s fifth shot) is no higher than the point ending rate on the 15th shot of first-serve rallies. This tallies with the conclusions of some other research I did six years ago, and it has the added benefit of agreeing with common sense, since ATP servers win only about half of their second serve points.

Of course, some players get plenty of positive after-effects from their second serves: When John Isner hits a second shot on a second-serve point, he finishes the point in his favor 30% of the time, a number that falls to 22% by his fourth shot. His second serve has effects that mirror those of an average player’s first serve.

Removing unforced errors

I wanted to build this metric without resorting to the vagaries of differentiating forced and unforced errors, but it wasn’t to be. The “point-ending” rates shown above include points that ended when the server’s opponent made an unforced error. We can argue about whether, or how much, such errors should be credited to the server, but for our purposes today, the important thing is that unforced errors aren’t affected that much by the stage of the rally.

If we want to isolate the effect of the serve, then, we should remove unforced errors. When we do so, we discover an even sharper effect. The rate at which the server hits winners (or induces forced errors) depends heavily on the stage of the rally. Here’s the same graph as above, only with opponent unforced errors removed:

The two graphs look very similar. Again, the first serve loses its effect around the 9th shot in the rally, and the second serve confers no advantage on later shots in the point. The important difference to notice is the ratio between the peak winner rate and the base rate, which is now just above 10%. When we counted unforced errors, the ratio between peak and base rate was about 3:2. With unforced errors removed, the ratio is close to 2:1, suggesting that when the server hits a winner on his second shot, the serve and the winner contributed roughly equally to the outcome of the point. It seems more appropriate to skip opponent unforced errors when measuring the effect of the serve, and the resulting 2:1 ratio jibes better with my intuition.

Making a metric

Now for the fun part. To narrow our focus, let’s zero in on one particular question: What percentage of service points won can be attributed to the serve? To answer that question, I want to consider only the server’s own efforts. For unreturned serves and unforced errors, we might be tempted to give negative credit to the other player. But for today’s purposes, I want to divvy up the credit among the server’s assets–his serve and his other shots–like separating the contributions of a baseball team’s pitching from its defense.

For unreturned serves, that’s easy. 100% of the credit belongs to the serve.

For second serve points in which the return was put in play, 0% of the credit goes to the serve. As we’ve seen, for the average player, once the return comes back, the server no longer has an advantage.

For first-serve points in which the return was put in play and the server won by his fourth shot, the serve gets some credit, but not all, and the amount of credit depends on how quickly the point ended. The following table shows the exact rates at which players hit winners on each shot, in the “Winner %” column:

```Server's…  Winner %  W%/Base  Shot credit  Serve credit
2nd shot      21.2%     1.96        51.0%         49.0%
3rd shot      18.1%     1.68        59.6%         40.4%
4th shot      13.3%     1.23        81.0%         19.0%
5th+          10.8%     1.00       100.0%          0.0%```

Compared to a base rate of 10.8% winners per shot opportunity, we can calculate the approximate value of the serve in points that end on the server’s 2nd, 3rd, and 4th shots. The resulting numbers come out close to round figures, so because these are hardly laws of nature (and the sample of charted matches has its biases), we’ll go with round numbers. We’ll give the serve 50% of the credit when the server needed only two shots, 40% when he needed three shots, and 20% when he needed four shots. After that, the advantage conferred by the serve is usually canceled out, so in longer rallies, the serve gets 0% of the credit.

Tour averages

Finally, we can begin the answer the question, What percentage of service points won can be attributed to the serve? This, I believe, is a good proxy for the slipperier query I started with, How important is the serve?

To do that, we take the same subset of 1,200 or so charted matches, tally the number of unreturned serves and first-serve points that ended with various numbers of shots, and assign credit to the serve based on the multipliers above. Adding up all the credit due to the serve gives us a raw number of “points” that the player won thanks to his serve. When we divide that number by the actual number of service points won, we find out how much of his service success was due to the serve itself. Let’s call the resulting number Serve Impact, or SvI.

Here are the aggregates for the entire tour, as well as for each major surface:

```         1st SvI  2nd SvI  Total SvI
Overall    63.4%    31.0%      53.6%
Hard       64.6%    31.5%      54.4%
Clay       56.9%    27.0%      47.8%
Grass      70.8%    37.3%      61.5%```

Bottom line, it appears that just over half of service points won are attributable to the serve itself. As expected, that number is lower on clay and higher on grass.

Since about two-thirds of the points that men win come on their own serves, we can go even one step further: roughly one-third of the points won by a men’s tennis player are due to his serve.

Player by player

These are averages, and the most interesting players rarely hew to the mean. Using the 50/40/20 multipliers, Isner’s SvI is a whopping 70.8% and Diego Schwartzman‘s is a mere 37.7%. As far from the middle as those are, they understate the uniqueness of these players. I hinted above that the same multipliers are not appropriate for everyone; the average player reaps no positive after-effects of his second serve, but Isner certainly does. The standard formula we’ve used so far credits Isner with an outrageous SvI, even without giving him credit for the “second serve plus one” points he racks up.

In other words, to get player-specific results, we need player-specific multipliers. To do that, we start by finding a player-specific base rate, for which we’ll use the winner (and induced forced error) rate for all shots starting with the server’s fifth shot on first-serve points and shots starting with the server’s fourth on second-serve points. Then we check the winner rate on the server’s 2nd, 3rd, and 4th shots on first-serve points and his 2nd and 3rd shots on second-serve points, and if the rate is at least 20% higher than the base rate, we give the player’s serve the corresponding amount of credit.

Here are the resulting multipliers for a quartet of players you might find interesting, with plenty of surprises already:

```                   1st serve              2nd serve
2nd shot  3rd  4th     2nd shot  3rd
Roger Federer            55%  50%  30%           0%   0%
Rafael Nadal             31%   0%   0%           0%   0%
John Isner               46%  41%   0%          34%   0%
Diego Schwartzman        20%  35%   0%           0%  25%
Average                  50%  30%  20%           0%   0%```

Roger Federer gets more positive after-effects from his first serve than average, more even than Isner does. The big American is a tricky case, both because so few of his serves come back and because he is so aggressive at all times, meaning that his base winner rate is very high. At the other extreme, Schwartzman and Rafael Nadal get very little follow-on benefit from their serves. Schwartzman’s multipliers are particularly intriguing, since on both first and second serves, his winner rate on his third shot is higher than on his second shot. Serve plus two, anyone?

Using player-specific multipliers makes Isner’s and Schwartzman’s SvI numbers more extreme. Isner’s ticks up a bit to 72.4% (just behind Ivo Karlovic), while Schwartzman’s drops to 35.0%, the lowest of anyone I’ve looked at. I’ve calculated multipliers and SvI for all 33 players with at least 1,000 tour-level service points in the Match Charting Project database:

```Player                 1st SvI  2nd SvI  Total SvI
Ivo Karlovic             79.2%    56.1%      73.3%
John Isner               78.3%    54.3%      72.4%
Andy Roddick             77.8%    51.0%      71.1%
Feliciano Lopez          83.3%    37.1%      68.9%
Kevin Anderson           77.7%    42.5%      68.4%
Milos Raonic             77.4%    36.0%      66.0%
Marin Cilic              77.1%    34.1%      63.3%
Nick Kyrgios             70.6%    41.0%      62.5%
Alexandr Dolgopolov      74.0%    37.8%      61.3%
Gael Monfils             69.8%    37.7%      60.8%
Roger Federer            70.6%    32.0%      58.8%

Player                 1st SvI  2nd SvI  Total SvI
Bernard Tomic            67.6%    28.7%      58.5%
Tomas Berdych            71.6%    27.0%      57.2%
Alexander Zverev         65.4%    30.2%      54.9%
Fernando Verdasco        61.6%    32.9%      54.3%
Stan Wawrinka            65.4%    33.7%      54.2%
Lleyton Hewitt           66.7%    32.1%      53.4%
Juan Martin Del Potro    63.1%    28.2%      53.4%
Grigor Dimitrov          62.9%    28.6%      53.3%
Jo Wilfried Tsonga       65.3%    25.9%      52.7%
Marat Safin              68.4%    22.7%      52.3%
Andy Murray              63.4%    27.5%      52.0%

Player                 1st SvI  2nd SvI  Total SvI
Dominic Thiem            60.6%    28.9%      50.8%
Roberto Bautista Agut    55.9%    32.5%      49.5%
Pablo Cuevas             57.9%    28.9%      47.8%
Richard Gasquet          56.0%    29.0%      47.5%
Novak Djokovic           56.0%    26.8%      47.3%
Andre Agassi             54.3%    31.4%      47.1%
Gilles Simon             55.7%    28.4%      46.7%
Kei Nishikori            52.2%    30.8%      45.2%
David Ferrer             46.9%    28.2%      41.0%
Rafael Nadal             42.8%    27.1%      38.8%
Diego Schwartzman        39.5%    25.8%      35.0%```

At the risk of belaboring the point, this table shows just how massive the difference is between the biggest servers and their opposites. Karlovic’s serve accounts for nearly three-quarters of his success on service points, while Schwartzman’s can be credited with barely one-third. Even those numbers don’t tell the whole story: Because Ivo’s game relies so much more on service games than Diego’s does, it means that 54% of Karlovic’s total points won–serve and return–are due to his serve, while only 20% of Schwartzman’s are.

We didn’t need a lengthy analysis to show us that the serve is important in men’s tennis, or that it represents a much bigger chunk of some players’ success than others. But now, instead of asserting a vague truism–the serve is a big deal–we can begin to understand just how much it influences results, and how much weak-serving players need to compensate just to stay even with their more powerful peers.

Just How Aggressive is Jelena Ostapenko?

If you picked up only two stats about surprise Roland Garros champion Jelena Ostapenko, you probably heard that, first, her average forehand is faster than Andy Murray’s, and second, she hit 299 winners in her seven French Open matches. I’m not yet sure how much emphasis we should put on shot speed, and I instinctively distrust raw totals, but even with those caveats, it’s hard not to be impressed.

Compared to the likes of Simona Halep, Timea Bacsinszky, and Caroline Wozniacki, the last three women she upset en route to her maiden title, Ostapenko was practically playing a different game. Her style is more reminiscent of fellow Slam winners Petra Kvitova and Maria Sharapova, who don’t construct points so much as they destruct them. What I’d like to know, then, is how Ostapenko stacks up against the most aggressive players on the WTA tour.

Thankfully we already have a metric for this: Aggression Score, which I’ll abbreviate as AGG. This stat requires that we know three things about every point: How many shots were hit, who won it, and how. With that data, we figure out what percentage of a player’s shots resulted in winners, unforced errors, or her opponent’s forced errors. (Technically, the denominator is “shot opportunities,” which includes shots a player didn’t manage to hit after her opponent hit a winner. That doesn’t affect the results too much.) For today’s purposes, I’m calculating AGG without a player’s serves–both aces and forced return errors–so we’re capturing only rally aggression.

The typical range of this version AGG is between 0.1–very passive–and 0.3–extremely aggressive. Based on the nearly 1,600 women’s matches in the Match Charting Project dataset, Kvitova and Julia Goerges represent the aggressive end, with average AGGs around .275. We only have four Samantha Crawford matches in the database, but early signs suggest she could outpace even those women, as her average is at .312. At the other end of the spectrum, Madison Brengle is at 0.11, with Wozniacki and Sara Errani at 0.12. In the Match Charting data, there are single-day performances that rise as high as 0.44 (Serena Williams over Errani at the 2013 French Open) and fall as low as 0.06. In the final against Ostapenko, Halep’s aggression score was 0.08, half of her average of 0.16.

Context established, let’s see where Ostapenko fits in, starting with the Roland Garros final. Against Halep, her AGG was a whopping .327. That’s third highest of any player in a major final, behind Kvitova at Wimbledon in 2014 (.344) and Serena at the 2007 Australian Open (.328). (We have data for every Grand Slam final back to 1999, and most of them before that.) Using data from IBM Pointstream, which encompasses almost all matches at Roland Garros this year, Ostapenko’s aggression in the final was 7th-highest of any match in the tournament–out of 188 player-matches with the necessary data–behind two showings from Bethanie Mattek Sands, one each from Goerges, Madison Keys, and Mirjana Lucic … and Ostapenko’s first-round win against Louisa Chirico. It was also the third-highest recorded against Halep out of more than 200 Simona matches in the Match Charting dataset.

You get the picture: The French Open final was a serious display of aggression, at least from one side of the court. That level of ball-bashing was nothing new for the Latvian, either. We have charting data for her last three matches at Roland Garros, along with two matches from Charleston and one from Prague this clay season. Of those six performances, Ostapenko’s lowest AGG was .275, against Wozniacki in the Paris quarters. Her average across the six was .303.

If those recent matches indicate what we’ll see from her in the future, she will likely score as the most aggressive rallying player on the WTA tour. Because she played less aggressively in her earlier matches on tour, her career average still trails those of Kvitova and Goerges, but not by much–and probably not for long. It’s scary to consider what might happen as she gets stronger; we’ll have to wait and see how her tactics evolve, as well.

The Match Charting Project contains at least 15 matches on 62 different players–here is the rally-only aggression score for all of them:

```PLAYER                    MATCHES  RALLY AGG
Julia Goerges                  15      0.277
Petra Kvitova                  57      0.277
Jelena Ostapenko               17      0.271
Madison Keys                   35      0.261
Camila Giorgi                  17      0.257
Sabine Lisicki                 19      0.246
Caroline Garcia                15      0.242
Coco Vandeweghe                17      0.238
Serena Williams               108      0.237
Laura Siegemund                19      0.235
Anastasia Pavlyuchenkova       17      0.230
Danka Kovinic                  15      0.223
Kristina Mladenovic            28      0.222
Na Li                          15      0.218
Maria Sharapova                73      0.217

PLAYER                    MATCHES  RALLY AGG
Eugenie Bouchard               52      0.214
Ana Ivanovic                   46      0.211
Garbine Muguruza               57      0.210
Lucie Safarova                 29      0.209
Karolina Pliskova              42      0.207
Elena Vesnina                  20      0.207
Venus Williams                 46      0.205
Johanna Konta                  31      0.205
Monica Puig                    15      0.203
Dominika Cibulkova             38      0.198
Martina Navratilova            25      0.197
Steffi Graf                    39      0.196
Anastasija Sevastova           17      0.194
Samantha Stosur                19      0.193
Sloane Stephens                15      0.190

PLAYER                    MATCHES  RALLY AGG
Ekaterina Makarova             23      0.189
Lauren Davis                   16      0.186
Heather Watson                 16      0.185
Daria Gavrilova                20      0.183
Justine Henin                  28      0.183
Kiki Bertens                   15      0.181
Monica Seles                   18      0.179
Svetlana Kuznetsova            28      0.174
Timea Bacsinszky               28      0.174
Victoria Azarenka              55      0.170
Andrea Petkovic                24      0.166
Roberta Vinci                  23      0.164
Barbora Strycova               16      0.163
Belinda Bencic                 31      0.163
Jelena Jankovic                24      0.162

PLAYER                    MATCHES  RALLY AGG
Alison Riske                   15      0.161
Angelique Kerber               83      0.161
Flavia Pennetta                23      0.160
Simona Halep                  218      0.160
Carla Suarez Navarro           31      0.159
Martina Hingis                 15      0.157
Chris Evert                    20      0.152
Darya Kasatkina                18      0.148
Elina Svitolina                46      0.141
Yulia Putintseva               15      0.137
Alize Cornet                   18      0.136
Agnieszka Radwanska            90      0.130
Annika Beck                    16      0.126
Monica Niculescu               25      0.124
Caroline Wozniacki             62      0.122
Sara Errani                    23      0.121```

(A few of the match counts differ slightly from what you’ll find on the MCP home page. I’ve thrown out a few matches with too much missing data or in formats that didn’t play nice with the script I wrote to calculate aggression score.)

3,000 Matches!

Last week, the Match Charting Project hit an exciting milestone: 3,000 matches!

The MCP has been logging shot-by-shot records of professional matches for about two and a half years now, and in doing so, we’ve built an open dataset unlike anything else in the tennis world. We have detailed records of at least one match from almost every player in the ATP and WTA top 200s, and extensive data on the top players of each tour. Altogether, we’ve tracked 450,000 points and over 1.7 million shots.

The research that could be conducted using this data is almost inexhaustible, and we’ve barely scraped the surface. My work on Federer’s new-and-improved backhand was just one example of what the Match Charting Project has made possible.

One of the most valuable aspects of the project last year was the addition–spearheaded by Edo–of nearly all men’s and women’s Grand Slam finals back to 1980. (We’re still missing a handful of them–if you can help us find video, we’d be very grateful!) This year, we’ve taken on another challenge: All of the head-to-heads of the ATP Big Four. Already, we’ve covered the 37 meetings of Federer and Nadal (through yesterday’s Miami final), and we’re near the 75% mark for the 216 total matches contested among these four all-time-greats.

Meanwhile, we’re continuing to add a broad range of matches almost as soon as they happen, including over 20 each from Indian Wells and Miami,  along with the occasional ITF and Challenger contest. While the data is skewed toward a handful of popular players, we’ve been careful to amass several matches for nearly every player of consequence on both tours.

If you’re interested in tennis analytics, I hope you’ll consider contributing to the project by charting matches. This data doesn’t magically collect itself, and like most volunteer-driven endeavors, a small number of contributors are responsible for a substantial percentage of the work. Even a single match is a useful addition, and the biggest risk you face is that you’ll get hooked.

Click here to find out how to get started.

Here’s to the next 3,000 matches!

The Federer Backhand That Finally Beat Nadal

Roger Federer and Rafael Nadal first met on court in 2004, and they contested their first Grand Slam final two years later. The head-to-head has long skewed in Rafa’s favor: Entering yesterday’s match, Nadal led 23-11, including 9-2 in majors. Nadal’s defense has usually trumped Roger’s offense, but after a five-set battle in yesterday’s Australian Open final, it was Federer who came out on top. Rafa’s signature topspin was less explosive than usual, and Federer’s extremely aggressive tactics took advantage of the fast conditions to generate one opportunity after another in the deciding fifth set.

In the past, Nadal’s topspin has been particularly damaging to Federer’s one-handed backhand, one of the most beautiful shots in the sport–but not the most effective. The last time the two players met in Melbourne, in a 2014 semifinal the Spaniard won in straight sets, Nadal hit 89 crosscourt forehands, shots that challenges Federer’s backhand, nearly three-quarters of them (66) in points he won. Yesterday, he hit 122 crosscourt forehands, less than half of them in points he won. Rafa’s tactics were similar, but instead of advancing easily, he came out on the losing side.

Federer’s backhand was unusually effective yesterday, especially compared to his other matches against Nadal. It wasn’t the only thing he did well, but as we’ll see, it accounted for more than the difference between the two players.

A metric I’ve devised called Backhand Potency (BHP) illustrates just how much better Fed executed with his one-hander. BHP approximates the number of points whose outcomes were affected by the backhand: add one point for a winner or an opponent’s forced error, subtract one for an unforced error, add a half-point for a backhand that set up a winner or opponent’s error on the following shot, and subtract a half-point for a backhand that set up a winning shot from the opponent. Divide by the total number of backhands, multiply by 100*, and the result is net effect of each player’s backhand. Using shot-by-shot data from over 1,400 men’s matches logged by the Match Charting Project, we can calculate BHP for dozens of active players and many former stars.

* The average men’s match consists of approximately 125 backhands (excluding slices), while Federer and Nadal each hit over 200 in yesterday’s five-setter.

By the BHP metric, Federer’s backhand is neutral: +0.2 points per 100 backhands. Fed wins most points with his serve and his forehand; a neutral BHP indicates that while his backhand isn’t doing the damage, at least it isn’t working against him. Nadal’s BHP is +1.7 per 100 backhands, a few ticks below those of Murray and Djokovic, whose BHPs are +2.6 and +2.5, respectively. Among the game’s current elite, Kei Nishikori sports the best BHP, at +3.6, while Andre Agassi‘s was a whopping +5.0. At the other extreme, Marin Cilic‘s is -2.9, Milos Raonic‘s is -3.7, and Jack Sock‘s is -6.6. Fortunately, you don’t have to hit very many backhands to shine in doubles.

BHP tells us just how much Federer’s backhand excelled yesterday: It rose to +7.8 per 100 shots, a better mark than Fed has ever posted against his rival. Here are his BHPs for every Slam meeting:

```Match       RF BHP
2006 RG      -11.2
2006 WIMB*    -3.4
2007 RG       -0.7
2007 WIMB*    -1.0
2008 RG      -10.1
2008 WIMB     -0.8
2009 AO        0.0
2011 RG       -3.7
2012 AO       -0.2
2014 AO       -9.9
2017 AO*      +7.8

* matches won by Federer
```

Yesterday’s rate of +7.8 per 100 shots equates to an advantage of +17 over the course of his 219 backhands. One unit of BHP is equivalent to about two-thirds of a point of match play, since BHP can award up to a combined 1.5 points for the two shots that set up and then finish a point. Thus, a +17 BHP accounts for about 11 points, exactly the difference between Federer and Nadal yesterday. Such a performance differs greatly from what Nadal has done to Fed’s backhand in the past: On average, Rafa has knocked his BHP down to -1.9, a bit more than Nadal’s effect on his typical opponent, which is a -1.7 point drop. In the 25 Federer-Nadal matches for which the Match Charting Project has data, Federer has only posted a positive BHP five times, and before yesterday’s match, none of those achievements came at a major.

The career-long trend suggests that, next time Federer and Nadal meet, the topspin-versus-backhand matchup will return to normal. The only previous time Federer recorded a +5 BHP or better against Nadal, at the 2007 Tour Finals, he followed it up by falling to -10.1 in their next match, at the 2008 French Open. He didn’t post another positive BHP until 2010, six matches later.

Outlier or not, Federer’s backhand performance yesterday changed history.  Using the approximation provided by BHP, had Federer brought his neutral backhand, Nadal would have won 52% of the 289 points played—exactly his career average against the Swiss—instead of the 48% he actually won. The long-standing rivalry has required both players to improve their games for more than a decade, and at least for one day, Federer finally plugged the gap against the opponent who has frustrated him the most.

Benchmarks for Shot-by-Shot Analysis

In my post last week, I outlined what the error stats of the future may look like. A wide range of advanced stats across different sports, from baseball to ice hockey–and increasingly in tennis–follow the same general algorithm:

1. Classify events (shots, opportunities, whatever) into categories;
2. Establish expected levels of performance–often league-average–for each category;
3. Compare players (or specific games or tournaments) to those expected levels.

The first step is, by far, the most complex. Classification depends in large part on available data. In baseball, for example, the earliest fielding metrics of this type had little more to work with than the number of balls in play. Now, batted balls can be categorized by exact location, launch angle, speed off the bat, and more. Having more data doesn’t necessarily make the task any simpler, as there are so many potential classification methods one could use.

The same will be true in tennis, eventually, when Hawkeye data (or something similar) is publicly available. For now, those of us relying on public datasets still have plenty to work with, particularly the 1.6 million shots logged as part of the Match Charting Project.*

*The Match Charting Project is a crowd-sourced effort to track professional matches. Please help us improve tennis analytics by contributing to this one-of-a-kind dataset. Click here to find out how to get started.

The shot-coding method I adopted for the Match Charting Project makes step one of the algorithm relatively straightforward. MCP data classifies shots in two primary ways: type (forehand, backhand, backhand slice, forehand volley, etc.) and direction (down the middle, or to the right or left corner). While this approach omits many details (depth, speed, spin, etc.), it’s about as much data as we can expect a human coder to track in real-time.

For example, we could use the MCP data to find the ATP tour-average rate of unforced errors when a player tries to hit a cross-court forehand, then compare everyone on tour to that benchmark. Tour average is 10%, Novak Djokovic‘s unforced error rate is 7%, and John Isner‘s is 17%. Of course, that isn’t the whole picture when comparing the effectiveness of cross-court forehands: While the average ATPer hits 7% of his cross-court forehands for winners, Djokovic’s rate is only 6% compared to Isner’s 16%.

However, it’s necessary to take a wider perspective. Instead of shots, I believe it will be more valuable to investigate shot opportunities. That is, instead of asking what happens when a player is in position to hit a specific shot, we should be figuring out what happens when the player is presented with a chance to hit a shot in a certain part of the court.

This is particularly important if we want to get beyond the misleading distinction between forced and unforced errors. (As well as the line between errors and an opponent’s winners, which lie on the same continuum–winners are simply shots that were too good to allow a player to make a forced error.) In the Isner/Djokovic example above, our denominator was “forehands in a certain part of the court that the player had a reasonable chance of putting back in play”–that is, successful forehands plus forehand unforced errors. We aren’t comparing apples to apples here: Given the exact same opportunities, Djokovic is going to reach more balls, perhaps making unforced errors where we would call Isner’s mistakes forced errors.

Outcomes of opportunities

Let me clarify exactly what I mean by shot opportunities. They are defined by what a player’s opponent does, regardless of how the player himself manages to respond–or if he manages to get a racket on the ball at all. For instance, assuming a matchup between right-handers, here is a cross-court forehand:

Player A, at the top of the diagram, is hitting the shot, presenting player B with a shot opportunity. Here is one way of classifying the outcomes that could ensue, together with the abbreviations I’ll use for each in the charts below:

• player B fails to reach the ball, resulting in a winner for player A (vs W)
• player B reaches the ball, but commits a forced error (FE)
• player B commits an unforced error (UFE)
• player B puts the ball back in play, but goes on to lose the point (ip-L)
• player B puts the ball back in play, presents player A with a “makeable” shot, and goes on to win the point (ip-W)
• player B causes player A to commit a forced error (ind FE)
• player B hits a winner (W)

As always, for any given denominator, we could devise different categories, perhaps combining forced and unforced errors into one, or further classifying the “in play” categories to identify whether the player is setting himself up to quickly end the point. We might also look at different categories altogether, like shot selection.

In any case, the categories above give us a good general idea of how players respond to different opportunities, and how those opportunities differ from each other. The following chart shows–to adopt the language of the example above–player B’s outcomes based on player A’s shots, categorized only by shot type:

The outcomes are stacked from worst to best. At the bottom is the percentage of opponent winners (vs W)–opportunities where the player we’re interested in didn’t even make contact with the ball. At the top is the percentage of winners (W) that our player hit in response to the opportunity. As we’d expect, forehands present the most difficult opportunities: 5.7% of them go for winners and another 4.6% result in forced errors. Players are able to convert those opportunities into points won only 42.3% of the time, compared to 46.3% when facing a backhand, 52.5% when facing a backhand slice (or chip), and 56.3% when facing a forehand slice.

The above chart is based on about 374,000 shots: All the baseline opportunities that arose (that is, excluding serves, which need to be treated separately) in over 1,000 logged matches between two righties. Of course, there are plenty of important variables to further distinguish those shots, beyond simply categorizing by shot type. Here are the outcomes of shot opportunities at various stages of the rally when the player’s opponent hits a forehand:

The leftmost column can be seen as the results of “opportunities to hit a third shot”–that is, outcomes when the serve return is a forehand. Once again, the numbers are in line with what we would expect: The best time to hit a winner off a forehand is on the third shot–the “serve-plus-one” tactic. We can see that in another way in the next column, representing opportunities to hit a fourth shot. If your opponent hits a forehand in play for his serve-plus-one shot, there’s a 10% chance you won’t even be able to get a racket on it. The average player’s chances of winning the point from that position are only 38.4%.

Beyond the 3rd and 4th shot, I’ve divided opportunities into those faced by the server (5th shot, 7th shot, and so on) and those faced by the returner (6th, 8th, etc.). As you can see, by the 5th shot, there isn’t much of a difference, at least not when facing a forehand.

Let’s look at one more chart: Outcomes of opportunities when the opponent hits a forehand in various directions. (Again, we’re only looking at righty-righty matchups.)

There’s very little difference between the two corners, and it’s clear that it’s more difficult to make good of a shot opportunity in either corner than it is from the middle. It’s interesting to note here that, when faced with a forehand that lands in play–regardless of where it is aimed–the average player has less than a 50% chance of winning the point. This is a confusing instance of selection bias that crops up occasionally in tennis analytics: Because a significant percentage of shots are errors, the player who just placed a shot in the court has a temporary advantage.

Next steps

If you’re wondering what the point of all of this is, I understand. (And I appreciate you getting this far despite your reservations.) Until we drill down to much more specific situations–and maybe even then–these tour averages are no more than curiosities. It doesn’t exactly turn the analytics world upside down to show that forehands are more effective than backhand slices, or that hitting to the corners is more effective than hitting down the middle.

These averages are ultimately only tools to better quantify the accomplishments of specific players. As I continue to explore this type of algorithm, combined with the growing Match Charting Project dataset, we’ll learn a lot more about the characteristics of the world’s best players, and what makes some so much more effective than others.

The Match Charting Project, 2017 Update

2016 was a great year for the Match Charting Project (MCP), my crowdsourced effort to improve the state of tennis statistics. Many new contributors joined the project, the data played a part in more research than ever, and best of all, we added over 1,000 new matches to the database.

For those who don’t know, the MCP is a volunteer effort from dozens of devoted tennis fans to collect shot-by-shot data for professional matches. The resulting data is vastly more detailed than anything else available to the public. You can find an extremely in-depth report on every match in the database–for example, here’s the 2016 Singapore final–as well as an equally detailed report on every player with more than one charted match. Here’s Andy Murray.

In 2016, we:

• added 1,145 new matches to the database, more than in any previous year;
• charted more WTA than ATP matches, bringing women’s tennis to near parity in the project;
• nearly completed the set of charted Grand Slam finals back to 1980;
• filled in the gaps to have at least one charted match of every member of the ATP top 200, and 198 of the WTA top 200;
• reached double digits in charted matches for every player in the ATP top 49 (sorry, Florian Mayer, we’re working on it!) and the WTA top 58;
• logged over 174,000 points and nearly 700,000 shots.

I believe 2017 can be even better. To make that happen, we could really use your help. As with most projects of this nature, a small number of contributors do the bulk of the work, and the MCP is no different–Isaac and Edo both charted more than 200 matches last year.

There are plenty of reasons to contribute: It will make you a more knowledgeable tennis fan, it will help add to the sum of human knowledge, and it can even be fun. Click here to find out how to get started.

I’m proud of the work we’ve done so far, and I hope that the first 2,700 matches are only the beginning.