The Federer Backhand That Finally Beat Nadal

Italian translation at settesei.it

Roger Federer and Rafael Nadal first met on court in 2004, and they contested their first Grand Slam final two years later. The head-to-head has long skewed in Rafa’s favor: Entering yesterday’s match, Nadal led 23-11, including 9-2 in majors. Nadal’s defense has usually trumped Roger’s offense, but after a five-set battle in yesterday’s Australian Open final, it was Federer who came out on top. Rafa’s signature topspin was less explosive than usual, and Federer’s extremely aggressive tactics took advantage of the fast conditions to generate one opportunity after another in the deciding fifth set.

In the past, Nadal’s topspin has been particularly damaging to Federer’s one-handed backhand, one of the most beautiful shots in the sport–but not the most effective. The last time the two players met in Melbourne, in a 2014 semifinal the Spaniard won in straight sets, Nadal hit 89 crosscourt forehands, shots that challenges Federer’s backhand, nearly three-quarters of them (66) in points he won. Yesterday, he hit 122 crosscourt forehands, less than half of them in points he won. Rafa’s tactics were similar, but instead of advancing easily, he came out on the losing side.

Federer’s backhand was unusually effective yesterday, especially compared to his other matches against Nadal. It wasn’t the only thing he did well, but as we’ll see, it accounted for more than the difference between the two players.

A metric I’ve devised called Backhand Potency (BHP) illustrates just how much better Fed executed with his one-hander. BHP approximates the number of points whose outcomes were affected by the backhand: add one point for a winner or an opponent’s forced error, subtract one for an unforced error, add a half-point for a backhand that set up a winner or opponent’s error on the following shot, and subtract a half-point for a backhand that set up a winning shot from the opponent. Divide by the total number of backhands, multiply by 100*, and the result is net effect of each player’s backhand. Using shot-by-shot data from over 1,400 men’s matches logged by the Match Charting Project, we can calculate BHP for dozens of active players and many former stars.

* The average men’s match consists of approximately 125 backhands (excluding slices), while Federer and Nadal each hit over 200 in yesterday’s five-setter.

By the BHP metric, Federer’s backhand is neutral: +0.2 points per 100 backhands. Fed wins most points with his serve and his forehand; a neutral BHP indicates that while his backhand isn’t doing the damage, at least it isn’t working against him. Nadal’s BHP is +1.7 per 100 backhands, a few ticks below those of Murray and Djokovic, whose BHPs are +2.6 and +2.5, respectively. Among the game’s current elite, Kei Nishikori sports the best BHP, at +3.6, while Andre Agassi‘s was a whopping +5.0. At the other extreme, Marin Cilic‘s is -2.9, Milos Raonic‘s is -3.7, and Jack Sock‘s is -6.6. Fortunately, you don’t have to hit very many backhands to shine in doubles.

BHP tells us just how much Federer’s backhand excelled yesterday: It rose to +7.8 per 100 shots, a better mark than Fed has ever posted against his rival. Here are his BHPs for every Slam meeting:

Match       RF BHP  
2006 RG      -11.2  
2006 WIMB*    -3.4  
2007 RG       -0.7  
2007 WIMB*    -1.0  
2008 RG      -10.1  
2008 WIMB     -0.8  
2009 AO        0.0  
2011 RG       -3.7  
2012 AO       -0.2  
2014 AO       -9.9  
2017 AO*      +7.8 

* matches won by Federer

Yesterday’s rate of +7.8 per 100 shots equates to an advantage of +17 over the course of his 219 backhands. One unit of BHP is equivalent to about two-thirds of a point of match play, since BHP can award up to a combined 1.5 points for the two shots that set up and then finish a point. Thus, a +17 BHP accounts for about 11 points, exactly the difference between Federer and Nadal yesterday. Such a performance differs greatly from what Nadal has done to Fed’s backhand in the past: On average, Rafa has knocked his BHP down to -1.9, a bit more than Nadal’s effect on his typical opponent, which is a -1.7 point drop. In the 25 Federer-Nadal matches for which the Match Charting Project has data, Federer has only posted a positive BHP five times, and before yesterday’s match, none of those achievements came at a major.

The career-long trend suggests that, next time Federer and Nadal meet, the topspin-versus-backhand matchup will return to normal. The only previous time Federer recorded a +5 BHP or better against Nadal, at the 2007 Tour Finals, he followed it up by falling to -10.1 in their next match, at the 2008 French Open. He didn’t post another positive BHP until 2010, six matches later.

Outlier or not, Federer’s backhand performance yesterday changed history.  Using the approximation provided by BHP, had Federer brought his neutral backhand, Nadal would have won 52% of the 289 points played—exactly his career average against the Swiss—instead of the 48% he actually won. The long-standing rivalry has required both players to improve their games for more than a decade, and at least for one day, Federer finally plugged the gap against the opponent who has frustrated him the most.

Benchmarks for Shot-by-Shot Analysis

Italian translation at settesei.it

In my post last week, I outlined what the error stats of the future may look like. A wide range of advanced stats across different sports, from baseball to ice hockey–and increasingly in tennis–follow the same general algorithm:

  1. Classify events (shots, opportunities, whatever) into categories;
  2. Establish expected levels of performance–often league-average–for each category;
  3. Compare players (or specific games or tournaments) to those expected levels.

The first step is, by far, the most complex. Classification depends in large part on available data. In baseball, for example, the earliest fielding metrics of this type had little more to work with than the number of balls in play. Now, batted balls can be categorized by exact location, launch angle, speed off the bat, and more. Having more data doesn’t necessarily make the task any simpler, as there are so many potential classification methods one could use.

The same will be true in tennis, eventually, when Hawkeye data (or something similar) is publicly available. For now, those of us relying on public datasets still have plenty to work with, particularly the 1.6 million shots logged as part of the Match Charting Project.*

*The Match Charting Project is a crowd-sourced effort to track professional matches. Please help us improve tennis analytics by contributing to this one-of-a-kind dataset. Click here to find out how to get started.

The shot-coding method I adopted for the Match Charting Project makes step one of the algorithm relatively straightforward. MCP data classifies shots in two primary ways: type (forehand, backhand, backhand slice, forehand volley, etc.) and direction (down the middle, or to the right or left corner). While this approach omits many details (depth, speed, spin, etc.), it’s about as much data as we can expect a human coder to track in real-time.

For example, we could use the MCP data to find the ATP tour-average rate of unforced errors when a player tries to hit a cross-court forehand, then compare everyone on tour to that benchmark. Tour average is 10%, Novak Djokovic‘s unforced error rate is 7%, and John Isner‘s is 17%. Of course, that isn’t the whole picture when comparing the effectiveness of cross-court forehands: While the average ATPer hits 7% of his cross-court forehands for winners, Djokovic’s rate is only 6% compared to Isner’s 16%.

However, it’s necessary to take a wider perspective. Instead of shots, I believe it will be more valuable to investigate shot opportunities. That is, instead of asking what happens when a player is in position to hit a specific shot, we should be figuring out what happens when the player is presented with a chance to hit a shot in a certain part of the court.

This is particularly important if we want to get beyond the misleading distinction between forced and unforced errors. (As well as the line between errors and an opponent’s winners, which lie on the same continuum–winners are simply shots that were too good to allow a player to make a forced error.) In the Isner/Djokovic example above, our denominator was “forehands in a certain part of the court that the player had a reasonable chance of putting back in play”–that is, successful forehands plus forehand unforced errors. We aren’t comparing apples to apples here: Given the exact same opportunities, Djokovic is going to reach more balls, perhaps making unforced errors where we would call Isner’s mistakes forced errors.

Outcomes of opportunities

Let me clarify exactly what I mean by shot opportunities. They are defined by what a player’s opponent does, regardless of how the player himself manages to respond–or if he manages to get a racket on the ball at all. For instance, assuming a matchup between right-handers, here is a cross-court forehand:

illustration of a shot opportunity

Player A, at the top of the diagram, is hitting the shot, presenting player B with a shot opportunity. Here is one way of classifying the outcomes that could ensue, together with the abbreviations I’ll use for each in the charts below:

  • player B fails to reach the ball, resulting in a winner for player A (vs W)
  • player B reaches the ball, but commits a forced error (FE)
  • player B commits an unforced error (UFE)
  • player B puts the ball back in play, but goes on to lose the point (ip-L)
  • player B puts the ball back in play, presents player A with a “makeable” shot, and goes on to win the point (ip-W)
  • player B causes player A to commit a forced error (ind FE)
  • player B hits a winner (W)

As always, for any given denominator, we could devise different categories, perhaps combining forced and unforced errors into one, or further classifying the “in play” categories to identify whether the player is setting himself up to quickly end the point. We might also look at different categories altogether, like shot selection.

In any case, the categories above give us a good general idea of how players respond to different opportunities, and how those opportunities differ from each other. The following chart shows–to adopt the language of the example above–player B’s outcomes based on player A’s shots, categorized only by shot type:

Outcomes of opportunities by shot type

The outcomes are stacked from worst to best. At the bottom is the percentage of opponent winners (vs W)–opportunities where the player we’re interested in didn’t even make contact with the ball. At the top is the percentage of winners (W) that our player hit in response to the opportunity. As we’d expect, forehands present the most difficult opportunities: 5.7% of them go for winners and another 4.6% result in forced errors. Players are able to convert those opportunities into points won only 42.3% of the time, compared to 46.3% when facing a backhand, 52.5% when facing a backhand slice (or chip), and 56.3% when facing a forehand slice.

The above chart is based on about 374,000 shots: All the baseline opportunities that arose (that is, excluding serves, which need to be treated separately) in over 1,000 logged matches between two righties. Of course, there are plenty of important variables to further distinguish those shots, beyond simply categorizing by shot type. Here are the outcomes of shot opportunities at various stages of the rally when the player’s opponent hits a forehand:

Outcomes of forehand responses based on number of shots

The leftmost column can be seen as the results of “opportunities to hit a third shot”–that is, outcomes when the serve return is a forehand. Once again, the numbers are in line with what we would expect: The best time to hit a winner off a forehand is on the third shot–the “serve-plus-one” tactic. We can see that in another way in the next column, representing opportunities to hit a fourth shot. If your opponent hits a forehand in play for his serve-plus-one shot, there’s a 10% chance you won’t even be able to get a racket on it. The average player’s chances of winning the point from that position are only 38.4%.

Beyond the 3rd and 4th shot, I’ve divided opportunities into those faced by the server (5th shot, 7th shot, and so on) and those faced by the returner (6th, 8th, etc.). As you can see, by the 5th shot, there isn’t much of a difference, at least not when facing a forehand.

Let’s look at one more chart: Outcomes of opportunities when the opponent hits a forehand in various directions. (Again, we’re only looking at righty-righty matchups.)

Outcomes of forehand responses based on shot direction

There’s very little difference between the two corners, and it’s clear that it’s more difficult to make good of a shot opportunity in either corner than it is from the middle. It’s interesting to note here that, when faced with a forehand that lands in play–regardless of where it is aimed–the average player has less than a 50% chance of winning the point. This is a confusing instance of selection bias that crops up occasionally in tennis analytics: Because a significant percentage of shots are errors, the player who just placed a shot in the court has a temporary advantage.

Next steps

If you’re wondering what the point of all of this is, I understand. (And I appreciate you getting this far despite your reservations.) Until we drill down to much more specific situations–and maybe even then–these tour averages are no more than curiosities. It doesn’t exactly turn the analytics world upside down to show that forehands are more effective than backhand slices, or that hitting to the corners is more effective than hitting down the middle.

These averages are ultimately only tools to better quantify the accomplishments of specific players. As I continue to explore this type of algorithm, combined with the growing Match Charting Project dataset, we’ll learn a lot more about the characteristics of the world’s best players, and what makes some so much more effective than others.

Measuring the Performance of Tennis Prediction Models

With the recent buzz about Elo rankings in tennis, both at FiveThirtyEight and here at Tennis Abstract, comes the ability to forecast the results of tennis matches. It’s not far fetched to ask yourself, which of these different models perform better and, even more interesting, how they fare compared to other ‘models’, such as the ATP ranking system or betting markets.

For this, admittedly limited, investigation, we collected the (implied) forecasts of five models, that is, FiveThirtyEight, Tennis Abstract, Riles, the official ATP rankings, and the Pinnacle betting market for the US Open 2016. The first three models are based on Elo. For inferring forecasts from the ATP ranking, we use a specific formula1 and for Pinnacle, which is one of the biggest tennis bookmakers, we calculate the implied probabilities based on the provided odds (minus the overround)2.

Next, we simply compare forecasts with reality for each model asking If player A was predicted to be the winner ($latex P(a) > 0.5$), did he really win the match? When we do that for each match and each model (ignoring retirements or walkovers) we come up with the following results.

Model		% correct
Pinnacle	76.92%
538		75.21%
TA		74.36%
ATP		72.65%
Riles		70.09%

What we see here is how many percent of the predictions were actually right. The betting model (based on the odds of Pinnacle) comes out on top followed by the Elo models of FiveThirtyEight and Tennis Abstract. Interestingly, the Elo model of Riles is outperformed by the predictions inferred from the ATP ranking. Since there are several parameters that can be used to tweak an Elo model, Riles may still have some room left for improvement.

However, just looking at the percentage of correctly called matches does not tell the whole story. In fact, there are more granular metrics to investigate the performance of a prediction model: Calibration, for instance, captures the ability of a model to provide forecast probabilities that are close to the true probabilities. In other words, in an ideal model, we want 70% forecasts to be true exactly in 70% of the cases. Resolution measures how much the forecasts differ from the overall average. The rationale here is, that just using the expected average values for forecasting will lead to a reasonably well-calibrated set of predictions, however, it will not be as useful as a method that manages the same calibration while taking current circumstances into account. In other words, the more extreme (and still correct) forecasts are, the better.

In the following table we categorize the set of predictions into bins of different probabilities and show how many percent of the predictions were correct per bin. This also enables us to calculate Calibration and Resolution measures for each model.

Model    50-59%  60-69%  70-79%  80-89%  90-100% Cal  Res   Brier
538      53%     61%     85%     80%     91%     .003 .082  .171
TA       56%     75%     78%     74%     90%     .003 .072  .182
Riles    56%     86%     81%     63%     67%     .017 .056  .211
ATP      50%     73%     77%     84%     100%    .003 .068  .185
Pinnacle 52%     91%     71%     77%     95%     .015 .093  .172

As we can see, the predictions are not always perfectly in line with what the corresponding bin would suggest. Some of these deviations, for instance the fact that for the Riles model only 67% of the 90-100% forecasts were correct, can be explained by small sample size (only three in that case). However, there are still two interesting cases (marked in bold) where sample size is better and which raised my interest. Both the Riles and Pinnacle models seem to be strongly underconfident (statistically significant) with their 60-69% predictions. In other words, these probabilities should have been higher, because, in reality, these forecasts were actually true 86% and 91% percent of the times.3 For the betting aficionados, the fact that Pinnacle underestimates the favorites here may be really interesting, because it could reveal some value as punters would say. For the Riles model, this would maybe be a starting point to tweak the model.

In the last three columns Calibration (the lower the better), Resolution (the higher the better), and the Brier score (the lower the better) are shown. The Brier score combines Calibration and Resolution (and the uncertainty of the outcomes) into a single score for measuring the accuracy of predictions. The models of FiveThirtyEight and Pinnacle (for the used subset of data) essentially perform equally good. Then there is a slight gap until the model of Tennis Abstract and the ATP ranking model come in third and fourth, respectively. The Riles model performs worst in terms of both Calibration and Resolution, hence, ranking fifth in this analysis.

To conclude, I would like to show a common visual representation that is used to graphically display a set of predictions. The reliability diagram compares the observed rate of forecasts with the forecast probability (similar to the above table).

The closer one of the colored lines is to the black line, the more reliable the forecasts are. If the forecast lines are above the black line, it means that forecasts are underconfident, in the opposite case, forecasts are overconfident. Given that we only investigated one tournament and therefore had to work with a low sample size (117 predictions), the big swings in the graph are somewhat expected. Still, we can see that the model based on ATP rankings does a really good job in preventing overestimations even though it is known to be outperformed by Elo in terms of prediction accuracy.

To sum up, this analysis shows how different predictive models for tennis can be compared among each other in a meaningful way. Moreover, I hope I could exhibit some of the areas where a model is good and where it’s bad. Obviously, this investigation could go into much more detail by, for example, comparing the models in how well they do for different kinds of players (e.g., based on ranking), different surfaces, etc. This is something I will spare for later. For now, I’ll try to get my sleeping patterns accustomed to the schedule of play for the Australian Open, and I hope, you can do the same.

Peter Wetz is a computer scientist interested in racket sports and data analytics based in Vienna, Austria.

Footnotes

1. $latex P(a) = a^e / (a^e + b^e) $ where $latex a $ are player A’s ranking points, $latex b $ are player B’s ranking points, and $latex e $ is a constant. We use $latex e = 0.85 $ for ATP men’s singles.

2. The betting market in itself is not really a model, that is, the goal of the bookmakers is simply to balance their book. This means that the odds, more or less, reflect the wisdom of the crowd, making it a very good predictor.

3. As an example, one instance, where Pinnacle was underconfident and all other models were more confident is the R32 encounter between Ivo Karlovic and Jared Donaldson. Pinnacle’s implied probability for Karlovic to win was 64%. The other models (except the also underconfident Riles model) gave 72% (ATP ranking), 75% (FiveThirtyEight), and 82% (Tennis Abstract). Turns out, Karlovic won in straight sets. One factor at play here might be that these were the US Open where more US citizens are likely to be confident about the US player Jared Donaldson and hence place a bet on him. As a consequence, to balance the book, Pinnacle will lower the odds on Donaldson, which results in higher odds (and a lower implied probability) for Karlovic.

The Continuum of Errors

Italian translation at settesei.it

When is an error unforced? If you envision designing an algorithm to answer that question, it quickly becomes unmanageable. You’d need to take into account player position, shot velocity, angle, and spin, surface speed, and perhaps more. Many errors are obviously forced or unforced, but plenty fall into an ambiguous middle ground.

Most of the unforced error counts we see these days–via broadcasts or in post-match recaps–are counted by hand. A scorer is given some guidance, and he or she tallies each kind of error. If the human-scoring algorithm is boiled down to a single rule, it’s something like: “Would a typical pro be expected to make that shot?” Some scorers limit the number of unforced errors by always counting serve returns, or net shots, or attempted passing shots, as forced.

Of course, any attempt to sort missed shots into only two buckets is a gross oversimplification. I don’t think this is a radical viewpoint. Many tennis commentators acknowledge this when they explain that a player’s unforced error count “doesn’t tell the whole story,” or something to that effect. In the past, I’ve written about the limitations of the frequently-cited winner-to-unforced error ratio, and the similarity between unforced errors and the rightly-maligned fielding errors stat in baseball.

Imagine for a moment that we have better data to work with–say, Hawkeye data that isn’t locked in silos–and we can sketch out an improved way of looking at errors.

First, instead of classifying only errors, it’s more instructive to sort potential shots into three categories: shots returned in play, errors (which we can further distinguish later on), and opponent winners. In other words: Did you make it, did you miss it, or did you fail to even get a racket on it? One man’s forced error is another man’s ball put back in play*, so we need to consider the full range of possible outcomes from each potential shot.

*especially if the first man is Bernard Tomic and the other man is Andy Murray.

The key to gaining insight from tennis statistics is increasing the amount of context available–for instance, taking a player’s stats from today and comparing them to the typical performance of a tour player, or contrasting them with how he or she played in the last similar matchup. Errors are no different.

Here’s a basic example. In the sixth game of Angelique Kerber‘s match in Sydney this week against Darya Kasatkina, she hit a down-the-line forehand:

Kerber hits a down-the-line forehand

Thanks to the Match Charting Project, we have data for about 350 of Kerber’s down-the-line forehands, so we know it goes for a winner 25% of the time, and her opponent hits a forced error another 9% of the time. Say that a further 11% turn into unforced errors, and we have a profile for what usually happens when Kerber goes down the line: 25% winners, 20% errors, 55% put back in play. We might dig even deeper and establish that the 55% put back in play consists of 30% that ultimately resulted in Kerber winning the point against 25% that she eventually lost.

In this case, Kasatkina was able to get a racket on the ball, but missed the shot, resulting in what most scorers would agree was a forced error:

Kasatkina lunges for the return

This single instance–Kasatkina hitting a forced error against a very effective type of offensive shot–doesn’t tell us anything on its own. Imagine, though, that we tracked several players in 100 attempts each to reply to a Kerber down-the-line forehand. We might discover that Kasatkina lets 35 of 100 go for winners, or that Simona Halep lets only 15 go for winners and gets 70 back in play, or that Anastasia Pavlyuchenkova hits an error on 30 of the 100 attempts.

My point is this: With more granular data, we can put errors in a real-life context. Instead of making a judgment about the difficulty of a certain shot (or relying on a scorer to do so), it’s feasible to let an algorithm do the work on 100 shots, telling us whether a player is getting to more balls than the average player, or making more errors than she usually does.

The continuum, and the future

In the example outlined above, there’s a lot of important details that I didn’t mention. In comparing Kasatkina’s error to a few hundred other down-the-line Kerber forehands, we don’t know whether the shot was harder than usual, whether it was placed more accurately in the corner, whether Kasatkina was in better position than Kerber’s typical opponent on that type of shot, or the speed of the surface. Over the course of 100 down-the-line forehands, those factors would probably even out. But in Tuesday’s match, Kerber hit only 18 of them. While a typical best-of-three match will give us a few hundred shots to work with, this level of analysis can only tell us so much about specific shots.

The ideal error-classifying algorithm of the future would do much better. It would take all of the variables I’ve mentioned (and more, undoubtedly) and, for any shot, calculate the likelihood of different outcomes. At the moment of the first image above, when the ball has just come off of Kerber’s racket, with Kasatkina on the wrong half of the baseline, we might estimate that there is a 35% chance of a winner, a 25% chance of an error, and a 40% chance that ball is returned in play. Depending on the type of analysis we’re doing, we could calculate those numbers for the average WTA player, or for Kasatkina herself.

Those estimates would allow us, in effect, to “rate” errors. In this example, the algorithm gives Kasatkina only a 40% chance of getting the ball back in play. By contrast, an average rallying shot probably has a 90% chance of ending up back in play. Instead of placing errors in buckets of “forced” and “unforced,” we could draw lines wherever we wish, perhaps separating potential shots into quintiles. We would be able to quantify whether, for instance, Andy Murray gets more of the most unreturnable shots back in play than Novak Djokovic does. Even if we have an intuition about that already, we can’t even begin to prove it until we’ve established precisely what that “unreturnable” quintile (or quartile, or whatever) consists of.

This sort of analysis would be engaging even for those fans who never look at aggregate stats. Imagine if a broadcaster could point to a specific shot and say that Murray had only a 2% chance of putting it back in play. In topsy-turvy rallies, this approach could generate a win probability graph for a single point, an image that could encapsulate just how hard a player worked to come back from the brink.

Fortunately, the technology to accomplish this is already here. Researchers with access to subsets of Hawkeye data have begun drilling down to the factors that influence things like shot selection. Playsight’s “SmartCourts” classify errors into forced and unforced in close to real time, suggesting that there is something much more sophisticated running in the background, even if its AI occasionally makes clunky mistakes. Another possible route is applying existing machine learning algorithms to large quantities of match video, letting the algorithms work out for themselves which factors best predict winners, errors, and other shot outcomes.

Someday, tennis fans will look back on the early 21st century and marvel at just how little we knew about the sport back then.

New at Tennis Abstract: Point-by-Point Stats

Yesterday, I announced the new ATP doubles results on Tennis Abstract. Today, I want to show you something else I rolled out over the offseason: sequential point-by-point stats for more than 100,000 matches.

Traditional match stats can do no more than summarize the action. Point-by-point stats are so much more revealing: They show us how matches unfold and allow us to look much deeper into topics such as momentum and situational skill. These are subjects that remain mysteries–or, at the very least, poorly quantified–in tennis.

As an example, let’s take a look at the new data available for one memorable contest, the World Tour Finals semifinal between Andy Murray and Milos Raonic:

The centerpiece of each page is a win probability graph, which shows the odds that one player would win the match after each point. These graphs do not take player skill into account, though they are adjusted for gender and surface. The red line shows one player’s win probability, while the grey line indicates “volatility”–a measure of how much each point matters. You can see exact win probability and volatility numbers by moving your cursor over the graph. Most match graphs aren’t nearly as dramatic as this one; of course, most matches aren’t nearly as dramatic as this one was.

(I’ve written a lot about win probability in the past, and I’ve also published the code I use to calculate in-match win probability.)

Next comes a table with situational serving stats for both players. In the screenshot above, you can see deuce/ad splits; the page continues, with tiebreak-specific totals and tallies for break points, set points, and match points. After that is an exhaustive, point-by-point text recap of the match, which displays the sequence of every point played.

I’ve tried to make these point-by-point match pages as easy to find as possible. Whenever you see a link on a match score, just click that link for the point-by-point page. For instance, here is part of Andy Murray’s page, showing where to click to find the Murray-Raonic example shown above:

As you can see from all the blue scores in this screenshot, most 2016 ATP tour-level matches have point-by-point data available. The same is true for the last few seasons, as well as top-level WTA matches. The lower the level, the fewer matches are available, but you might be surprised by the breadth and depth of the coverage. The site now contains point-by-point data for almost half of 2016 main-draw men’s Futures matches. For instance, here’s the graph for a Futures final last May between Stefanos Tsitsipas and Casper Ruud.

I’ll keep these as up-to-date as I can, but with my current setup, you can expect to wait 1-4 weeks after a match before the point-by-point page becomes available. I’m hoping to further automate the process and shorten the wait over the course of this season.

Enjoy!

New at Tennis Abstract: ATP Doubles!

At last, I’ve added ATP doubles results to player pages at Tennis Abstract. Doubles has long been relegated to second-class status by tennis analytics, largely because the data just isn’t there. Now, much more is readily available.

Tennis Abstract now has career doubles results (including Challengers, Futures, and Satellites) for thousands of ATP players, and they’ll be updated throughout each day’s play, just like singles results. Match times and traditional match stats are included for most 2016 ATP and Challenger tournaments, and I hope that will continue to be the case in 2017 and beyond.

Let me give you a brief tour of what you’ll find, using doubles legend Jack Sock as a starting point:

The big red “1” shows where to click to switch over to doubles results. For full-time doubles specialists, you won’t have to click–the site will automatically show you doubles results.

The “2” indicates three new doubles-specific filters: by partner, by opponent, and by opposing team. For instance, you can see Sock’s results with Vasek Pospisil, his eight matches against Daniel Nestor, or his twelve meetings with the Bryan Brothers. You may always combine multiple filters, so for example, you can look at Sock’s record against the Bryans only when partnering Pospisil.

There are three more new filters, marked by the big “3” toward the bottom. The “vs Hands” filter allows you to select matches against righty-righty, righty-lefty, or lefty-lefty teams. “Partner Hand” and “Partner Rank” make it possible to limit matches to those in which the partner had certain characteristics.

Finally, the “4” shows you where to access more detailed stats. Doubles results take a lot more room to display than singles results, so on the default view, the only “stats” on offer are match time and Dominance Ratio. Click on “Serve,” “Return,” or “Raw” to get the other traditional numbers, such as ace rate, first-serve points won, or break points converted. All of these numbers are totals for each team; individual player stats are almost never available for doubles matches.

I hope you enjoy this new resource. It’s something I’ve wanted for a long time, so I’m excited to be able to use it myself. There are still some minor gaps in the record, as well as some kinks in the functionality, so please be patient as I try to work all of that out.

For those of you who’d like to see WTA doubles results, as well: Me too! I can’t promise any particular deadline, but I’ve already done much of the work to build the dataset, so I’m hoping to add them to women’s player pages early this year. Stay tuned!

The Match Charting Project, 2017 Update

2016 was a great year for the Match Charting Project (MCP), my crowdsourced effort to improve the state of tennis statistics. Many new contributors joined the project, the data played a part in more research than ever, and best of all, we added over 1,000 new matches to the database.

For those who don’t know, the MCP is a volunteer effort from dozens of devoted tennis fans to collect shot-by-shot data for professional matches. The resulting data is vastly more detailed than anything else available to the public. You can find an extremely in-depth report on every match in the database–for example, here’s the 2016 Singapore final–as well as an equally detailed report on every player with more than one charted match. Here’s Andy Murray.

In 2016, we:

  • added 1,145 new matches to the database, more than in any previous year;
  • charted more WTA than ATP matches, bringing women’s tennis to near parity in the project;
  • nearly completed the set of charted Grand Slam finals back to 1980;
  • filled in the gaps to have at least one charted match of every member of the ATP top 200, and 198 of the WTA top 200;
  • reached double digits in charted matches for every player in the ATP top 49 (sorry, Florian Mayer, we’re working on it!) and the WTA top 58;
  • logged over 174,000 points and nearly 700,000 shots.

I believe 2017 can be even better. To make that happen, we could really use your help. As with most projects of this nature, a small number of contributors do the bulk of the work, and the MCP is no different–Isaac and Edo both charted more than 200 matches last year.

There are plenty of reasons to contribute: It will make you a more knowledgeable tennis fan, it will help add to the sum of human knowledge, and it can even be fun. Click here to find out how to get started.

I’m proud of the work we’ve done so far, and I hope that the first 2,700 matches are only the beginning.

The Unexpectedly Predictable IPTL

December is here, and with the tennis offseason almost five days old, it’s time to resume the annual ritual of pretending we care about exhibitions. The hit-and-giggle circuit gets underway in earnest tomorrow with the kickoff, in Japan, of the 2016 IPTL slate.

The star-studded IPTL, or International Premier Tennis League, is two years old, and uses a format similar to that of the USA’s World Team Tennis. Each match consists of five separate sets: one each of men’s singles, women’s singles, (men’s) champions’ singles, men’s doubles, and mixed doubles. Games are no-ad, each set is played to six games, and a tiebreak is played at 5-5. At the end of all those sets, if both teams have the same number of games, representatives of each side’s sponsors thumb-wrestle to determine the winner. Or something like that. It doesn’t really matter.

As with any exhibition, players don’t take the competition too seriously. Elites who sit out November tournaments due to injury find themselves able to compete in December, given a sufficient appearance fee. It’s entertaining, but compared to the first eleven months of the year, it isn’t “real” tennis.

That triggers an unusual research question: How predictable are IPTL sets? If players have nothing at stake, are outcomes simply random? Or do all the participants ease off to an equivalent degree, resulting in the usual proportion of sets going the way of the favorite?

Last season, there were 29 IPTL “matches,” meaning that we have a dataset consisting of 29 sets each of men’s singles, women’s singles, and men’s doubles. (For lack of data, I won’t look at mixed doubles, and for lack of interest, forget about champion’s singles.) Except for a handful of singles specialists who played doubles, we have plenty of data on every player. Using Elo ratings, we can generate forecasts for every set based on each competitor’s level at the time.

Elo-based predictions spit out forecasts for standard best-of-three contests, so we’ll need to adjust those a bit. Single-set results are more random, so we would expect a few more upsets. For instance, when Roger Federer faced Ivo Karlovic last December, Elo gave him an 89.9% chance of winning a traditional match, and the relevant IPTL forecast is a more modest 80.3%. With these estimates, we can see how many sets went the way of the favorite and how many upsets we should have expected given the short format.

Let’s start with men’s singles. Karlovic beat Federer, and Nick Kyrgios lost a set to Ivan Dodig, but in general, decisions went the direction we would expect. Of the 29 sets, favorites won 18, or 62.1%. The Elo single-set forecasts imply that the favorites should have won 64.2%, or 18.6 sets. So far, so predictable: If IPTL were a regular-season event, its results wouldn’t be statistically out of place.

The results are similar for women’s singles. The forecasts show the women’s field to be more lopsided, due mostly to the presence of Serena Williams and Maria Sharapova. Elo expected that the favorites would win 20.4, or 70.4% of the 29 sets. In fact, the favorites won 21 of 29.

The men’s doubles results are more complex, but they nonetheless provide further evidence that IPTL results are predictable. Elo implied that most of the men’s doubles matches were close: Only one match (Kei Nishikori and Pierre-Hugues Herbert against Gael Monfils and Rohan Bopanna) had a forecast above 62%, and overall, the system expected only 16.4 victories for the favorites, or 56.4%. In fact, the Elo-favored teams won 19, or 65.5% of the 29 sets, more than the singles favorites did.

The difference of less than three wins in a small sample could easily just be noise, but even so, a couple of explanations spring to mind. First, almost every team had at least one doubles specialist, and those guys are accustomed to the rapid-fire no-ad format. Second, the higher-than-usual number of non-specialists–such as Federer, Nishikori, and Monfils–means that the player ratings may not be as reliable as they are for specialists, or for singles. It might be the case that Nishikori is a better doubles player than Monfils, but because both usually stick to singles, no rating system can capture the difference in abilities very accurately.

Here is a summary of all these results:

Competition      Sets  Fave W  Fave W%  Elo Forecast%  
Men's Singles      29      18    62.1%          64.2%  
Women's Singles    29      21    72.4%          70.4%  
ALL SINGLES        58      39    67.3%          67.3%  
                                                       
Men's Doubles      29      19    65.5%          56.4%  
ALL SETS           87      58    66.7%          63.7%

Taken together, last season’s evidence shows that IPTL contests tend to go the way of the favorites. In fact, when we account for the differences in format, favorites win more often than we’d expect. That’s the surprising bit. The conventional wisdom suggests that the elites became champions thanks to their prowess at high-pressure moments; many dozens of pros could reach the top if they were only stronger mentally. In exhos, the mental game is largely taken out of the picture, yet in this case, the elites are still winning.

No matter how often the favorites win, these matches are still meaningless, and I’m not about to include them in the next round of player ratings. However, it’s a mistake to disregard exhibitions entirely. By offering a contrast to the high-pressure tournaments of the regular season, they may offer us perspectives we can’t get anywhere else.

The Most Exciting Matches of the 2016 WTA Season

Italian translation at settesei.it

In my most recent piece for The Economist, I used a metric called Excitement Index (EI) to consider the implications of shortening singles matches to a format like the no-ad, super-tiebreak rules used for doubles. In my simulations, the shorter format didn’t fare well: The most gripping contests are often the longest ones, and the full-length third set is frequently the best part.

I used data from ATP tournaments in that piece, and several readers have asked how women’s matches score on the EI scale. Many matches from the 2016 season rate extremely highly, while some players we tend to think of as exciting fail to register among the best by this metric. I’ll share some of the results in a moment.

First, a quick overview of EI. We can calculate the probability that each player will win a match at any point in the contest, and using those numbers, it’s possible to determine the leverage of every point–that is, the difference between a player’s odds if she wins the next point and her odds if she loses it. At 40-0, down a break in the first set, that leverage is very low: less than 2%. In a tight third-set tiebreak, leverage can climb as high as 25%. The average point is around 5% to 6%, and as long as neither player has a substantial lead, points at 30-30 or later are higher.

EI is calculated by averaging the leverage of every point in the match. The more high-leverage points, the higher the EI. To make the results a bit more viewer-friendly, I multiply the average leverage by 1,000, so if the typical point has the potential for a 5% (0.05) swing, the EI is 50. The most boring matches, like Garbine Muguruza‘s 6-1 6-0 dismantling of Ekaterina Makarova in Rome, rate below 25. The most exciting will occasionally top 100, and the average WTA match this year scored a 53.7. By comparison, the average ATP match this year rated at 48.9.

Of course, the number and magnitude of crucial moments isn’t the only thing that can make a tennis match “exciting.” Finals tend to be more gripping than first-round tilts, long rallies and daring net play are more watchable than error-riddled ballbashing, and Fed Cup rubbers feature crowds that can make the warmup feel like a third-set tiebreak. When news outlets make their “Best Matches of 2016” lists, they’ll surely take some of those other factors into account. EI takes a narrower view, and it is able to show us which matches, independent of context, offered the most pressure-packed tennis.

Here are the top ten matches of the 2016 WTA season, ranked by EI:

Tournament    Match                Score                    EI  
Charleston    Lucic/Mladenovic     4-6 6-4 7-6(13)       109.9  
Wimbledon     Cibulkova/Radwanska  6-3 5-7 9-7           105.0  
Wimbledon     Safarova/Cepelova    4-6 6-1 12-10         101.7  
Kuala Lumpur  Nara/Hantuchova      6-4 6-7(4) 7-6(10)    100.2  
Brisbane      CSN/Lepchenko        4-6 6-4 7-5            99.0  
Quebec City   Vickery/Tig          7-6(5) 6-7(3) 7-6(7)   98.5  
Miami         Garcia/Petkovic      7-6(5) 3-6 7-6(2)      98.1  
Wimbledon     Vesnina/Makarova     5-7 6-1 9-7            97.2  
Beijing       Keys/Kvitova         6-3 6-7(2) 7-6(5)      96.8  
Acapulco      Stephens/Cibulkova   6-4 4-6 7-6(5)         96.7

Getting to 6-6 in the final set is clearly a good way to appear on this list. The top fifty matches of the season (out of about 2,700) all reached at least 5-5 in the third. The highest-rated clash that didn’t get that far was Angelique Kerber‘s 1-6 7-6(2) 6-4 defeat of Elina Svitolina, with an EI of 88.2. Svitolina’s 4-6 6-3 6-4 victory over Bethanie Mattek Sands in Wuhan, the top match on the list without any sets reaching 5-5, scored an EI of 87.3.

Wimbledon featured an unusual number of very exciting matches this year, especially compared to Roland Garros and the Australian Open, the other tournaments that forgo a tiebreak in the final set. The top-rated French Open contest was the first-rounder between Johanna Larsson and Magda Linette, which scored 95.3 and ranks 13th for the season, while the highest EI among Aussie Open matches is all the way down at 27th on the list, a 92.8 between Monica Puig and Kristyna Pliskova.

Dominika Cibulkova is the only player who appears twice on this list. That doesn’t mean she’s a sure thing for exciting matches: As we’ll see, elite players rarely are. The only year-end top-tenner who ranks among the highest average EIs is Svetlana Kuznetsova, who played as many “very exciting” matches–those rating among the top fifth of matches this season–as any other woman on tour:

Rank  Player                M  Avg EI  V. Exc  Exc %  Bor %  
1     Kristina Mladenovic  60    59.8      19  55.0%  25.0%  
2     Christina McHale     46    59.6      16  50.0%  19.6%  
3     Heather Watson       35    58.5      12  48.6%  25.7%  
4     Jelena Jankovic      43    57.6      12  55.8%  30.2%  
5     Svetlana Kuznetsova  64    57.4      21  48.4%  32.8%  
6     Venus Williams       38    57.1      10  55.3%  31.6%  
7     Yanina Wickmayer     43    56.5      13  46.5%  30.2%  
8     Alison Riske         46    56.5      10  45.7%  32.6%  
9     Caroline Garcia      62    56.4      18  43.5%  33.9%  
10    Irina-Camelia Begu   42    56.4      14  45.2%  40.5% 

(Minimum 35 tour-level matches (“M” above), excluding retirements. My data is also missing a random handful of matches throughout the season.)

The “V. Exc” column tallies how many top-quintile matches the player took part in. The “Exc %” column shows the percent of matches that rated in the top 40% of all WTA contests, while “Bor %” shows the same for the bottom 40%, the more boring matches. Big servers who reach a disproportionate number of tiebreaks and 7-5 sets do well on this list, though it is far from a perfect correspondence. Tiebreaks can create a lot of big moments, but if there were many love service games en route to 6-6, the overall picture isn’t nearly so exciting.

Unlike Kuznetsova, who played a whopping 32 deciding sets this year, most of the other top women enjoy plenty of blowouts. Muguruza, Simona Halep, and Serena Williams occupy the very last three places on the average-EI ranking, largely because when they win, they do so handily–and they win a lot. The next table shows the WTA year-end top-ten, with their ranking (out of 59) on the average-EI list:

Rank  Player        WTA#  Matches  Avg EI  V. Exc  Exc %  Bor %  
5     Kuznetsova       9       64    57.4      21  48.4%  32.8%  
13    Pliskova         6       66    55.6      19  48.5%  39.4%  
16    Keys             8       64    55.4      13  40.6%  35.9%  
23    Cibulkova        5       68    54.6      21  42.6%  42.6%  
28    Kerber           1       77    54.0      12  42.9%  41.6%  
      tour average                   53.7          40.0%  40.0%  
41    Radwanska        3       69    52.5      12  29.0%  44.9%  
51    Konta           10       67    51.2      12  34.3%  46.3%  
57    Muguruza         7       51    49.9       5  33.3%  43.1%  
58    Halep            4       59    49.6       8  30.5%  50.8%  
59    Williams         2       44    48.1       3  27.3%  50.0%

It’s a good thing that fans love Serena, because her matches rarely provide much in the way of big moments. As low as Williams and Halep rate on this measure, Victoria Azarenka scores even lower. Her Miami fourth-rounder against Muguruza was her only match this season to rank in the “exciting” category, and her average EI was a mere 44.0.

Clearly, EI isn’t much of a method for identifying the best players. Even looking at the lowest-rated competitors by EI would be misleading: In 56th place, right above Muguruza, is the otherwise unheralded Nao Hibino. EI excels as a metric for ferreting out the most riveting individual matches, whether they were broadcast worldwide or ignored entirely. And the next time someone suggests shortening matches, EI is a great tool to highlight just how much excitement would be lost by doing so.

How Argentina’s Road Warriors Defied the Davis Cup Home-Court Odds

Italian translation at settesei.it

The conventional wisdom has long held that there is a home court advantage in Davis Cup. It makes sense: In almost every sport, there is a documented advantage to playing at home, and Davis Cup gives us what seem to be the most extreme home courts in tennis.

However, Argentina won this year’s competition despite playing all four of their ties on the road. After the first round this season, only one of seven hosts managed to give the home crowd a victory. Bob Bryan has some ideas as to why:

https://twitter.com/Bryanbros/status/803244964784308227

Which is it? Do players excel in front of an enthusiastic home crowd, on a surface chosen for their advantage? Or do they suffer from the distractions that Bryan cites?

To answer that question, I looked at 322 Davis Cup ties, encompassing all World Group and World Group Play-off weekends back to 2003. Of those, the home side won 196, of 60.9% of the time. So far, the conventional wisdom looks pretty good.

But we need to do more. To check whether the hosting teams were actually better, meaning that they should have won more ties regardless of venue, I used singles and doubles Elo ratings to simulate every match of every one of those ties. (In cases where the tie was decided before the fourth or fifth rubber, I simulated matches between the best available players who could have contested those matches.) Based on those simulations, the hosts “should” have won 171 of the 322 ties, or 53.1%.

The evidence in favor of home-court advantage–and against Bryan’s “distractions” theory–is strong. Home sides have won World Group ties about 15% more often than we would expect. Some of that is likely due to the hosts’ ability to choose surface. I doubt surface accounts for the whole effect, since some court types (like the medium-slow hard court in Croatia last weekend) don’t heavily favor either side, and many ties are rather lopsided regardless of surface. Teasing out the surface advantage from the more intangible home-court edge is a worthy subject to study, but I won’t be going any further in that direction today.

If distractions are a danger to hosts, we might expect see the home court advantage erode in later rounds. Many early-round matchups are minor news events compared to semifinals and finals. (On the other hand, there were over 100 representatives of the Argentinian press in Croatia last weekend, so the effect isn’t entirely straightforward.) The following table shows how home sides have fared in each round:

Round         Ties  Home Win %  Wins/Exp  
First Round    112       58.9%      1.11  
Quarterfinal    56       60.7%      1.16  
Semifinal       28       82.1%      1.30  
Final           14       57.1%      1.14  
Play-off       112       58.9%      1.14

Aside from a blip at the semifinal level, home-court advantage is quite consistent from one round to the next. The “Wins/Exp” shows how much better the hosts fared than my simulations would have predicted; for instance, in first-round encounters, hosts won 11% more ties than expected.

There is also no meaningful difference between home court advantage on day one and day three. The hosts’s singles players win 15% more matches than my simulations would expect on day one, and 15% more on day three. The day three divide is intriguing: Home players win the fourth rubber 12% more often than expected, but they claim the deciding fifth rubber a whopping 23% more frequently than they would in neutral environments. However, only 91 of the 322 ties involved five live rubbers, so the extreme home advantage in the deciding match may just be nothing more than a random spike.

The doubles rubber is less likely to be influenced by venue. Compared to the 15% advantage enjoyed by World Group singles players, the hosting side’s doubles pairings win only 6% more often than expected. This again raises the issue of surface: Not only are doubles results less influenced by court speed than singles results, but home sides are less likely to choose a surface based on the desire of their doubles team, if that preference clashes with the needs of their singles players.

Argentina on the road

In the sense that they never played at home or chose a surface, Argentina beat the odds in all four rounds this year. Of course, home court advantage can only take you so far; it helps to have a good squad. My simulations indicate that the Argentines had a nearly 4-in-5 chance of defeating their Polish first-round opponents on neutral ground, while Juan Martin del Potro and company had a more modest 59% chance of beating the Italians in Italy.

For the last two rounds, though, the Argentines were fighting an uphill battle. The semifinal venue in Glasglow didn’t matter much; the prospect of facing the Murray brothers meant Argentina had less than a 10% chance of advancing no matter what the location. And as I wrote last week, Croatia was rightfully favored in the final. Playing yet another tie on the road simply made the task more difficult.

Once we adjust my simulations of each tie for home court advantage, it turns out that Argentina’s chances of winning the Cup this year were less than 1%, barely 1 in 200. The following table shows the last 14 winners, along with the number of ties they played at home and their chances of winning the Cup in my simulations, given which countries they ended up facing and the players who turned up for each tie:

Year  Winner  Home Ties  Win Prob  
2016  ARG             0      0.5%  
2015  GBR             3     18.9%  
2014  SUI             2     54.7%  
2013  CZE             1     10.5%  
2012  CZE             3     19.7%  
2011  ESP             2     12.2%  
2010  SRB             3     17.6%  
2009  ESP             4     44.0%  
2008  ESP             1     14.3%  
2007  USA             2     24.4%  
2006  RUS             2      1.7%  
2005  CRO             2      7.4%  
2004  ESP             3     23.8%  
2003  AUS             3     15.9%

In the time span I’ve studied, only the 2006 Russian squad managed anything close to the same season-long series of upsets. (I don’t yet have adequate doubles data to analyze earlier Davis Cup competitions.)  At the other end of the spectrum, the simulations emphasize how smoothly Switzerland swept through the bracket in 2014. A wide-open draw, together with Roger Federer, certainly helps.

It was tough going for Argentina, and the luck of the home-court draw made it tougher. Without a solid #2 singles player or an elite doubles specialist, it isn’t likely to get much easier. For all that, they’ll open the 2017 Davis Cup campaign against Italy with at least one unfamiliar weapon in their arsenal: They finally get to play a tie at home.