We need some context to appreciate just what an outlier that is. Of 50 tour-level WTA tournaments this year, no other titlist has spent more than about 11 hours and 35 minutes on court–and that includes Grand Slam winners, who play two more matches than McHale did! Before Christina’s marathon effort last week, the champion who spent the most time on court in a 32-draw event was Dominika Cibulkova, who needed “only” 9 hours and 20 minutes to win in Eastbourne.

There’s no complete source for historical WTA match-time data, so we can’t determine just how rare 13-hour efforts were in years past. We can, however, hunt for tournaments in which the winner needed to play so many sets.

Going back to 1991–encompassing almost 1,500 events–McHale’s effort marks only the second time a player has won a tournament while playing 15 sets in three matches. The only previous instance was Anastasia Pavlyuchenkova‘s Paris title run in 2014. Serena Williams played five three-setters en route to the Roland Garros title last year, but of course, she played two other matches as well. Three other players–none since 2003–received first-round byes and then won tournaments by playing three sets in each of their four matches.

In general, we might expect a player who goes the distance in every round to struggle in the final. First of all, we would expect her to be tired–especially if, as is almost always the case, her opponent hasn’t spent as much time on court. Second, we might deduce that, if a player needed three-sets to win early rounds, she’s in relatively weak form, compared to the typical tour-level finalist.

Sure enough, the last 25 years of WTA history give us 16 players who reached a final by playing three sets in every round. Of the 16, only four–McHale, Pavlyuchenkova, and two others who didn’t require three sets in the final–won the title. The other 12 couldn’t retain their three-set magic and lost in the final.

While 16 players don’t make up much of a sample, we get a similar result if we broaden our view to those who played three-setters in exactly three of their four matches before the final. Excluding those who faced opponents who also played so many three-setters, we’re left with 134 players, only 48 (35.8%) of whom won the title match. A simple ranking-based forecast indicates that 58 (43.3%) of those players should have won, suggesting that while these players are indeed weaker than their more-dominant opponents, their underperformance may be due partly to fatigue.

McHale spent over 10 hours on court simply reaching the Tokyo final, far more than the six-plus hours required by her opponent, Katerina Siniakova. Even when a player doesn’t spend the record-setting amount of time on court that the American did this week, competitors tend to underperform after playing so many three-setters. The fact that McHale didn’t, and that she triumphed in yet another marathon match, makes her achievement all the more impressive.

]]>With stats of 0-10 in ATP quarterfinals, he is still pretty far away from Teymuraz Gabashvili‘s streak of 0-16. Despite having lost six more quarterfinals before winning his first QF this January against a retiring Bernard Tomic, Teymuraz climbed only to a ranking of 50. Still, we could argue that the QF losing-streak of Teymuraz is not really over after having won against a possibly injured player.

Running the numbers can answer questions such as *“Who could climb up highest in the rankings without having won an ATP quarterfinal?”* Doing so will put Andrey’s number 42 into perspective and will possibly reveal some other statistical trivia.

Player Rank Date On Andrei Chesnokov 30 1986.11.03 1 Yen Hsun Lu 33 2010.11.01 1 Nick Kyrgios 34 2015.04.06 1 Adrian Voinea 36 1996.04.15 1 Paul Haarhuis 36 1990.07.09 1 Jaime Yzaga 40 1986.03.03 1 Antonio Zugarelli 41 1973.08.23 1 Bernard Tomic 41 2011.11.07 1 Omar Camporese 41 1989.10.09 1 Wayne Ferreira 41 1991.12.02 1 Andrey Kuznetsov 42 2016.08.22 0 David Goffin 42 2012.10.29 1 Mischa Zverev 45 2009.06.08 1 Alexandr Dolgopolov 46 2010.06.07 1 Andrew Sznajder 46 1989.09.25 1 Lukas Rosol 46 2013.04.08 1 Ulf Stenlund 46 1986.07.07 1 Dominic Thiem 47 2014.07.21 1 Janko Tipsarevic 47 2007.07.16 1 Paul Annacone 47 1985.04.08 1 Renzo Furlan 47 1991.06.17 1 Mike Fishbach 47 1978.01.16 0 Oscar Hernandez 48 2007.10.08 1 Ronald Agenor 48 1985.11.25 1 Gary Donnelly 48 1986.11.10 0 Francisco Gonzalez 49 1978.07.12 1 Paolo Lorenzi 49 2013.03.04 1 Boris Becker 50 1985.05.06 1 Brett Steven 50 1993.02.15 1 Dominik Hrbaty 50 1997.05.19 1 Mike Leach 50 1985.02.18 1 Patrik Kuhnen 50 1988.08.01 1 Teymuraz Gabashvili 50 2015.07.20 1 Blaine Willenborg 50 1984.09.10 0

The table shows career highs (up until #50) for players *before* they won their first ATP QF. A 0 in the last column indicates that the player can still climb up in this table, because he did not win a QF, yet. There may also be retired players being denoted with a 0, because they never managed to get past a QF during their career.

I wonder, who had Andrei Chesnokov on the radar for this? Before winning his first ATP QF he pushed his ranking as far as 30. He later went on to have a career high of 9. Nick Kyrgios could also improve his ranking quickly without the need to go as deep as a SF. His Wimbledon 2014 QF, Roland Garros 2015 R32, and Australian Open 2015 QF runs helped him to get up until #34 without a single win at an ATP QF. Also, I particularly would like to highlight Alexandr Dolgopolov who reached #46 before having even played a single QF.

Looking only at players who are still active *and* able to up their ranking without an ATP SF we get the following picture:

Player Rank Date Andrey Kuznetsov 42 2016.08.22 Rui Machado 59 2011.10.03 Tatsuma Ito 60 2012.10.22 Matthew Ebden 61 2012.10.01 Kenny De Schepper 62 2014.04.07 Pere Riba 65 2011.05.16 Tim Smyczek 68 2015.04.06 Blaz Kavcic 68 2012.08.06 Alejandro Gonzalez 70 2014.06.09

Andrey seems to be relatively alone with Rui Machado being second in the list having reached his highest ranking already about five years ago. Skimming through the remainder of the table, we would be surprised if anyone soon would be able to come close to Andrey’s 42, which doesn’t mean that a sudden unexpected streak of an upcoming player would render this scenario impossible.

So what practical implications does this give us for analyzing tennis? Hardly any, I am afraid. Still, we can infer that it is possible to get well within the top-50 without winning more than two matches at a single tournament over a duration that can even range over a player’s whole career. Of course it would be interesting to see how long such players can stay in these ranking areas, guaranteeing direct acceptance into ATP tournaments and, hence, a more or less regular income from R32, R16, and QF prize money. Moreover, as the case of 2015-ish Nick Kyrgios shows, the question arises how one’s ranking points are composed: Performing well at the big stage of Masters or Grand Slams can be enough for a decent ranking while showing poor performance at ATP 250s. On the other hand, are there players whose ATP points breakdown reveals that they are willing to go for easier points at ATP 250s while never having deep runs at Masters or Grand Slams? These are questions which I would like to answer in a future post.

—

This is a guest article by me, Peter Wetz. I am a computer scientist interested in racket sports and data analytics based in Vienna, Austria. I would like to thank Jeff for being open-minded and allowing me to post these surface-scratching lines here.

]]>As is often the case with new metrics, no one seems to be asking whether these new stats mean anything. Thanks to IBM (you never thought I’d write *that*, did you?), we have more than merely anecdotal data to play with, and we can start to answer that question.

At Roland Garros and Wimbledon this year, distance run during each point was tracked for players on several main courts. From those two Slams, we have point-by-point distance numbers for 103 of the 254 men’s singles matches. A substantial group of women’s matches is available as well, and I’ll look at those in a future post.

Let’s start by getting a feel for the range of these numbers. Of the available non-retirement matches, the shortest distance run was in Rafael Nadal’s first-round match in Paris against Sam Groth. Nadal ran 960 meters against Groth’s 923–the only match in the dataset with a total distance run under two kilometers.

At the other extreme, Novak Djokovic ran 4.3 km in his fourth-round Roland Garros match against Roberto Bautista Agut, who himself tallied a whopping 4.6 km. Novak’s French Open final against Andy Murray is also near the top of the list. The two players totaled 6.7 km, with Djokovic’s 3.4 km edging out Murray’s 3.3 km. Murray is a familiar face in these marathon matches, figuring in four of the top ten. (Thanks to his recent success, he’s also wildly overepresented in our sample, appearing 14 times.)

Between these extremes, the *average* match features a combined 4.4 km of running, or just over 20 meters per point. If we limit our view to points of five shots or longer (a very approximate way of separating rallies from points in which the serve largely determines the outcome), the average distance per point is 42 meters.

Naturally, on the Paris clay, points are longer and players do more running. In the average Roland Garros match, the competitors combined for 4.8 km per match, compared to 4.1 km at Wimbledon. (The dataset consists of about twice as many Wimbledon matches, so the overall numbers are skewed in that direction.) Measured by the point, that’s 47 meters per point on clay and 37 meters per point on grass.

**Not a key to the match**

All that running may be necessary, but covering more distance than your opponent doesn’t seem to have anything to do with winning the match. Of the 104 matches, almost exactly half (53) were won by the player who ran farther.

It’s possible that running more or less is a benefit for certain players. Surprisingly, Murray ran less than his opponent in 10 of his 14 matches, including his French Open contests against Ivo Karlovic and John Isner. (Big servers, immobile as they tend to be, may induce even less running in their opponents, since so many of their shots are all-or-nothing. On the other hand, Murray outran another big server, Nick Kyrgios, at Wimbledon.)

We think of physical players like Murray and Djokovic as the ones covering the entire court, and by doing so, they simultaneously force their opponents to do the same–or more. In Novak’s ten Roland Garros and Wimbledon matches, he ran farther than his opponent only twice–in the Paris final against Murray, and in the second round of Wimbledon against Adrian Mannarino. In general, running fewer meters doesn’t appear to be a leading indicator of victory, but for certain players in the Murray-Djokovic mold, it may be.

In the same vein, *combined *distance run may turn out to be a worthwhile metric. For men who earn their money in long, physical rallies, total distance run could serve as a proxy for their success in forcing a certain kind of match.

It’s also possible that aggregate numbers will never be more than curiosities. In the average match, there was only a 125 meter difference between the distances covered by the two players. In percentage terms, that means one player outran the other by only 5.5%. And as we’ll see in a moment, a difference of that magnitude could happen simply because one player racked up more points on serve.

**Point-level characteristics**

In the majority of points, the returner does a lot more running than the server does. The server usually forces his opponent to start running first, and in today’s men’s game, the server rarely needs to scramble too much to hit his next shot.

On average, the returner must run just over 10% further. When the first serve is put in play, that difference jumps to 12%. On second-serve points, it drops to 7%.

By extension, we would expect that the player who runs further would, more often than not, lose the point. That’s not because running more is necessarily bad, but because of the inherent server’s advantage, which has the side effect of showing up in the distance run stats as well. That hypothesis turns out to be correct: The player who runs farther in a single point loses the point 56% of the time.

When we narrow our view to only those points with five shots or more, we see that running more is still associated with losing. In these longer rallies, the player who covered more distance loses 58% of the points.

Some of the “extra” running in shorter points can be attributed to returning serve–and thus, we can assume that players are losing points because of the disadvantage of returning, not necessarily because they ran so much. But even in very long rallies of 10 shots or more, the player who runs farther is more likely to lose the point. Even at the level of a single point, my suggestion above, that physical players succeed by forcing opponents to work even harder than they do, seems valid.

With barely 100 matches of data–and a somewhat biased sample, no less–there are only so many conclusions we can draw about distance run stats. Two Grand Slams worth of show court matches is just enough to give us a general context for understanding these numbers and to hint at some interesting findings about the best players. Let’s hope that IBM continues to collect these stats, and that the ATP and WTA follow suit.

]]>- every men’s Wimbledon and US Open final all the way back to 1980;
- every men’s Slam final since 1989 Wimbledon;
- every women’s Slam final back to 2001, with a single exception.

The Match Charting Project gathers and standardizes data that, for many of these matches, simply didn’t exist before. These recaps give us shot-by-shot breakdowns of historically important matches, allowing us to quantify how the game has changed–at least at the very highest level–over the last 35 years. A couple of months ago, I did one small project using this data to approximate surface speed changes–that’s just the tip of the iceberg in terms of what you can do with this data. (The dataset is also publicly available, so have fun!)

We’ve got about 30 Slam finals left to chart, and you might be able to help. As always, we are actively looking for new contributors to the project to chart matches (here’s how to get started, and why you should, and you don’t have to chart Slam finals!), but right now, I have a different request.

We’ve scoured the internet, from YouTube to Youku to torrent trackers, to find video for all of these matches. While I don’t expect any of you to have the 1980 Teacher-Warwick Australian Open final sitting around on your hard drive, I’ve got higher hopes for some of the more recent matches we’re missing.

If you have full (or nearly full) video for any of these matches, or you know of a (preferably free) source where we can find them, please–please, please!–drop me a line. Once we have the video, Edo or I will do the rest, and the project will become even more valuable.

- 2007 Wimbledon, Venus Williams vs Marion Bartoli
- 2000 US Open, Venus Williams vs Lindsay Davenport
- 1998 US Open, Lindsay Davenport vs Martina Hingis
- 1989 or 1995 Roland Garros, and 1994 US Open, Steffi Graf vs Arantxa Sanchez Vicario
- 1990 Wimbledon, Martina Navratilova vs Zina Garrison
- 1989 Roland Garros, Stefan Edberg vs Michael Chang
- 1987 Australian Open, Stefan Edberg vs Pat Cash
- 1985 or 1987 Roland Garros, Ivan Lendl vs Mats Wilander

There are several more finals from the 1980s that we’re still looking for. Here’s the complete list.

Thanks for your help!

]]>Aside from the traditional break point stats, which have plenty of limitations, we don’t have a good way to measure clutch performance in tennis. There’s a lot more to this issue than counting break points won and lost, and it turns out that a lot of the work necessary to quantify clutchness is already done.

I’ve written many times about *win probability* in tennis. At any given point score, we can calculate the likelihood that each player will go on to win the match. Back in 2010, I borrowed a page from baseball analysts and introduced the concept of *volatility*, as well. (Click the link to see a visual representation of both metrics for an entire match.) Volatility, or *leverage*, measures the importance of each point–the difference in win probability between a player winning it or losing it.

To put it simply, the higher the leverage of a point, the more valuable it is to win. “High leverage point” is just a more technical way of saying “big point.” To be considered clutch, a player should be winning more high-leverage points than low-leverage points. You don’t have to win a disproportionate number of high-leverage points to be a very *good* player–Roger Federer’s break point record is proof of that–but high-leverage points are key to being a *clutch* player.

(I’m not the only person to think about these issues. Stephanie wrote about this topic in December and calculated a full-year clutch metric for the 2015 ATP season.)

To make this more concrete, I calculated win probability and leverage (LEV) for every point in the Wimbledon semifinal between Federer and Milos Raonic. For the first point of the match, LEV = 2.2%. Raonic could boost his match odds to 50.7% by winning it or drop to 48.5% by losing it. The highest leverage in the match was a whopping 32.8%, when Federer (twice) had game point at 1-2 in the fifth set. The lowest leverage of the match was a mere 0.03%, when Raonic served at 40-0, down a break in the third set. The average LEV in the match was 5.7%, a rather high figure befitting such a tight match.

On average, the 166 points that Raonic won were slightly more important, with LEV = 5.85%, than Federer’s 160, at LEV = 5.62%. Without doing a lot more work with match-level leverage figures, I don’t know whether that’s a terribly meaningful difference. What *is* clear, though, is that certain parts of Federer’s game fell apart when he needed them most.

By Wimbledon’s official count, Federer committed nine unforced errors, not counting his five double faults, which we’ll get to in a minute. (The Match Charting Project log says Fed had 15, but that’s a discussion for another day.) There were 180 points in the match where the return was put in play, with an average LEV = 6.0%. Federer’s unforced errors, by contrast, had an average LEV nearly twice as high, at 11.0%! The typical leverage of Raonic’s unforced errors was a much less noteworthy 6.8%.

Fed’s double fault timing was even worse. Those of us who watched the fourth set don’t need a fancy metric to tell us that, but I’ll do it anyway. His five double faults had an average LEV of 13.7%. Raonic double faulted more than twice as often, but the average LEV of those points, 4.0%, means that his 11 doubles had less of an impact on the outcome of the match than Roger’s five.

Even the famous Federer forehand looks like less of a weapon when we add leverage to the mix. Fed hit 26 forehand winners, in points with average LEV = 5.1%. Raonic’s 23 forehand winners occurred during points with average LEV = 7.0%.

Taking these three stats together, it seems like Federer saved his greatness for the points that didn’t matter as much.

**The bigger picture**

When we look at a handful of stats from a single match, we’re not improving much on a commentator who vaguely summarizes a performance by saying that a player didn’t win enough of the big points. While it’s nice to attach concrete numbers to these things, the numbers are only worth so much without more context.

In order to gain a more meaningful understanding of this (or any) performance with leverage stats, there are many, many more questions we should be able to answer. Were Federer’s high-leverage performances typical? Does Milos often double fault on less important points? Do higher-leverage points usually result in more returns in play? How much can leverage explain the outcome of very close matches?

These questions (and dozens, if not hundreds more) signal to me that this is a fruitful field for further study. The smaller-scale numbers, like the average leverage of points ending with unforced errors, seem to have particular potential. For instance, it may be that Federer is less likely to go for a big forehand on a high-leverage point.

Despite the dangers of small samples, these metrics allow us to pinpoint what, exactly, players did at more crucial moments. Unlike some of the more simplistic stats that tennis fans are forced to rely on, leverage numbers could help us understand the situational tendencies of every player on tour, leading to a better grasp of each match as it happens.

]]>The official ATP and WTA rankings have always represented a collection of compromises, as they try to accomplish dual goals of rewarding certain behaviors (like showing up for high-profile events) and identifying the best players for entry in upcoming tournaments. Stripping the Olympics of ranking points altogether was an even weirder compromise than usual. Four years ago in London, some points were awarded and almost all the top players on both tours showed up, even though many of them could’ve won more points playing elsewhere.

For most players, the chance at Olympic gold was enough. The level of competition was quite high, so while the ATP and WTA tours treat the tournament in Rio as a mere exhibition, those of us who want to measure player ability and make forecasts must factor Olympics results into our calculations.

Elo, a rating system originally designed for chess that I’ve been using for tennis for the past year, is an excellent tool to use to integrate Rio results with the rest of this season’s wins and losses. Broadly speaking, it awards points to match winners and subtracts points from losers. Beating a top player is worth many more points than beating a lower-rated one. There is no penalty for not playing–for example, Stan Wawrinka‘s and Simona Halep‘s ratings are unchanged from a week ago.

Unlike the ATP and WTA ranking systems, which award points based on the level of tournament and round, Elo is context-neutral. Del Potro’s Elo rating improved quite a bit thanks to his first-round upset of Novak Djokovic–the same amount it would have increased if he had beaten Djokovic in, say, the Toronto final.

Many fans object to this, on the reasonable assumption that context matters. It certainly seems like the Wimbledon final should count for more than, say, a Monte Carlo quarterfinal, even if the same player defeats the same opponent in both matches.

However, results matter for ranking systems, too. A good rating system will do two things: predict winners correctly more often than other systems, and give more accurate degrees of confidence for those predictions. (For example, in a sample of 100 matches in which the system gives one player a 70% chance of winning, the favorite should win 70 times.) Elo, with its ignorance of context, predicts more winners and gives more accurate forecast certainties than any other system I’m aware of.

For one thing, it wipes the floor with the official rankings. While it’s possible that tweaking Elo with context-aware details would better the results even more, the improvement would likely be minor compared to the massive difference between Elo’s accuracy and that of the ATP and WTA algorithms.

Relying on a context-neutral system is perfect for tennis. Instead of altering the ranking system with every change in tournament format, we can always rate players the same way, using only their wins, losses, and opponents. In the case of the Olympics, it doesn’t matter which players participate, or what anyone thinks about the overall level of play. If you defeat a trio of top players, as Puig did, your rating skyrockets. Simple as that.

Two weeks ago, Puig was ranked 49th among WTA players by Elo–several places lower than her WTA ranking of 37. After beating Garbine Muguruza, Petra Kvitova, and Angelique Kerber, her Elo ranking jumped to 22nd. While it’s tough, intuitively, to know just how much weight to assign to such an outlier of a result, her Elo rating just outside the top 20 seems much more plausible than Puig’s effectively unchanged WTA ranking in the mid-30s.

Del Potro is another interesting test case, as his injury-riddled career presents difficulties for any rating system. According to the ATP algorithm, he is still outside the top 100 in the world–a common predicament for once-elite players who don’t immediately return to winning ways.

Elo has the opposite problem with players who miss a lot of time due to injury. When a player doesn’t compete, Elo assumes his level doesn’t change. That’s clearly wrong, and it has cast a lot of doubt over del Potro’s place in the Elo rankings this season. The more matches he plays, the more his rating will reflect his current ability, but his #10 position in the pre-Olympics Elo rankings seemed overly influenced by his former greatness.

(A more sophisticated Elo-based system, Glicko, was created in part to improve ratings for competitors with few recent results. I’ve tinkered with Glicko quite a bit in hopes of more accurately measuring the current levels of players like Delpo, but so far, the system as a whole hasn’t come close to matching Elo’s accuracy while also addressing the problem of long layoffs. For what it’s worth, Glicko ranked del Potro around #16 before the Olympics.)

Del Potro’s success in Rio boosted him three places in the Elo rankings, up to #7. While that still owes something to the lingering influence of his pre-injury results, it’s the first time his post-injury Elo rating comes close to passing the smell test.

You can see the full current lists elsewhere on the site: here are ATP Elo ratings and WTA Elo ratings.

Any rating system is only as good as the assumptions and data that go into it. The official ATP and WTA ranking systems have long suffered from improvised assumptions and conflicting goals. When an important event like the Olympics is excluded altogether, the data is incomplete as well. Now as much as ever, Elo shines as an alternative method. In addition to a more predictive algorithm, Elo can give Rio results the weight they deserve.

]]>One of the most frequent complaints was that I was looking at the wrong data–surface speed should really be quantified by rally length, spin rate, or any number of other things. As is so often the case with tennis analytics, we have only so much choice in the matter. At the time, I was using all the data that existed.

Thanks to the Match Charting Project–with a particular tip of the cap to Edo Salvati–a lot more data is available now. We have shot-by-shot stats for 223 Grand Slam finals, including over three-fourths of Slam finals back to 1980. While we’ll never be able to measure anything like ITF Court Pace Rating for surfaces thirty years in the past, this shot-by-shot data allows us to get closer to the truth of the matter.

Sure enough, when we take a look at a simple (but until recently, unavailable) metric such as rally length, we find that the sport’s major surfaces are playing a lot more similarly than they used to. The first graph shows a five-year rolling average* for the rally length in the men’s finals of each Grand Slam from 1985 to 2015:

** since some matches are missing, the five-year rolling averages each represent the mean of anywhere from two to five Slam finals.*

Over the last decade and a half, the hard-court and grass-court slams have crept steadily upward, with average rally lengths now similar to those at Roland Garros, traditionally the slowest of the four Grand Slam surfaces. The movement is most dramatic in the Wimbledon grass, which for many years saw an average rally length of a mere two shots.

For all the advantages of rally length and shot-by-shot data, there’s one massive limitation to this analysis: It doesn’t control for player. (My older analysis, with more limited data per match, but for many more matches, *was* able to control for player.) Pete Sampras contributed to 15 of our data points, but none on clay. Andres Gomez makes an appearance, but only at Roland Garros. Until we have shot-by-shot data on multiple surfaces for more of these players, there’s not much we can do to control for this severe case of selection bias.

So we’re left with something of a chicken-and-egg problem. Back in the early 90’s, when Roland Garros finals averaged almost six shots per point and Wimbledon finals averaged barely two shots per point, how much of the difference was due to the surface itself, and how much to the fact that certain players reached the final? The surface itself certainly doesn’t account for everything–in 1988, Mats Wilander and Ivan Lendl averaged over seven shots per point at the US Open, and in 2002, David Nalbandian and Lleyton Hewitt topped 5.5 shots per point at Wimbledon.

Still, outliers and selection bias aside, the rally length convergence we see in the graph above reflects a real phenomenon, even if it is amplified by the bias. After all, players who prefer short points win more matches on grass because grass lends itself to short points, and in an earlier era, “short points” meant something more extreme than it does today.

The same graph for women’s Grand Slam finals shows some convergence, though not as much:

Part of the reason that the convergence is more muted is that there’s less selection bias. The all-surface dominance of a few players–Chris Evert, Martina Navratilova, and Steffi Graf–means that, if only by historical accident, there is less bias than in men’s finals.

We still need a lot more data before we can make confident statements about surface speeds in 20th-century tennis. (You can help us get there by charting some matches!) But as we gather more information, we’re able to better illustrate how the surfaces have become less unique over the years.

]]>We’ve learned over the last several years that human line-calling is pretty darn good, so players don’t turn to Hawkeye that often. At the Australian Open this year, men challenged fewer than nine calls per match–well under three per set or, put another way, less than 1.5 challenges per player per set. Even at that low rate of fewer than once per thirty points, players are usually wrong. Only about one in three calls are overturned.

So while challenges are technically scarce, they aren’t *that *scarce. It’s a rare match in which a player challenges so often *and* is so frequently incorrect that he runs out. That said, it does happen, and while running out of challenges is low-probability, it’s very high risk. Getting a call overturned at a crucial moment could be the difference between winning and losing a tight match. Most of the time, challenges seem worthless, but in certain circumstances, they can be very valuable indeed.

Just how valuable? That’s what I hope to figure out. To do so, we’ll need to estimate the frequency with which players miss opportunities to overturn line calls because they’ve exhausted their challenges, and we’ll need to calculate the potential impact of failing to overturn those calls.

A few notes before we get any further. The extra challenge awarded to each player at the beginning of a tiebreak would make the analysis much more daunting, so I’ve ignored both that extra challenge and points played in tiebreaks. I suspect it has little effect on the results. I’ve limited this analysis to the ATP, since men challenge more frequently and get calls overturned more often. And finally, this is a very complex, sprawling subject, so we often have to make simplifying assumptions or plug in educated guesses where data isn’t available.

**Running out of challenges**

The Australian Open data mentioned above is typical for ATP challenges. It is very similar to a subset of Match Charting Project data, suggesting that both challenge frequency and accuracy are about the same across the tour as they are in Melbourne.

Let’s assume that each player challenges a call roughly once every sixty points, or 1.7%. Given an approximate success rate of 30%, each player makes an incorrect challenge on about 1.2% of points and a correct challenge on 0.5% of points. Later on, I’ll introduce a different set of assumptions so we can see what different parameters do to the results.

Running out of challenges isn’t in itself a problem. We’re interested in scenarios when a player not only exhausts his challenges, but when he also misses an opportunity to overturn a call later in the set. These situations are much less common than all of those in which a player might *want* to contest a call, but we don’t care about the 70% of those challenges that would be wrong, as they wouldn’t have any effect on the outcome of the match.

For each possible set length, from 24-point golden sets up to 93-point marathons, I ran a Monte Carlo simulation, using the assumptions given above, to determine the probability that, in a set of that length, a player would miss a chance to overturn a later call. As noted above, I’ve excluded tiebreaks from this analysis, so I counted only the number of points up to 6-6. I also excluded all “advantage” fifth sets.

For example, the most common set length in the data set is 57 points, which occured 647 times. In 10,000 simulations, a player missed a chance to overturn a call 0.27% of the time. The longer the set, the more likely that challenge scarcity would become an issue. In 10,000 simulations of 85-point sets, players ran out of challenges more than three times as often. In 0.92% of the simulations, a player was unable to challenge a call that would have been overturned.

These simulations are simple, assuming that each point is identical. Of course, players are aware of the cap on challenges, so with only one challenge remaining, they may be less likely to contest a “probably correct” call, and they would be very unlikely to use a challenge to earn a few extra seconds of rest. Further, the fact that players sometimes use Hawkeye for a bit of a break suggests that what we might call “true” challenges–instances in which the player believes the original call was wrong–are a bit less frequent that the numbers we’re using. Ultimately, we can’t address these concerns without a more complex model *and* quite a bit of data we don’t have.

Back to the results. Taking every possible set length and the results of the simulation for each one, we find the average player is likely to run out of challenges and miss a chance to overturn a call roughly once every 320 sets, or 0.31% of the time. That’s not very often–for almost all players, it’s less than once per season.

**The impact of (not) overturning a call**

Just because such an outcome is infrequent doesn’t necessarily mean it isn’t important. If a low-probability event has a high enough impact when it does occur, it’s still worth planning for.

Toward the end of a set, when most of these missed chances would occur, points can be very important, like break point at 5-6. But other points are almost meaningless, like 40-0 in just about any game.

To estimate the impact of these missed opportunities, I ran another set of Monte Carlo simulations. (This gets a bit hairy–bear with me.) For each set length, for those cases when a player ran out of challenges, I found the average number of points at which he used his last challenge. Then, for each run of the simulation, I took a random set from the last few years of ATP data with the corresponding number of points, chose a random point between the average time that the challenges ran out and the end of the set, and measured the importance of that point.

To quantify the importance of the point, I calculated three probabilities from the perspective of the player who lost the point and, had he conserved his challenges, could have overturned it:

- his odds of winning the set before that point was played
- his odds of winning the set after that point was played (and not overturned)
- his odds of winning the set had the call been overturned and the point awarded to him.

(To generate these probabilities, I used my win probability code posted here with the assumption that each player wins 65% of his service points. The model treats points as independent–that is, the outcome of one point does not depend on the outcomes of previous points–which is not precisely true, but it’s close, and it makes things immensely more straightforward. Alert readers will also note that I’ve ignored the possibility of yet *another* call that could be overturned. However, the extremely low probability of that event convinced me to avoid the additional complexity required to model it.)

Given these numbers, we can calculate the possible effects of the challenge he couldn’t make. The difference between (2) and (3) is the effect if the call would’ve been overturned and awarded to him. The difference between (1) and (2) is the effect if the point would have been replayed. This is essentially the same concept as “leverage index” in baseball analytics.

Again, we’re missing some data–I have no idea what percentage of overturned calls result in each of those two outcomes. For today, we’ll say it’s half and half, so to boil down the effect of the missed challenge to a single number, we’ll average those two differences.

For example, let’s say we’re at five games all, and the returner wins the first point of the 11th game. The server’s odds of winning the set have decreased from 50% (at 5-all, love-all) to 43.0%. If the server got the call overturned and was awarded the point, his odds would increase to 53.8%. Thus, the win probability impact of overturning the call and taking the point is 10.8%, while the effect of forcing a replay is 7.0%. For the purposes of this simulation, we’re averaging these two numbers and using 8.9% as the win probability impact of this missed opportunity to challenge.

Back to the big picture. For each set length, I ran 1,000 simulations like what I’ve described above and averaged the results. In short sets under 40 points, the win probability impact of the missed challenge is less than five percentage points. The longer the set, the bigger the effect: Long sets are typically closer and the points tend to be higher-leverage. In 85-point sets, for instance, the average effect of the missed challenge is a whopping 20 percentage points–meaning that if a player more skillfully conserved his challenges in five such sets, he’d be able to reverse the outcome of one of them.

On average, the win probability effect of the missed challenge is 12.4 percentage points. In other words, better challenge management would win a player one more set for every eight times he didn’t lose such an opportunity by squandering his challenges.

**The (small) big picture**

Let’s put together the two findings. Based on our assumptions, players run out of challenges and forgo a chance to overturn a later call about once every 320 matches. We now know that the cost of such a mistake is, on average, a 12.4 percentage point win probability hit.

Thus, challenge management costs an average player one set out of every 2600. Given that many matches are played on clay or on courts without Hawkeye, that’s *maybe* once in a career. As long as the assumptions I’ve used are in the right ballpark, the effect isn’t even worth talking about. The mental cost of a player thinking more carefully before challenging might be greater than this exceedingly unlikely benefit.

What if some of the assumptions are wrong? Anecdotally, it seems like challenges cluster in certain matches, because of poor officiating, bad lighting, extreme spin, precise hitting, or some combination of these. It seems possible that certain scenarios would arise in which a player would want to challenge much more frequently, and even though he might gain some accuracy, he would still increase the risk.

I ran the same algorithms for what seems to me to be an extreme case, almost doubling the frequency with which each player challenges, to 3.0%, and somewhat increasing the accuracy rate, to 40%.

With these parameters, a player would run out of challenges and miss an opportunity to overturn a call about six times more often–once every 54 sets, or 1.8% of the time. The impact of each of these missed opportunities doesn’t change, so the overall result also increases by a factor of six. In these extreme case, poor challenge management would cost a player the set 0.28% of the time, or once every 356 sets. That’s a less outrageous number, representing perhaps one set every second year, but it also applies to unusual sets of circumstances which are very unlikely to follow a player to every match.

It seems clear that three challenges is enough. Even in long sets, players usually don’t run out, and when they do, it’s rare that they miss an opportunity that a fourth challenge would have afforded them. The effect of a missed chance can be enormous, but they are so infrequent that players would see little or no benefit from tactically conserving challenges.

]]>In the last week, two developers have released GUIs to make charting easier and more engaging. When I first started the project, I put together an excel spreadsheet that tracks all the user input and keeps score. I’ve used that spreadsheet for the hundreds of matches I’ve charted, but I recognize that it’s not the most intuitive system for some people.

The first new interface is thanks to Stephanie Kovalchik, who writes the tennis blog On the T. (And who has contributed to the MCP in the past.) Her GUI is entirely click-based, which means you don’t have to learn the various letter- and number-codes that are required for the traditional MCP spreadsheet.

While it’s web-based, it has some of the look and feel of a modern handheld app. It’s probably the easiest way to get started contributing to the project.

(Which reminds me, Brian Hrebec wrote an Android app for the project almost two years ago, and I haven’t given it the attention it deserves. It also makes getting started relatively easy, especially if you’d like to chart on an Android device.)

The second new interface is thanks to Charles Allen, of Tennis Visuals. Also web-based, his app requires that you use the same letter- and number-based codes as the original spreadsheet, but sweetens the deal with live visualizations that update after each point:

With four ways to chart matches and add to the Match Charting Project database, there are even fewer excuses not to contribute. If you’re still not convinced, I have even more reasons for you to consider. And if you’re ready to jump in, just click over to one of the new GUIs, or click here for my Quick Start guide.

]]>

Elo is a rating system originally designed for chess, and now used across a wide range of sports. It awards points based on *who *you beat, not when you beat them. That’s in direct contrast to the official ATP and WTA ranking systems, which award points based on tournament and round, regardless of whether you play a qualifier or the number one player in the world.

As such, there are some notable differences between Elo-based rankings and the official lists. In addition to some rearrangement in the top ten, ATP Elo ratings place last week’s champion Roberto Bautista Agut up at #12 (compared to #17 in the official ranking) and Jack Sock at #13 (instead of #23).

The shuffling is even more dramatic on the women’s side. Belinda Bencic, still outside the top ten in the official WTA ranking, is up to #5 by Elo. After her Fed Cup heroics last weekend, Bencic is a single Elo point away from drawing equal with #4 Angelique Kerber.

These new Elo reports also show peaks for every player. That way, you can see how close each player is to his or her career best. You can also spot which players–like Bencic and Bautista Agut–are currently at their peak.

Like any rating system, Elo isn’t perfect. In this simple form, it doesn’t consider surface at all. I haven’t factored Challenger, ITF, or qualifying results into these calculations, either. Elo also doesn’t make any adjustments when a player misses considerable time to injury; a player just re-assumes his or her old rating when they return.

That said, Elo is a more reliable way of comparing players and predicting match outcomes than the official ranking system. And now, you can check in on each player’s rating every week.

]]>