How Much Is a Challenge Worth?

When the Hawkeye line-calling system is available, tennis players are given the right to make three incorrect challenges per set. As with any situation involving scarcity, there’s a choice to make: Take the chance of getting a call overturned, or make sure to keep your options open for later?

We’ve learned over the last several years that human line-calling is pretty darn good, so players don’t turn to Hawkeye that often. At the Australian Open this year, men challenged fewer than nine calls per match–well under three per set or, put another way, less than 1.5 challenges per player per set. Even at that low rate of fewer than once per thirty points, players are usually wrong. Only about one in three calls are overturned.

So while challenges are technically scarce, they aren’t that scarce.  It’s a rare match in which a player challenges so often and is so frequently incorrect that he runs out. That said, it does happen, and while running out of challenges is low-probability, it’s very high risk. Getting a call overturned at a crucial moment could be the difference between winning and losing a tight match. Most of the time, challenges seem worthless, but in certain circumstances, they can be very valuable indeed.

Just how valuable? That’s what I hope to figure out. To do so, we’ll need to estimate the frequency with which players miss opportunities to overturn line calls because they’ve exhausted their challenges, and we’ll need to calculate the potential impact of failing to overturn those calls.

A few notes before we get any further.  The extra challenge awarded to each player at the beginning of a tiebreak would make the analysis much more daunting, so I’ve ignored both that extra challenge and points played in tiebreaks. I suspect it has little effect on the results. I’ve limited this analysis to the ATP, since men challenge more frequently and get calls overturned more often. And finally, this is a very complex, sprawling subject, so we often have to make simplifying assumptions or plug in educated guesses where data isn’t available.

Running out of challenges

The Australian Open data mentioned above is typical for ATP challenges. It is very similar to a subset of Match Charting Project data, suggesting that both challenge frequency and accuracy are about the same across the tour as they are in Melbourne.

Let’s assume that each player challenges a call roughly once every sixty points, or 1.7%. Given an approximate success rate of 30%, each player makes an incorrect challenge on about 1.2% of points and a correct challenge on 0.5% of points. Later on, I’ll introduce a different set of assumptions so we can see what different parameters do to the results.

Running out of challenges isn’t in itself a problem. We’re interested in scenarios when a player not only exhausts his challenges, but when he also misses an opportunity to overturn a call later in the set. These situations are much less common than all of those in which a player might want to contest a call, but we don’t care about the 70% of those challenges that would be wrong, as they wouldn’t have any effect on the outcome of the match.

For each possible set length, from 24-point golden sets up to 93-point marathons, I ran a Monte Carlo simulation, using the assumptions given above, to determine the probability that, in a set of that length, a player would miss a chance to overturn a later call. As noted above, I’ve excluded tiebreaks from this analysis, so I counted only the number of points up to 6-6. I also excluded all “advantage” fifth sets.

For example, the most common set length in the data set is 57 points, which occured 647 times. In 10,000 simulations, a player missed a chance to overturn a call 0.27% of the time. The longer the set, the more likely that challenge scarcity would become an issue. In 10,000 simulations of 85-point sets, players ran out of challenges more than three times as often. In 0.92% of the simulations, a player was unable to challenge a call that would have been overturned.

These simulations are simple, assuming that each point is identical. Of course, players are aware of the cap on challenges, so with only one challenge remaining, they may be less likely to contest a “probably correct” call, and they would be very unlikely to use a challenge to earn a few extra seconds of rest. Further, the fact that players sometimes use Hawkeye for a bit of a break suggests that what we might call “true” challenges–instances in which the player believes the original call was wrong–are a bit less frequent that the numbers we’re using. Ultimately, we can’t address these concerns without a more complex model and quite a bit of data we don’t have.

Back to the results. Taking every possible set length and the results of the simulation for each one, we find the average player is likely to run out of challenges and miss a chance to overturn a call roughly once every 320 sets, or 0.31% of the time. That’s not very often–for almost all players, it’s less than once per season.

The impact of (not) overturning a call

Just because such an outcome is infrequent doesn’t necessarily mean it isn’t important. If a low-probability event has a high enough impact when it does occur, it’s still worth planning for.

Toward the end of a set, when most of these missed chances would occur, points can be very important, like break point at 5-6. But other points are almost meaningless, like 40-0 in just about any game.

To estimate the impact of these missed opportunities, I ran another set of Monte Carlo simulations. (This gets a bit hairy–bear with me.) For each set length, for those cases when a player ran out of challenges, I found the average number of points at which he used his last challenge. Then, for each run of the simulation, I took a random set from the last few years of ATP data with the corresponding number of points, chose a random point between the average time that the challenges ran out and the end of the set, and measured the importance of that point.

To quantify the importance of the point, I calculated three probabilities from the perspective of the player who lost the point and, had he conserved his challenges, could have overturned it:

  1. his odds of winning the set before that point was played
  2. his odds of winning the set after that point was played (and not overturned)
  3. his odds of winning the set had the call been overturned and the point awarded to him.

(To generate these probabilities, I used my win probability code posted here with the assumption that each player wins 65% of his service points. The model treats points as independent–that is, the outcome of one point does not depend on the outcomes of previous points–which is not precisely true, but it’s close, and it makes things immensely more straightforward. Alert readers will also note that I’ve ignored the possibility of yet another call that could be overturned. However, the extremely low probability of that event convinced me to avoid the additional complexity required to model it.)

Given these numbers, we can calculate the possible effects of the challenge he couldn’t make. The difference between (2) and (3) is the effect if the call would’ve been overturned and awarded to him. The difference between (1) and (2) is the effect if the point would have been replayed. This is essentially the same concept as “leverage index” in baseball analytics.

Again, we’re missing some data–I have no idea what percentage of overturned calls result in each of those two outcomes. For today, we’ll say it’s half and half, so to boil down the effect of the missed challenge to a single number, we’ll average those two differences.

For example, let’s say we’re at five games all, and the returner wins the first point of the 11th game. The server’s odds of winning the set have decreased from 50% (at 5-all, love-all) to 43.0%. If the server got the call overturned and was awarded the point, his odds would increase to 53.8%. Thus, the win probability impact of overturning the call and taking the point is 10.8%, while the effect of forcing a replay is 7.0%. For the purposes of this simulation, we’re averaging these two numbers and using 8.9% as the win probability impact of this missed opportunity to challenge.

Back to the big picture. For each set length, I ran 1,000 simulations like what I’ve described above and averaged the results. In short sets under 40 points, the win probability impact of the missed challenge is less than five percentage points. The longer the set, the bigger the effect: Long sets are typically closer and the points tend to be higher-leverage. In 85-point sets, for instance, the average effect of the missed challenge is a whopping 20 percentage points–meaning that if a player more skillfully conserved his challenges in five such sets, he’d be able to reverse the outcome of one of them.

On average, the win probability effect of the missed challenge is 12.4 percentage points. In other words, better challenge management would win a player one more set for every eight times he didn’t lose such an opportunity by squandering his challenges.

The (small) big picture

Let’s put together the two findings. Based on our assumptions, players run out of challenges and forgo a chance to overturn a later call about once every 320 matches. We now know that the cost of such a mistake is, on average, a 12.4 percentage point win probability hit.

Thus, challenge management costs an average player one set out of every 2600. Given that many matches are played on clay or on courts without Hawkeye, that’s maybe once in a career. As long as the assumptions I’ve used are in the right ballpark, the effect isn’t even worth talking about. The mental cost of a player thinking more carefully before challenging might be greater than this exceedingly unlikely benefit.

What if some of the assumptions are wrong? Anecdotally, it seems like challenges cluster in certain matches, because of poor officiating, bad lighting, extreme spin, precise hitting, or some combination of these. It seems possible that certain scenarios would arise in which a player would want to challenge much more frequently, and even though he might gain some accuracy, he would still increase the risk.

I ran the same algorithms for what seems to me to be an extreme case, almost doubling the frequency with which each player challenges, to 3.0%, and somewhat increasing the accuracy rate, to 40%.

With these parameters, a player would run out of challenges and miss an opportunity to overturn a call about six times more often–once every 54 sets, or 1.8% of the time. The impact of each of these missed opportunities doesn’t change, so the overall result also increases by a factor of six. In these extreme case, poor challenge management would cost a player the set 0.28% of the time, or once every 356 sets. That’s a less outrageous number, representing perhaps one set every second year, but it also applies to unusual sets of circumstances which are very unlikely to follow a player to every match.

It seems clear that three challenges is enough. Even in long sets, players usually don’t run out, and when they do, it’s rare that they miss an opportunity that a fourth challenge would have afforded them. The effect of a missed chance can be enormous, but they are so infrequent that players would see little or no benefit from tactically conserving challenges.

What Happens After an Unsuccessful First Serve Challenge?

A lot of first serves miss, so every player has a well-established routine between the first and second serve. So much so that, traditionally, if something disrupts that routine, the receiver may grant the server another first serve.

Hawkeye has changed all that. If the server doubts the line call, he or she may challenge it. That results in a lengthy wait, usually some crowd noise, and a general wreckage of that between-serves routine.

The conventional wisdom seems to be that the long pause is harmful to the server: that if the challenge fails, the server is less likely to put the second serve in the box. And if the second serve does go in, it’s weaker than average, so the server is less likely to win the point.

My analysis of over 200 first-serve challenges casts doubt on the conventional wisdom. It’s another triumph for the null hypothesis, the only force in tennis as dominant as Novak Djokovic.

As I’ve charted matches for the Match Charting Project, I’ve noted each challenge, the type of challenge, and whether it was successful. I’ve accumulated 116 ATP and 89 WTA instances in which a player unsuccessfully challenged the call on his own first serve. For each of these challenges, I also calculated some match-level stats for that server: how often s/he made the second serve, and how often s/he won second serve points.

Of the 116 unsuccessful ATP challenges, players made 106 of their second serves. Based on their overall rates in those matches, we’d expect them to make 106.6 of them. They won exactly half–58–of those points, and their performance in those matches suggests that they “should” have won 58.2 of them.

In other words, players are recovering from the disruption and performing almost exactly as they normally do.

For WTAers, it’s a similar story. Players made 77 of their 89 second serves. If they landed second serves at the same rate they did in the rest of those matches, they’d have made 77.1. They won 38 of the 89 points, compared to an expected 40 points. That last difference, of five percent, is the only one that is more than a rounding error. Even if the effect is real–which is doubtful, given the conflicting ATP number and the small sample size–it’s a small one.

Of course, the potential benefit of challenging the call on your first serve is big: If you’re right, you either win the point or get another first serve. Of the challenges I’ve tracked, men were successful 38% of the time on their first serves, and women were right 32% of the time.

There’s no evidence here that players are harmed by appealing to Hawkeye on their own first serves. Apart from the small risk of running out of challenges, it’s all upside. Tennis pros adore routine, but in this case, they perform just as well when the routine is disrupted.

There Is No Analytics Revolution In Tennis

I’m sure you’ve heard about the trend. First, statistics overhauled baseball, and teams in every major sport now employ quants to search out that extra edge. Tennis has lagged behind the others, but with the help of big data, we’re on the cusp of a whole new era.

That’s the story, anyway. Yesterday brought us another example.

What happened in baseball is, quite simply, never going to happen in tennis.

To oversimplify a bit, the “Moneyball revolution” refers to front offices using analytics to identify underrated and underpriced players. To a lesser extent, it refers to deploying those players in a smarter way–say, rearranging the batting order or attempting fewer stolen bases.

In tennis, there are no front offices. Players aren’t paid salaries by teams. And there are no managers to decide how best to use their players.

In short: There are no organizations with both the incentives and the resources to analyze data.

Of course, when people get breathless about all the raw data floating around in tennis, that isn’t what they’re talking about. (No one really thinks Hawkeye data is going to revolutionize, say, the World Team Tennis draft.) Instead, they are implying that the data can be analyzed in such a way to be actionable for players.

That’s an admirable objective. In theory, Kevin Anderson’s coach could look at all the data from all the matches between Anderson and Tomas Berdych and identify which tactics worked, which didn’t, and make recommendations accordingly. Of course, Kevin’s coach is already watching all those matches, taking notes, reviewing video, and presumably making recommendations, so if big data is going to change the game, it needs to somehow offer coaches demonstrably better insights.

With all the cameras pointed at tennis’s show courts, that’s certainly possible. The closest analogue in baseball is the pitch f/x system, which tracks the speed, location, and movement of every pitch. Some pitchers have been able to use pitch f/x data to analyze and improve upon their own performance. The same could eventually happen in tennis. But there are systemic reasons why it hasn’t yet, and those root causes are unlikely to disappear anytime soon.

What needs to change

Hawkeye cameras are aimed at a lot of courts and have the capability of collecting an enormous amount of data. That’s how broadcasts are able to bring you stats like average net clearance and meters run. Those cameras also help generate graphics like those showing where all of a player’s serves landed.

After a match is over, with no calls left to be overturned and no broadcast needs likely to arise, what happens to the data? For all practical purposes, it gets stashed in the attic and forgotten. (Here’s a more thorough explanation.) Contrast that to Major League Baseball, which makes all pitch f/x data available immediately–to the public, for free–and has archived it indefinitely.

If tennis is to see any meaningful analytical breakthroughs, Hawkeye data needs to be aggregated in a single database. Results from one match are sometimes interesting (hey look, Andy’s net clearance is 15% greater than Roger’s!), but if we’re always looking at one match, or one tournament, at a time, we’ll never learn which of these Hawkeye-derived statistics matter, or how much.

IBM, the collector of much of this information, may already maintain some version of that database. But the results are jaw-droppingly uninspiring. On broadcasts, we get the same old stats and graphics. When IBM has ventured into predicting match outcomes, their “millions of data points” are outperformed by my much simpler model.

IBM is the one organization in the sport with the resources to do the kind of analysis that will transform tennis. But they have no incentive to do so. To IBM (and now SAP, in the women’s game), tennis is a public relations opportunity, one that allows them to brand tournament websites and on-screen graphics with their logo. (Not to mention those suspiciously pro-IBM trend pieces linked to above.)

Players might eventually benefit from data-based insights, but only a tiny fraction of them could afford to hire even a single analyst. (Hi Simona! Text me anytime!)

Once again, we have to turn to baseball for a precedent. Even in that immense sport, with its billion-dollar franchises, it was amateurs–outsiders–who did the work that brought about the analytics revolution. Even now, with teams aggressively hiring promising talent from outside the game, many of the most profitable insights still come from independent researchers. If MLB made its data as inaccessible as tennis does, that trend would’ve ground to a halt long ago.

Nice as it is to dream about a better world of tennis data, we’re unlikely to see it anytime soon. Tennis doesn’t have a commissioner, so there’s no one to appoint a data czar, let alone anyone who could convince the alphabet soup of the ATP, WTA, ITF, IBM, SAP, and Hawkeye to aggregate their data in any meaningful way.

Until that happens, and until the data is publicly available, there will be no analytics revolution in tennis. We’ll continue to get what we have now: the occasional Hawkeye stat, free of context, illustrating the same sort of analysis we’ve been hearing for decades.

Halep’s Beatdown, Challenges by Gender, Djokovic Unthreatened

Thanks to the dominance of players like Serena Williams and Victoria Azarenka, it’s not much of a surprise to see a scoreline like 6-1 6-0 in the first week of a Grand Slam.  But when an upset comes with scores like that, we should sit up and take notice.

That’s what Simona Halep did to Maria Kirilenko, and trust me, it wasn’t any closer than the score suggests.  Halep has a deceptively big game, content to counterpunch but always looking for an opening for what can be a monster backhand.  I charted her match yesterday (along with Vika’s third-rounder against Alize Cornet), so look for some detailed stats from those matches later today.

Even before the first matches were played, it was clear that the Romanian landed in the right part of the draw, in a quarter free of Serena, Vika, Agnieszka Radwanska, and Na Li.  With the early upsets of Sara Errani and Caroline Wozniacki, the two highest-ranked women in her quarter, Halep’s position looks even better.

Strangely enough, though, her next two opponents are women she might prefer not to face.  Flavia Pennetta, who will play her in the round of 16, was the last woman outside of the top 20 to beat Halep.  (Granted, Simona retired in the third set with a lower back injury.)  Her likely quarterfinal opponent, Roberta Vinci, is a more  interesting case.  The pair have already faced off three times this year, and on the first of those occasions, Vinci dished out Halep’s worst loss of the year, a 6-0 6-3 drubbing on the carpet in Paris.  Since then, Simona has won two equally lopsided matches, on both clay and grass.

The way Halep was playing yesterday, though, we can safely pencil her into the semifinals, regardless who she draws in the meantime.

Did you know that, at Grand Slams, men use the challenge system more that women do?

At the Open so far this year, men have made 7.52 challenges per match, while women have made 3.38.  The same pattern held at the Australian Open and Wimbledon this year.  In general, there are about twice as many challenges in a men’s Slam match than in a women’s slam match.

Of course, a big part of that discrepancy arises because men play best-of-5 matches while women play best-of-3.  The more sets, the more points, and the more points, the more potential reasons to challenge.

Still, the structural difference doesn’t entirely account for the gap.  For instance, there were roughly 90 men’s matches and 90 women’s matches played on Hawkeye courts in Melbourne this year, and the men’s matches averaged about 60% more points.  Men challenged calls once every 32 points, while women challenged once every 37 points.

That’s not quite as dramatic as the 2:1 ratio we started with, but it’s still notable, and it has remained consistent throughout multiple slams this year.

One possibility is that men challenge more because, on average, they hit the ball harder, particularly on the serve.  The harder the shot, the tougher it is for everyone to see exactly where it lands, and the greater likelihood of disagreement.  To corroborate, it would be interesting to know whether chair umpires are more or less likely to overrule in men’s matches.

Yesterday I noted that Djokovic had a remarkably easy path to the quarterfinals.  If Marcel Granollers beats Tim Smyczek, the Spaniard will be Novak’s highest-ranked opponent en route to the quarters.  (That’s assuming Djokovic beats 95th-ranked Joao Sousa today.)

If Granollers advances, Djokovic’s first four opponents will have the following rankings: 112, 87, 95, and 43.  In 24 previous Grand Slam quarterfinal runs, Novak has needed to beat someone in the top 40 20 of those times, and someone in the top 30 17 of those times.

If, as all patriotic Americans fervently hope, Smyczek wins today, we’ll venture into more extreme territory.  In that case, Djokovic’s highest-ranked opponent will have been 87th-ranked Benjamin Becker.  One suspects that a fair number of ATP players could advance to the quarterfinals given this draw.

In the Smyczek scenario, Djokovic will have faced an easier path than Roger Federer ever has in his 36 Slam quarterfinal showings.  As Carl Bialik reported during last year’s French Open, Roger’s first four rounds at Roland Garros were the easiest of his career–his highest-ranked opponent was #78 Tobias Kamke.

Federer’s experience leaves it unclear whether such a friendly draw is a good thing.  In the quarterfinals of that tournament, he lost his first two sets to Juan Martin del Potro before charging back for the five-set victory.  Perhaps we can expect such a thriller from Djokovic and Tommy Haas next week.

Want to know more about Tim Smyczek?  Here’s a good place to start.

Here’s another excellent win probability graph from Betting Market Analytics, this time covering the five-setter between Hewitt and del Potro.