How Much Is a Challenge Worth?

When the Hawkeye line-calling system is available, tennis players are given the right to make three incorrect challenges per set. As with any situation involving scarcity, there’s a choice to make: Take the chance of getting a call overturned, or make sure to keep your options open for later?

We’ve learned over the last several years that human line-calling is pretty darn good, so players don’t turn to Hawkeye that often. At the Australian Open this year, men challenged fewer than nine calls per match–well under three per set or, put another way, less than 1.5 challenges per player per set. Even at that low rate of fewer than once per thirty points, players are usually wrong. Only about one in three calls are overturned.

So while challenges are technically scarce, they aren’t that scarce.  It’s a rare match in which a player challenges so often and is so frequently incorrect that he runs out. That said, it does happen, and while running out of challenges is low-probability, it’s very high risk. Getting a call overturned at a crucial moment could be the difference between winning and losing a tight match. Most of the time, challenges seem worthless, but in certain circumstances, they can be very valuable indeed.

Just how valuable? That’s what I hope to figure out. To do so, we’ll need to estimate the frequency with which players miss opportunities to overturn line calls because they’ve exhausted their challenges, and we’ll need to calculate the potential impact of failing to overturn those calls.

A few notes before we get any further.  The extra challenge awarded to each player at the beginning of a tiebreak would make the analysis much more daunting, so I’ve ignored both that extra challenge and points played in tiebreaks. I suspect it has little effect on the results. I’ve limited this analysis to the ATP, since men challenge more frequently and get calls overturned more often. And finally, this is a very complex, sprawling subject, so we often have to make simplifying assumptions or plug in educated guesses where data isn’t available.

Running out of challenges

The Australian Open data mentioned above is typical for ATP challenges. It is very similar to a subset of Match Charting Project data, suggesting that both challenge frequency and accuracy are about the same across the tour as they are in Melbourne.

Let’s assume that each player challenges a call roughly once every sixty points, or 1.7%. Given an approximate success rate of 30%, each player makes an incorrect challenge on about 1.2% of points and a correct challenge on 0.5% of points. Later on, I’ll introduce a different set of assumptions so we can see what different parameters do to the results.

Running out of challenges isn’t in itself a problem. We’re interested in scenarios when a player not only exhausts his challenges, but when he also misses an opportunity to overturn a call later in the set. These situations are much less common than all of those in which a player might want to contest a call, but we don’t care about the 70% of those challenges that would be wrong, as they wouldn’t have any effect on the outcome of the match.

For each possible set length, from 24-point golden sets up to 93-point marathons, I ran a Monte Carlo simulation, using the assumptions given above, to determine the probability that, in a set of that length, a player would miss a chance to overturn a later call. As noted above, I’ve excluded tiebreaks from this analysis, so I counted only the number of points up to 6-6. I also excluded all “advantage” fifth sets.

For example, the most common set length in the data set is 57 points, which occured 647 times. In 10,000 simulations, a player missed a chance to overturn a call 0.27% of the time. The longer the set, the more likely that challenge scarcity would become an issue. In 10,000 simulations of 85-point sets, players ran out of challenges more than three times as often. In 0.92% of the simulations, a player was unable to challenge a call that would have been overturned.

These simulations are simple, assuming that each point is identical. Of course, players are aware of the cap on challenges, so with only one challenge remaining, they may be less likely to contest a “probably correct” call, and they would be very unlikely to use a challenge to earn a few extra seconds of rest. Further, the fact that players sometimes use Hawkeye for a bit of a break suggests that what we might call “true” challenges–instances in which the player believes the original call was wrong–are a bit less frequent that the numbers we’re using. Ultimately, we can’t address these concerns without a more complex model and quite a bit of data we don’t have.

Back to the results. Taking every possible set length and the results of the simulation for each one, we find the average player is likely to run out of challenges and miss a chance to overturn a call roughly once every 320 sets, or 0.31% of the time. That’s not very often–for almost all players, it’s less than once per season.

The impact of (not) overturning a call

Just because such an outcome is infrequent doesn’t necessarily mean it isn’t important. If a low-probability event has a high enough impact when it does occur, it’s still worth planning for.

Toward the end of a set, when most of these missed chances would occur, points can be very important, like break point at 5-6. But other points are almost meaningless, like 40-0 in just about any game.

To estimate the impact of these missed opportunities, I ran another set of Monte Carlo simulations. (This gets a bit hairy–bear with me.) For each set length, for those cases when a player ran out of challenges, I found the average number of points at which he used his last challenge. Then, for each run of the simulation, I took a random set from the last few years of ATP data with the corresponding number of points, chose a random point between the average time that the challenges ran out and the end of the set, and measured the importance of that point.

To quantify the importance of the point, I calculated three probabilities from the perspective of the player who lost the point and, had he conserved his challenges, could have overturned it:

  1. his odds of winning the set before that point was played
  2. his odds of winning the set after that point was played (and not overturned)
  3. his odds of winning the set had the call been overturned and the point awarded to him.

(To generate these probabilities, I used my win probability code posted here with the assumption that each player wins 65% of his service points. The model treats points as independent–that is, the outcome of one point does not depend on the outcomes of previous points–which is not precisely true, but it’s close, and it makes things immensely more straightforward. Alert readers will also note that I’ve ignored the possibility of yet another call that could be overturned. However, the extremely low probability of that event convinced me to avoid the additional complexity required to model it.)

Given these numbers, we can calculate the possible effects of the challenge he couldn’t make. The difference between (2) and (3) is the effect if the call would’ve been overturned and awarded to him. The difference between (1) and (2) is the effect if the point would have been replayed. This is essentially the same concept as “leverage index” in baseball analytics.

Again, we’re missing some data–I have no idea what percentage of overturned calls result in each of those two outcomes. For today, we’ll say it’s half and half, so to boil down the effect of the missed challenge to a single number, we’ll average those two differences.

For example, let’s say we’re at five games all, and the returner wins the first point of the 11th game. The server’s odds of winning the set have decreased from 50% (at 5-all, love-all) to 43.0%. If the server got the call overturned and was awarded the point, his odds would increase to 53.8%. Thus, the win probability impact of overturning the call and taking the point is 10.8%, while the effect of forcing a replay is 7.0%. For the purposes of this simulation, we’re averaging these two numbers and using 8.9% as the win probability impact of this missed opportunity to challenge.

Back to the big picture. For each set length, I ran 1,000 simulations like what I’ve described above and averaged the results. In short sets under 40 points, the win probability impact of the missed challenge is less than five percentage points. The longer the set, the bigger the effect: Long sets are typically closer and the points tend to be higher-leverage. In 85-point sets, for instance, the average effect of the missed challenge is a whopping 20 percentage points–meaning that if a player more skillfully conserved his challenges in five such sets, he’d be able to reverse the outcome of one of them.

On average, the win probability effect of the missed challenge is 12.4 percentage points. In other words, better challenge management would win a player one more set for every eight times he didn’t lose such an opportunity by squandering his challenges.

The (small) big picture

Let’s put together the two findings. Based on our assumptions, players run out of challenges and forgo a chance to overturn a later call about once every 320 matches. We now know that the cost of such a mistake is, on average, a 12.4 percentage point win probability hit.

Thus, challenge management costs an average player one set out of every 2600. Given that many matches are played on clay or on courts without Hawkeye, that’s maybe once in a career. As long as the assumptions I’ve used are in the right ballpark, the effect isn’t even worth talking about. The mental cost of a player thinking more carefully before challenging might be greater than this exceedingly unlikely benefit.

What if some of the assumptions are wrong? Anecdotally, it seems like challenges cluster in certain matches, because of poor officiating, bad lighting, extreme spin, precise hitting, or some combination of these. It seems possible that certain scenarios would arise in which a player would want to challenge much more frequently, and even though he might gain some accuracy, he would still increase the risk.

I ran the same algorithms for what seems to me to be an extreme case, almost doubling the frequency with which each player challenges, to 3.0%, and somewhat increasing the accuracy rate, to 40%.

With these parameters, a player would run out of challenges and miss an opportunity to overturn a call about six times more often–once every 54 sets, or 1.8% of the time. The impact of each of these missed opportunities doesn’t change, so the overall result also increases by a factor of six. In these extreme case, poor challenge management would cost a player the set 0.28% of the time, or once every 356 sets. That’s a less outrageous number, representing perhaps one set every second year, but it also applies to unusual sets of circumstances which are very unlikely to follow a player to every match.

It seems clear that three challenges is enough. Even in long sets, players usually don’t run out, and when they do, it’s rare that they miss an opportunity that a fourth challenge would have afforded them. The effect of a missed chance can be enormous, but they are so infrequent that players would see little or no benefit from tactically conserving challenges.

Two New Ways to Chart Tennis Matches

Readers of this site are probably already aware of the Match Charting Project, my effort to coordinate volunteer contributions to build a massive shot-by-shot database of professional tennis. If this is the first you’ve heard of it, I encourage you to check out the detailed match- and player-level data we’ve gathered already.

In the last week, two developers have released GUIs to make charting easier and more engaging. When I first started the project, I put together an excel spreadsheet that tracks all the user input and keeps score. I’ve used that spreadsheet for the hundreds of matches I’ve charted, but I recognize that it’s not the most intuitive system for some people.

The first new interface is thanks to Stephanie Kovalchik, who writes the tennis blog On the T. (And who has contributed to the MCP in the past.) Her GUI is entirely click-based, which means you don’t have to learn the various letter- and number-codes that are required for the traditional MCP spreadsheet.

skoval

While it’s web-based, it has some of the look and feel of a modern handheld app. It’s probably the easiest way to get started contributing to the project.

(Which reminds me, Brian Hrebec wrote an Android app for the project almost two years ago, and I haven’t given it the attention it deserves. It also makes getting started relatively easy, especially if you’d like to chart on an Android device.)

The second new interface is thanks to Charles Allen, of Tennis Visuals. Also web-based, his app requires that you use the same letter- and number-based codes as the original spreadsheet, but sweetens the deal with live visualizations that update after each point:

tvis

With four ways to chart matches and add to the Match Charting Project database, there are even fewer excuses not to contribute. If you’re still not convinced, I have even more reasons for you to consider. And if you’re ready to jump in, just click over to one of the new GUIs, or click here for my Quick Start guide.

 

New at TennisAbstract: Weekly Elo Reports

Starting today, you can find weekly Elo ranking reports on the home page of Tennis Abstract. Here are the men’s ratings, and here are the women’s ratings.

Elo is a rating system originally designed for chess, and now used across a wide range of sports. It awards points based on who you beat, not when you beat them. That’s in direct contrast to the official ATP and WTA ranking systems, which award points based on tournament and round, regardless of whether you play a qualifier or the number one player in the world.

As such, there are some notable differences between Elo-based rankings and the official lists. In addition to some rearrangement in the top ten, ATP Elo ratings place last week’s champion Roberto Bautista Agut up at #12 (compared to #17 in the official ranking) and Jack Sock at #13 (instead of #23).

The shuffling is even more dramatic on the women’s side. Belinda Bencic, still outside the top ten in the official WTA ranking, is up to #5 by Elo. After her Fed Cup heroics last weekend, Bencic is a single Elo point away from drawing equal with #4 Angelique Kerber.

These new Elo reports also show peaks for every player. That way, you can see how close each player is to his or her career best. You can also spot which players–like Bencic and Bautista Agut–are currently at their peak.

Like any rating system, Elo isn’t perfect. In this simple form, it doesn’t consider surface at all. I haven’t factored Challenger, ITF, or qualifying results into these calculations, either. Elo also doesn’t make any adjustments when a player misses considerable time to injury; a player just re-assumes his or her old rating when they return.

That said, Elo is a more reliable way of comparing players and predicting match outcomes than the official ranking system. And now, you can check in on each player’s rating every week.

What Happens After an Unsuccessful First Serve Challenge?

A lot of first serves miss, so every player has a well-established routine between the first and second serve. So much so that, traditionally, if something disrupts that routine, the receiver may grant the server another first serve.

Hawkeye has changed all that. If the server doubts the line call, he or she may challenge it. That results in a lengthy wait, usually some crowd noise, and a general wreckage of that between-serves routine.

The conventional wisdom seems to be that the long pause is harmful to the server: that if the challenge fails, the server is less likely to put the second serve in the box. And if the second serve does go in, it’s weaker than average, so the server is less likely to win the point.

My analysis of over 200 first-serve challenges casts doubt on the conventional wisdom. It’s another triumph for the null hypothesis, the only force in tennis as dominant as Novak Djokovic.

As I’ve charted matches for the Match Charting Project, I’ve noted each challenge, the type of challenge, and whether it was successful. I’ve accumulated 116 ATP and 89 WTA instances in which a player unsuccessfully challenged the call on his own first serve. For each of these challenges, I also calculated some match-level stats for that server: how often s/he made the second serve, and how often s/he won second serve points.

Of the 116 unsuccessful ATP challenges, players made 106 of their second serves. Based on their overall rates in those matches, we’d expect them to make 106.6 of them. They won exactly half–58–of those points, and their performance in those matches suggests that they “should” have won 58.2 of them.

In other words, players are recovering from the disruption and performing almost exactly as they normally do.

For WTAers, it’s a similar story. Players made 77 of their 89 second serves. If they landed second serves at the same rate they did in the rest of those matches, they’d have made 77.1. They won 38 of the 89 points, compared to an expected 40 points. That last difference, of five percent, is the only one that is more than a rounding error. Even if the effect is real–which is doubtful, given the conflicting ATP number and the small sample size–it’s a small one.

Of course, the potential benefit of challenging the call on your first serve is big: If you’re right, you either win the point or get another first serve. Of the challenges I’ve tracked, men were successful 38% of the time on their first serves, and women were right 32% of the time.

There’s no evidence here that players are harmed by appealing to Hawkeye on their own first serves. Apart from the small risk of running out of challenges, it’s all upside. Tennis pros adore routine, but in this case, they perform just as well when the routine is disrupted.

First and Second Serves: Another ATP Info-miss

Breaking news, everybody: First serves are better than second serves!

That’s what I learned, anyway, from the latest article in the “Infosys ATP Beyond the Numbers” series:

When you average out the Top 10 players in the 2015 season, they are saving break points 72 per cent of the time when making a first serve. On average, that drops to 53 per cent with second serves. That 19 per cent difference is one of the most important, hidden metrics in our sport.

Is the difference between first and second serves “important?” Definitely. Is it in any way “hidden?” Not so much.

The melodramatic phrasing here suggests that break points are different from regular points, perhaps with a much larger spread between first and second serve winning percentages. But no, that’s not the case.

Last year, top ten players won 75.6% of first-serve points and 55.4% of second-serve points. Combined with the Infosys numbers–which I can’t verify, because the ATP doesn’t make the necessary raw data available–that means that top ten players win 5% less often when making a first serve on break point, and 5% less often when missing their first serve on break point.

At the risk of belaboring this: When it comes to the importance of making your first serve, break points are no different than other points.

Even that 5% difference is less meaningful that it looks. Break points don’t occur at random–better opponents generate more break opportunities. If you play two matches, one against Novak Djokovic and one against Jerzy Janowicz, you’re likely to face far more break points against Novak than against Jerzy … and of course, you’re less likely to win them.

Pundits tend to focus on break points, and in part, they are right to do so, because this small subset of points have an outsized effect on match outcomes. However, because of the small sample, it’s easy–and far too common–to read too much into break point results. My research has repeatedly shown that, once you control for opponent quality, most players win break points about as often as they do non-break points.

The ATP is sitting on a wealth of information. If we’re going to learn anything meaningful when they go “beyond the numbers,” it would be nice if they took advantage of more of their data and offered up more sophisticated analysis.

Match Charting Project February Update

At the beginning of the year, I announced an ambitious goal: to double the number of matches in the Match Charting Project dataset. That’s a target of 1,617 new matches in 2016–about 135 per month, or 4.5 per day.

So far, so good! In January, ten contributors combined to add 162 new matches to the total. Our biggest heroes were Edo, with 35 matches, including many Grand Slam finals; Isaac, with 33; and Edged, whose 22 included some of the dramatic late-round men’s matches from Melbourne.

As we close in on the 1,800-match mark, I’m excited to announce a new addition to the stats and reports available on Tennis Abstract. Now, for every player with at least two charted matches in the database, there’s a dedicated player page with hundreds of aggregate data points for that player.

Here’s Novak Djokovic’s page, and here’s Angelique Kerber’s. I’m still working on integrating these pages into the rest of Tennis Abstract, but for now, you’ll be able to access them by clicking on the match totals next to every player’s name on the Match Charting home page.

These pages each feature four charts, which compare the player’s typical rally length, shot selection, winner types, and unforced error types to tour average. The other links on each page take you to tables very similar to those on the MCP match reports. Move your cursor over any rate to see the relevant tour average, as well as that player’s rates on each surface.

I hope you like this new addition, which owes so much to the amazing efforts of so many volunteer charters.

I hope, too, that you’ll be inspired to contribute to the project as well. When you’re ready to try your hand at charting, start here. As always, the more matches we have, the more valuable the project becomes.

Is Milos Raonic’s Return Game Improving?

It’s no secret that Milos Raonic‘s return game is a liability. He has reached the game’s elite level with a dominant serve, and he broke into the top five on the strength of a historically-great record in tiebreaks.

Last year, Raonic’s tiebreak record fell back to earth (as these things usually do) and he dropped out of the top ten. Now, in a new season with a new coach, Carlos Moya, Raonic reeled off nine straight victories, finally losing in five sets to Andy Murray in today’s Australian Open semifinal.

Until today’s match, when Raonic won a dismal 25% of return points, the numbers were looking good. Milos won 36.5% of return points in his four matches in Brisbane, which is a little bit better than the 35% tour average on hard courts. With his serve, he doesn’t need to be a great returner; simply improving that aspect of his game to average would make him a dominant force on tour.

This is a crucial number to watch, because it could be the difference between Milos becoming number one in the world and Milos languishing in the back half of the top ten. It’s incredibly rare that players with weak return games are able to maintain a position at the very top of the rankings.

Through the quarterfinals in Melbourne, the positive signs kept piling up. For each of his 2016 opponents, I tallied their 2015 service points won on hard courts. In 6 of 10 matches this month, Milos kept their number below their 2015 average. In a 7th match, against Gael Monfils, he was one return point away from doing the same.

By comparison, in 2015, Raonic held hard-court opponents to their average rate of service points won only 9 times in 35 tries. Even in his career-best season of 2014, he did so in only 15 of 41 matches. Even with the weak return numbers against Murray, this is Raonic’s best ever 10-match stretch, by this metric.

The difference is more dramatic when we combine all these single-match measurements into a single metric per season. For each match, I calculated how well Milos returned relative to an average player against his opponent that day. For example, against Murray today, he won 25% of return points compared to an average hard-court Murray opponent’s 33.7%. In percentage terms, Raonic returned 26% worse than average.

Aggregating all of his 2016 matches, Raonic has returned 6% better than average. In 2015 hard-court matches, he was 10% below average; in 2014, 3% below average, and in 2013, 7% below average.

A nine-match stretch of good form is hardly proof that a player has massively improved half of his game, but it’s certainly encouraging. While all know that Milos is an elite server, it’s his return game that will determine how great he becomes.