How Elo Solves the Olympics Ranking Points Conundrum

Italian translation at settesei.it

Last week’s Olympic tennis tournament had superstars, it had drama, and it had tears, but it didn’t have ranking points. Surprise medalists Monica Puig and Juan Martin del Potro scored huge triumphs for themselves and their countries, yet they still languish at 35th and 141st in their respective tour’s rankings.

The official ATP and WTA rankings have always represented a collection of compromises, as they try to accomplish dual goals of rewarding certain behaviors (like showing up for high-profile events) and identifying the best players for entry in upcoming tournaments. Stripping the Olympics of ranking points altogether was an even weirder compromise than usual. Four years ago in London, some points were awarded and almost all the top players on both tours showed up, even though many of them could’ve won more points playing elsewhere.

For most players, the chance at Olympic gold was enough. The level of competition was quite high, so while the ATP and WTA tours treat the tournament in Rio as a mere exhibition, those of us who want to measure player ability and make forecasts must factor Olympics results into our calculations.

Elo, a rating system originally designed for chess that I’ve been using for tennis for the past year, is an excellent tool to use to integrate Rio results with the rest of this season’s wins and losses. Broadly speaking, it awards points to match winners and subtracts points from losers. Beating a top player is worth many more points than beating a lower-rated one. There is no penalty for not playing–for example, Stan Wawrinka‘s and Simona Halep‘s ratings are unchanged from a week ago.

Unlike the ATP and WTA ranking systems, which award points based on the level of tournament and round, Elo is context-neutral. Del Potro’s Elo rating improved quite a bit thanks to his first-round upset of Novak Djokovic–the same amount it would have increased if he had beaten Djokovic in, say, the Toronto final.

Many fans object to this, on the reasonable assumption that context matters. It certainly seems like the Wimbledon final should count for more than, say, a Monte Carlo quarterfinal, even if the same player defeats the same opponent in both matches.

However, results matter for ranking systems, too. A good rating system will do two things: predict winners correctly more often than other systems, and give more accurate degrees of confidence for those predictions. (For example, in a sample of 100 matches in which the system gives one player a 70% chance of winning, the favorite should win 70 times.) Elo, with its ignorance of context, predicts more winners and gives more accurate forecast certainties than any other system I’m aware of.

For one thing, it wipes the floor with the official rankings. While it’s possible that tweaking Elo with context-aware details would better the results even more, the improvement would likely be minor compared to the massive difference between Elo’s accuracy and that of the ATP and WTA algorithms.

Relying on a context-neutral system is perfect for tennis. Instead of altering the ranking system with every change in tournament format, we can always rate players the same way, using only their wins, losses, and opponents. In the case of the Olympics, it doesn’t matter which players participate, or what anyone thinks about the overall level of play. If you defeat a trio of top players, as Puig did, your rating skyrockets. Simple as that.

Two weeks ago, Puig was ranked 49th among WTA players by Elo–several places lower than her WTA ranking of 37. After beating Garbine Muguruza, Petra Kvitova, and Angelique Kerber, her Elo ranking jumped to 22nd. While it’s tough, intuitively, to know just how much weight to assign to such an outlier of a result, her Elo rating just outside the top 20 seems much more plausible than Puig’s effectively unchanged WTA ranking in the mid-30s.

Del Potro is another interesting test case, as his injury-riddled career presents difficulties for any rating system. According to the ATP algorithm, he is still outside the top 100 in the world–a common predicament for once-elite players who don’t immediately return to winning ways.

Elo has the opposite problem with players who miss a lot of time due to injury. When a player doesn’t compete, Elo assumes his level doesn’t change. That’s clearly wrong, and it has cast a lot of doubt over del Potro’s place in the Elo rankings this season. The more matches he plays, the more his rating will reflect his current ability, but his #10 position in the pre-Olympics Elo rankings seemed overly influenced by his former greatness.

(A more sophisticated Elo-based system, Glicko, was created in part to improve ratings for competitors with few recent results. I’ve tinkered with Glicko quite a bit in hopes of more accurately measuring the current levels of players like Delpo, but so far, the system as a whole hasn’t come close to matching Elo’s accuracy while also addressing the problem of long layoffs. For what it’s worth, Glicko ranked del Potro around #16 before the Olympics.)

Del Potro’s success in Rio boosted him three places in the Elo rankings, up to #7. While that still owes something to the lingering influence of his pre-injury results, it’s the first time his post-injury Elo rating comes close to passing the smell test.

You can see the full current lists elsewhere on the site: here are ATP Elo ratings and WTA Elo ratings.

Any rating system is only as good as the assumptions and data that go into it. The official ATP and WTA ranking systems have long suffered from improvised assumptions and conflicting goals. When an important event like the Olympics is excluded altogether, the data is incomplete as well. Now as much as ever, Elo shines as an alternative method. In addition to a more predictive algorithm, Elo can give Rio results the weight they deserve.

The Grass is Slowing: Another Look at Surface Speed Convergence

Italian translation at settesei.it

A few years ago, I posted one of my most-read and most-debated articles, called The Mirage of Surface Speed Convergence.  Using the ATP’s data on ace rates and breaks of serve going back to 1991, it argued that surface speeds aren’t really converging, at least to the extent we can measure them with those two tools.

One of the most frequent complaints was that I was looking at the wrong data–surface speed should really be quantified by rally length, spin rate, or any number of other things. As is so often the case with tennis analytics, we have only so much choice in the matter. At the time, I was using all the data that existed.

Thanks to the Match Charting Project–with a particular tip of the cap to Edo Salvati–a lot more data is available now. We have shot-by-shot stats for 223 Grand Slam finals, including over three-fourths of Slam finals back to 1980. While we’ll never be able to measure anything like ITF Court Pace Rating for surfaces thirty years in the past, this shot-by-shot data allows us to get closer to the truth of the matter.

Sure enough, when we take a look at a simple (but until recently, unavailable) metric such as rally length, we find that the sport’s major surfaces are playing a lot more similarly than they used to. The first graph shows a five-year rolling average* for the rally length in the men’s finals of each Grand Slam from 1985 to 2015:

mens_finals_rallies

* since some matches are missing, the five-year rolling averages each represent the mean of anywhere from two to five Slam finals.

Over the last decade and a half, the hard-court and grass-court slams have crept steadily upward, with average rally lengths now similar to those at Roland Garros, traditionally the slowest of the four Grand Slam surfaces. The movement is most dramatic in the Wimbledon grass, which for many years saw an average rally length of a mere two shots.

For all the advantages of rally length and shot-by-shot data, there’s one massive limitation to this analysis: It doesn’t control for player. (My older analysis, with more limited data per match, but for many more matches, was able to control for player.) Pete Sampras contributed to 15 of our data points, but none on clay. Andres Gomez makes an appearance, but only at Roland Garros. Until we have shot-by-shot data on multiple surfaces for more of these players, there’s not much we can do to control for this severe case of selection bias.

So we’re left with something of a chicken-and-egg problem.  Back in the early 90’s, when Roland Garros finals averaged almost six shots per point and Wimbledon finals averaged barely two shots per point, how much of the difference was due to the surface itself, and how much to the fact that certain players reached the final? The surface itself certainly doesn’t account for everything–in 1988, Mats Wilander and Ivan Lendl averaged over seven shots per point at the US Open, and in 2002, David Nalbandian and Lleyton Hewitt topped 5.5 shots per point at Wimbledon.

Still, outliers and selection bias aside, the rally length convergence we see in the graph above reflects a real phenomenon, even if it is amplified by the bias. After all, players who prefer short points win more matches on grass because grass lends itself to short points, and in an earlier era, “short points” meant something more extreme than it does today.

The same graph for women’s Grand Slam finals shows some convergence, though not as much:

womens_finals_rallies

Part of the reason that the convergence is more muted is that there’s less selection bias. The all-surface dominance of a few players–Chris Evert, Martina Navratilova, and Steffi Graf–means that, if only by historical accident, there is less bias than in men’s finals.

We still need a lot more data before we can make confident statements about surface speeds in 20th-century tennis. (You can help us get there by charting some matches!) But as we gather more information, we’re able to better illustrate how the surfaces have become less unique over the years.

How Much Is a Challenge Worth?

Italian translation at settesei.it

When the Hawkeye line-calling system is available, tennis players are given the right to make three incorrect challenges per set. As with any situation involving scarcity, there’s a choice to make: Take the chance of getting a call overturned, or make sure to keep your options open for later?

We’ve learned over the last several years that human line-calling is pretty darn good, so players don’t turn to Hawkeye that often. At the Australian Open this year, men challenged fewer than nine calls per match–well under three per set or, put another way, less than 1.5 challenges per player per set. Even at that low rate of fewer than once per thirty points, players are usually wrong. Only about one in three calls are overturned.

So while challenges are technically scarce, they aren’t that scarce.  It’s a rare match in which a player challenges so often and is so frequently incorrect that he runs out. That said, it does happen, and while running out of challenges is low-probability, it’s very high risk. Getting a call overturned at a crucial moment could be the difference between winning and losing a tight match. Most of the time, challenges seem worthless, but in certain circumstances, they can be very valuable indeed.

Just how valuable? That’s what I hope to figure out. To do so, we’ll need to estimate the frequency with which players miss opportunities to overturn line calls because they’ve exhausted their challenges, and we’ll need to calculate the potential impact of failing to overturn those calls.

A few notes before we get any further.  The extra challenge awarded to each player at the beginning of a tiebreak would make the analysis much more daunting, so I’ve ignored both that extra challenge and points played in tiebreaks. I suspect it has little effect on the results. I’ve limited this analysis to the ATP, since men challenge more frequently and get calls overturned more often. And finally, this is a very complex, sprawling subject, so we often have to make simplifying assumptions or plug in educated guesses where data isn’t available.

Running out of challenges

The Australian Open data mentioned above is typical for ATP challenges. It is very similar to a subset of Match Charting Project data, suggesting that both challenge frequency and accuracy are about the same across the tour as they are in Melbourne.

Let’s assume that each player challenges a call roughly once every sixty points, or 1.7%. Given an approximate success rate of 30%, each player makes an incorrect challenge on about 1.2% of points and a correct challenge on 0.5% of points. Later on, I’ll introduce a different set of assumptions so we can see what different parameters do to the results.

Running out of challenges isn’t in itself a problem. We’re interested in scenarios when a player not only exhausts his challenges, but when he also misses an opportunity to overturn a call later in the set. These situations are much less common than all of those in which a player might want to contest a call, but we don’t care about the 70% of those challenges that would be wrong, as they wouldn’t have any effect on the outcome of the match.

For each possible set length, from 24-point golden sets up to 93-point marathons, I ran a Monte Carlo simulation, using the assumptions given above, to determine the probability that, in a set of that length, a player would miss a chance to overturn a later call. As noted above, I’ve excluded tiebreaks from this analysis, so I counted only the number of points up to 6-6. I also excluded all “advantage” fifth sets.

For example, the most common set length in the data set is 57 points, which occured 647 times. In 10,000 simulations, a player missed a chance to overturn a call 0.27% of the time. The longer the set, the more likely that challenge scarcity would become an issue. In 10,000 simulations of 85-point sets, players ran out of challenges more than three times as often. In 0.92% of the simulations, a player was unable to challenge a call that would have been overturned.

These simulations are simple, assuming that each point is identical. Of course, players are aware of the cap on challenges, so with only one challenge remaining, they may be less likely to contest a “probably correct” call, and they would be very unlikely to use a challenge to earn a few extra seconds of rest. Further, the fact that players sometimes use Hawkeye for a bit of a break suggests that what we might call “true” challenges–instances in which the player believes the original call was wrong–are a bit less frequent that the numbers we’re using. Ultimately, we can’t address these concerns without a more complex model and quite a bit of data we don’t have.

Back to the results. Taking every possible set length and the results of the simulation for each one, we find the average player is likely to run out of challenges and miss a chance to overturn a call roughly once every 320 sets, or 0.31% of the time. That’s not very often–for almost all players, it’s less than once per season.

The impact of (not) overturning a call

Just because such an outcome is infrequent doesn’t necessarily mean it isn’t important. If a low-probability event has a high enough impact when it does occur, it’s still worth planning for.

Toward the end of a set, when most of these missed chances would occur, points can be very important, like break point at 5-6. But other points are almost meaningless, like 40-0 in just about any game.

To estimate the impact of these missed opportunities, I ran another set of Monte Carlo simulations. (This gets a bit hairy–bear with me.) For each set length, for those cases when a player ran out of challenges, I found the average number of points at which he used his last challenge. Then, for each run of the simulation, I took a random set from the last few years of ATP data with the corresponding number of points, chose a random point between the average time that the challenges ran out and the end of the set, and measured the importance of that point.

To quantify the importance of the point, I calculated three probabilities from the perspective of the player who lost the point and, had he conserved his challenges, could have overturned it:

  1. his odds of winning the set before that point was played
  2. his odds of winning the set after that point was played (and not overturned)
  3. his odds of winning the set had the call been overturned and the point awarded to him.

(To generate these probabilities, I used my win probability code posted here with the assumption that each player wins 65% of his service points. The model treats points as independent–that is, the outcome of one point does not depend on the outcomes of previous points–which is not precisely true, but it’s close, and it makes things immensely more straightforward. Alert readers will also note that I’ve ignored the possibility of yet another call that could be overturned. However, the extremely low probability of that event convinced me to avoid the additional complexity required to model it.)

Given these numbers, we can calculate the possible effects of the challenge he couldn’t make. The difference between (2) and (3) is the effect if the call would’ve been overturned and awarded to him. The difference between (1) and (2) is the effect if the point would have been replayed. This is essentially the same concept as “leverage index” in baseball analytics.

Again, we’re missing some data–I have no idea what percentage of overturned calls result in each of those two outcomes. For today, we’ll say it’s half and half, so to boil down the effect of the missed challenge to a single number, we’ll average those two differences.

For example, let’s say we’re at five games all, and the returner wins the first point of the 11th game. The server’s odds of winning the set have decreased from 50% (at 5-all, love-all) to 43.0%. If the server got the call overturned and was awarded the point, his odds would increase to 53.8%. Thus, the win probability impact of overturning the call and taking the point is 10.8%, while the effect of forcing a replay is 7.0%. For the purposes of this simulation, we’re averaging these two numbers and using 8.9% as the win probability impact of this missed opportunity to challenge.

Back to the big picture. For each set length, I ran 1,000 simulations like what I’ve described above and averaged the results. In short sets under 40 points, the win probability impact of the missed challenge is less than five percentage points. The longer the set, the bigger the effect: Long sets are typically closer and the points tend to be higher-leverage. In 85-point sets, for instance, the average effect of the missed challenge is a whopping 20 percentage points–meaning that if a player more skillfully conserved his challenges in five such sets, he’d be able to reverse the outcome of one of them.

On average, the win probability effect of the missed challenge is 12.4 percentage points. In other words, better challenge management would win a player one more set for every eight times he didn’t lose such an opportunity by squandering his challenges.

The (small) big picture

Let’s put together the two findings. Based on our assumptions, players run out of challenges and forgo a chance to overturn a later call about once every 320 matches. We now know that the cost of such a mistake is, on average, a 12.4 percentage point win probability hit.

Thus, challenge management costs an average player one set out of every 2600. Given that many matches are played on clay or on courts without Hawkeye, that’s maybe once in a career. As long as the assumptions I’ve used are in the right ballpark, the effect isn’t even worth talking about. The mental cost of a player thinking more carefully before challenging might be greater than this exceedingly unlikely benefit.

What if some of the assumptions are wrong? Anecdotally, it seems like challenges cluster in certain matches, because of poor officiating, bad lighting, extreme spin, precise hitting, or some combination of these. It seems possible that certain scenarios would arise in which a player would want to challenge much more frequently, and even though he might gain some accuracy, he would still increase the risk.

I ran the same algorithms for what seems to me to be an extreme case, almost doubling the frequency with which each player challenges, to 3.0%, and somewhat increasing the accuracy rate, to 40%.

With these parameters, a player would run out of challenges and miss an opportunity to overturn a call about six times more often–once every 54 sets, or 1.8% of the time. The impact of each of these missed opportunities doesn’t change, so the overall result also increases by a factor of six. In these extreme case, poor challenge management would cost a player the set 0.28% of the time, or once every 356 sets. That’s a less outrageous number, representing perhaps one set every second year, but it also applies to unusual sets of circumstances which are very unlikely to follow a player to every match.

It seems clear that three challenges is enough. Even in long sets, players usually don’t run out, and when they do, it’s rare that they miss an opportunity that a fourth challenge would have afforded them. The effect of a missed chance can be enormous, but they are so infrequent that players would see little or no benefit from tactically conserving challenges.

Two New Ways to Chart Tennis Matches

Readers of this site are probably already aware of the Match Charting Project, my effort to coordinate volunteer contributions to build a massive shot-by-shot database of professional tennis. If this is the first you’ve heard of it, I encourage you to check out the detailed match- and player-level data we’ve gathered already.

In the last week, two developers have released GUIs to make charting easier and more engaging. When I first started the project, I put together an excel spreadsheet that tracks all the user input and keeps score. I’ve used that spreadsheet for the hundreds of matches I’ve charted, but I recognize that it’s not the most intuitive system for some people.

The first new interface is thanks to Stephanie Kovalchik, who writes the tennis blog On the T. (And who has contributed to the MCP in the past.) Her GUI is entirely click-based, which means you don’t have to learn the various letter- and number-codes that are required for the traditional MCP spreadsheet.

skoval

While it’s web-based, it has some of the look and feel of a modern handheld app. It’s probably the easiest way to get started contributing to the project.

(Which reminds me, Brian Hrebec wrote an Android app for the project almost two years ago, and I haven’t given it the attention it deserves. It also makes getting started relatively easy, especially if you’d like to chart on an Android device. [Update, December 2019: Unfortunately, it appears this app is no longer available.])

[Update, December 2019: The other GUI referred to in the title has bugs that render the output unusable. I recommend sticking with the standard MCP spreadsheet, or with Stephanie’s GUI described above.]

With four ways to chart matches and add to the Match Charting Project database, there are even fewer excuses not to contribute. If you’re still not convinced, I have even more reasons for you to consider. And if you’re ready to jump in, just click over to one of the new GUIs, or click here for my Quick Start guide.

New at TennisAbstract: Weekly Elo Reports

Starting today, you can find weekly Elo ranking reports on the home page of Tennis Abstract. Here are the men’s ratings, and here are the women’s ratings.

Elo is a rating system originally designed for chess, and now used across a wide range of sports. It awards points based on who you beat, not when you beat them. That’s in direct contrast to the official ATP and WTA ranking systems, which award points based on tournament and round, regardless of whether you play a qualifier or the number one player in the world.

As such, there are some notable differences between Elo-based rankings and the official lists. In addition to some rearrangement in the top ten, ATP Elo ratings place last week’s champion Roberto Bautista Agut up at #12 (compared to #17 in the official ranking) and Jack Sock at #13 (instead of #23).

The shuffling is even more dramatic on the women’s side. Belinda Bencic, still outside the top ten in the official WTA ranking, is up to #5 by Elo. After her Fed Cup heroics last weekend, Bencic is a single Elo point away from drawing equal with #4 Angelique Kerber.

These new Elo reports also show peaks for every player. That way, you can see how close each player is to his or her career best. You can also spot which players–like Bencic and Bautista Agut–are currently at their peak.

Like any rating system, Elo isn’t perfect. In this simple form, it doesn’t consider surface at all. I haven’t factored Challenger, ITF, or qualifying results into these calculations, either. Elo also doesn’t make any adjustments when a player misses considerable time to injury; a player just re-assumes his or her old rating when they return.

That said, Elo is a more reliable way of comparing players and predicting match outcomes than the official ranking system. And now, you can check in on each player’s rating every week.

What Happens After an Unsuccessful First Serve Challenge?

Italian translation at settesei.it

A lot of first serves miss, so every player has a well-established routine between the first and second serve. So much so that, traditionally, if something disrupts that routine, the receiver may grant the server another first serve.

Hawkeye has changed all that. If the server doubts the line call, he or she may challenge it. That results in a lengthy wait, usually some crowd noise, and a general wreckage of that between-serves routine.

The conventional wisdom seems to be that the long pause is harmful to the server: that if the challenge fails, the server is less likely to put the second serve in the box. And if the second serve does go in, it’s weaker than average, so the server is less likely to win the point.

My analysis of over 200 first-serve challenges casts doubt on the conventional wisdom. It’s another triumph for the null hypothesis, the only force in tennis as dominant as Novak Djokovic.

As I’ve charted matches for the Match Charting Project, I’ve noted each challenge, the type of challenge, and whether it was successful. I’ve accumulated 116 ATP and 89 WTA instances in which a player unsuccessfully challenged the call on his own first serve. For each of these challenges, I also calculated some match-level stats for that server: how often s/he made the second serve, and how often s/he won second serve points.

Of the 116 unsuccessful ATP challenges, players made 106 of their second serves. Based on their overall rates in those matches, we’d expect them to make 106.6 of them. They won exactly half–58–of those points, and their performance in those matches suggests that they “should” have won 58.2 of them.

In other words, players are recovering from the disruption and performing almost exactly as they normally do.

For WTAers, it’s a similar story. Players made 77 of their 89 second serves. If they landed second serves at the same rate they did in the rest of those matches, they’d have made 77.1. They won 38 of the 89 points, compared to an expected 40 points. That last difference, of five percent, is the only one that is more than a rounding error. Even if the effect is real–which is doubtful, given the conflicting ATP number and the small sample size–it’s a small one.

Of course, the potential benefit of challenging the call on your first serve is big: If you’re right, you either win the point or get another first serve. Of the challenges I’ve tracked, men were successful 38% of the time on their first serves, and women were right 32% of the time.

There’s no evidence here that players are harmed by appealing to Hawkeye on their own first serves. Apart from the small risk of running out of challenges, it’s all upside. Tennis pros adore routine, but in this case, they perform just as well when the routine is disrupted.

First and Second Serves: Another ATP Info-miss

Breaking news, everybody: First serves are better than second serves!

That’s what I learned, anyway, from the latest article in the “Infosys ATP Beyond the Numbers” series:

When you average out the Top 10 players in the 2015 season, they are saving break points 72 per cent of the time when making a first serve. On average, that drops to 53 per cent with second serves. That 19 per cent difference is one of the most important, hidden metrics in our sport.

Is the difference between first and second serves “important?” Definitely. Is it in any way “hidden?” Not so much.

The melodramatic phrasing here suggests that break points are different from regular points, perhaps with a much larger spread between first and second serve winning percentages. But no, that’s not the case.

Last year, top ten players won 75.6% of first-serve points and 55.4% of second-serve points. Combined with the Infosys numbers–which I can’t verify, because the ATP doesn’t make the necessary raw data available–that means that top ten players win 5% less often when making a first serve on break point, and 5% less often when missing their first serve on break point.

At the risk of belaboring this: When it comes to the importance of making your first serve, break points are no different than other points.

Even that 5% difference is less meaningful that it looks. Break points don’t occur at random–better opponents generate more break opportunities. If you play two matches, one against Novak Djokovic and one against Jerzy Janowicz, you’re likely to face far more break points against Novak than against Jerzy … and of course, you’re less likely to win them.

Pundits tend to focus on break points, and in part, they are right to do so, because this small subset of points have an outsized effect on match outcomes. However, because of the small sample, it’s easy–and far too common–to read too much into break point results. My research has repeatedly shown that, once you control for opponent quality, most players win break points about as often as they do non-break points.

The ATP is sitting on a wealth of information. If we’re going to learn anything meaningful when they go “beyond the numbers,” it would be nice if they took advantage of more of their data and offered up more sophisticated analysis.

Match Charting Project February Update

At the beginning of the year, I announced an ambitious goal: to double the number of matches in the Match Charting Project dataset. That’s a target of 1,617 new matches in 2016–about 135 per month, or 4.5 per day.

So far, so good! In January, ten contributors combined to add 162 new matches to the total. Our biggest heroes were Edo, with 35 matches, including many Grand Slam finals; Isaac, with 33; and Edged, whose 22 included some of the dramatic late-round men’s matches from Melbourne.

As we close in on the 1,800-match mark, I’m excited to announce a new addition to the stats and reports available on Tennis Abstract. Now, for every player with at least two charted matches in the database, there’s a dedicated player page with hundreds of aggregate data points for that player.

Here’s Novak Djokovic’s page, and here’s Angelique Kerber’s. I’m still working on integrating these pages into the rest of Tennis Abstract, but for now, you’ll be able to access them by clicking on the match totals next to every player’s name on the Match Charting home page.

These pages each feature four charts, which compare the player’s typical rally length, shot selection, winner types, and unforced error types to tour average. The other links on each page take you to tables very similar to those on the MCP match reports. Move your cursor over any rate to see the relevant tour average, as well as that player’s rates on each surface.

I hope you like this new addition, which owes so much to the amazing efforts of so many volunteer charters.

I hope, too, that you’ll be inspired to contribute to the project as well. When you’re ready to try your hand at charting, start here. As always, the more matches we have, the more valuable the project becomes.

Is Milos Raonic’s Return Game Improving?

It’s no secret that Milos Raonic‘s return game is a liability. He has reached the game’s elite level with a dominant serve, and he broke into the top five on the strength of a historically-great record in tiebreaks.

Last year, Raonic’s tiebreak record fell back to earth (as these things usually do) and he dropped out of the top ten. Now, in a new season with a new coach, Carlos Moya, Raonic reeled off nine straight victories, finally losing in five sets to Andy Murray in today’s Australian Open semifinal.

Until today’s match, when Raonic won a dismal 25% of return points, the numbers were looking good. Milos won 36.5% of return points in his four matches in Brisbane, which is a little bit better than the 35% tour average on hard courts. With his serve, he doesn’t need to be a great returner; simply improving that aspect of his game to average would make him a dominant force on tour.

This is a crucial number to watch, because it could be the difference between Milos becoming number one in the world and Milos languishing in the back half of the top ten. It’s incredibly rare that players with weak return games are able to maintain a position at the very top of the rankings.

Through the quarterfinals in Melbourne, the positive signs kept piling up. For each of his 2016 opponents, I tallied their 2015 service points won on hard courts. In 6 of 10 matches this month, Milos kept their number below their 2015 average. In a 7th match, against Gael Monfils, he was one return point away from doing the same.

By comparison, in 2015, Raonic held hard-court opponents to their average rate of service points won only 9 times in 35 tries. Even in his career-best season of 2014, he did so in only 15 of 41 matches. Even with the weak return numbers against Murray, this is Raonic’s best ever 10-match stretch, by this metric.

The difference is more dramatic when we combine all these single-match measurements into a single metric per season. For each match, I calculated how well Milos returned relative to an average player against his opponent that day. For example, against Murray today, he won 25% of return points compared to an average hard-court Murray opponent’s 33.7%. In percentage terms, Raonic returned 26% worse than average.

Aggregating all of his 2016 matches, Raonic has returned 6% better than average. In 2015 hard-court matches, he was 10% below average; in 2014, 3% below average, and in 2013, 7% below average.

A nine-match stretch of good form is hardly proof that a player has massively improved half of his game, but it’s certainly encouraging. While all know that Milos is an elite server, it’s his return game that will determine how great he becomes.

How Dangerous Is It To Fix a Single Service Game?

Italian translation at settesei.it

Earlier this week, I offered a rough outline of the economics of fixing tennis matches, calculating the expected prize money that players forgo at various levels when they lose on purpose. The vast gulf between prize money, especially at lower-level events, and fixing fees suggests that gamblers must pay high premiums to convince players to do something ethically repugnant and fraught with risk.

So much for match-level fixes. What about single service games? In Ben Rothenberg’s recent report, a shadowy insider offers the following data points:

Buying a service break at a Futures event cost $300 to $500, he said. A set was $1,000 to $2,000, and a match was $2,000 to $3,000.

In other words, a service break is valued at between 10% and 25% the cost of an entire match. The article doesn’t mention service-break prices at higher levels, so we’ll have to use the Futures numbers as our reference point.

Selling a service break might be a way to have your cake and eat it too, taking some cash from gamblers while retaining the chance to advance in the draw and earn ranking points. But it won’t always work out that way.

I ran some simulations to see how much a service break should cost, based on the simplifying assumption that prices correspond to chances of winning and, by extension, forgone prize money. It turns out that the range of 10% to 25% is exactly right.

Let’s start with the simplest scenario: Two equal men with middle-of-the-road serves, which win them 63% of service points. In an honest match, these two would each have a 50% chance of winning. If one of them guarantees a break in his second service game, he is effectively lowering his chances of winning the match to 38.5%. dropping his expected prize money for the tournament by 23%.

If our players have weaker serves, for instance each winning 55% of service points, the fixer’s chances of winning the match fall to about 42%, only a 16% haircut. With stronger serves, using the extreme case of 70% of points going the way of the server, the fixer’s chances drop to 34%, a loss of 32% in his expected prize money.

This last scenario–two equal players with big serves–is the one that confers the most value on a single service break. We can use that 32% sacrifice as an upper bound for the worth of a single fixed break.

Fixed contests have more value to gamblers when the better player is guaranteed to lose, and in those cases, a service break doesn’t have as much impact on the outcome of the match. If the fixer is considerably better than his opponent, he was probably going to break serve a few times more than his opponent would, so losing a single game is less likely to determine the outcome of the match.

Let’s take a few examples:

  • If one player wins 64% of service points and other wins 62%, the favorite has a 60% chance of winning. If he fixes one service break, his chances of winning fall to just below 48%, about a 20% drop in expected prize money.
  • When one player wins 65% of service points against an opponent winning 61%, his chances in an honest match are 69.3%. Giving up one fixed service break, his odds fall to 57.4%, a sacrifice of roughly 17%.
  • A 67% server facing a 60% server has an 80.8% chance of winning. With one fixed service break, that drops to 70.7%, a loss of 12.5%.
  • A huge favorite winning 68% of service points against his opponent’s 58% has an 89.5% chance of advancing to the next round. Guarantee a break in one of his service games, and his odds drop to 82%, a loss of 8.4%.

With the exception of very lopsided matches (for which there might not be as many betting markets), we have our lower bound, not far below 10%.

The average Futures first-rounder, if we can generalize from such a mixed bag of matches, is somewhere in the middle of those examples–not an even contest, but without a heavy favorite. So the typical value of a fixed service break is between about 12% and 20% of the value of the match, right in the middle of the range of estimates given by Rothenberg’s source.

Even in this hidden, illegal marketplace, the numbers we’ve seen so far suggest that both gamblers and players act reasonably rationally. Amid a sea of bad news, that’s a good sign for tennis’s governing bodies: It promises that players will respond in a predictable manner to changing incentives. Unfortunately, it remains to be seen whether the incentives will change.