Is Milos Raonic’s Return Game Improving?

It’s no secret that Milos Raonic‘s return game is a liability. He has reached the game’s elite level with a dominant serve, and he broke into the top five on the strength of a historically-great record in tiebreaks.

Last year, Raonic’s tiebreak record fell back to earth (as these things usually do) and he dropped out of the top ten. Now, in a new season with a new coach, Carlos Moya, Raonic reeled off nine straight victories, finally losing in five sets to Andy Murray in today’s Australian Open semifinal.

Until today’s match, when Raonic won a dismal 25% of return points, the numbers were looking good. Milos won 36.5% of return points in his four matches in Brisbane, which is a little bit better than the 35% tour average on hard courts. With his serve, he doesn’t need to be a great returner; simply improving that aspect of his game to average would make him a dominant force on tour.

This is a crucial number to watch, because it could be the difference between Milos becoming number one in the world and Milos languishing in the back half of the top ten. It’s incredibly rare that players with weak return games are able to maintain a position at the very top of the rankings.

Through the quarterfinals in Melbourne, the positive signs kept piling up. For each of his 2016 opponents, I tallied their 2015 service points won on hard courts. In 6 of 10 matches this month, Milos kept their number below their 2015 average. In a 7th match, against Gael Monfils, he was one return point away from doing the same.

By comparison, in 2015, Raonic held hard-court opponents to their average rate of service points won only 9 times in 35 tries. Even in his career-best season of 2014, he did so in only 15 of 41 matches. Even with the weak return numbers against Murray, this is Raonic’s best ever 10-match stretch, by this metric.

The difference is more dramatic when we combine all these single-match measurements into a single metric per season. For each match, I calculated how well Milos returned relative to an average player against his opponent that day. For example, against Murray today, he won 25% of return points compared to an average hard-court Murray opponent’s 33.7%. In percentage terms, Raonic returned 26% worse than average.

Aggregating all of his 2016 matches, Raonic has returned 6% better than average. In 2015 hard-court matches, he was 10% below average; in 2014, 3% below average, and in 2013, 7% below average.

A nine-match stretch of good form is hardly proof that a player has massively improved half of his game, but it’s certainly encouraging. While all know that Milos is an elite server, it’s his return game that will determine how great he becomes.

How Dangerous Is It To Fix a Single Service Game?

Earlier this week, I offered a rough outline of the economics of fixing tennis matches, calculating the expected prize money that players forgo at various levels when they lose on purpose. The vast gulf between prize money, especially at lower-level events, and fixing fees suggests that gamblers must pay high premiums to convince players to do something ethically repugnant and fraught with risk.

So much for match-level fixes. What about single service games? In Ben Rothenberg’s recent report, a shadowy insider offers the following data points:

Buying a service break at a Futures event cost $300 to $500, he said. A set was $1,000 to $2,000, and a match was $2,000 to $3,000.

In other words, a service break is valued at between 10% and 25% the cost of an entire match. The article doesn’t mention service-break prices at higher levels, so we’ll have to use the Futures numbers as our reference point.

Selling a service break might be a way to have your cake and eat it too, taking some cash from gamblers while retaining the chance to advance in the draw and earn ranking points. But it won’t always work out that way.

I ran some simulations to see how much a service break should cost, based on the simplifying assumption that prices correspond to chances of winning and, by extension, forgone prize money. It turns out that the range of 10% to 25% is exactly right.

Let’s start with the simplest scenario: Two equal men with middle-of-the-road serves, which win them 63% of service points. In an honest match, these two would each have a 50% chance of winning. If one of them guarantees a break in his second service game, he is effectively lowering his chances of winning the match to 38.5%. dropping his expected prize money for the tournament by 23%.

If our players have weaker serves, for instance each winning 55% of service points, the fixer’s chances of winning the match fall to about 42%, only a 16% haircut. With stronger serves, using the extreme case of 70% of points going the way of the server, the fixer’s chances drop to 34%, a loss of 32% in his expected prize money.

This last scenario–two equal players with big serves–is the one that confers the most value on a single service break. We can use that 32% sacrifice as an upper bound for the worth of a single fixed break.

Fixed contests have more value to gamblers when the better player is guaranteed to lose, and in those cases, a service break doesn’t have as much impact on the outcome of the match. If the fixer is considerably better than his opponent, he was probably going to break serve a few times more than his opponent would, so losing a single game is less likely to determine the outcome of the match.

Let’s take a few examples:

  • If one player wins 64% of service points and other wins 62%, the favorite has a 60% chance of winning. If he fixes one service break, his chances of winning fall to just below 48%, about a 20% drop in expected prize money.
  • When one player wins 65% of service points against an opponent winning 61%, his chances in an honest match are 69.3%. Giving up one fixed service break, his odds fall to 57.4%, a sacrifice of roughly 17%.
  • A 67% server facing a 60% server has an 80.8% chance of winning. With one fixed service break, that drops to 70.7%, a loss of 12.5%.
  • A huge favorite winning 68% of service points against his opponent’s 58% has an 89.5% chance of advancing to the next round. Guarantee a break in one of his service games, and his odds drop to 82%, a loss of 8.4%.

With the exception of very lopsided matches (for which there might not be as many betting markets), we have our lower bound, not far below 10%.

The average Futures first-rounder, if we can generalize from such a mixed bag of matches, is somewhere in the middle of those examples–not an even contest, but without a heavy favorite. So the typical value of a fixed service break is between about 12% and 20% of the value of the match, right in the middle of the range of estimates given by Rothenberg’s source.

Even in this hidden, illegal marketplace, the numbers we’ve seen so far suggest that both gamblers and players act reasonably rationally. Amid a sea of bad news, that’s a good sign for tennis’s governing bodies: It promises that players will respond in a predictable manner to changing incentives. Unfortunately, it remains to be seen whether the incentives will change.

The Weirdest Thing About David Marrero’s Suspicious Mixed Doubles Match

You’ve probably seen the news: There was suspicious betting activity on a mixed doubles match a few days ago, hinting that some bettors knew ahead of time that David Marrero and Lara Arruabarena were going to lose to Andrea Hlavackova and Lukasz Kubot.

I don’t know whether it was a fix, or if someone leaked information, or if it was a publicity stunt by Pinnacle, who reported the suspicious activity. I don’t really care. Instead, what stuck out to me was this odd claim from Marrero, as reported by the Times:

“Normally, when I play, I play full power, in doubles or singles,” said Marrero, who won the doubles title at the 2013 ATP World Tour Finals. “But when I see the lady in front of me, I feel my hand wants to play, but my head says, ‘Be careful.’ This is not a good combination.”

As the Times also points out, Marrero’s record in mixed doubles is abysmal: 7-21 (with nine different partners), including 10 consecutive losses. He has, at times, ranked among the best doubles players in the world, yet managed to lose mixed matches alongside other greats, such as Hlavackova and Sara Errani. In six matches with Arantxa Parra-Santonja, a doubles specialist with eight tour-level titles, he’s lost the lot.

Assuming Marrero isn’t regularly fixing Grand Slam mixed doubles matches–after all, fixing a match this week would be awfully dumb–it’s clear that he’s not very good in this format. Here’s the weird thing: Before this mini-scandal, nobody was paying any attention.

Yeah, of course, it’s mixed doubles, which is little more than a glorified exhibition. Tennis isn’t great when it comes to statkeeping, and there’s virtually no one paying attention to doubles stats. The situation with mixed doubles is even worse. But if singles player had a losing streak of 10 of just about anything, fans would know about it, and people would be watching closely.

Given the nature of the mixed doubles event–specialists frequently switch partners, and the format includes a super-tiebreak in place of a third set–we wouldn’t expect too many extremes. In fact, of the 36 players who have contested at least 15 mixed matches since 2009 (28 slams plus the 2012 Olympics), only Leander Paes, with a 63-21 record, has been as good as Marrero has been bad. No one else has won more than 70% of their mixed matches.

And since mixed doubles draws are full of non-specialists (like Naomi Broady and Neal Skupski, who beat Marrero and Parra-Santonja at Wimbledon in 2014) we would expect the specialists to perform better than average. Sure enough, of those 36 regulars, 25 have winning percentages of 50% or better, and all but four have won at least 43% of matches. Only Marrero and Raquel Atawo (formerly Kops-Jones) hold winning percentages below 36%.

Let’s say we give Marrero the benefit of the doubt–as far the fixing goes, anyway–and accept his claim that he’s uncomfortable playing when there’s a woman across the net. It’s a strange state of affairs when (a) he continues playing almost every possible mixed doubles event despite his discomfort; (b) women choose to partner with him, either ignorant of his discomfort or simply happy to get into the draw; and (c) it’s possible to play 21 Grand Slams before the public gets any inkling that one of the 64 players in the mixed draw has a fundamental issue playing normally on the mixed doubles court.

Such comprehensive, long-standing ignorance isn’t out of place in tennis, especially in doubles. But given what we now know about David Marrero, the suspicious betting activity isn’t the influx of money against him–it’s the fact that anyone ever put money on him to win a mixed doubles match.

The Cost of Fixing a Tennis Match

In the last week, we’ve seen an enormous amount of speculation about match fixing in men’s tennis. There are plenty of signs that players are fixing matches and some alarming evidence that the sport’s governing bodies aren’t taking action.

A few numbers have popped up in interviews with insiders: A player might be offered $50,000 or $60,000 to fix a match. Back when Novak Djokovic was apparently asked (indirectly) to lose on purpose, the price is said to have been $200,000.

Those of you who know your tennis economics know that those are huge sums relative to what most players can expect to earn on tour. At most ATP 250s, only the titlist takes home a check worth more than $50,000. Even at Masters-level events, a player has to reach the quarterfinals to earn that much.

Most of the players whose names turn up in fixing allegations are rarely reaching those heights. Instead, they’re playing a first-rounder at a 250 for the chance to earn an $3,000 or $4,000. Even when giving a full effort, these players are rarely heavy favorites, so they might only have a 50% chance of winning that second-round money in matches they don’t fix.

It’s clear that the vast majority of ATP matches are worth much less to the players–in prize money terms–than they are to gamblers. If press reports are to be believed, gamblers offer payments to many more players than ever accept them, so the gamblers presumably have a reasonably good idea of what it costs to fix a match.

If we can pin down the value of a match played fairly, we can get an idea of the sort of multiple that gamblers must pay to fix a match. Of course, the multiple won’t be the same for every player, or in every situation, but it will shed some additional light on the situation.

I’ve attempted to quantify the prize-money value of every match in the 2015 ATP and ATP Challenger seasons for both players. For each match, using the same general methodology as my tournament forecasts, I generated the probability that each player would reach the next round and every round beyond that. When combined with the additional prize money a player earns for reaching each successive round, these probabilities give us an “expected value” for each match. That’s our best guess of what a player forgoes if he opts to fix a match.

As an example, consider Fernando Verdasco at last year’s ATP 250 in Metz. In the first round, my model gave him a 53% chance of defeating Alexander Zverev. Based on the entire draw, it estimated Verdasco’s chances of reaching each successive round, up to a 2.9% chance of winning the title. Run the numbers, and his expected value for that match was $3,855–incidentally, the lowest expected value of any of his matches last year.

Once Verdasco advanced to the second round and was guaranteed second-round prize money, he had a 54% chance of getting past Gilles Muller, and an expected value of $6,239 for playing that match honestly. Overall, the median expected value of a Verdasco match last year was roughly $9,500, and the highest expected value of any match–his US Open first-rounder–was just over $45,000.

As it turns out, that $9,500 median is very close to the median for all ATP tour-level matches, $9,667. Put another way, well over half of last year’s ATP matches–had an expected value under $10,000–20% of what gamblers are apparently willing to pay to fix a match.

The value of not fixing is even lower for many substantial subsets of tour matches. The median expected value of a first-round match–including the lucrative ones at the majors–is only $6,200. The median expected value of a first-round match at a 250- or 500-level event is a mere $4,200.

Consider the case of Andrey Golubev, a player who has turned up in fixing allegations in the past, and appeared among the players flagged by Buzzfeed’s recent investigation. The median expected value of his 13 tour-level matches last year was $3,450; all but three were under $5,100.

So far, we’ve only discussed tour-level matches–and for players, that’s where the real money is. Most fixing allegations these days pertain to the Challenger level and below, where the entire purse of some events is equal to what a single first-round loser will receive this fortnight in Melbourne.

Golubev played the majority of last season on the Challenger tour, so we can see what it was worth to him to play honestly at the lower level. In 21 matches, his median expected prize money was a whopping $692, and 15 of the 21 matches had expected values under $1,000.

Want another example? Take Denys Molchanov, who has also been identified as a possible fixer. In 27 Challenger matches last year, his median expected prize money was $612, 23 of the 27 matches had expected values under $1,000, and no match he played was, by this measure, worth more than $1,200.

Altogether, the median expected value of a Challenger match last year was $514, and almost 80% of matches had expected prize money values of less than $1,000. The betting volume on Challenger matches is generally lower, so the value of fixing these matches is also lower, but the prize money discrepancy is at least as great.

Without knowing more about how much players have been paid to fix matches–and adjusting for the possibility that they are sometimes injured, with correspondingly lower chances of winning, when they do so–we’ll never be able to establish a precise relationship between expected prize money values and fixing fees. That said, the available anecdotal evidence, combined with my analysis here, can give us a rough idea.

If we take $50,000 as a typical payment to fix an anonymous early-round ATP match, that’s probably about 10 times the expected value of that match to the player who fixes it. It’s less clear, though, whether that extra $45,000 should be understood as a 10x multiple of expected value, or a fee to offset the very high cost–albeit one with very low probability–of the player suffering a long-term suspension, banishment from the game, or even legal penalties.

Further, the expected value of a match is more than just its prize money. When a player wins matches, he gains more ranking points, and those points can get him entry into higher-level (and higher-paycheck) events and earn him seedings at the events he’s already playing. It’s very difficult to assign a money value to ranking points, especially since they are non-linear: The 50 points that push you past the Grand Slam direct entry threshold are enormously valuable, but the 50 points that move you from #41 to #39 are usually worthless.

For both of these reasons, we can’t explain exactly how fixing fees are established, or–more importantly–how fixing might be affected by increased prize money. After all, plenty of people have been clamoring for years for more prize money at the lower levels of the game, and when we see players apparently losing matches on purpose in order to finance their life on tour, that gives more ammunition to their cause.

However, based on the numbers we’ve seen so far, it’s far from obvious that increasing prize money–whether at the Challenger level or in early rounds of 250s–would do much to solve the problem.

Consider a radical move such as doubling prize money at all Challengers. That would cost about $8,000,000 per season, and it would make tennis a much more viable option for the many players who don’t receive substantial support from their national federations. But would it put a dent in match-fixing?

Simply doubling the numbers above, we find that almost half of all Challenger matches would still be worth $1,000 or less, with almost 80% under $2,000. If some players will fix matches for ten times the expected prize money, that means we still have thousands of matches that are “fixable” for $10,000. Perhaps an increase in overall prize money would mean that more players would refuse to fix on moral grounds. But financially, even an expensive solution like this one is very unlikely to eliminate fixing.

If recent reports are to be believed, the governing bodies of tennis are doing very little to stop match fixing. Spending that same $8,000,000 to massively increase the size of the Tennis Integrity Unit (assuming that it uses that money wisely and acts on its findings) would probably have a much greater effect than the same amount spent to increase prize money.

If enforcement is more effective, the risk that a player takes each time he fixes a match is that much greater. Increasing that risk would require that gamblers pay a higher multiple (or a higher “risk of banishment” flat fee) for every match, not just those for which prize money is higher. It wouldn’t prevent every corrupt player from entering the sport, and it wouldn’t address non-financial issues like situations in which a player’s family is threatened, but it would make fixing matches a lot more expensive.

Comparing fixing fees to the prize money expectations I’ve generated, it’s clear that the players who do fix matches–themselves already in the minority–charge a very high premium for doing so. They are very concerned about getting caught, or they would simply rather not fix matches. The problem can’t be solved by any feasible boost in prize money, but it can be mitigated by increasing the premium that gamblers have to pay.

The Tennis Data Storytelling Challenge

Want to show off your data analysis and visualization skills, and dig into some tennis data while you’re at it? Nikita, from the Tennis Notebook, and I are hosting the Tennis Data Storytelling Challenge, which will run for the next several months. You can read all about it–and sign up–at her site.

There’s plenty of data out there for you to use, and I encourage you to explore it to get a better sense of what you might do with it. Nikita has focused in particular on data from the Match Charting Project, the crowdsourced effort I’ve coordinated to collect shot-by-shot stats for hundreds of professional matches. The best way to learn about that project is to jump in and chart a match (or ten) yourself.

So, what should you write about? If you have to ask, I suspect you’re not watching enough tennis–or, at least, you’re not watching with a sufficiently critical eye.

The best analytical work comes from people with deep domain knowledge in addition to the data science skills they need to do the analysis. The more you watch the sport intently, listen to commentators with a skeptical ear, and think for yourself about what’s happening on court, the better your work will be.

In addition to those that Nikita posted recently, here are a few more general subjects to get you pointed in the right direction:

  1. How do lefties differ–in tactics or in results–from right-handers?
  2. What shots or tactics are more successful on one surface than on another?
  3. What happens to a player’s game when s/he starts getting tired, and how does that show up in the data?
  4. Are there particular skills or playing styles that transfer well from the lower levels (Challengers, etc) to tour level?
  5. What happens in pressure situations? Do some players excel more than others? Do players lean on different tactics than they do in lower-leverage situations?

The list goes on–literally. I’ve posted a list of nearly 150 topics, ranging from simple queries to barely-touched fields of research. Feel free to add to it.

Remember, however, that many of these are big questions–far too complicated to adequately address in 1,500 words. I’d rather see a very carefully researched, convincing narrative about a narrow topic than an attempt to cover too much ground at once.

Again, if you want to join the fun, head over to The Tennis Notebook and sign up for the Challenge!

Is Grand Slam Qualifying Worth Tanking For?

Earlier today in Hobart, Naomi Osaka lost her second-round match to Mona Barthel. Coming into the match, she was in a tricky position: If she won, she wouldn’t be able to play Australian Open qualifying. For a young player outside the top 100, a tour-level quarterfinal would be nice, but presumably Melbourne was intended to be the centerpiece of her trip to Australia.

Since she lost the match, she’ll be able to play qualifying. But what if she hadn’t? Is this a situation in which a player would benefit from losing a match?

Put another way: In a position like Osaka’s, what are the incentives? If she could choose between the International-level quarterfinal and the Slam qualifying berth, which should she pick? Or, put more crassly, should a player in this position tank?

Let’s review the scenarios. In scenario A, Osaka wins the Hobart second-rounder, reaches the quarterfinal, and has a chance to go even further. She can’t play the Australian Open in any form. In scenario B, she loses the second-rounder, enters Melbourne qualifying and has a chance to reach the main draw.

Before we go through the numbers, take a guess: Which scenario is likely to give Osaka more ranking points? What about prize money?

Scenario A is more straightforward. By reaching the quarterfinals, she earns 30 additional ranking points and US$2,590 beyond what a second-round loser makes. Beyond that, we need to calculate “expected” points and prize money, using the amounts on offer for each round and combining them with her odds of getting there.

Let’s estimate that Osaka would have about a 25% chance of winning her quarterfinal match and earning an additional 50 points and $5400. In expected terms, that’s 12.5 points and $1,350. If she progresses, we’ll give her a 25% chance of reaching the final, then in the final, a 15% chance of winning the title.

Adding up these various possibilities, from her guaranteed QF points to her 0.94% chance (25%*25%*15*) of winning the Hobart title, we see that her expected rewards in scenario A are roughly 48 ranking points and just under $4,800.

Scenario B starts in a very different place. Thanks to the recent increases in Grand Slam prize money, every player in the qualifying takes home at least US$3,150. That’s already close to Osaka’s expected financial reward from advancing in Hobart. The points are a different story, though: First-round qualifying losers only get 2 WTA ranking points.

I’ll spare you all the calculations for scenario B, but I’ve assumed that Osaka would have a 70% chance of winning qualifying round 1, a 60% chance of winning QR2, and a 50% chance of winning QR3 and qualifying. Those might be a little bit high, but if they are, consider it compensation for the possibility that she’ll reach the main draw as a lucky loser. (Also, if we knock her chances all the way down to 50%, 45%, and 40%, the conclusions are the same, even if the points and prize money in scenario B are quite a bit lower.)

Those estimated probabilities translate into an expectation of about 23 ranking points and US$11,100. Osaka isn’t guaranteed any money beyond the initial $3,150, but the rewards for qualifying are enormous, especially compared to the prize money in Hobart. A first-round main draw loser in Melbourne takes home more money than the losing finalist does in Hobart.

And, of course, if she does qualify, there’s a chance she’ll go further. Since 2000, female Slam qualifiers have reached the second round 41% of the time, the third round 9% of the time, the fourth round 1.8% of the time, and the quarterfinals 0.3% of the time. Those odds, combined with her 21% chance of reaching the main draw in the first place,  translate into an additional 7 expected ranking points and $2,600 in prize money.

All told, scenario B gives us 30 expected ranking points and US$13,600 in expected prize money.

The Slam option results in far more cash, while the International route is worth more ranking points. In the long term, those ranking points would have some financial value, possible earning Osaka entry into a few higher-level events than she would otherwise qualify for. But that value probably doesn’t overcome the nearly $9,000 gap in immediate prize money.

I hope that no player ever tanks a match at a tour-level event so they can make it in time for Slam qualifying. But if one does, we’ll at least understand the logic behind it.

Winners, Errors, and Misinformation

Of the general ways in which points end–winners, unforced errors, and forced errors, which is the most common? It’s so basic a question that I’d never thought to investigate it. As it turns out, other people have, and they’re making tenuous claims based on their results.

A friend sent me a link to this advertisement for an instructional course, which–eventually, far into a painfully slow video–explains that more points on the pro tour end in forced errors than in winners or unforced errors. And because of this, the video argues, you can use some of the same patterns the pros use with the goal of generating forced errors. Apparently, aiming for winners is too risky, as is waiting for unforced errors.

Pedagogically, it seems reasonable enough to encourage patience and tactical conservatism. I don’t know the first thing about helping amateurs improve their tennis game, and I’ll happily defer to the experts.

However, the use of pro tennis data sparked my interest. I was immediately skeptical of these claims, which were apparently based on Grand Slam matches from 2012.

Using my datasets extracted from IBM Pointstream’s records of the last several slams, I tested the 2015 French Open and the 2015 US Open, tallying winners, unforced errors, and forced errors for men and women at both events. Here’s how they break down:

Dataset    Winners  Unforced  Forced  
FO Men       33.8%     32.9%   33.3%  
FO Women     32.7%     37.8%   29.5%  
USO Men      34.3%     31.6%   34.1%  
USO Women    31.0%     38.0%   30.9%

On both surfaces, men’s points split fairly evenly among the three categories. For women, winners are roughly even with forced errors (though there are more winners on clay) and unforced errors are the most common type of point-ending shot.

The Pointstream-based dataset has limitations, though, and you might have already guessed what it is. A sizable percentage of forced errors are serve returns, which don’t really seem pertinent to a discussion of tactics. We can separate aces from winners and double faults from unforced errors, but not forced error returns from forced errors.

For that, we need the resources of the Match Charting Project. That data gives us almost 1500 matches (evenly split between men and women) once we limit our view to tour-level contests. The MCP dataset contains everything Pointstream does–winners, unforced and forced errors–and much, much more. For our purposes, the key addition is rally length, which allows to differentiate between forced error returns and forced errors that came later in rallies.

With the MCP data, we can remove serve statistics from this discussion altogether, excluding aces, double faults, and forced error returns, none of which are tactics in the sense we usually use the word.

Here’s the frequency of each type of point-ender:

Dataset  Winners  Unforced  Forced  
Men        32.5%     45.8%   21.7%  
Women      32.4%     49.4%   18.2%

When serves are no longer cluttering the picture, winners retain their relative importance, but the distribution of errors changes enormously. Now, we see that once the returner gets the ball back in play (or receives a serve he or she should be able to put back in play), unforced errors outnumber forced errors by more than two to one.

(I also calculated clay-specific numbers, and all the rates were within one percentage point of the overall averages.)

Forced errors are the most common type of point-ender in only 14 of 728 charted men’s matches and 4 of 751 charted women’s matches. Even if you’re concerned about the representativeness of the MCP sample or the error-labeling tendencies of the charters and add make substantial adjustments to allow for them, these results overwhelming establish that unforced errors are the most common way in which rallies end.

I’m not sure how applicable the tactics and tendencies of pro players are to amateur coaching, so it’s possible that these numbers are irrelevant to a great deal of coaching pedagogy. But if you’re going to base your instructional technique on pro tennis stats, it seems reasonable to start by getting the numbers right.

The Match Charting Project is making it possible to answer questions about tennis that were previously unanswerable. Project data is open to all researchers. Please help us grow the project by watching tennis and charting matches!