Is Milos Raonic’s Return Game Improving?

It’s no secret that Milos Raonic‘s return game is a liability. He has reached the game’s elite level with a dominant serve, and he broke into the top five on the strength of a historically-great record in tiebreaks.

Last year, Raonic’s tiebreak record fell back to earth (as these things usually do) and he dropped out of the top ten. Now, in a new season with a new coach, Carlos Moya, Raonic reeled off nine straight victories, finally losing in five sets to Andy Murray in today’s Australian Open semifinal.

Until today’s match, when Raonic won a dismal 25% of return points, the numbers were looking good. Milos won 36.5% of return points in his four matches in Brisbane, which is a little bit better than the 35% tour average on hard courts. With his serve, he doesn’t need to be a great returner; simply improving that aspect of his game to average would make him a dominant force on tour.

This is a crucial number to watch, because it could be the difference between Milos becoming number one in the world and Milos languishing in the back half of the top ten. It’s incredibly rare that players with weak return games are able to maintain a position at the very top of the rankings.

Through the quarterfinals in Melbourne, the positive signs kept piling up. For each of his 2016 opponents, I tallied their 2015 service points won on hard courts. In 6 of 10 matches this month, Milos kept their number below their 2015 average. In a 7th match, against Gael Monfils, he was one return point away from doing the same.

By comparison, in 2015, Raonic held hard-court opponents to their average rate of service points won only 9 times in 35 tries. Even in his career-best season of 2014, he did so in only 15 of 41 matches. Even with the weak return numbers against Murray, this is Raonic’s best ever 10-match stretch, by this metric.

The difference is more dramatic when we combine all these single-match measurements into a single metric per season. For each match, I calculated how well Milos returned relative to an average player against his opponent that day. For example, against Murray today, he won 25% of return points compared to an average hard-court Murray opponent’s 33.7%. In percentage terms, Raonic returned 26% worse than average.

Aggregating all of his 2016 matches, Raonic has returned 6% better than average. In 2015 hard-court matches, he was 10% below average; in 2014, 3% below average, and in 2013, 7% below average.

A nine-match stretch of good form is hardly proof that a player has massively improved half of his game, but it’s certainly encouraging. While all know that Milos is an elite server, it’s his return game that will determine how great he becomes.

How Dangerous Is It To Fix a Single Service Game?

Italian translation at settesei.it

Earlier this week, I offered a rough outline of the economics of fixing tennis matches, calculating the expected prize money that players forgo at various levels when they lose on purpose. The vast gulf between prize money, especially at lower-level events, and fixing fees suggests that gamblers must pay high premiums to convince players to do something ethically repugnant and fraught with risk.

So much for match-level fixes. What about single service games? In Ben Rothenberg’s recent report, a shadowy insider offers the following data points:

Buying a service break at a Futures event cost $300 to $500, he said. A set was $1,000 to $2,000, and a match was $2,000 to $3,000.

In other words, a service break is valued at between 10% and 25% the cost of an entire match. The article doesn’t mention service-break prices at higher levels, so we’ll have to use the Futures numbers as our reference point.

Selling a service break might be a way to have your cake and eat it too, taking some cash from gamblers while retaining the chance to advance in the draw and earn ranking points. But it won’t always work out that way.

I ran some simulations to see how much a service break should cost, based on the simplifying assumption that prices correspond to chances of winning and, by extension, forgone prize money. It turns out that the range of 10% to 25% is exactly right.

Let’s start with the simplest scenario: Two equal men with middle-of-the-road serves, which win them 63% of service points. In an honest match, these two would each have a 50% chance of winning. If one of them guarantees a break in his second service game, he is effectively lowering his chances of winning the match to 38.5%. dropping his expected prize money for the tournament by 23%.

If our players have weaker serves, for instance each winning 55% of service points, the fixer’s chances of winning the match fall to about 42%, only a 16% haircut. With stronger serves, using the extreme case of 70% of points going the way of the server, the fixer’s chances drop to 34%, a loss of 32% in his expected prize money.

This last scenario–two equal players with big serves–is the one that confers the most value on a single service break. We can use that 32% sacrifice as an upper bound for the worth of a single fixed break.

Fixed contests have more value to gamblers when the better player is guaranteed to lose, and in those cases, a service break doesn’t have as much impact on the outcome of the match. If the fixer is considerably better than his opponent, he was probably going to break serve a few times more than his opponent would, so losing a single game is less likely to determine the outcome of the match.

Let’s take a few examples:

  • If one player wins 64% of service points and other wins 62%, the favorite has a 60% chance of winning. If he fixes one service break, his chances of winning fall to just below 48%, about a 20% drop in expected prize money.
  • When one player wins 65% of service points against an opponent winning 61%, his chances in an honest match are 69.3%. Giving up one fixed service break, his odds fall to 57.4%, a sacrifice of roughly 17%.
  • A 67% server facing a 60% server has an 80.8% chance of winning. With one fixed service break, that drops to 70.7%, a loss of 12.5%.
  • A huge favorite winning 68% of service points against his opponent’s 58% has an 89.5% chance of advancing to the next round. Guarantee a break in one of his service games, and his odds drop to 82%, a loss of 8.4%.

With the exception of very lopsided matches (for which there might not be as many betting markets), we have our lower bound, not far below 10%.

The average Futures first-rounder, if we can generalize from such a mixed bag of matches, is somewhere in the middle of those examples–not an even contest, but without a heavy favorite. So the typical value of a fixed service break is between about 12% and 20% of the value of the match, right in the middle of the range of estimates given by Rothenberg’s source.

Even in this hidden, illegal marketplace, the numbers we’ve seen so far suggest that both gamblers and players act reasonably rationally. Amid a sea of bad news, that’s a good sign for tennis’s governing bodies: It promises that players will respond in a predictable manner to changing incentives. Unfortunately, it remains to be seen whether the incentives will change.

The Weirdest Thing About David Marrero’s Suspicious Mixed Doubles Match

You’ve probably seen the news: There was suspicious betting activity on a mixed doubles match a few days ago, hinting that some bettors knew ahead of time that David Marrero and Lara Arruabarena were going to lose to Andrea Hlavackova and Lukasz Kubot.

I don’t know whether it was a fix, or if someone leaked information, or if it was a publicity stunt by Pinnacle, who reported the suspicious activity. I don’t really care. Instead, what stuck out to me was this odd claim from Marrero, as reported by the Times:

“Normally, when I play, I play full power, in doubles or singles,” said Marrero, who won the doubles title at the 2013 ATP World Tour Finals. “But when I see the lady in front of me, I feel my hand wants to play, but my head says, ‘Be careful.’ This is not a good combination.”

As the Times also points out, Marrero’s record in mixed doubles is abysmal: 7-21 (with nine different partners), including 10 consecutive losses. He has, at times, ranked among the best doubles players in the world, yet managed to lose mixed matches alongside other greats, such as Hlavackova and Sara Errani. In six matches with Arantxa Parra-Santonja, a doubles specialist with eight tour-level titles, he’s lost the lot.

Assuming Marrero isn’t regularly fixing Grand Slam mixed doubles matches–after all, fixing a match this week would be awfully dumb–it’s clear that he’s not very good in this format. Here’s the weird thing: Before this mini-scandal, nobody was paying any attention.

Yeah, of course, it’s mixed doubles, which is little more than a glorified exhibition. Tennis isn’t great when it comes to statkeeping, and there’s virtually no one paying attention to doubles stats. The situation with mixed doubles is even worse. But if singles player had a losing streak of 10 of just about anything, fans would know about it, and people would be watching closely.

Given the nature of the mixed doubles event–specialists frequently switch partners, and the format includes a super-tiebreak in place of a third set–we wouldn’t expect too many extremes. In fact, of the 36 players who have contested at least 15 mixed matches since 2009 (28 slams plus the 2012 Olympics), only Leander Paes, with a 63-21 record, has been as good as Marrero has been bad. No one else has won more than 70% of their mixed matches.

And since mixed doubles draws are full of non-specialists (like Naomi Broady and Neal Skupski, who beat Marrero and Parra-Santonja at Wimbledon in 2014) we would expect the specialists to perform better than average. Sure enough, of those 36 regulars, 25 have winning percentages of 50% or better, and all but four have won at least 43% of matches. Only Marrero and Raquel Atawo (formerly Kops-Jones) hold winning percentages below 36%.

Let’s say we give Marrero the benefit of the doubt–as far the fixing goes, anyway–and accept his claim that he’s uncomfortable playing when there’s a woman across the net. It’s a strange state of affairs when (a) he continues playing almost every possible mixed doubles event despite his discomfort; (b) women choose to partner with him, either ignorant of his discomfort or simply happy to get into the draw; and (c) it’s possible to play 21 Grand Slams before the public gets any inkling that one of the 64 players in the mixed draw has a fundamental issue playing normally on the mixed doubles court.

Such comprehensive, long-standing ignorance isn’t out of place in tennis, especially in doubles. But given what we now know about David Marrero, the suspicious betting activity isn’t the influx of money against him–it’s the fact that anyone ever put money on him to win a mixed doubles match.

The Cost of Fixing a Tennis Match

Italian translation at settesei.it

In the last week, we’ve seen an enormous amount of speculation about match fixing in men’s tennis. There are plenty of signs that players are fixing matches and some alarming evidence that the sport’s governing bodies aren’t taking action.

A few numbers have popped up in interviews with insiders: A player might be offered $50,000 or $60,000 to fix a match. Back when Novak Djokovic was apparently asked (indirectly) to lose on purpose, the price is said to have been $200,000.

Those of you who know your tennis economics know that those are huge sums relative to what most players can expect to earn on tour. At most ATP 250s, only the titlist takes home a check worth more than $50,000. Even at Masters-level events, a player has to reach the quarterfinals to earn that much.

Most of the players whose names turn up in fixing allegations are rarely reaching those heights. Instead, they’re playing a first-rounder at a 250 for the chance to earn an $3,000 or $4,000. Even when giving a full effort, these players are rarely heavy favorites, so they might only have a 50% chance of winning that second-round money in matches they don’t fix.

It’s clear that the vast majority of ATP matches are worth much less to the players–in prize money terms–than they are to gamblers. If press reports are to be believed, gamblers offer payments to many more players than ever accept them, so the gamblers presumably have a reasonably good idea of what it costs to fix a match.

If we can pin down the value of a match played fairly, we can get an idea of the sort of multiple that gamblers must pay to fix a match. Of course, the multiple won’t be the same for every player, or in every situation, but it will shed some additional light on the situation.

I’ve attempted to quantify the prize-money value of every match in the 2015 ATP and ATP Challenger seasons for both players. For each match, using the same general methodology as my tournament forecasts, I generated the probability that each player would reach the next round and every round beyond that. When combined with the additional prize money a player earns for reaching each successive round, these probabilities give us an “expected value” for each match. That’s our best guess of what a player forgoes if he opts to fix a match.

As an example, consider Fernando Verdasco at last year’s ATP 250 in Metz. In the first round, my model gave him a 53% chance of defeating Alexander Zverev. Based on the entire draw, it estimated Verdasco’s chances of reaching each successive round, up to a 2.9% chance of winning the title. Run the numbers, and his expected value for that match was $3,855–incidentally, the lowest expected value of any of his matches last year.

Once Verdasco advanced to the second round and was guaranteed second-round prize money, he had a 54% chance of getting past Gilles Muller, and an expected value of $6,239 for playing that match honestly. Overall, the median expected value of a Verdasco match last year was roughly $9,500, and the highest expected value of any match–his US Open first-rounder–was just over $45,000.

As it turns out, that $9,500 median is very close to the median for all ATP tour-level matches, $9,667. Put another way, well over half of last year’s ATP matches–had an expected value under $10,000–20% of what gamblers are apparently willing to pay to fix a match.

The value of not fixing is even lower for many substantial subsets of tour matches. The median expected value of a first-round match–including the lucrative ones at the majors–is only $6,200. The median expected value of a first-round match at a 250- or 500-level event is a mere $4,200.

Consider the case of Andrey Golubev, a player who has turned up in fixing allegations in the past, and appeared among the players flagged by Buzzfeed’s recent investigation. The median expected value of his 13 tour-level matches last year was $3,450; all but three were under $5,100.

So far, we’ve only discussed tour-level matches–and for players, that’s where the real money is. Most fixing allegations these days pertain to the Challenger level and below, where the entire purse of some events is equal to what a single first-round loser will receive this fortnight in Melbourne.

Golubev played the majority of last season on the Challenger tour, so we can see what it was worth to him to play honestly at the lower level. In 21 matches, his median expected prize money was a whopping $692, and 15 of the 21 matches had expected values under $1,000.

Want another example? Take Denys Molchanov, who has also been identified as a possible fixer. In 27 Challenger matches last year, his median expected prize money was $612, 23 of the 27 matches had expected values under $1,000, and no match he played was, by this measure, worth more than $1,200.

Altogether, the median expected value of a Challenger match last year was $514, and almost 80% of matches had expected prize money values of less than $1,000. The betting volume on Challenger matches is generally lower, so the value of fixing these matches is also lower, but the prize money discrepancy is at least as great.

Without knowing more about how much players have been paid to fix matches–and adjusting for the possibility that they are sometimes injured, with correspondingly lower chances of winning, when they do so–we’ll never be able to establish a precise relationship between expected prize money values and fixing fees. That said, the available anecdotal evidence, combined with my analysis here, can give us a rough idea.

If we take $50,000 as a typical payment to fix an anonymous early-round ATP match, that’s probably about 10 times the expected value of that match to the player who fixes it. It’s less clear, though, whether that extra $45,000 should be understood as a 10x multiple of expected value, or a fee to offset the very high cost–albeit one with very low probability–of the player suffering a long-term suspension, banishment from the game, or even legal penalties.

Further, the expected value of a match is more than just its prize money. When a player wins matches, he gains more ranking points, and those points can get him entry into higher-level (and higher-paycheck) events and earn him seedings at the events he’s already playing. It’s very difficult to assign a money value to ranking points, especially since they are non-linear: The 50 points that push you past the Grand Slam direct entry threshold are enormously valuable, but the 50 points that move you from #41 to #39 are usually worthless.

For both of these reasons, we can’t explain exactly how fixing fees are established, or–more importantly–how fixing might be affected by increased prize money. After all, plenty of people have been clamoring for years for more prize money at the lower levels of the game, and when we see players apparently losing matches on purpose in order to finance their life on tour, that gives more ammunition to their cause.

However, based on the numbers we’ve seen so far, it’s far from obvious that increasing prize money–whether at the Challenger level or in early rounds of 250s–would do much to solve the problem.

Consider a radical move such as doubling prize money at all Challengers. That would cost about $8,000,000 per season, and it would make tennis a much more viable option for the many players who don’t receive substantial support from their national federations. But would it put a dent in match-fixing?

Simply doubling the numbers above, we find that almost half of all Challenger matches would still be worth $1,000 or less, with almost 80% under $2,000. If some players will fix matches for ten times the expected prize money, that means we still have thousands of matches that are “fixable” for $10,000. Perhaps an increase in overall prize money would mean that more players would refuse to fix on moral grounds. But financially, even an expensive solution like this one is very unlikely to eliminate fixing.

If recent reports are to be believed, the governing bodies of tennis are doing very little to stop match fixing. Spending that same $8,000,000 to massively increase the size of the Tennis Integrity Unit (assuming that it uses that money wisely and acts on its findings) would probably have a much greater effect than the same amount spent to increase prize money.

If enforcement is more effective, the risk that a player takes each time he fixes a match is that much greater. Increasing that risk would require that gamblers pay a higher multiple (or a higher “risk of banishment” flat fee) for every match, not just those for which prize money is higher. It wouldn’t prevent every corrupt player from entering the sport, and it wouldn’t address non-financial issues like situations in which a player’s family is threatened, but it would make fixing matches a lot more expensive.

Comparing fixing fees to the prize money expectations I’ve generated, it’s clear that the players who do fix matches–themselves already in the minority–charge a very high premium for doing so. They are very concerned about getting caught, or they would simply rather not fix matches. The problem can’t be solved by any feasible boost in prize money, but it can be mitigated by increasing the premium that gamblers have to pay.

The Tennis Data Storytelling Challenge

Want to show off your data analysis and visualization skills, and dig into some tennis data while you’re at it? Nikita, from the Tennis Notebook, and I are hosting the Tennis Data Storytelling Challenge, which will run for the next several months. You can read all about it–and sign up–at her site.

There’s plenty of data out there for you to use, and I encourage you to explore it to get a better sense of what you might do with it. Nikita has focused in particular on data from the Match Charting Project, the crowdsourced effort I’ve coordinated to collect shot-by-shot stats for hundreds of professional matches. The best way to learn about that project is to jump in and chart a match (or ten) yourself.

So, what should you write about? If you have to ask, I suspect you’re not watching enough tennis–or, at least, you’re not watching with a sufficiently critical eye.

The best analytical work comes from people with deep domain knowledge in addition to the data science skills they need to do the analysis. The more you watch the sport intently, listen to commentators with a skeptical ear, and think for yourself about what’s happening on court, the better your work will be.

In addition to those that Nikita posted recently, here are a few more general subjects to get you pointed in the right direction:

  1. How do lefties differ–in tactics or in results–from right-handers?
  2. What shots or tactics are more successful on one surface than on another?
  3. What happens to a player’s game when s/he starts getting tired, and how does that show up in the data?
  4. Are there particular skills or playing styles that transfer well from the lower levels (Challengers, etc) to tour level?
  5. What happens in pressure situations? Do some players excel more than others? Do players lean on different tactics than they do in lower-leverage situations?

The list goes on–literally. I’ve posted a list of nearly 150 topics, ranging from simple queries to barely-touched fields of research. Feel free to add to it.

Remember, however, that many of these are big questions–far too complicated to adequately address in 1,500 words. I’d rather see a very carefully researched, convincing narrative about a narrow topic than an attempt to cover too much ground at once.

Again, if you want to join the fun, head over to The Tennis Notebook and sign up for the Challenge!

Is Grand Slam Qualifying Worth Tanking For?

Italian translation at settesei.it

Earlier today in Hobart, Naomi Osaka lost her second-round match to Mona Barthel. Coming into the match, she was in a tricky position: If she won, she wouldn’t be able to play Australian Open qualifying. For a young player outside the top 100, a tour-level quarterfinal would be nice, but presumably Melbourne was intended to be the centerpiece of her trip to Australia.

Since she lost the match, she’ll be able to play qualifying. But what if she hadn’t? Is this a situation in which a player would benefit from losing a match?

Put another way: In a position like Osaka’s, what are the incentives? If she could choose between the International-level quarterfinal and the Slam qualifying berth, which should she pick? Or, put more crassly, should a player in this position tank?

Let’s review the scenarios. In scenario A, Osaka wins the Hobart second-rounder, reaches the quarterfinal, and has a chance to go even further. She can’t play the Australian Open in any form. In scenario B, she loses the second-rounder, enters Melbourne qualifying and has a chance to reach the main draw.

Before we go through the numbers, take a guess: Which scenario is likely to give Osaka more ranking points? What about prize money?

Scenario A is more straightforward. By reaching the quarterfinals, she earns 30 additional ranking points and US$2,590 beyond what a second-round loser makes. Beyond that, we need to calculate “expected” points and prize money, using the amounts on offer for each round and combining them with her odds of getting there.

Let’s estimate that Osaka would have about a 25% chance of winning her quarterfinal match and earning an additional 50 points and $5400. In expected terms, that’s 12.5 points and $1,350. If she progresses, we’ll give her a 25% chance of reaching the final, then in the final, a 15% chance of winning the title.

Adding up these various possibilities, from her guaranteed QF points to her 0.94% chance (25%*25%*15*) of winning the Hobart title, we see that her expected rewards in scenario A are roughly 48 ranking points and just under $4,800.

Scenario B starts in a very different place. Thanks to the recent increases in Grand Slam prize money, every player in the qualifying takes home at least US$3,150. That’s already close to Osaka’s expected financial reward from advancing in Hobart. The points are a different story, though: First-round qualifying losers only get 2 WTA ranking points.

I’ll spare you all the calculations for scenario B, but I’ve assumed that Osaka would have a 70% chance of winning qualifying round 1, a 60% chance of winning QR2, and a 50% chance of winning QR3 and qualifying. Those might be a little bit high, but if they are, consider it compensation for the possibility that she’ll reach the main draw as a lucky loser. (Also, if we knock her chances all the way down to 50%, 45%, and 40%, the conclusions are the same, even if the points and prize money in scenario B are quite a bit lower.)

Those estimated probabilities translate into an expectation of about 23 ranking points and US$11,100. Osaka isn’t guaranteed any money beyond the initial $3,150, but the rewards for qualifying are enormous, especially compared to the prize money in Hobart. A first-round main draw loser in Melbourne takes home more money than the losing finalist does in Hobart.

And, of course, if she does qualify, there’s a chance she’ll go further. Since 2000, female Slam qualifiers have reached the second round 41% of the time, the third round 9% of the time, the fourth round 1.8% of the time, and the quarterfinals 0.3% of the time. Those odds, combined with her 21% chance of reaching the main draw in the first place,  translate into an additional 7 expected ranking points and $2,600 in prize money.

All told, scenario B gives us 30 expected ranking points and US$13,600 in expected prize money.

The Slam option results in far more cash, while the International route is worth more ranking points. In the long term, those ranking points would have some financial value, possible earning Osaka entry into a few higher-level events than she would otherwise qualify for. But that value probably doesn’t overcome the nearly $9,000 gap in immediate prize money.

I hope that no player ever tanks a match at a tour-level event so they can make it in time for Slam qualifying. But if one does, we’ll at least understand the logic behind it.

Winners, Errors, and Misinformation

Italian translation at settesei.it

Of the general ways in which points end–winners, unforced errors, and forced errors, which is the most common? It’s so basic a question that I’d never thought to investigate it. As it turns out, other people have, and they’re making tenuous claims based on their results.

A friend sent me a link to this advertisement for an instructional course, which–eventually, far into a painfully slow video–explains that more points on the pro tour end in forced errors than in winners or unforced errors. And because of this, the video argues, you can use some of the same patterns the pros use with the goal of generating forced errors. Apparently, aiming for winners is too risky, as is waiting for unforced errors.

Pedagogically, it seems reasonable enough to encourage patience and tactical conservatism. I don’t know the first thing about helping amateurs improve their tennis game, and I’ll happily defer to the experts.

However, the use of pro tennis data sparked my interest. I was immediately skeptical of these claims, which were apparently based on Grand Slam matches from 2012.

Using my datasets extracted from IBM Pointstream’s records of the last several slams, I tested the 2015 French Open and the 2015 US Open, tallying winners, unforced errors, and forced errors for men and women at both events. Here’s how they break down:

Dataset    Winners  Unforced  Forced  
FO Men       33.8%     32.9%   33.3%  
FO Women     32.7%     37.8%   29.5%  
                                      
USO Men      34.3%     31.6%   34.1%  
USO Women    31.0%     38.0%   30.9%

On both surfaces, men’s points split fairly evenly among the three categories. For women, winners are roughly even with forced errors (though there are more winners on clay) and unforced errors are the most common type of point-ending shot.

The Pointstream-based dataset has limitations, though, and you might have already guessed what it is. A sizable percentage of forced errors are serve returns, which don’t really seem pertinent to a discussion of tactics. We can separate aces from winners and double faults from unforced errors, but not forced error returns from forced errors.

For that, we need the resources of the Match Charting Project. That data gives us almost 1500 matches (evenly split between men and women) once we limit our view to tour-level contests. The MCP dataset contains everything Pointstream does–winners, unforced and forced errors–and much, much more. For our purposes, the key addition is rally length, which allows to differentiate between forced error returns and forced errors that came later in rallies.

With the MCP data, we can remove serve statistics from this discussion altogether, excluding aces, double faults, and forced error returns, none of which are tactics in the sense we usually use the word.

Here’s the frequency of each type of point-ender:

Dataset  Winners  Unforced  Forced  
Men        32.5%     45.8%   21.7%  
Women      32.4%     49.4%   18.2%

When serves are no longer cluttering the picture, winners retain their relative importance, but the distribution of errors changes enormously. Now, we see that once the returner gets the ball back in play (or receives a serve he or she should be able to put back in play), unforced errors outnumber forced errors by more than two to one.

(I also calculated clay-specific numbers, and all the rates were within one percentage point of the overall averages.)

Forced errors are the most common type of point-ender in only 14 of 728 charted men’s matches and 4 of 751 charted women’s matches. Even if you’re concerned about the representativeness of the MCP sample or the error-labeling tendencies of the charters and add make substantial adjustments to allow for them, these results overwhelming establish that unforced errors are the most common way in which rallies end.

I’m not sure how applicable the tactics and tendencies of pro players are to amateur coaching, so it’s possible that these numbers are irrelevant to a great deal of coaching pedagogy. But if you’re going to base your instructional technique on pro tennis stats, it seems reasonable to start by getting the numbers right.

The Match Charting Project is making it possible to answer questions about tennis that were previously unanswerable. Project data is open to all researchers. Please help us grow the project by watching tennis and charting matches!

Winning Return Points When It Matters

In my post last week about players who have performed better than expected in tiebreaks (temporarily, anyway), I speculated that big servers may try harder in tiebreaks than in return games.

If we interpret “try harder” as “win points more frequently,” we can test it. With my point-by-point dataset, we can look at every top player in the men’s game and compare their return-point performance in tiebreaks to their return-point performance earlier in the set.

As it turns out, top players post better return numbers in tiebreaks than they do earlier in the set. I looked at every match in my dataset (most tour-level matches from the last few seasons) for the ATP top 50, and found that these players, on average, won 5.2% more return points than they did earlier in those sets.

That same group of players saw their serve performance decline slightly, by 1.1%. Since the top 50 frequently play each other, it’s no surprise that the serve and return numbers point in different directions. However, the return point increase and the serve point decrease don’t cancel each other out, suggesting that the top 50 is winning a particularly large number of tiebreaks against the rest of the pack, mostly by improving their return game once the tiebreak begins.

(There’s a little bit of confirmation bias here, since some of the players on the edge of the top 50 got there thanks to good luck in recent tiebreaks. However, most of top 50–especially those players who make up the largest part of this dataset–have been part of this sample of players for years, so the bias remains only minor.)

My initial speculation concerned big servers–the players who might reasonably relax during return games, knowing that they probably won’t break anyway. However, big servers aren’t any more likely than others to return better in tiebreaks. (Or, put another way, to return worse before tiebreaks.) John Isner, Ivo Karlovic, Kevin Anderson, and Roger Federer all win slightly more return points in tiebreaks than they do earlier in sets, but don’t improve as much as the 5.2% average. What’s more, Isner and Anderson improve their serve performance for tiebreaks slightly more than they do their return performance.

There are a few players who may be relaxing in return games. Bernard Tomic improves his return points won by a whopping 27% in tiebreaks, Marin Cilic improves by 16%, and Milos Raonic improves by 11%. Tomic and Raonic, in particular, are particularly ineffective in return games when they have a break advantage in the set (more on that in a moment), so it’s plausible they are saving their effort for more important moments.

Despite these examples, this is hardly a clear-cut phenomenon. Kei Nishikori, for example, ups his return game in tiebreaks almost as much as Cilic does, and we would never think of him as a big server, nor do I think he often shows signs of tactically relaxing in return games. We have plenty of data for most of these players, so many of these trends are more than just statistical noise, but the results for individual players don’t coalesce into any simple, overarching narratives about tiebreak tendencies.

There is one nearly universal tendency that turned up in this research. When leading a set by one break or more, almost every player returns worse. (Conversely, when down a break, almost every player serves better.) The typical top 50 player’s return game declines by almost 5%, meaning that a player winning 35% of return points falls to 33.4%.

Almost every player fits this pattern. 48 of the top 50–everyone except for David Ferrer and Aljaz Bedene–win fewer return points when up a break, and 46 of 50 win more service points when down a break.

Pinning down exactly why this is the case is–as usual–more difficult than establishing that the phenomenon exists. It may be that players are relaxing on return. A one-break advantage, especially late, is often enough to win the set, so it may make sense for players to conserve their energy for their own service games. Looking at it from the server’s perspective, that one-break disadvantage might remove some pressure.

What’s clear is this: Players return worse than usual when up a break, and better than usual in tiebreaks. The changes are much more pronounced for some ATPers than others, but there’s no clear relationship with big serving. As ever, tiebreaks remain fascinating and more than a little inscrutable.

The Luck of the Tiebreak, 2015 in Review

Tiebreak outcomes are influenced by luck a lot more than most people think. All else equal, big servers aren’t any more successful than weak servers, and one season’s tiebreak king is often the next season’s tiebreak chump.

I’ve written a lot about this in the past, so I won’t repeat myself too much. (If you want to read more, here’s a good place to start.) In short, the data shows this: Good players win more tiebreaks than bad players do, but only because they’re better in general, not because they have special tiebreak skills. Very few players perform better or worse than they usually do in tiebreaks.

In the past, I’ve found that three players–Roger Federer, Rafael Nadal, and John Isner–consistently increase their level in tiebreaks. In other words, when you calculate how many tiebreaks Federer (or Nadal, or Isner) should win based on his overall rate of serve and return points won, you discover than he wins even more tiebreaks than that.

In any given year, some players score very high or very low–winning or losing far more tiebreaks than their overall level of play would suggest that they should. But the vast majority of those players regress back to the mean in subsequent years.

Here’s a look at which players outperformed the most in 2015 (minimum 20 tiebreaks). TBExp is the number of tiebreaks we would expect them to win, given their usual rate of serve and return points won. TBOE (Tie Breaks Over Expectations) is the difference between the number they won and the number we’d expect them to win, and TBOR is that difference divided by total tiebreaks.

Player              TBs  TBWon  TBExp  TBOE   TBOR  
Stan Wawrinka        46     34   24.9   9.1  19.8%  
Martin Klizan        25     17   12.2   4.8  19.0%  
Marin Cilic          35     26   21.0   5.0  14.2%  
Tomas Berdych        34     24   20.0   4.0  11.7%  
John Isner           64     39   31.7   7.3  11.3%  
Feliciano Lopez      42     27   22.4   4.6  11.0%  
Jiri Vesely          28     16   13.2   2.8  10.1%  
Sam Groth            31     18   14.9   3.1  10.1%  
Gilles Muller        45     27   22.7   4.3   9.5%  
Gael Monfils         28     18   15.4   2.6   9.4%

There are a lot of big servers here (more on that later) and a lot of new faces. Federer and Nadal were roughly neutral in 2015, winning exactly as many tiebreaks as we’d expect. Of the tiebreak masters, only Isner remained among the leaders. He has never posted a season below +5% TBOR, and only twice has he been below +11% TBOR. Just from this leaderboard, you can tell how elite that is.

Along with Isner, we have Marin Cilic, Feliciano Lopez, Sam Groth, and Gilles Muller, all players one would reasonably consider to be big servers. As I mentioned above, big serving doesn’t typically correlate with exceeding tiebreak expectations. It may just be a fluke: Lopez was roughly neutral in 2013 and 2014, and -15% in 2012; Groth doesn’t have much of a tour-level track record, but was -5% in 2014; Muller has been up and down throughout his career; and Cilic almost always underperformed until 2013.

Adding to the “fluke” argument is the case of Ivo Karlovic. His -14% TBOR this year was one of the worst among players who contested 20 or more tiebreaks, and he’s been exactly neutral over the last decade.

Let’s take a closer look at a few players.

Stan Wawrinka: For the second year in a row, he won at least 15% more tiebreaks than expected. Whether it’s clutch, focus, or dumb luck, the shift in his tiebreak fortunes dovetails nicely with his upward career trajectory. From 2006-13, he only posted one season at neutral or better, and his overall TBOR of -9% was one of the worst in the game for that span.

Cilic’s story is similar. Before 2013, he posted only one season above expectations. Since then, he’s won 19%, 16%, and 14% more tiebreaks than expected.

While only anecdotes, these two cases contradict an idea I’ve heard quite a bit, that players weaken in the clutch as they get older. The subject often comes up in the context of Karlovic’s tiebreak futility or Federer’s break point frustrations. It’s tough to prove one way or the other, in part because there’s no generally accepted measure of clutch in tennis. (If indeed there is any persistent clutch skill.) Using a measure like TBOR is dangerous, both because it is so noisy, and because of survivorship bias–players who get worse as they get older are more likely to fall in the rankings and play fewer tour matches as a result.

Another complicating factor is worthy of further study. To estimate how many tiebreaks a player should win, we need to take our expectation from somewhere. I’m using each player’s overall rates of serve and return points won. But if a player is trying harder in tiebreaks (assuming more effort translates into better results), we would expect that he would win more points in tiebreaks.

Isner has admitted to coasting on unimportant points, and for someone with his game style, a whole lot of return points can be classified as unimportant. Very generally speaking, the more one-dimensional the player, the more reason he has to take it easy during return games, and the more he does so, the more we would observe that he outperforms expectations in tiebreaks–simply because he sets expectations artificially low.

That might be an explanation for Isner’s consistent appearance on these leaderboards. And if we assume that players become more strategically sound as they age–or simply better at tactically conserving energy–we might have a reason why older players score higher in this metric.

Two more players worth mentioning are Milos Raonic and Kei Nishikori. They were 5th and 6th on the 2014 leaderboard, outperforming expectations by 15% and 14%, respectively. In 2015, Raonic fell to neutral, and Nishikori (in far fewer tiebreaks) dropped to -14%, nearly the bottom of the rankings. Taken together, it’s a good reminder of the volatility of these numbers. In Raonic’s case, it’s a warning that relying too much on winning tiebreaks (which, by extension, implies relying too little on one’s return game) is a poor recipe for long-term success.

Finally, some notes on the big four. Novak Djokovic and Andy Murray have never figured heavily in these discussions, both because they don’t play a ton of tiebreaks, and because they don’t persistently out- or underperform expectations. Federer and Nadal, however, were long among the best. Both have returned to the middle of the pack: Federer hasn’t posted a TBOR above 5% since 2011, and Nadal underperformed by 8.5% in 2014 before bouncing back to neutral last season.

Whatever tiebreak skill Roger and Rafa once had now eludes them. On the other hand, ten months of good tiebreak luck can happen to anyone, even a legend. If either player can recapture that tiebreak magic–even if it’s mere luck that allows them to do so–it might translate into a few more wins as they try to reclaim the top spot in the rankings.

A New Year For the Match Charting Project

The 2015 tennis season was an amazing one for the Match Charting Project. We added more than 1,000 new matches to the database, including 800 from the 2015 season alone. In about two and half years, the project has grown from little more than a half-baked idea to a tremendous resource for tennis researchers.

The Match Charting Project relies on volunteers to record details of every point of professional matches. Over 50 of you have taken the time to learn the method and chart at least one match, and some of you have gone way, way beyond that. Taken together, the results are outstanding.

In a sport where most data is hidden away by federations and sponsors, the Match Charting Project is one of the few bright spots for analysts. Anyone can use this data to research players, tendencies, and tactics. Anyone can contribute and help us learn more about the game.

We now have shot-by-shot data for over 1,600 matches, including sizable samples for most of the current ATP and WTA top 40. We have particularly large datasets for some top players, including the ATP big four and several WTA favorites. The database includes at least one match for every player in the ATP and WTA top 100, as well as detailed records of matches for many notable retired players.

We made huge progress last year, but I think we can do even better.

In 2015, we added 1,069 matches to the database, just under three per day. At the end of the day on December 31st, we had a total of 1,617 matches covered.

My goal for 2016 is to double that:  another 1,617 new matches in 2016, a rate of about four and a half per day. To accomplish that, we’ll need more of you to pitch in. Hopefully those of you who have contributed in the past will continue to do so. Charting 1,600 matches is no easy feat, but with enough of us working toward that goal, we’ll get there.

For my part, in addition to charting an unhealthy number of matches, I’ll continue to write about my findings from the MCP dataset, and I’ll be developing ways to make the data more accessible to fans. Keep an eye out for updates–other researchers are working on projects that should create even more interest in the Match Charting Project.

Want to find out more? Ready to contribute? Here’s a list of MCP-related resources to fill you in on all the details of the project: