A Preface to All GOAT Arguments

Earlier this week, The Economist published my piece about Rafael Nadal’s and Roger Federer’s grand slam counts. I made the case that, because Nadal’s paths to major titles had been more difficult (the 2017 US Open notwithstanding), his 16 slams are worth more–barely!–than Federer’s.

Inevitably, some readers reduced my conclusion to something like, “stats prove that Nadal is the greatest ever.” Whoa there, kiddos. It may be true that Nadal is better than Federer, and we could probably make a solid argument based on the stats. But a rating of 18.8 to 18.7, based on 35 tournaments, can’t quite carry that burden.

There are two major steps in settling any “greatest ever” debate (tennis or otherwise). The first is definitional. What do we mean by “greatest?” How much more important are slams than non-slams? What about longevity? Rankings? Accomplishments across different surfaces? How much weight do we give a player’s peak? How much does the level of competition matter? What about head-to-head records? I could go on and on. Only when we decide what “greatest” means can we even attempt to make an argument for one player over another.

The second step–answering the questions posed by the first–is more work-intensive, but much less open to debate. If we decide that the greatest male tennis player of all time is the one who achieved the highest Elo rating at his peak, we can do the math. (It’s Novak Djokovic.) If you pick out ten questions that are plausible proxies for “who’s the greatest?” you won’t always get the same answer. Longevity-focused variations tend to give you Federer. (Or Jimmy Connors.) Questions based solely on peak-level accomplishments will net Djokovic (or maybe Bjorn Borg). Much of the territory in between is owned by Nadal, unless you consider the amateur era, in which case Rod Laver takes a bite out of Rafa’s share.

Of course, many fans skip straight to the third step–basking in the reflected glory of their hero–and work backwards. With a firm belief that their favorite player is the GOAT, they decide that the most relevant questions are the ones that crown their man. This approach fuels plenty of online debates, but it’s not quite at my desired level of rigor.

When the big three have all retired, someone could probably write an entire book laying out all the ways we might determine “greatest” and working out who, by the various definitions, comes out on top. Most of what we’re doing now is simply contributing sections of chapters to that eventual project. Now or then, one blog post will never be enough to settle a debate of this magnitude.

In the meantime, we can aim to shed more light on the comparisons we’re already making. Grand slam titles aren’t everything, but they are important, and “19 is more than 16” is a key weapon in the arsenal of Federer partisans. Establishing that this particular 19 isn’t really any better than that particular 16 doesn’t end the debate any more than “19 is more than 16” ever did. But I hope that it made us a little more knowledgeable about the sport and the feats of its greatest competitors.

At the one-article, 1,000-word scale, we can achieve a lot of interesting things. But for an issue this wide-ranging, we can’t hope to settle it in one fell swoop. The answers are hard to find, and choosing the right question is even more difficult.


Little Data, Big Potential

This is a guest post by Carl Bialik.

I had more data on my last 30 minutes of playing tennis than I’d gotten in my first 10 years of playing tennis  — and it just made me want so much more.

Ben Rothenberg and I had just played four supertiebreakers, after 10 minutes of warmup and before a forehand drill. And for most of that time — all but a brief break while PlaySight staff showed the WTA’s Micky Lawler the system — 10 PlaySight cameras were recording our every move and every shot: speed, spin, trajectory and whether it landed in or out. Immediately after every point, we could walk over to the kiosk right next to the net to watch video replays and get our stats. The tennis sure didn’t look professional-grade, but the stats did: spin rate, net clearance, winners, unforced errors, net points won.

Later that night, we could go online and watch and laugh with friends and family. If you’re as good as Ben and I are, laugh you will: As bad as we knew the tennis was by glancing over to Dominic Thiem and Jordan Thompson on the next practice court, it was so much worse when viewed on video, from the kind of camera angle that usually yields footage of uberfit tennis-playing pros, not uberslow tennis-writing bros.

This wasn’t the first time I’d seen video evidence of my take on tennis, an affront to aesthetes everyone. Though my first decade and a half of awkward swings and shoddy footwork went thankfully unrecorded, in the last five years I’d started to quantify my tennis self. First there was the time my friend Alex, a techie, mounted a camera on a smartphone during our match in a London park. Then in Paris a few years later, I roped him into joining me for a test of Mojjo, a PlaySight competitor that used just one camera — enough to record video later published online, with our consent and to our shame. Last year, Tennis Abstract proprietor Jeff Sackmann and I demo-ed a PlaySight court with Gordon Uehling, founder of the company.

With PlaySight and Mojjo still only in a handful of courts available to civilians, that probably puts me — and Alex, Jeff and Ben — in the top 5 or 10 percent of players at our level for access to advanced data on our games. (Jeff may object to being included in this playing level, but our USPS Tennis Abstract Head2Head suggests he belongs.) So as a member of the upper echelon of stats-aware casual players, what’s left once I’m done geeking out on the video replays and RPM stats? What actionable information is there about how I should change my game?

Little data, modest lessons

After reviewing my footage and data, I’m still looking for answers. Just a little bit of tennis data isn’t much more useful than none.

Take the serve, the most common shot in tennis. In any one set, I might hit a few dozen. But what can I learn from them? Half are to the deuce court, and half are to the ad court. And almost half of the ones that land in are second serves. Even with my limited repertoire, some are flat while others have slice. Some are out wide, some down the T and some to the body — usually, for me, a euphemism for, I missed my target.

Playsight groundstroke report

If I hit only five slice first serves out wide to the deuce court, three went in, one was unreturned and I won one of the two ensuing rallies, what the hell does that mean? It doesn’t tell me a whole lot about what would’ve happened if I’d gotten a chance to I try that serve once more that day against Ben — let alone what would happen the next time we played, when he had his own racquet, when we weren’t hitting alongside pros and in front of confused fans, with different balls on a different surface without the desert sun above us, at a different time of day when we’re in different frames of mind. And the data says even less about how that serve would have done against a different opponent.

That’s the serve, a shot I’ll hit at least once on about half of points in any match. The story’s even tougher for rarer shots, like a backhand drop half volley or a forehand crosscourt defensive lob, shots so rare they might come up once or twice every 10 matches.

More eyes on the court

It’s cool to know that my spinniest forehand had 1,010 RPM (I hit pretty flat compared to Jack Sock’s 3,337 rpm), but the real value I see is in the kind of data collected on that London court: the video. PlaySight doesn’t yet know enough about me to know that my footwork was sloppier than usual on that forehand, but I do, and it’s a good reminder to get moving quickly and take small steps. And if I were focusing on the ball and my own feet, I might have missed that Ben leans to his backhand side instead of truly split-stepping, but if I catch him on video I can use that tendency to attack his forehand side next time.

Playsight video with shot stats

Video is especially useful for players who are most focused on technique. As you might have gathered, I’m not, but I can still get tactical edge from studying patterns that PlaySight doesn’t yet identify.

Where PlaySight and its ilk could really drive breakthroughs is by combining all of the data at its disposal. The company’s software knows about only one of the thousands of hours I’ve spent playing tennis in the last five years. But it has tens of thousands of hours of tennis in its database. Even a player as idiosyncratic as me should have a doppelganger or two in a data set that big. And some of them must’ve faced an opponent like Ben. Then there are partial doppelgangers: women who serve like me even though all of our other shots are different; or juniors whose backhands resemble mine (and hopefully are being coached into a new one).  Start grouping those videos together — I’m thinking of machine learning, clustering and classifying — and you can start building a sample of some meaningful size. PlaySight is already thinking this way, looking to add features that can tell a player, say, “Your backhand percentage in matches is 11 percent below other PlaySight users of a similar age/ability,” according to Jeff Angus, marketing manager for the company, who ran the demo for Ben and me.

The hardware side of PlaySight is tricky. It needs to install cameras and kiosks, weatherproofing them when the court is outdoors, and protect them from human error and carelessness. It’s in a handful of clubs, and the number probably won’t expand much: The company is focusing more on the college game. Even when Alex and I, two players at the very center of PlaySight’s target audience among casual players, happened to book a PlaySight court recently in San Francisco, we decided it wasn’t worth the few minutes it would have taken at the kiosk to register — or, in my case, remember my password. The cameras stood watch, but the footage was forever lost.

Bigger data, big questions

I’m more excited by PlaySight’s software side. I probably will never play enough points on PlaySight courts for the company to tell me how to play better or smarter — unless I pay to install the system at my home courts. But if it gets cheaper and easier to collect decent video of my own matches — really a matter of a decent mount and protector for a smartphone and enough storage space — why couldn’t I upload my video to the company? And why couldn’t it find video of enough Bizarro Carls and Bizarro Carl opponents around the world to make a decent guess about where I should be hitting forehands?

There are bigger, deeper tennis mysteries waiting to be solved. As memorably argued by John McPhee in Levels of the Game, tennis isn’t so much as one sport as dozens of different ones, each a different level of play united only by common rules and equipment. And a match between two players even from adjacent levels in his hierarchy typically is a rout. Yet tactically my matches aren’t so different from the ones I see on TV, or even from the practice set played by Thiem and Thompson a few feet from us. Hit to the backhand, disguise your shots, attack short balls and approach the net, hit drop shots if your opponent is playing too far back. And always, make your first serve and get your returns in.

So can a tactic from one level of the game even to one much lower? I’m no Radwanska and Ben is no Cibulkova, but could our class of play share enough similarity — mathematically, is Carl : Ben :: Aga : Pome — that what works for the pros works for me? If so, then medium-sized data on my style is just a subset of big data from analogous styles at every level of the game, and I might even find out if that backhand drop half volley is a good idea. (Probably not.)

PlaySight was the prompt, but it’s not the company’s job to fulfill product features only I care about. It doesn’t have to be PlaySight. Maybe it’s Mojjo, maybe Cizr. Or maybe it’s some college student who likes tennis and is looking for a machine-learning class. Hawk-Eye, the higher-tech, higher-priced, older competitor to PlaySight, has been slow to share its data with researchers and journalists. If PlaySight has figured out that most coaches value the video and don’t care much for stats, why not release the raw footage and stats to researchers, anonymized, who might get cracking on the tennis classification question or any of a dozen other tennis analysis questions I’ve never thought to ask? (Here’s a list of some Jeff and I have brainstormed, and here are his six big ones.) I hear all the time from people who like tennis and data and want to marry the two, not for money but to practice, to learn, to discover, and to share their findings. And other than what Jeff’s made available on GitHub, there’s not much data to share. (Just the other week, an MIT grad asked for tennis data to start analyzing.)

Sharing data with outside researchers “isn’t currently in the road map for our product team, but that could change,” Angus said, if sharing data can help the company make its data “actionable” for users to improve to their games.

Maybe there aren’t enough rec players who’d want the data with enough cash to make such ventures worthwhile. But college teams could use every edge. Rising juniors have the most plastic games and the biggest upside. And where a few inches can change a pro career, surely some of the top women and men could also benefit from PlaySight-driven insights.

Yet even the multimillionaire ruling class of the sport is subject to the same limitations driven by the fractured nature of the sport: Each event has its own data and own systems. Even at Indian Wells, where Hawk-Eye exists on every match court, just two practice courts have PlaySight; the company was hoping to install four more for this year’s tournament and is still aiming to install them soon. Realistically, unless pros pay to install PlaySight on their own practice courts and play lots of practice matches there, few will get enough data to be actionable. But if PlaySight, Hawk-Eye or a rival can make sense of all the collective video out there, maybe the most tactical players can turn smarts and stats into competitive advantages on par with big serves and wicked topspin forehands.

PlaySight has already done lots of cool stuff with its tennis data, but the real analytics breakthroughs in the sport are ahead of us.

Carl Bialik has written about tennis for fivethirtyeight.com and The Wall Street Journal. He lives and plays tennis in New York City and has a Tennis Abstract page.

The Five Big Questions in Tennis Analytics

The fledgling field of tennis analytics can seem rather chaotic, with scores of mini-studies that don’t fit together in any obvious way. Some seem important but unfinished while others are entertaining but trivial.

Let me try to impose some structure on this project by classifying research topics into what I’ll call the Five Big Questions, each of which is really just an umbrella for hundreds more (like these). As we’ll see, there are really six categories, not five, which just goes to show: analytics is about more than just counting.

1. What’s the long-term forecast?

Beyond the realm of the next few tournaments, what does the evidence tell us about the future? This question encompasses everything from seasons to entire careers. What are the odds that Roger Federer reclaims the No. 1 ranking? How many Grand Slams will Nick Kyrgios win? How soon will Catherine Bellis crack the top ten?

The most important questions in this category are the hardest ones to answer: Given the limited data we have on junior players, what can we predict–and with what level of confidence–about their future? These are questions that national federations would love to answer, but they are far from the only stakeholders. Everyone from sponsors to tournaments to the players’ families themselves have an interest in picking future stars. Further, the better we can answer these questions, the more prepared we can be for the natural follow-ups. What can we (as families, coaches, federations, etc.) do to improve the odds that a player succeeds?

2. Who will win the next match?

The second question is also concerned with forecasting, and it is the subject that has received–by far–the most analytical attention. Not only is it fun and engaging to try to pick winners, there’s an enormous global industry with billions of dollars at stake trying to make more accurate forecasts.

As an analyst, I’m not terribly interested in picking winners for the sake of picking winners. More valuable is the quest to identify all of the factors that influence match outcomes, like the role of fatigue, or a player’s preference for certain conditions, or the specifics of a given matchup. Player rating systems fall into this category, and it’s important to remember they are only a tool for forecasting, not an end to themselves.

As a meta-question in this category, one might ask how accurate a set of forecasts could possibly become. Or, posed differently, how big of a role does chance play in match outcomes?

3. When and why does the i.i.d. model break down?

A lot of sports analysis depends on the assumption that events are “identically and independently distributed”–i.e. factors like streakiness, momentum, and clutch are either nonexistent or impossible to measure. In tennis terms, the i.i.d. model might assume that a player converts break points at the same rate that she wins all ad-court points, or that a player hold serve while serving for the set just as often as he holds serve in general.

The conventional wisdom strongly disagrees, but it is rarely consistent. (“It’s hard to serve for the set” but “this player is particularly good when leading.”) This boils down to yet another set of forecasting questions. We might know that a player wins 65% of service points, but what are her chances of winning this point, given the context?

I suspect that thorough analysis will reveal plenty of small discrepancies between reality and the i.i.d. model, especially at the level of individual players. More than with the first two topics, the limited sample sizes for many specific contexts mean we must always be careful to distinguish actual effects from noise and look for long-term trends.

4. How good is that shot?

As more tennis data becomes available in a variety of formats, the focus of tennis analytics will become more granular. The Match Charting Project offers more than 3,000 matches worth of shot-by-shot logs. Even without the details of each shot–like court position, speed, and spin–we can start measuring the effectiveness of specific players’ shots, such as Federer’s backhand.

With more granular data on every shot, analysts will be able to be even more precise. Eventually we may know the effect of adding five miles per hour to your average forehand speed, or the value of hitting a shot from just inside the baseline instead of just behind. Some academics–notably Stephanie Kovalchik–have begun digging into this sort of data, and the future of this subfield will depend a great deal on whether these datasets ever become available to the public.

5. How effective is that tactic?

Analyzing a single shot has its limits. Aside from the serve, every shot in tennis has a context–and even serves usually form part of the backdrop for other shots. Many of the most basic tactical questions have yet to be quantified, such as the success rate of approaching to the backhand instead of the forehand.

As with the previous topic, the questions about tactics get a lot more interesting–and immensely more complicated–as soon as Hawkeye-type data is available. With enough location, speed, and spin data, we’ll be able to measure the positions from which approach shots are most successful, and the type (and direction) that is most effective from each position. We could quantify the costs and benefits of running around a forehand: How good does the forehand have to be to counteract the weaker court position that results?

We can scrape the surface of this subject with the Match Charting Project, but ultimately, this territory belongs to those with camera tracking data.

6. What is the ideal structure of the sport?

Like I said, there are really just five questions. Forecasting careers, matches, and points, and quantifying shots and tactics encompass, for me, the entire range of “tennis analytics.”

However, there are plenty of tennis-related questions that we might assign to the larger field of “business of sports.” How should prize money be distributed? What is the best way to structure the tour to balance the interests of veterans and newcomers? Are there too many top-level tournaments, or too few? What the hell should we do with Davis Cup, anyway?

Many of these issues are–for now–philosophical questions that boil down to preferences and gut instincts. Controlled experiments will always be difficult if only because of the time frames involved: If we change the Davis Cup format and it loses popularity, is it causation or just correlation? We can’t replicate the experiment. But despite the challenges, these are major questions, and analysts may be able to offer valuable insights.

Now … let’s get to work.

The Continuum of Errors

When is an error unforced? If you envision designing an algorithm to answer that question, it quickly becomes unmanageable. You’d need to take into account player position, shot velocity, angle, and spin, surface speed, and perhaps more. Many errors are obviously forced or unforced, but plenty fall into an ambiguous middle ground.

Most of the unforced error counts we see these days–via broadcasts or in post-match recaps–are counted by hand. A scorer is given some guidance, and he or she tallies each kind of error. If the human-scoring algorithm is boiled down to a single rule, it’s something like: “Would a typical pro be expected to make that shot?” Some scorers limit the number of unforced errors by always counting serve returns, or net shots, or attempted passing shots, as forced.

Of course, any attempt to sort missed shots into only two buckets is a gross oversimplification. I don’t think this is a radical viewpoint. Many tennis commentators acknowledge this when they explain that a player’s unforced error count “doesn’t tell the whole story,” or something to that effect. In the past, I’ve written about the limitations of the frequently-cited winner-to-unforced error ratio, and the similarity between unforced errors and the rightly-maligned fielding errors stat in baseball.

Imagine for a moment that we have better data to work with–say, Hawkeye data that isn’t locked in silos–and we can sketch out an improved way of looking at errors.

First, instead of classifying only errors, it’s more instructive to sort potential shots into three categories: shots returned in play, errors (which we can further distinguish later on), and opponent winners. In other words: Did you make it, did you miss it, or did you fail to even get a racket on it? One man’s forced error is another man’s ball put back in play*, so we need to consider the full range of possible outcomes from each potential shot.

*especially if the first man is Bernard Tomic and the other man is Andy Murray.

The key to gaining insight from tennis statistics is increasing the amount of context available–for instance, taking a player’s stats from today and comparing them to the typical performance of a tour player, or contrasting them with how he or she played in the last similar matchup. Errors are no different.

Here’s a basic example. In the sixth game of Angelique Kerber‘s match in Sydney this week against Darya Kasatkina, she hit a down-the-line forehand:

Kerber hits a down-the-line forehand

Thanks to the Match Charting Project, we have data for about 350 of Kerber’s down-the-line forehands, so we know it goes for a winner 25% of the time, and her opponent hits a forced error another 9% of the time. Say that a further 11% turn into unforced errors, and we have a profile for what usually happens when Kerber goes down the line: 25% winners, 20% errors, 55% put back in play. We might dig even deeper and establish that the 55% put back in play consists of 30% that ultimately resulted in Kerber winning the point against 25% that she eventually lost.

In this case, Kasatkina was able to get a racket on the ball, but missed the shot, resulting in what most scorers would agree was a forced error:

Kasatkina lunges for the return

This single instance–Kasatkina hitting a forced error against a very effective type of offensive shot–doesn’t tell us anything on its own. Imagine, though, that we tracked several players in 100 attempts each to reply to a Kerber down-the-line forehand. We might discover that Kasatkina lets 35 of 100 go for winners, or that Simona Halep lets only 15 go for winners and gets 70 back in play, or that Anastasia Pavlyuchenkova hits an error on 30 of the 100 attempts.

My point is this: With more granular data, we can put errors in a real-life context. Instead of making a judgment about the difficulty of a certain shot (or relying on a scorer to do so), it’s feasible to let an algorithm do the work on 100 shots, telling us whether a player is getting to more balls than the average player, or making more errors than she usually does.

The continuum, and the future

In the example outlined above, there’s a lot of important details that I didn’t mention. In comparing Kasatkina’s error to a few hundred other down-the-line Kerber forehands, we don’t know whether the shot was harder than usual, whether it was placed more accurately in the corner, whether Kasatkina was in better position than Kerber’s typical opponent on that type of shot, or the speed of the surface. Over the course of 100 down-the-line forehands, those factors would probably even out. But in Tuesday’s match, Kerber hit only 18 of them. While a typical best-of-three match will give us a few hundred shots to work with, this level of analysis can only tell us so much about specific shots.

The ideal error-classifying algorithm of the future would do much better. It would take all of the variables I’ve mentioned (and more, undoubtedly) and, for any shot, calculate the likelihood of different outcomes. At the moment of the first image above, when the ball has just come off of Kerber’s racket, with Kasatkina on the wrong half of the baseline, we might estimate that there is a 35% chance of a winner, a 25% chance of an error, and a 40% chance that ball is returned in play. Depending on the type of analysis we’re doing, we could calculate those numbers for the average WTA player, or for Kasatkina herself.

Those estimates would allow us, in effect, to “rate” errors. In this example, the algorithm gives Kasatkina only a 40% chance of getting the ball back in play. By contrast, an average rallying shot probably has a 90% chance of ending up back in play. Instead of placing errors in buckets of “forced” and “unforced,” we could draw lines wherever we wish, perhaps separating potential shots into quintiles. We would be able to quantify whether, for instance, Andy Murray gets more of the most unreturnable shots back in play than Novak Djokovic does. Even if we have an intuition about that already, we can’t even begin to prove it until we’ve established precisely what that “unreturnable” quintile (or quartile, or whatever) consists of.

This sort of analysis would be engaging even for those fans who never look at aggregate stats. Imagine if a broadcaster could point to a specific shot and say that Murray had only a 2% chance of putting it back in play. In topsy-turvy rallies, this approach could generate a win probability graph for a single point, an image that could encapsulate just how hard a player worked to come back from the brink.

Fortunately, the technology to accomplish this is already here. Researchers with access to subsets of Hawkeye data have begun drilling down to the factors that influence things like shot selection. Playsight’s “SmartCourts” classify errors into forced and unforced in close to real time, suggesting that there is something much more sophisticated running in the background, even if its AI occasionally makes clunky mistakes. Another possible route is applying existing machine learning algorithms to large quantities of match video, letting the algorithms work out for themselves which factors best predict winners, errors, and other shot outcomes.

Someday, tennis fans will look back on the early 21st century and marvel at just how little we knew about the sport back then.

All the Answers

At the end of Turing’s Cathedral, George Dyson suggests that while computers aren’t always able to usefully respond to our questions, they are able to generate a stunning, unprecedented array of answers–even if the corresponding questions have never been asked.

Think of a search engine: It has indexed every possible word and phrase, in many cases still waiting for the first user to search for it.

Tennis Abstract is no different. Using the menus on the left-hand side of Roger Federer’s page–even ignoring the filters for head-to-heads, tournaments, countries, matchstats, and custom settings like those for date and rank–you can run five trillion different queries. That’s twelve zeroes–and that’s just Federer. Judging by my traffic numbers, it will be a bit longer before all of those have been tried.

Every filter is there for a reason–an attempt to answer some meaningful question about a player. But the vast majority of those five trillion queries settle debates that no one in their right mind would ever have, like Roger’s 2010 hard-court Masters record when winning a set 6-1 against a player outside the top 10. (He was 2-0.)

The danger in having all these answers is that it can be tempting to pretend we were asking the questions–or worse, that we were asking the questions and suspected all along that the answers would turn out this way.

The Hawkeye data on tennis broadcasts is a great example. When a graphic shows us the trajectory of several serves, or the path of the ball over every shot of a rally, we’re looking at an enormous amount of raw data, more than most of us could comprehend if it weren’t presented against the familiar backdrop of a tennis court. Given all those answers, our first instinct is too often to seek evidence for something we were already pretty sure about–that Jack Sock’s topspin is doing the damage, or Rafael Nadal’s second serve is attackable.

It’s tough to argue with those kind of claims, especially when a high-tech graphic appears to serve as confirmation. But while those graphics (or those results of long-tail Tennis Abstract queries) are “answers,” they address only narrow questions, rarely proving the points we pretend they do.

These narrow answers are merely jumping-off points for meaningful questions. Instead of looking at a breakdown of Novak Djokovic’s backhands over the course of a match and declaring, “I knew it, his down-the-line backhand is the best in the game,” we should realize we’re looking at a small sample, devoid of context, and take the opportunity to ask, “Is his down-the-line backhand always this good?” or “How does his down-the-line backhand compare to others?” Or even, “How much does a down-the-line backhand increase a player’s odds of winning a rally?”

Unfortunately, the discussion usually stops before a meaningful question is ever asked. Even without publicly released Hawkeye data, we’re beginning to have the necessary data to research many of these questions.

As much as we love to complain about the dearth of tennis analytics, too many people draw conclusions from the pseudo-answers of fancy graphics. With more data available to us than ever before, it is a shame to mistake narrow, facile answers for broad, meaningful ones.

The Pervasive Role of Luck in Tennis

No matter what the scale, from a single point to a season-long ranking–even to a career–luck plays a huge role in tennis. Sometimes good luck and bad luck cancel each other out, as is the case when two players benefit from net cord winners in the same match. But sometimes luck spawns more of the same, giving fortunate players opportunities that, in turn, make them more fortunate still.

Usually, we refer to luck only in passing, as one possible explanation for an isolated phenomenon. It’s important that we examine them in conjunction with each other to get a better sense of just how much of a factor luck can be.

Single points

Usually, we’re comfortable saying that the results of individual points are based on skill. Occasionally, though, something happens to give the point to an undeserving player. The most obvious examples are points heavily influenced by a net cord or a bad bounce off an uneven surface, but there are others.

Officiating gets in the way, too. A bad call that the chair umpire doesn’t overturn can hand a point to the wrong player. Even if the chair umpire (or Hawkeye) does overrule a bad call, it can result in the point being replayed–even if one player was completely in control of the point.

We can go a bit further into the territory of “lucky shots,” including successful mishits, or even highlight-reel tweeners that a player could never replicate. While the line between truly lucky shots and successful low-percentage shots is an ambiguous one, we should remember that in the most extreme cases, skill isn’t the only thing determining the outcome of the point.

Lucky matches

More than 5% of matches on the ATP tour this year have been won by a player who failed to win more than half of points played. Another 25% were won by a player who failed to win more than 53% of points–a range that doesn’t guarantee victory.

Depending on what you think about clutch and momentum in tennis, you might not view some–or even any–of those outcomes as lucky. If a player converts all five of his break point opportunities and wins a match despite only winning 49% of total points, perhaps he deserved it more. The same goes for strong performance in a tiebreaks, another cluster of high-leverage points that can swing a match away from the player who won more points.

But when the margins are so small that executing at just one or two key moments can flip the result–especially when we know that points are themselves influenced by luck–we have to view at least some of these tight matches as having lucky outcomes. We don’t have to decide which is which, we simply need to acknowledge that some matches aren’t won by the better player, even if we use the very loose definition of “better player that day.”

Longer-term luck

Perhaps the most obvious manifestation of luck in tennis is in the draw each week. An unseeded player might start his tournament with an unwinnable match against a top seed or with a cakewalk against a low-ranked wild card. Even seeded players can be affected by fortune, depending on which unseeded players they draw, along with which fellow seeds they will face at which points in the match.

Another form of long-term luck–which is itself affected by draw luck–is what we might call “clustering.” A player who goes 20-20 on a season by winning all of his first-round matches and losing all of his second-round matches will not fare nearly as well in terms of rankings or prize money as someone who goes 20-20 by winning only 10 first-round matches, but reaching the third round every time he does.

Again, this may not be entirely luck–this sort of player would quickly be labeled “streaky,” but combined with draw luck, he might simply be facing players he can beat in clusters, instead of getting easy first-rounders and difficult second-rounders.

The Matthew effect

All of these forms of tennis-playing fortune are in some way related. The sociologist Robert Merton coined the term “Matthew effect“–alternatively known as the principle of cumulative advantage–to refer to situations where one entity with a very small advantage will, by the very nature of a system, end up with a much larger advantage.

The Matthew effect applies to a wide range of phenomena, and I think it’s instructive here. Consider the case of two players separated by only a few points in the rankings–a margin that could have come about by pure luck: for instance, when one player won a match by walkover. One of these players gets the 32nd seed at the Australian Open and the other is unseeded.

These two players–who are virtually indistinguishable, remember–face very different challenges. One is guaranteed two matches against unseeded opponents, while the other will almost definitely face a seed before the third round, perhaps even a high seed in the first. The unseeded player might get lucky, either in his draw or in his matches, cancelling out the effect of the seeding, but it’s more likely that the seeded player will walk away from the tournament with more points, solidifying the higher ranking–that he didn’t earn in the first place.

Making and breaking careers

The Matthew effect can have an impact on an even broader scale. Today’s tennis pros have been training and competing from a young age, and most of them have gotten quite a bit of help along the way, whether it’s the right coach, support from a national federation, or well-timed wild cards.

It’s tough to quantify things like the effect of a good or bad coach at age 15, but wild cards are a more easily understood example of the phenomenon. The unlucky unseeded player I discussed above at least got to enter the tournament. But when a Grand Slam-hosting federation decides which promising prospect gets a wild card, it’s all or nothing: One player gets a huge opportunity (cash and ranking points, even if they lose in the first round!) while the other one gets nothing.

This, in a nutshell, is why people like me spend so much time on our hobby horses ranting about wild cards. It isn’t the single tournament entry that’s the problem, it’s the cascading opportunities it can generate. Sure, sometimes it turns into nothing–Ryan Harrison’s career is starting to look that way–but even in those cases, we never hear about the players who didn’t get the wild cards, the ones who never had the chance to gain from the cumulative advantage of a small leg up.

Why all this luck matters

If you’re an avid tennis fan, most of this isn’t news to you. Sure, players face good and bad breaks, they get good and bad draws, and they’ve faced uneven challenges along the way.

By discussing all of these types of fortune in one place, I hope to emphasize just how much luck plays a part in our estimate of each player at any given time. It’s no accident that mid-range players bounce around the rankings so much. Some of them are truly streaky, and injuries play a part, but much of the variance can be explained by these varying forms of luck. The #30 player in the rankings is probably better than the #50 player, but it’s no guarantee. It doesn’t take much misfortune–especially when bad luck starts to breed more opportunities for bad luck–to tumble down the list.

Even if many of the forms of luck I’ve discussed are truly skill-based and, say, break point conversions are a matter of someone playing better that day, the evidence generally shows that major rises and falls in things like tiebreak winning percentage and break point conversion rates are temporary–they don’t persist from year to year. That may not be properly classed as luck, but if we’re projecting the rankings a year from now, it might as well be.

While match results, tournament outcomes, and the weekly rankings are written in stone, the way that players get there is not nearly so clear. We’d do well to accept that uncertainty.

Toward Atomic Statistics

The other day, Roger Federer mentioned in a press conference that he’s “never been a big stat guy.”  And why would he be?  Television commentators and the reporters asking him post-match questions tend to harp on the same big-picture numbers, like break points converted and 2nd-serve points won.

In other words, statistics that look better when you’re winning points.  How’s that for cutting edge insight: You get better results when you win more points.  If I were in Fed’s position, I wouldn’t be a “big stat guy” either.

To the extent statistics have the potential to tell us about a particular player’s performance, we need to look at numbers that each player can control as much as possible.  Ace counts–though they are affected by returners to a limited extent–are an example of one of the few commonly-tracked stats that directly reflect an aspect of a player’s performance.  You can have a big serving day with not too many aces and a mediocre serving day with more, but for the most part, lots of aces means you’re serving well.  Lots of double faults means you’re not.

By contrast, think about points won on second serve, a favorite among the commentariat.  That statistic may weakly track second serve quality, but it also factors the returner’s second serve returns, as well as both player’s performance in rallies that begin close to an even keel.  It provides fodder for discussion, but it certainly doesn’t offer anything actionable for a player, or an explanation of exactly what either player did well in the match.

Atomic statistics

Aces and double faults are a decent proxy for performance on serve.  (It would be nice to have unreturnables as well, since they have more in common with aces than they do with serves that are returned, however poorly.)

But what about every other shot?  What about specific strategies?

An obvious example of a base-level stat we should be counting is service return depth.  Yes, it’s affected by how well the opponent serves, but it refers to a single shot type, and one upon which the outcome of a match can hinge.  It can be clearly defined, and it’s actionable.  Fail to get a reasonable percentage of service returns past the service line, and a good player will beat you.  Put a majority of service returns in the backmost quarter of the court, and you’re neutralizing much of the server’s advantage.

Here are more atomic statistics with the same type of potential:

  • Percentage of service returns chipped or sliced.
  • Percentage of backhands chipped or sliced.
  • Serves (and other errors) into the net, as opposed to other types of errors.
  • Variety of direction on each shot, e.g. backhands down the line compared to backhands crosscourt and down the middle.
  • Net approaches
  • Drop shot success rate (off of each wing).

Two commonly-counted statistics, unforced errors and winners, have many characteristics in common with these atomic stats, but are insufficiently specific.  Sure, knowing a player’s winner/ufe rate for a match is some indication of how well he or she played, but what’s the takeaway? Federer needs to be less sloppy? He needs to hit more winners?  Once again, it’s easy to see why players aren’t clamoring to hear these numbers.  No baseball pitcher benefits from learning he should give up fewer runs, or a hockey goaltender that he needs to allow fewer goals.

Glimmers of hope

With full access to Hawkeye data, this sort of analysis (and much, much more) is within reach.  Even if Hawkeye material remains mostly impenetrable, the recent announcement from SAP and the WTA holds out hope for more granular tennis data.

In the meantime, we’ll have to count this stuff ourselves.