The Pervasive Role of Luck in Tennis

No matter what the scale, from a single point to a season-long ranking–even to a career–luck plays a huge role in tennis. Sometimes good luck and bad luck cancel each other out, as is the case when two players benefit from net cord winners in the same match. But sometimes luck spawns more of the same, giving fortunate players opportunities that, in turn, make them more fortunate still.

Usually, we refer to luck only in passing, as one possible explanation for an isolated phenomenon. It’s important that we examine them in conjunction with each other to get a better sense of just how much of a factor luck can be.

Single points

Usually, we’re comfortable saying that the results of individual points are based on skill. Occasionally, though, something happens to give the point to an undeserving player. The most obvious examples are points heavily influenced by a net cord or a bad bounce off an uneven surface, but there are others.

Officiating gets in the way, too. A bad call that the chair umpire doesn’t overturn can hand a point to the wrong player. Even if the chair umpire (or Hawkeye) does overrule a bad call, it can result in the point being replayed–even if one player was completely in control of the point.

We can go a bit further into the territory of “lucky shots,” including successful mishits, or even highlight-reel tweeners that a player could never replicate. While the line between truly lucky shots and successful low-percentage shots is an ambiguous one, we should remember that in the most extreme cases, skill isn’t the only thing determining the outcome of the point.

Lucky matches

More than 5% of matches on the ATP tour this year have been won by a player who failed to win more than half of points played. Another 25% were won by a player who failed to win more than 53% of points–a range that doesn’t guarantee victory.

Depending on what you think about clutch and momentum in tennis, you might not view some–or even any–of those outcomes as lucky. If a player converts all five of his break point opportunities and wins a match despite only winning 49% of total points, perhaps he deserved it more. The same goes for strong performance in a tiebreaks, another cluster of high-leverage points that can swing a match away from the player who won more points.

But when the margins are so small that executing at just one or two key moments can flip the result–especially when we know that points are themselves influenced by luck–we have to view at least some of these tight matches as having lucky outcomes. We don’t have to decide which is which, we simply need to acknowledge that some matches aren’t won by the better player, even if we use the very loose definition of “better player that day.”

Longer-term luck

Perhaps the most obvious manifestation of luck in tennis is in the draw each week. An unseeded player might start his tournament with an unwinnable match against a top seed or with a cakewalk against a low-ranked wild card. Even seeded players can be affected by fortune, depending on which unseeded players they draw, along with which fellow seeds they will face at which points in the match.

Another form of long-term luck–which is itself affected by draw luck–is what we might call “clustering.” A player who goes 20-20 on a season by winning all of his first-round matches and losing all of his second-round matches will not fare nearly as well in terms of rankings or prize money as someone who goes 20-20 by winning only 10 first-round matches, but reaching the third round every time he does.

Again, this may not be entirely luck–this sort of player would quickly be labeled “streaky,” but combined with draw luck, he might simply be facing players he can beat in clusters, instead of getting easy first-rounders and difficult second-rounders.

The Matthew effect

All of these forms of tennis-playing fortune are in some way related. The sociologist Robert Merton coined the term “Matthew effect“–alternatively known as the principle of cumulative advantage–to refer to situations where one entity with a very small advantage will, by the very nature of a system, end up with a much larger advantage.

The Matthew effect applies to a wide range of phenomena, and I think it’s instructive here. Consider the case of two players separated by only a few points in the rankings–a margin that could have come about by pure luck: for instance, when one player won a match by walkover. One of these players gets the 32nd seed at the Australian Open and the other is unseeded.

These two players–who are virtually indistinguishable, remember–face very different challenges. One is guaranteed two matches against unseeded opponents, while the other will almost definitely face a seed before the third round, perhaps even a high seed in the first. The unseeded player might get lucky, either in his draw or in his matches, cancelling out the effect of the seeding, but it’s more likely that the seeded player will walk away from the tournament with more points, solidifying the higher ranking–that he didn’t earn in the first place.

Making and breaking careers

The Matthew effect can have an impact on an even broader scale. Today’s tennis pros have been training and competing from a young age, and most of them have gotten quite a bit of help along the way, whether it’s the right coach, support from a national federation, or well-timed wild cards.

It’s tough to quantify things like the effect of a good or bad coach at age 15, but wild cards are a more easily understood example of the phenomenon. The unlucky unseeded player I discussed above at least got to enter the tournament. But when a Grand Slam-hosting federation decides which promising prospect gets a wild card, it’s all or nothing: One player gets a huge opportunity (cash and ranking points, even if they lose in the first round!) while the other one gets nothing.

This, in a nutshell, is why people like me spend so much time on our hobby horses ranting about wild cards. It isn’t the single tournament entry that’s the problem, it’s the cascading opportunities it can generate. Sure, sometimes it turns into nothing–Ryan Harrison’s career is starting to look that way–but even in those cases, we never hear about the players who didn’t get the wild cards, the ones who never had the chance to gain from the cumulative advantage of a small leg up.

Why all this luck matters

If you’re an avid tennis fan, most of this isn’t news to you. Sure, players face good and bad breaks, they get good and bad draws, and they’ve faced uneven challenges along the way.

By discussing all of these types of fortune in one place, I hope to emphasize just how much luck plays a part in our estimate of each player at any given time. It’s no accident that mid-range players bounce around the rankings so much. Some of them are truly streaky, and injuries play a part, but much of the variance can be explained by these varying forms of luck. The #30 player in the rankings is probably better than the #50 player, but it’s no guarantee. It doesn’t take much misfortune–especially when bad luck starts to breed more opportunities for bad luck–to tumble down the list.

Even if many of the forms of luck I’ve discussed are truly skill-based and, say, break point conversions are a matter of someone playing better that day, the evidence generally shows that major rises and falls in things like tiebreak winning percentage and break point conversion rates are temporary–they don’t persist from year to year. That may not be properly classed as luck, but if we’re projecting the rankings a year from now, it might as well be.

While match results, tournament outcomes, and the weekly rankings are written in stone, the way that players get there is not nearly so clear. We’d do well to accept that uncertainty.

Toward Atomic Statistics

The other day, Roger Federer mentioned in a press conference that he’s “never been a big stat guy.”  And why would he be?  Television commentators and the reporters asking him post-match questions tend to harp on the same big-picture numbers, like break points converted and 2nd-serve points won.

In other words, statistics that look better when you’re winning points.  How’s that for cutting edge insight: You get better results when you win more points.  If I were in Fed’s position, I wouldn’t be a “big stat guy” either.

To the extent statistics have the potential to tell us about a particular player’s performance, we need to look at numbers that each player can control as much as possible.  Ace counts–though they are affected by returners to a limited extent–are an example of one of the few commonly-tracked stats that directly reflect an aspect of a player’s performance.  You can have a big serving day with not too many aces and a mediocre serving day with more, but for the most part, lots of aces means you’re serving well.  Lots of double faults means you’re not.

By contrast, think about points won on second serve, a favorite among the commentariat.  That statistic may weakly track second serve quality, but it also factors the returner’s second serve returns, as well as both player’s performance in rallies that begin close to an even keel.  It provides fodder for discussion, but it certainly doesn’t offer anything actionable for a player, or an explanation of exactly what either player did well in the match.

Atomic statistics

Aces and double faults are a decent proxy for performance on serve.  (It would be nice to have unreturnables as well, since they have more in common with aces than they do with serves that are returned, however poorly.)

But what about every other shot?  What about specific strategies?

An obvious example of a base-level stat we should be counting is service return depth.  Yes, it’s affected by how well the opponent serves, but it refers to a single shot type, and one upon which the outcome of a match can hinge.  It can be clearly defined, and it’s actionable.  Fail to get a reasonable percentage of service returns past the service line, and a good player will beat you.  Put a majority of service returns in the backmost quarter of the court, and you’re neutralizing much of the server’s advantage.

Here are more atomic statistics with the same type of potential:

  • Percentage of service returns chipped or sliced.
  • Percentage of backhands chipped or sliced.
  • Serves (and other errors) into the net, as opposed to other types of errors.
  • Variety of direction on each shot, e.g. backhands down the line compared to backhands crosscourt and down the middle.
  • Net approaches
  • Drop shot success rate (off of each wing).

Two commonly-counted statistics, unforced errors and winners, have many characteristics in common with these atomic stats, but are insufficiently specific.  Sure, knowing a player’s winner/ufe rate for a match is some indication of how well he or she played, but what’s the takeaway? Federer needs to be less sloppy? He needs to hit more winners?  Once again, it’s easy to see why players aren’t clamoring to hear these numbers.  No baseball pitcher benefits from learning he should give up fewer runs, or a hockey goaltender that he needs to allow fewer goals.

Glimmers of hope

With full access to Hawkeye data, this sort of analysis (and much, much more) is within reach.  Even if Hawkeye material remains mostly impenetrable, the recent announcement from SAP and the WTA holds out hope for more granular tennis data.

In the meantime, we’ll have to count this stuff ourselves.

Why the ATP is More Popular Than the WTA

Last night, Fernando Gonzalez played the last match of his career.  Gonzo is a fan favorite, with a historically great forehand that propelled him to finals at the 2007 Australian Open and the 2008 Olympics.  He won tour-level titles over a ten-year span.

Next month, the man in the limelight will be Ivan Ljubicic.  He doesn’t exactly qualify as a “fan favorite,” but tennis aficionados have grown to appreciate his deadly service accuracy, beautiful one-handed backhand, and intelligence on and off the court.

Men’s tennis is in the age of the veteran.  Even though we’re talking about 20-somethings and a few 30-year-olds, virtually every player at the top of the game five years ago is still in the mix today.  With the exception of Andre Agassi, every top-ranked player from the ten years is still active.

And fans love veterans.  The current state of the ATP is tailor-made for fan interest.

There are two things going on here.  One is simply a matter of familiarity.  If you lost interest in tennis for the last five years, you might be surprised to find Mario Ancic out of the game, Arnaud Clement still in it, and Andy Roddick well out of the top ten, but the cast of characters would be immediately recognizable.  It’s like a television soap opera–you only have to watch an episode or two before you’re back in the swing of things.

The other factor is what we might call the “Agassi effect.”  In the late 80’s and early 90’s, Agassi was the stereotypical brash youngster, offending the effete and challenging Wimbledon’s all-white rule.  A decade and a half later, he was perhaps the most popular player in the game, the very picture of sportsmanship and class.  Few players undergo such a radical transformation in the eyes of the public, but the general direction is very common.

Only a few years ago, Rafael Nadal was a divisive figure, mocked by many for his sleeveless tops and bulging biceps.  More recently, Novak Djokovic was widely disliked.  I’m sure detractors are still out there, but they are much quieter.  Think back to the early days of just about any veteran’s career–Andy Roddick was exciting to American fans, objectionable to most everybody else.  Lleyton Hewitt was another Agassi, and he didn’t grow out of it as quickly.

Yet for all that, can you think of a player who has gotten less popular as he ages?  Perhaps this phenomenon is unique to individual sports.  In team sports, some figures seem to attract fans, but others lose them, as they sign mega-contracts with new teams, becoming viewed as sellouts.  (Or worse, if they take the mega-contract, then never perform as well again.)

The phenomenon of gaining fans with age isn’t limited to men–veteran WTA players experience it, as well.  It seems like Kim Clijsters was better loved upon her return to the game than she was the day she retired.  Even the Williams sisters seem to have fewer detractors these days than they did several years ago.  But while the WTA has its share of vets, it has far fewer players who have persisted at the top of the game.

Only two players from the 2007 year-end top ten (Maria Sharapova and Marion Bartoli) are in the top ten of today’s WTA rankings.  Most of the WTA’s vets have hung around on the fringes of the game’s best for years.  Li Na, Sam Stosur, and Vera Zvonareva have all given us their share of highlights, but to extend my soap opera analogy, they are peripheral characters who star in a few episodes, only to disappear into the background again.  Someone who hasn’t watched women’s tennis for a few years would have a hard time catching up.

Of course, none of this is to say that men’s tennis is inherently better.  At various times in the past, the WTA has had a stronger stable of perennial stars, and when that is the case, it rakes in the ratings.  Victoria Azarenka may not be as obviously bankable as a charmer like Caroline Wozniacki or a cover girl like Maria Sharapova, but by winning consistently, she gives the women’s game a head start toward developing what the ATP possesses right now.  If a few other players rise to the challenge for more than a couple months at a time, we might do more than just talk about Djokovic, Federer, and Nadal all the time.

What Does the “Hot Hand” Mean in Tennis?

In sports analytics, the topic of streakiness–the “hot hand“–is a popular one. Nearly everyone believes it exists, that players (or even teams) can go on a hot or cold streak, during which they temporarily play above or below their true level.

To a certain extent, streakiness is inevitable–if you flip a coin 100 times, you’ll see segments of 5 or 10 flips in which most of the flips are heads. That’s not because the coin suddenly got “better,” it’s a natural occurence over a long enough time span. So if you watch an entire tennis match, there are bound to be games where one player seems to be performing better than usual, perhaps stringing together several aces or exceptional winners.

The question, then, is whether a player is more streaky than would occur purely at random. To take just one example, let’s say a player hits aces on 10% of service points. If he did occasionally serve better than usual, we would observe that after he hits one ace, he is more likely (say, 15% or 20%) to hit another ace. A missed first or second serve might make it more likely than he misses his next try.

My last couple of topics–differences in the deuce/ad court, and the “reverse hot hand” at 30-40–have hinted that tennis may be structured in a way that prevents players from getting hot.

One of the most popular subjects for hot hand research is basketball free throw shooting. Researchers like it because it’s as close as basketball players get to a laboratory: every shot is from the same distance, there’s no defensive quality to consider, and even better, players usually get two tries, one right after the other. There’s nothing like it in tennis.

The one thing that seems a bit akin to free throw shooting is serving, especially for more dominant players. John Isner, Roger Federer, and Milos Raonic seem to go on serving streaks; certainly they can play game after game and control play with unreturnable serving. But when we look closer, their experience is much more nuanced. As we’ve seen, players generally are better in the deuce or ad court. It would be as if basketball player shot one free throw, then took two steps to the left and one step forward before attempting his next shot.

And, of course, there’s another player on the court. If Federer uses a relatively slow serve out wide in the deuce court for a service winner at 15-15, he is much less likely to use the same tactic at 30-30 or 40-15. Even if he was capable of hitting 50 perfect serves of that nature, he would never do so in a match. If it has any relevance for professional tennis, the hot hand must refer to something broader than a single skill.

On a more general level, the rules of tennis involve alternation more than more sports. Sure, most sports give the ball to the other team after a goal, but the length of possession–or in baseball, the length of an inning–can vary widely. In tennis, you can only add one game to your tally before handing the ball to your opponent. And even within that game, you are constantly moving from your stronger court to your weaker court; your opponent might be doing the same.

My question to you is this: If there is a hot hand in tennis, where would you expect to find it? Consecutive aces? Aces specifically in the deuce court? Service winners? Short service points? Points won? Return points won? Games won? First serves in? Point-ending winners? Avoidance of unforced errors? It’s possible that any or all of these things could occur in bunches, but which of them would indicate what we think of as a tennis player on a hot streak?

The Problem With “Unforced Errors”

In any sport, there are a handful of stats that are frequently cited, but are ultimately of limited use.  Often, these statistics tell you something, but are misunderstood to imply something more.  Simple examples are many “counting” stats — points scored in basketball, touchdowns thrown in football, RBI in baseball.  In all of those cases, they indicate something good, but don’t give you context — lots of field goal attempts, a great offensive line, or good hitters on base in front of you, to take those three cases.

The stat in tennis that aggravates me most is the unforced error.  Not only does it ignore some important context (as in the other-sport stats I just mentioned), but it relies on the judgment of a scorer.


The second problem is the more problematic one.  How much does a number mean if two people watching the same match wouldn’t come up with the same result?  This was a hot-button issue during Wimbledon, when the scorers were assigning an unusually small number of UEs, especially on serve returns.

If you’re watching the match, you might not notice.  If the end-of-set stats show that Nadal had 8 UEs and Federer had 17, that does tell you something … Federer was making more obvious mistakes.  But if you want to compare that to a Nadal/Federer match three weeks ago, or last year, those numbers are all but useless.

I suspect that, at events like Wimbledon, someone from the ITF, or maybe IBM, is giving standardized instructions to scorers with general rules for categorizing errors.  That would be a good start, especially if it were implemented across all tournaments at all professional levels.

…but it doesn’t matter

I suspect that no matter how consistent scorers are, the distinction between “unforced” and “forced” errors will always be arbitrary.  Consider the case of service returns.  There are occasional points, especially on second serve returns, where the returning player misses an easy shot.  But more frequently, the returning player is immediately on defense.  When is an error “unforced” on the return of a 130 mile-per-hour shot?

Ultimately, we will probably have computerized systems that classify errors for us.  If you have all the necessary data and crunch the numbers, a 125-mph serve down the T in the ad court might be returned 60% of the time, meaning there is a 40% chance of an error or non-return.  With those numbers on every serve (and every other shot, eventually), we could set the line for an “unforced” error on a shot that the average top-100 player would make, say, 75% of the time.  Or we could have different classifications: “unforced errors,” “disastrous errors,” “mildly forced errors,” and so on, indicating different percentage ranges.

The problem we have now is that professionals are so good (and their equipment is so advanced), that almost every shot can be offensive, meaning that players are almost always–to some extent–on defense.  If you’re rallying with Nadal, you might hit some winners, but you’re always fighting the spin.  If you’re rallying with Federer, the spin isn’t so bad, but you’re always trying to keep the ball away from his forehand.  (If you’re rallying with Djokovic, you’re wishing you had hit a better serve.)  That perpetual semi-defensive posture means that nearly every error is, to some extent, forced.  And because players are so good, we expect them to return every reachable ball, suggesting that nearly every error is, to some extent, unforced.


The wisdom of baseball analysts

A very similar problem arises in baseball.  If a fielder makes a misplay (according to the official scorer), he is charged with an “error.”  Paradoxically, some of the best fielders end up with the highest error totals.  If, say, a shortstop has great range, he’ll reach a lot of groundballs, and have more chances to make bad throws, thus racking up the errors.

For decades, fans considered errors to be the standard measure of defensive prowess–a stat called “fielding percentage” measures the ratio of plays-successfully-made to chances.   (In other words, 1 minus “error rate.”)  But because of the paradox mentioned above, the highest fielding percentages do not necessarily belong to the best fielders.

The solution: Ignore errors, look only at plays made.  (This is an oversimplification, but not by much.)  If Shortstop A makes more plays than Shortstop B, it doesn’t matter whether A makes more errors.  The guy you want on your team is the one who makes more plays.

Essentially, baseball errors correspond to tennis unforced errors, and baseball plays-not-made (shortstop dives for the ball and can’t reach it) correspond to tennis forced errors.  The stat that ends up mattering to baseball analysts–“plays made”–corresponds to “shots successfully returned.”  The analogy is imperfect, but it illustrates the problem with separating one type of non-play from another.


If we don’t distinguish between different types of errors, we’re left with “shots made” and “shots not made,” or–even less satisfactorily–“points won” and “points lost.”  Not exactly a step in the right direction, since we’re already counting points!

Still, I suspect it’s better to have no stat than to have a misleading stat.  Rally counts are a positive step, since we can look at outcomes for different types of points.  If you win a lot of 10-or-more-stroke rallies, that identifies you as a certain type of player (or playing a certain kind of match).  It doesn’t matter whether you lose that sort of point on an unforced error or your opponent’s winner–both outcomes might stem from the same tactical mistake three or four strokes sooner.

Either that, or we can wait until we can calculate real-time win probability and start categorizing errors with extreme precision.  “Unforced errors” aren’t going away any time soon, but as fans, we can be smarter about how much attention we grant to individual numbers.