The Continuum of Errors

When is an error unforced? If you envision designing an algorithm to answer that question, it quickly becomes unmanageable. You’d need to take into account player position, shot velocity, angle, and spin, surface speed, and perhaps more. Many errors are obviously forced or unforced, but plenty fall into an ambiguous middle ground.

Most of the unforced error counts we see these days–via broadcasts or in post-match recaps–are counted by hand. A scorer is given some guidance, and he or she tallies each kind of error. If the human-scoring algorithm is boiled down to a single rule, it’s something like: “Would a typical pro be expected to make that shot?” Some scorers limit the number of unforced errors by always counting serve returns, or net shots, or attempted passing shots, as forced.

Of course, any attempt to sort missed shots into only two buckets is a gross oversimplification. I don’t think this is a radical viewpoint. Many tennis commentators acknowledge this when they explain that a player’s unforced error count “doesn’t tell the whole story,” or something to that effect. In the past, I’ve written about the limitations of the frequently-cited winner-to-unforced error ratio, and the similarity between unforced errors and the rightly-maligned fielding errors stat in baseball.

Imagine for a moment that we have better data to work with–say, Hawkeye data that isn’t locked in silos–and we can sketch out an improved way of looking at errors.

First, instead of classifying only errors, it’s more instructive to sort potential shots into three categories: shots returned in play, errors (which we can further distinguish later on), and opponent winners. In other words: Did you make it, did you miss it, or did you fail to even get a racket on it? One man’s forced error is another man’s ball put back in play*, so we need to consider the full range of possible outcomes from each potential shot.

*especially if the first man is Bernard Tomic and the other man is Andy Murray.

The key to gaining insight from tennis statistics is increasing the amount of context available–for instance, taking a player’s stats from today and comparing them to the typical performance of a tour player, or contrasting them with how he or she played in the last similar matchup. Errors are no different.

Here’s a basic example. In the sixth game of Angelique Kerber‘s match in Sydney this week against Darya Kasatkina, she hit a down-the-line forehand:

Kerber hits a down-the-line forehand

Thanks to the Match Charting Project, we have data for about 350 of Kerber’s down-the-line forehands, so we know it goes for a winner 25% of the time, and her opponent hits a forced error another 9% of the time. Say that a further 11% turn into unforced errors, and we have a profile for what usually happens when Kerber goes down the line: 25% winners, 20% errors, 55% put back in play. We might dig even deeper and establish that the 55% put back in play consists of 30% that ultimately resulted in Kerber winning the point against 25% that she eventually lost.

In this case, Kasatkina was able to get a racket on the ball, but missed the shot, resulting in what most scorers would agree was a forced error:

Kasatkina lunges for the return

This single instance–Kasatkina hitting a forced error against a very effective type of offensive shot–doesn’t tell us anything on its own. Imagine, though, that we tracked several players in 100 attempts each to reply to a Kerber down-the-line forehand. We might discover that Kasatkina lets 35 of 100 go for winners, or that Simona Halep lets only 15 go for winners and gets 70 back in play, or that Anastasia Pavlyuchenkova hits an error on 30 of the 100 attempts.

My point is this: With more granular data, we can put errors in a real-life context. Instead of making a judgment about the difficulty of a certain shot (or relying on a scorer to do so), it’s feasible to let an algorithm do the work on 100 shots, telling us whether a player is getting to more balls than the average player, or making more errors than she usually does.

The continuum, and the future

In the example outlined above, there’s a lot of important details that I didn’t mention. In comparing Kasatkina’s error to a few hundred other down-the-line Kerber forehands, we don’t know whether the shot was harder than usual, whether it was placed more accurately in the corner, whether Kasatkina was in better position than Kerber’s typical opponent on that type of shot, or the speed of the surface. Over the course of 100 down-the-line forehands, those factors would probably even out. But in Tuesday’s match, Kerber hit only 18 of them. While a typical best-of-three match will give us a few hundred shots to work with, this level of analysis can only tell us so much about specific shots.

The ideal error-classifying algorithm of the future would do much better. It would take all of the variables I’ve mentioned (and more, undoubtedly) and, for any shot, calculate the likelihood of different outcomes. At the moment of the first image above, when the ball has just come off of Kerber’s racket, with Kasatkina on the wrong half of the baseline, we might estimate that there is a 35% chance of a winner, a 25% chance of an error, and a 40% chance that ball is returned in play. Depending on the type of analysis we’re doing, we could calculate those numbers for the average WTA player, or for Kasatkina herself.

Those estimates would allow us, in effect, to “rate” errors. In this example, the algorithm gives Kasatkina only a 40% chance of getting the ball back in play. By contrast, an average rallying shot probably has a 90% chance of ending up back in play. Instead of placing errors in buckets of “forced” and “unforced,” we could draw lines wherever we wish, perhaps separating potential shots into quintiles. We would be able to quantify whether, for instance, Andy Murray gets more of the most unreturnable shots back in play than Novak Djokovic does. Even if we have an intuition about that already, we can’t even begin to prove it until we’ve established precisely what that “unreturnable” quintile (or quartile, or whatever) consists of.

This sort of analysis would be engaging even for those fans who never look at aggregate stats. Imagine if a broadcaster could point to a specific shot and say that Murray had only a 2% chance of putting it back in play. In topsy-turvy rallies, this approach could generate a win probability graph for a single point, an image that could encapsulate just how hard a player worked to come back from the brink.

Fortunately, the technology to accomplish this is already here. Researchers with access to subsets of Hawkeye data have begun drilling down to the factors that influence things like shot selection. Playsight’s “SmartCourts” classify errors into forced and unforced in close to real time, suggesting that there is something much more sophisticated running in the background, even if its AI occasionally makes clunky mistakes. Another possible route is applying existing machine learning algorithms to large quantities of match video, letting the algorithms work out for themselves which factors best predict winners, errors, and other shot outcomes.

Someday, tennis fans will look back on the early 21st century and marvel at just how little we knew about the sport back then.

All the Answers

At the end of Turing’s Cathedral, George Dyson suggests that while computers aren’t always able to usefully respond to our questions, they are able to generate a stunning, unprecedented array of answers–even if the corresponding questions have never been asked.

Think of a search engine: It has indexed every possible word and phrase, in many cases still waiting for the first user to search for it.

Tennis Abstract is no different. Using the menus on the left-hand side of Roger Federer’s page–even ignoring the filters for head-to-heads, tournaments, countries, matchstats, and custom settings like those for date and rank–you can run five trillion different queries. That’s twelve zeroes–and that’s just Federer. Judging by my traffic numbers, it will be a bit longer before all of those have been tried.

Every filter is there for a reason–an attempt to answer some meaningful question about a player. But the vast majority of those five trillion queries settle debates that no one in their right mind would ever have, like Roger’s 2010 hard-court Masters record when winning a set 6-1 against a player outside the top 10. (He was 2-0.)

The danger in having all these answers is that it can be tempting to pretend we were asking the questions–or worse, that we were asking the questions and suspected all along that the answers would turn out this way.

The Hawkeye data on tennis broadcasts is a great example. When a graphic shows us the trajectory of several serves, or the path of the ball over every shot of a rally, we’re looking at an enormous amount of raw data, more than most of us could comprehend if it weren’t presented against the familiar backdrop of a tennis court. Given all those answers, our first instinct is too often to seek evidence for something we were already pretty sure about–that Jack Sock’s topspin is doing the damage, or Rafael Nadal’s second serve is attackable.

It’s tough to argue with those kind of claims, especially when a high-tech graphic appears to serve as confirmation. But while those graphics (or those results of long-tail Tennis Abstract queries) are “answers,” they address only narrow questions, rarely proving the points we pretend they do.

These narrow answers are merely jumping-off points for meaningful questions. Instead of looking at a breakdown of Novak Djokovic’s backhands over the course of a match and declaring, “I knew it, his down-the-line backhand is the best in the game,” we should realize we’re looking at a small sample, devoid of context, and take the opportunity to ask, “Is his down-the-line backhand always this good?” or “How does his down-the-line backhand compare to others?” Or even, “How much does a down-the-line backhand increase a player’s odds of winning a rally?”

Unfortunately, the discussion usually stops before a meaningful question is ever asked. Even without publicly released Hawkeye data, we’re beginning to have the necessary data to research many of these questions.

As much as we love to complain about the dearth of tennis analytics, too many people draw conclusions from the pseudo-answers of fancy graphics. With more data available to us than ever before, it is a shame to mistake narrow, facile answers for broad, meaningful ones.

The Pervasive Role of Luck in Tennis

No matter what the scale, from a single point to a season-long ranking–even to a career–luck plays a huge role in tennis. Sometimes good luck and bad luck cancel each other out, as is the case when two players benefit from net cord winners in the same match. But sometimes luck spawns more of the same, giving fortunate players opportunities that, in turn, make them more fortunate still.

Usually, we refer to luck only in passing, as one possible explanation for an isolated phenomenon. It’s important that we examine them in conjunction with each other to get a better sense of just how much of a factor luck can be.

Single points

Usually, we’re comfortable saying that the results of individual points are based on skill. Occasionally, though, something happens to give the point to an undeserving player. The most obvious examples are points heavily influenced by a net cord or a bad bounce off an uneven surface, but there are others.

Officiating gets in the way, too. A bad call that the chair umpire doesn’t overturn can hand a point to the wrong player. Even if the chair umpire (or Hawkeye) does overrule a bad call, it can result in the point being replayed–even if one player was completely in control of the point.

We can go a bit further into the territory of “lucky shots,” including successful mishits, or even highlight-reel tweeners that a player could never replicate. While the line between truly lucky shots and successful low-percentage shots is an ambiguous one, we should remember that in the most extreme cases, skill isn’t the only thing determining the outcome of the point.

Lucky matches

More than 5% of matches on the ATP tour this year have been won by a player who failed to win more than half of points played. Another 25% were won by a player who failed to win more than 53% of points–a range that doesn’t guarantee victory.

Depending on what you think about clutch and momentum in tennis, you might not view some–or even any–of those outcomes as lucky. If a player converts all five of his break point opportunities and wins a match despite only winning 49% of total points, perhaps he deserved it more. The same goes for strong performance in a tiebreaks, another cluster of high-leverage points that can swing a match away from the player who won more points.

But when the margins are so small that executing at just one or two key moments can flip the result–especially when we know that points are themselves influenced by luck–we have to view at least some of these tight matches as having lucky outcomes. We don’t have to decide which is which, we simply need to acknowledge that some matches aren’t won by the better player, even if we use the very loose definition of “better player that day.”

Longer-term luck

Perhaps the most obvious manifestation of luck in tennis is in the draw each week. An unseeded player might start his tournament with an unwinnable match against a top seed or with a cakewalk against a low-ranked wild card. Even seeded players can be affected by fortune, depending on which unseeded players they draw, along with which fellow seeds they will face at which points in the match.

Another form of long-term luck–which is itself affected by draw luck–is what we might call “clustering.” A player who goes 20-20 on a season by winning all of his first-round matches and losing all of his second-round matches will not fare nearly as well in terms of rankings or prize money as someone who goes 20-20 by winning only 10 first-round matches, but reaching the third round every time he does.

Again, this may not be entirely luck–this sort of player would quickly be labeled “streaky,” but combined with draw luck, he might simply be facing players he can beat in clusters, instead of getting easy first-rounders and difficult second-rounders.

The Matthew effect

All of these forms of tennis-playing fortune are in some way related. The sociologist Robert Merton coined the term “Matthew effect“–alternatively known as the principle of cumulative advantage–to refer to situations where one entity with a very small advantage will, by the very nature of a system, end up with a much larger advantage.

The Matthew effect applies to a wide range of phenomena, and I think it’s instructive here. Consider the case of two players separated by only a few points in the rankings–a margin that could have come about by pure luck: for instance, when one player won a match by walkover. One of these players gets the 32nd seed at the Australian Open and the other is unseeded.

These two players–who are virtually indistinguishable, remember–face very different challenges. One is guaranteed two matches against unseeded opponents, while the other will almost definitely face a seed before the third round, perhaps even a high seed in the first. The unseeded player might get lucky, either in his draw or in his matches, cancelling out the effect of the seeding, but it’s more likely that the seeded player will walk away from the tournament with more points, solidifying the higher ranking–that he didn’t earn in the first place.

Making and breaking careers

The Matthew effect can have an impact on an even broader scale. Today’s tennis pros have been training and competing from a young age, and most of them have gotten quite a bit of help along the way, whether it’s the right coach, support from a national federation, or well-timed wild cards.

It’s tough to quantify things like the effect of a good or bad coach at age 15, but wild cards are a more easily understood example of the phenomenon. The unlucky unseeded player I discussed above at least got to enter the tournament. But when a Grand Slam-hosting federation decides which promising prospect gets a wild card, it’s all or nothing: One player gets a huge opportunity (cash and ranking points, even if they lose in the first round!) while the other one gets nothing.

This, in a nutshell, is why people like me spend so much time on our hobby horses ranting about wild cards. It isn’t the single tournament entry that’s the problem, it’s the cascading opportunities it can generate. Sure, sometimes it turns into nothing–Ryan Harrison’s career is starting to look that way–but even in those cases, we never hear about the players who didn’t get the wild cards, the ones who never had the chance to gain from the cumulative advantage of a small leg up.

Why all this luck matters

If you’re an avid tennis fan, most of this isn’t news to you. Sure, players face good and bad breaks, they get good and bad draws, and they’ve faced uneven challenges along the way.

By discussing all of these types of fortune in one place, I hope to emphasize just how much luck plays a part in our estimate of each player at any given time. It’s no accident that mid-range players bounce around the rankings so much. Some of them are truly streaky, and injuries play a part, but much of the variance can be explained by these varying forms of luck. The #30 player in the rankings is probably better than the #50 player, but it’s no guarantee. It doesn’t take much misfortune–especially when bad luck starts to breed more opportunities for bad luck–to tumble down the list.

Even if many of the forms of luck I’ve discussed are truly skill-based and, say, break point conversions are a matter of someone playing better that day, the evidence generally shows that major rises and falls in things like tiebreak winning percentage and break point conversion rates are temporary–they don’t persist from year to year. That may not be properly classed as luck, but if we’re projecting the rankings a year from now, it might as well be.

While match results, tournament outcomes, and the weekly rankings are written in stone, the way that players get there is not nearly so clear. We’d do well to accept that uncertainty.

Toward Atomic Statistics

The other day, Roger Federer mentioned in a press conference that he’s “never been a big stat guy.”  And why would he be?  Television commentators and the reporters asking him post-match questions tend to harp on the same big-picture numbers, like break points converted and 2nd-serve points won.

In other words, statistics that look better when you’re winning points.  How’s that for cutting edge insight: You get better results when you win more points.  If I were in Fed’s position, I wouldn’t be a “big stat guy” either.

To the extent statistics have the potential to tell us about a particular player’s performance, we need to look at numbers that each player can control as much as possible.  Ace counts–though they are affected by returners to a limited extent–are an example of one of the few commonly-tracked stats that directly reflect an aspect of a player’s performance.  You can have a big serving day with not too many aces and a mediocre serving day with more, but for the most part, lots of aces means you’re serving well.  Lots of double faults means you’re not.

By contrast, think about points won on second serve, a favorite among the commentariat.  That statistic may weakly track second serve quality, but it also factors the returner’s second serve returns, as well as both player’s performance in rallies that begin close to an even keel.  It provides fodder for discussion, but it certainly doesn’t offer anything actionable for a player, or an explanation of exactly what either player did well in the match.

Atomic statistics

Aces and double faults are a decent proxy for performance on serve.  (It would be nice to have unreturnables as well, since they have more in common with aces than they do with serves that are returned, however poorly.)

But what about every other shot?  What about specific strategies?

An obvious example of a base-level stat we should be counting is service return depth.  Yes, it’s affected by how well the opponent serves, but it refers to a single shot type, and one upon which the outcome of a match can hinge.  It can be clearly defined, and it’s actionable.  Fail to get a reasonable percentage of service returns past the service line, and a good player will beat you.  Put a majority of service returns in the backmost quarter of the court, and you’re neutralizing much of the server’s advantage.

Here are more atomic statistics with the same type of potential:

  • Percentage of service returns chipped or sliced.
  • Percentage of backhands chipped or sliced.
  • Serves (and other errors) into the net, as opposed to other types of errors.
  • Variety of direction on each shot, e.g. backhands down the line compared to backhands crosscourt and down the middle.
  • Net approaches
  • Drop shot success rate (off of each wing).

Two commonly-counted statistics, unforced errors and winners, have many characteristics in common with these atomic stats, but are insufficiently specific.  Sure, knowing a player’s winner/ufe rate for a match is some indication of how well he or she played, but what’s the takeaway? Federer needs to be less sloppy? He needs to hit more winners?  Once again, it’s easy to see why players aren’t clamoring to hear these numbers.  No baseball pitcher benefits from learning he should give up fewer runs, or a hockey goaltender that he needs to allow fewer goals.

Glimmers of hope

With full access to Hawkeye data, this sort of analysis (and much, much more) is within reach.  Even if Hawkeye material remains mostly impenetrable, the recent announcement from SAP and the WTA holds out hope for more granular tennis data.

In the meantime, we’ll have to count this stuff ourselves.

Why the ATP is More Popular Than the WTA

Last night, Fernando Gonzalez played the last match of his career.  Gonzo is a fan favorite, with a historically great forehand that propelled him to finals at the 2007 Australian Open and the 2008 Olympics.  He won tour-level titles over a ten-year span.

Next month, the man in the limelight will be Ivan Ljubicic.  He doesn’t exactly qualify as a “fan favorite,” but tennis aficionados have grown to appreciate his deadly service accuracy, beautiful one-handed backhand, and intelligence on and off the court.

Men’s tennis is in the age of the veteran.  Even though we’re talking about 20-somethings and a few 30-year-olds, virtually every player at the top of the game five years ago is still in the mix today.  With the exception of Andre Agassi, every top-ranked player from the ten years is still active.

And fans love veterans.  The current state of the ATP is tailor-made for fan interest.

There are two things going on here.  One is simply a matter of familiarity.  If you lost interest in tennis for the last five years, you might be surprised to find Mario Ancic out of the game, Arnaud Clement still in it, and Andy Roddick well out of the top ten, but the cast of characters would be immediately recognizable.  It’s like a television soap opera–you only have to watch an episode or two before you’re back in the swing of things.

The other factor is what we might call the “Agassi effect.”  In the late 80’s and early 90’s, Agassi was the stereotypical brash youngster, offending the effete and challenging Wimbledon’s all-white rule.  A decade and a half later, he was perhaps the most popular player in the game, the very picture of sportsmanship and class.  Few players undergo such a radical transformation in the eyes of the public, but the general direction is very common.

Only a few years ago, Rafael Nadal was a divisive figure, mocked by many for his sleeveless tops and bulging biceps.  More recently, Novak Djokovic was widely disliked.  I’m sure detractors are still out there, but they are much quieter.  Think back to the early days of just about any veteran’s career–Andy Roddick was exciting to American fans, objectionable to most everybody else.  Lleyton Hewitt was another Agassi, and he didn’t grow out of it as quickly.

Yet for all that, can you think of a player who has gotten less popular as he ages?  Perhaps this phenomenon is unique to individual sports.  In team sports, some figures seem to attract fans, but others lose them, as they sign mega-contracts with new teams, becoming viewed as sellouts.  (Or worse, if they take the mega-contract, then never perform as well again.)

The phenomenon of gaining fans with age isn’t limited to men–veteran WTA players experience it, as well.  It seems like Kim Clijsters was better loved upon her return to the game than she was the day she retired.  Even the Williams sisters seem to have fewer detractors these days than they did several years ago.  But while the WTA has its share of vets, it has far fewer players who have persisted at the top of the game.

Only two players from the 2007 year-end top ten (Maria Sharapova and Marion Bartoli) are in the top ten of today’s WTA rankings.  Most of the WTA’s vets have hung around on the fringes of the game’s best for years.  Li Na, Sam Stosur, and Vera Zvonareva have all given us their share of highlights, but to extend my soap opera analogy, they are peripheral characters who star in a few episodes, only to disappear into the background again.  Someone who hasn’t watched women’s tennis for a few years would have a hard time catching up.

Of course, none of this is to say that men’s tennis is inherently better.  At various times in the past, the WTA has had a stronger stable of perennial stars, and when that is the case, it rakes in the ratings.  Victoria Azarenka may not be as obviously bankable as a charmer like Caroline Wozniacki or a cover girl like Maria Sharapova, but by winning consistently, she gives the women’s game a head start toward developing what the ATP possesses right now.  If a few other players rise to the challenge for more than a couple months at a time, we might do more than just talk about Djokovic, Federer, and Nadal all the time.

What Does the “Hot Hand” Mean in Tennis?

In sports analytics, the topic of streakiness–the “hot hand“–is a popular one. Nearly everyone believes it exists, that players (or even teams) can go on a hot or cold streak, during which they temporarily play above or below their true level.

To a certain extent, streakiness is inevitable–if you flip a coin 100 times, you’ll see segments of 5 or 10 flips in which most of the flips are heads. That’s not because the coin suddenly got “better,” it’s a natural occurence over a long enough time span. So if you watch an entire tennis match, there are bound to be games where one player seems to be performing better than usual, perhaps stringing together several aces or exceptional winners.

The question, then, is whether a player is more streaky than would occur purely at random. To take just one example, let’s say a player hits aces on 10% of service points. If he did occasionally serve better than usual, we would observe that after he hits one ace, he is more likely (say, 15% or 20%) to hit another ace. A missed first or second serve might make it more likely than he misses his next try.

My last couple of topics–differences in the deuce/ad court, and the “reverse hot hand” at 30-40–have hinted that tennis may be structured in a way that prevents players from getting hot.

One of the most popular subjects for hot hand research is basketball free throw shooting. Researchers like it because it’s as close as basketball players get to a laboratory: every shot is from the same distance, there’s no defensive quality to consider, and even better, players usually get two tries, one right after the other. There’s nothing like it in tennis.

The one thing that seems a bit akin to free throw shooting is serving, especially for more dominant players. John Isner, Roger Federer, and Milos Raonic seem to go on serving streaks; certainly they can play game after game and control play with unreturnable serving. But when we look closer, their experience is much more nuanced. As we’ve seen, players generally are better in the deuce or ad court. It would be as if basketball player shot one free throw, then took two steps to the left and one step forward before attempting his next shot.

And, of course, there’s another player on the court. If Federer uses a relatively slow serve out wide in the deuce court for a service winner at 15-15, he is much less likely to use the same tactic at 30-30 or 40-15. Even if he was capable of hitting 50 perfect serves of that nature, he would never do so in a match. If it has any relevance for professional tennis, the hot hand must refer to something broader than a single skill.

On a more general level, the rules of tennis involve alternation more than more sports. Sure, most sports give the ball to the other team after a goal, but the length of possession–or in baseball, the length of an inning–can vary widely. In tennis, you can only add one game to your tally before handing the ball to your opponent. And even within that game, you are constantly moving from your stronger court to your weaker court; your opponent might be doing the same.

My question to you is this: If there is a hot hand in tennis, where would you expect to find it? Consecutive aces? Aces specifically in the deuce court? Service winners? Short service points? Points won? Return points won? Games won? First serves in? Point-ending winners? Avoidance of unforced errors? It’s possible that any or all of these things could occur in bunches, but which of them would indicate what we think of as a tennis player on a hot streak?

The Problem With “Unforced Errors”

In any sport, there are a handful of stats that are frequently cited, but are ultimately of limited use.  Often, these statistics tell you something, but are misunderstood to imply something more.  Simple examples are many “counting” stats — points scored in basketball, touchdowns thrown in football, RBI in baseball.  In all of those cases, they indicate something good, but don’t give you context — lots of field goal attempts, a great offensive line, or good hitters on base in front of you, to take those three cases.

The stat in tennis that aggravates me most is the unforced error.  Not only does it ignore some important context (as in the other-sport stats I just mentioned), but it relies on the judgment of a scorer.


The second problem is the more problematic one.  How much does a number mean if two people watching the same match wouldn’t come up with the same result?  This was a hot-button issue during Wimbledon, when the scorers were assigning an unusually small number of UEs, especially on serve returns.

If you’re watching the match, you might not notice.  If the end-of-set stats show that Nadal had 8 UEs and Federer had 17, that does tell you something … Federer was making more obvious mistakes.  But if you want to compare that to a Nadal/Federer match three weeks ago, or last year, those numbers are all but useless.

I suspect that, at events like Wimbledon, someone from the ITF, or maybe IBM, is giving standardized instructions to scorers with general rules for categorizing errors.  That would be a good start, especially if it were implemented across all tournaments at all professional levels.

…but it doesn’t matter

I suspect that no matter how consistent scorers are, the distinction between “unforced” and “forced” errors will always be arbitrary.  Consider the case of service returns.  There are occasional points, especially on second serve returns, where the returning player misses an easy shot.  But more frequently, the returning player is immediately on defense.  When is an error “unforced” on the return of a 130 mile-per-hour shot?

Ultimately, we will probably have computerized systems that classify errors for us.  If you have all the necessary data and crunch the numbers, a 125-mph serve down the T in the ad court might be returned 60% of the time, meaning there is a 40% chance of an error or non-return.  With those numbers on every serve (and every other shot, eventually), we could set the line for an “unforced” error on a shot that the average top-100 player would make, say, 75% of the time.  Or we could have different classifications: “unforced errors,” “disastrous errors,” “mildly forced errors,” and so on, indicating different percentage ranges.

The problem we have now is that professionals are so good (and their equipment is so advanced), that almost every shot can be offensive, meaning that players are almost always–to some extent–on defense.  If you’re rallying with Nadal, you might hit some winners, but you’re always fighting the spin.  If you’re rallying with Federer, the spin isn’t so bad, but you’re always trying to keep the ball away from his forehand.  (If you’re rallying with Djokovic, you’re wishing you had hit a better serve.)  That perpetual semi-defensive posture means that nearly every error is, to some extent, forced.  And because players are so good, we expect them to return every reachable ball, suggesting that nearly every error is, to some extent, unforced.


The wisdom of baseball analysts

A very similar problem arises in baseball.  If a fielder makes a misplay (according to the official scorer), he is charged with an “error.”  Paradoxically, some of the best fielders end up with the highest error totals.  If, say, a shortstop has great range, he’ll reach a lot of groundballs, and have more chances to make bad throws, thus racking up the errors.

For decades, fans considered errors to be the standard measure of defensive prowess–a stat called “fielding percentage” measures the ratio of plays-successfully-made to chances.   (In other words, 1 minus “error rate.”)  But because of the paradox mentioned above, the highest fielding percentages do not necessarily belong to the best fielders.

The solution: Ignore errors, look only at plays made.  (This is an oversimplification, but not by much.)  If Shortstop A makes more plays than Shortstop B, it doesn’t matter whether A makes more errors.  The guy you want on your team is the one who makes more plays.

Essentially, baseball errors correspond to tennis unforced errors, and baseball plays-not-made (shortstop dives for the ball and can’t reach it) correspond to tennis forced errors.  The stat that ends up mattering to baseball analysts–“plays made”–corresponds to “shots successfully returned.”  The analogy is imperfect, but it illustrates the problem with separating one type of non-play from another.


If we don’t distinguish between different types of errors, we’re left with “shots made” and “shots not made,” or–even less satisfactorily–“points won” and “points lost.”  Not exactly a step in the right direction, since we’re already counting points!

Still, I suspect it’s better to have no stat than to have a misleading stat.  Rally counts are a positive step, since we can look at outcomes for different types of points.  If you win a lot of 10-or-more-stroke rallies, that identifies you as a certain type of player (or playing a certain kind of match).  It doesn’t matter whether you lose that sort of point on an unforced error or your opponent’s winner–both outcomes might stem from the same tactical mistake three or four strokes sooner.

Either that, or we can wait until we can calculate real-time win probability and start categorizing errors with extreme precision.  “Unforced errors” aren’t going away any time soon, but as fans, we can be smarter about how much attention we grant to individual numbers.