The Misleading Stat Sheet

A glance at the stat sheet from Serena Williams’s third-round match against Jie Zheng suggests that Serena dominated.  23 aces to 1, 3 break point conversions to none, 54 winners to 21, 84% 2nd-serve points won to 50%, and 55% of the total points played.

Of course, according to the more important stats–games and sets–Serena didn’t dominate.  She barely snuck through, losing a first-set tiebreak and going to 9-7 in the third.

Rick Devereaux, who brought this contrast to my attention, suggests that grass-court tennis–with more clean winners and fewer unforced errors than slower-paced styles–may be responsible.  That’s certainly part of the equation.

In fact, the Serena/Zheng match highlights the limits of the traditional stat sheet, especially on a surface that particularly favors the server.  Except for winners and unforced errors, nearly every stat directly captures some aspect of serving prowess–either yours or your opponent’s.  And in an era where nearly everyone is an excellent server, it doesn’t matter much whether you’ve set down a great serving performance or merely a good one.

To get to tiebreaks (or 9-7, or 70-68), you don’t have to be as good as your opponent, you just need to be good enough to hold.  Even the “winners” stat has to do with serving dominance, since so many are third shots behind a serve.  The vast majority of the stats from Serena’s match tell us that the American was more dominant on her serve than Zheng was.  And, of course, while Zheng was good enough to hold to 6-6 and 7-7, she lost the second set fairly badly, so the stats are a weighted average of two almost-even sets and one lopsided one.

When we find a mismatch between stat sheet and scoreline, we’re usually seeing one of two things:

  1. One player was much more dominant on serve (think 4 or 5-point games instead of 6+)
  2. One player won a lot of clutch points (like deuce, on serve) — losing unimportant ones (like 40-0 on serve), thus padding her opponent’s stat sheet.

Oddly, in the men’s game, the players who we think of as most dominant on serve rarely give us mismatched score sheets like this–quite the opposite.  Note the wording: “one player was much more dominant.”  There’s no doubt John Isner can dominate on serve, but since almost all his opponents are also good servers, Isner’s weak return game means that he is often the less dominant server, winning service games at 40-30 and losing return games at 0-40 or 15-40.  In fact, Isner has won more than 20 career matches despite losing more than half of the points played!

The same reasoning doesn’t apply to Serena.  She may be as big a server (relative to her opponents) as Isner, but her return game is also world-class.  And in the WTA, there are far more weak-to-middling servers.  On grass, as Rick points out, those weak-to-middling servers are (usually) still able to hold, making it more likely that a dominant performance on paper ends at 9-7 in a deciding set.

5 thoughts on “The Misleading Stat Sheet”

  1. As you note, “[Zheng] lost the second set fairly badly, so the stats are a weighted average of two almost-even sets and one lopsided one.” It seems to me that’s an even simpler explanation for why total stats can be misleading, regardless of surface. But a very interesting read nonetheless.

    Maybe you’ve already covered this in a previous post – but can you think of kinds of stats that aren’t kept in tennis, but should be?

    1. It’s true that one lopsided set has some effect on the misleading stats, but in this case it isn’t the whole story, nor is a lopsided set necessary to generate misleading stats.

      My biggest gripe with traditional stats is that indistinguishable outcomes are distinguished. E.g. aces are counted but unreturned serves are not; “unforced” errors are counted but forced errors generally are not–and there is little agreement about what separates the two. Winners are counted, but like aces, a winning shot isn’t a “winner” if the other player gets a racquet on it. That’s part of Rick’s point — a fast court means more ‘clean’ winners/aces, making the stats look more impressive even when the same shots on a different surface would have had the same end result.

      1. I guess, then, I’d ask, how could/might tennis stats be kept in an ideal world where metrics have more meaning?

        I.e. I guess things like aces are counted because (1) they’re easy to count, and (2), a “clean” shot like an ace (or a winner) looks good, i.e. is impressive, thus seems to the naked eye (and the naked brain) to be important. But given improvements not just in technology but analysis, what can now be counted or quantified that might be more revealing?

        1. A good answer to that question will probably require another post of its own. But a good start would be (a) unreturnable serves (instead of aces) , and (b) some measurement of performance on high-leverage points (including, but not limited to break points), such as deuce, and 40-30 on serve.

          There’s also the meta-question of what the point of tennis stats are. In team sports (where stats have become big business), the point is to tease out individual contributions — we care about Derek Jeter’s stats because we don’t have any other way to quantify how many wins he’s added to the Yankees total. But we know how many wins Serena has added to Serena’s total! So there’s some value in ‘component’ stats (like those on current stat sheets, and others that have been dreamed up), but it’s either for entertainment, or for predictive value, perhaps telling us that Zheng’s performance, though gritty, isn’t really telling us she’s about to jump in the rankings this hard court season.

Leave a Reply

Your email address will not be published. Required fields are marked *