# Toward Atomic Statistics

The other day, Roger Federer mentioned in a press conference that he’s “never been a big stat guy.”  And why would he be?  Television commentators and the reporters asking him post-match questions tend to harp on the same big-picture numbers, like break points converted and 2nd-serve points won.

In other words, statistics that look better when you’re winning points.  How’s that for cutting edge insight: You get better results when you win more points.  If I were in Fed’s position, I wouldn’t be a “big stat guy” either.

To the extent statistics have the potential to tell us about a particular player’s performance, we need to look at numbers that each player can control as much as possible.  Ace counts–though they are affected by returners to a limited extent–are an example of one of the few commonly-tracked stats that directly reflect an aspect of a player’s performance.  You can have a big serving day with not too many aces and a mediocre serving day with more, but for the most part, lots of aces means you’re serving well.  Lots of double faults means you’re not.

By contrast, think about points won on second serve, a favorite among the commentariat.  That statistic may weakly track second serve quality, but it also factors the returner’s second serve returns, as well as both player’s performance in rallies that begin close to an even keel.  It provides fodder for discussion, but it certainly doesn’t offer anything actionable for a player, or an explanation of exactly what either player did well in the match.

Atomic statistics

Aces and double faults are a decent proxy for performance on serve.  (It would be nice to have unreturnables as well, since they have more in common with aces than they do with serves that are returned, however poorly.)

An obvious example of a base-level stat we should be counting is service return depth.  Yes, it’s affected by how well the opponent serves, but it refers to a single shot type, and one upon which the outcome of a match can hinge.  It can be clearly defined, and it’s actionable.  Fail to get a reasonable percentage of service returns past the service line, and a good player will beat you.  Put a majority of service returns in the backmost quarter of the court, and you’re neutralizing much of the server’s advantage.

Here are more atomic statistics with the same type of potential:

• Percentage of service returns chipped or sliced.
• Percentage of backhands chipped or sliced.
• Serves (and other errors) into the net, as opposed to other types of errors.
• Variety of direction on each shot, e.g. backhands down the line compared to backhands crosscourt and down the middle.
• Net approaches
• Drop shot success rate (off of each wing).

Two commonly-counted statistics, unforced errors and winners, have many characteristics in common with these atomic stats, but are insufficiently specific.  Sure, knowing a player’s winner/ufe rate for a match is some indication of how well he or she played, but what’s the takeaway? Federer needs to be less sloppy? He needs to hit more winners?  Once again, it’s easy to see why players aren’t clamoring to hear these numbers.  No baseball pitcher benefits from learning he should give up fewer runs, or a hockey goaltender that he needs to allow fewer goals.

Glimmers of hope

With full access to Hawkeye data, this sort of analysis (and much, much more) is within reach.  Even if Hawkeye material remains mostly impenetrable, the recent announcement from SAP and the WTA holds out hope for more granular tennis data.

In the meantime, we’ll have to count this stuff ourselves.

## 2 thoughts on “Toward Atomic Statistics”

1. Jeff – Not sure what the bulleted list of atomic stats have in common with return depth as you explained it:
– % returns chipped: yes, it’s within the control of the player, but does the raw # provide anything without comparing points W/L after a chip return with those W/L after other returns; is that what you mean?
– same with % of BH’s chipped or sliced
– errors into the net: is the assumption that such errors are unforced or avoidable, or what?
– variety of direction: if Rafa pounds Fed’s BH enough to be able to hit a few inners to his FH side and extract errors from the BH, how does that help Rafa know if he has maximized his tactical opportunity v Fed? so many other factors involved (surface, health, other stats in match)
– net approaches: same kind of Qs
– drop shot success: if 5 work, would 10 be better? not necessarily…

Rick

1. You could make the same point about return depth — it’s not fully within the control of the player. Nothing is, except for the serve, and as I noted, even ace count isn’t wholly unaffected by the other player. But by identifying these components, we get closer to identifying what players are doing to help or hurt themselves. To use a baseball analogy, it’s a lot better to know that a pitcher threw 70% strikes than it is that he gave up 5 runs. 70% strikes isn’t a guarantee of success, but it tells us more about his direct contribution to the game than the 5 runs does.

If we have a group of 10 Tsonga matches, and we know his # of net approaches in each match, we can start to draw conclusions from that based on the outcome of each. No, it doesn’t mean that more is necessarily good, and certainly not that infinitely more is infinitely good, but it would tell us something we don’t know, and something that could help Tsonga better plan tactics for his next match or help his opponent do the same.