There’s a lot more that can be done with tennis data. Everyone knows this. Even the ATP and WTA tours–along with their rather prominent partners–know this.
Both tours are sitting on a mountain of information that they’ve barely exploited: umpire scorecard data. It’s not cutting edge–there are no cameras, no courtside loggers counting unforced errors and winners. It’s just a log of every point, along with first or second serve, aces, and double faults. Despite those limits, there are many untapped advantages.
First: There’s an umpire scorecard for every match. Not every match on a TV court, not every match on a Hawkeye court, not every main draw match. Every match. If a ATP, WTA, or ITF umpire is officiating the match, to the best of my knowledge, there is a scorecard–when you see a chair umpire tap on a screen, this is what they’re recording. That means data on thousands of matches and players every year, from Novak Djokovic to Djordje Djokovic.
It’s tough to overstate how valuable that is. The main drawback of most tennis stats is context. For instance, when Hawkeye puts a graphic on your TV screen, it’s often based on data from a single match or the present tournament. IBM’s much-publicized analytics are based on Grand Slam matches only. Umpire scorecards have no such problem.
Second, there’s a ton of information lurking in this low-tech tracking system. The basics of first and second serves, aces, and double faults may not sound like much, but as we’ll see below, they open the door to a huge array of stats. ATP and WTA “Match Stats” are compiled from these scorecards, but they only scratch the surface.
How to do more with scorecards
In a minute, I’ll make specific suggestions for additional totals and rates that the tours could compile from the data they already have. Before that, let me explain why simply expanding the contents of “Match Stats” should be Plan B.
More and more journalism is data-based, and more and more avid fans are, to some extent or other, analyzing tennis for publication. In other words, there is a rapidly growing base of analysts who don’t need data pre-packaged for them. Every match is different, and the numbers needed to illustrate any match report are different as well. For broader analysis, like comparing players over the course of a season, the need for customized data is more important still.
So: Release the point-by-point data from the scorecards.
Another benefit of the simplicity of umpire scorecard data is that more analysts can easily manage it. No organization could foresee everything that might be interesting about a match, so why even try? Not every journalist will want to dig into a point-by-point spreadsheet to see how often Julien Benneteau missed his first serve of a game, or how Rafael Nadal responded every time he fell behind 0-30. But some will do just that. When they do, their work benefits, their readers have more ways to engage with the next match they watch, and the sport ultimately wins.
A not-so-brief wish list
I have a sneaking feeling that no one’s going to release point-by-point data for every ATP or WTA match. I hope that’s not the case, but if it is, the tours should still consider vastly expanding the stats they compile for each match–including past matches for as far back as their databases go.
- Deuce/ad comparisons. Some players serve much more effectively in one than the other. For all deuce-court service points, I would like: (a) total points, (b) aces, (c) double faults, and (d) first serves in. Same for ad-court service points.
- Break point stats. Same as the above: For both servers facing break point: (a) aces, (b) double faults, and (c) first serves in.
- Break point games. In how many games did each player earn a break point?
- Stats for other important point scores. Break points are key, but other scenarios are important as well. If I have to pick only a few, let’s start with 0-30, 15-30, deuce, and ad-in (including 40-30). For all service points at each of those scores, I’d like (a) total points, (b) aces, (c) double faults, and (d) first serves in.
- Set points and match points: Same as above. Fans love match point stats.
- The game sequence–at what points did breaks of serve occur? This would allow us to answer many oft-posed questions: Do players hold serve more early in sets? Do breaks of serve more frequently follow breaks than holds? (And if so, how much more often?) Are players more like to drop serve immediately after winning a tight set?
- Set-by-set breakdowns of all stats that are currently kept, plus all of the above. The live scoring app separates stats by set, but there is no official archive with set-by-set breakdowns. This is particularly key for journalists attempting to tell the story of a match, when a small change in approach can turn the tide.
- Tiebreak breakdowns. Tiebreaks–especially long ones–have a life of their own, and analysts should be able to see all of the same stats for each tiebreak as for each set as a whole. For example, it would be interesting to see if a player’s ace or double fault rates (or even his or her first-serve percentage) changed between the first twelve games of a set and the breaker.
- A list of the score when each double fault occurred. (Aces would be nice, too.) Especially in men’s tennis, DFs are quite rare, and they often loom large in match narratives.
- Longest streaks for each player: consecutive aces, consecutive double faults, consecutive points won on serve, consecutive points won overall, and the score at the beginning and end of each of those streaks.
- For doubles matches, a separation of all of the above service stats by server. For the Samuel Groth/Leander Paes partnership, aggregate serve stats f (as they are presented now) aren’t going to tell you anything useful about either player’s performance at the line.
To reiterate, all of this stuff is in the scorecards. Most of the above are no more difficult to compile than the Match Stats that the tours already publish.
If the tours added everything on my list, that would be one big step out of the dark ages for tennis. Certainly, tennis writers would be able to file more intelligent stories and fans would have a much better way to experience the performances of their favorite players.
If the tours published current and archived raw point-by-point data, tennis would go one better: it would become an example for many other individual sports to follow. We would see an boom in fan engagement as every follower of the sport would have the opportunity to learn much more about tennis and relive matches–whether last week or late last century–in detail.
We’re not talking about a multi-million dollar infrastructure investment. To achieve all this, the tours need only do a little bit more with what they already have.