There Is No Analytics Revolution In Tennis

Italian translation at settesei.it

I’m sure you’ve heard about the trend. First, statistics overhauled baseball, and teams in every major sport now employ quants to search out that extra edge. Tennis has lagged behind the others, but with the help of big data, we’re on the cusp of a whole new era.

That’s the story, anyway. Yesterday brought us another example.

What happened in baseball is, quite simply, never going to happen in tennis.

To oversimplify a bit, the “Moneyball revolution” refers to front offices using analytics to identify underrated and underpriced players. To a lesser extent, it refers to deploying those players in a smarter way–say, rearranging the batting order or attempting fewer stolen bases.

In tennis, there are no front offices. Players aren’t paid salaries by teams. And there are no managers to decide how best to use their players.

In short: There are no organizations with both the incentives and the resources to analyze data.

Of course, when people get breathless about all the raw data floating around in tennis, that isn’t what they’re talking about. (No one really thinks Hawkeye data is going to revolutionize, say, the World Team Tennis draft.) Instead, they are implying that the data can be analyzed in such a way to be actionable for players.

That’s an admirable objective. In theory, Kevin Anderson’s coach could look at all the data from all the matches between Anderson and Tomas Berdych and identify which tactics worked, which didn’t, and make recommendations accordingly. Of course, Kevin’s coach is already watching all those matches, taking notes, reviewing video, and presumably making recommendations, so if big data is going to change the game, it needs to somehow offer coaches demonstrably better insights.

With all the cameras pointed at tennis’s show courts, that’s certainly possible. The closest analogue in baseball is the pitch f/x system, which tracks the speed, location, and movement of every pitch. Some pitchers have been able to use pitch f/x data to analyze and improve upon their own performance. The same could eventually happen in tennis. But there are systemic reasons why it hasn’t yet, and those root causes are unlikely to disappear anytime soon.

What needs to change

Hawkeye cameras are aimed at a lot of courts and have the capability of collecting an enormous amount of data. That’s how broadcasts are able to bring you stats like average net clearance and meters run. Those cameras also help generate graphics like those showing where all of a player’s serves landed.

After a match is over, with no calls left to be overturned and no broadcast needs likely to arise, what happens to the data? For all practical purposes, it gets stashed in the attic and forgotten. (Here’s a more thorough explanation.) Contrast that to Major League Baseball, which makes all pitch f/x data available immediately–to the public, for free–and has archived it indefinitely.

If tennis is to see any meaningful analytical breakthroughs, Hawkeye data needs to be aggregated in a single database. Results from one match are sometimes interesting (hey look, Andy’s net clearance is 15% greater than Roger’s!), but if we’re always looking at one match, or one tournament, at a time, we’ll never learn which of these Hawkeye-derived statistics matter, or how much.

IBM, the collector of much of this information, may already maintain some version of that database. But the results are jaw-droppingly uninspiring. On broadcasts, we get the same old stats and graphics. When IBM has ventured into predicting match outcomes, their “millions of data points” are outperformed by my much simpler model.

IBM is the one organization in the sport with the resources to do the kind of analysis that will transform tennis. But they have no incentive to do so. To IBM (and now SAP, in the women’s game), tennis is a public relations opportunity, one that allows them to brand tournament websites and on-screen graphics with their logo. (Not to mention those suspiciously pro-IBM trend pieces linked to above.)

Players might eventually benefit from data-based insights, but only a tiny fraction of them could afford to hire even a single analyst. (Hi Simona! Text me anytime!)

Once again, we have to turn to baseball for a precedent. Even in that immense sport, with its billion-dollar franchises, it was amateurs–outsiders–who did the work that brought about the analytics revolution. Even now, with teams aggressively hiring promising talent from outside the game, many of the most profitable insights still come from independent researchers. If MLB made its data as inaccessible as tennis does, that trend would’ve ground to a halt long ago.

Nice as it is to dream about a better world of tennis data, we’re unlikely to see it anytime soon. Tennis doesn’t have a commissioner, so there’s no one to appoint a data czar, let alone anyone who could convince the alphabet soup of the ATP, WTA, ITF, IBM, SAP, and Hawkeye to aggregate their data in any meaningful way.

Until that happens, and until the data is publicly available, there will be no analytics revolution in tennis. We’ll continue to get what we have now: the occasional Hawkeye stat, free of context, illustrating the same sort of analysis we’ve been hearing for decades.

Disorder of Play

Imagine you’re a rabid Chicago Cubs fan (sorry), and you’re looking forward to the season starting in a couple of weeks. You’re thinking of making a road trip to see your favorite team. You go to the Yankees website, and all you can find are some vague references to a big series in St. Louis in May. Nothing more.

You check out MLB.com and find a story about the matchup between the Cubs and White Sox, but it’s mostly about last year. Finally you start checking the websites for other MLB stadiums, and you discover that the Cubs are scheduled to play in Milwaukee for three days in June. You consider checking another couple dozen sites and finally give up.

Baseball fans know just how ridiculous that is–you can find a Cubs schedule in any of hundreds of places, with clickable links to every other MLB team’s slate for the season. You can see a list of every Opening Day matchup or, if you want, every game scheduled for the 5th of September.

Yet this fictional scenario of fruitless schedule-hunting is exactly what tennis fans face every week of the season. It’s easy to find out where tournaments will be held, but often impossible–and always irritating–to establish who will be playing. If you want to know what the next few weeks look like for your favorite player (especially if your favorite player isn’t named Roger, Rafa, Novak, or Andy), good luck. Patience is a virtue, I guess.

Unlisted lists

Players formally commit to tour events several weeks ahead of time. Each tournament has an entry deadline (top-tier events are six weeks in advance, Challengers three weeks), and once entries are in, we have what is called–you guessed it–an entry list. You can see the list for the ATP Houston event here, since the tournament organizers chose to publish it. Not all events do.

And even when they do, they rarely keep them up to date. Throughout the several weeks between the initial list and the beginning of qualifying rounds, players withdraw and alternates enter the mix. Especially at the 250 level, it’s not uncommon for 10 or more alternates to find their way into the main draw.  But with an old list (if there is a list at all), how to know whether Tim Smyczek or Dominic Thiem or Dudi Sela or Somdev Devvarman is going to be there?

Making matters worse, Wild Card entries–players who are chosen in part to increase fan interest at an event–are often published elsewhere, for instance in a press release. 18 days from the opening of the tournament, Houston hasn’t said anything about who any of those players will be. (Though if I were a betting man, here’s where I’d put my money.)

Usually, if you’re willing to put in some effort and you want to know a specific fact–Is Bernard Tomic going to play Monte Carlo? Is Tommy Robredo going to defend his title in Casablanca?–you can find it. But is that really the best the ATP can do? Again, think of the scenario in which it takes a super sleuth to find out where the Cubs will be playing in a month.

Help us become bigger fans

Sporting organizations thrive on big fans, the ones who travel to events (paying for lots of tickets), pony up for year-long subscriptions to streaming services, and stock up on branded merchandise. These fans want to know what’s going on all the time, and they care about more than just the two players who might appear on the front page of the newspaper.

It would be so simple to make available an actual schedule, like other sports started doing back in the 19th century. In fact, before the ATP password-protected their entry lists, I did just that. Here’s a simple page that shows everyone who was on an entry list in a six-week period, along with links to the lists for each event.

That information is out there. It’s an insult to fans to hide it. We want to get excited about our favorite players–both the ones who are guaranteed a seed and the ones who are holding out hope of a spot in the main draw. We deserve better.