At the end of Turing’s Cathedral, George Dyson suggests that while computers aren’t always able to usefully respond to our questions, they are able to generate a stunning, unprecedented array of answers–even if the corresponding questions have never been asked.
Think of a search engine: It has indexed every possible word and phrase, in many cases still waiting for the first user to search for it.
Tennis Abstract is no different. Using the menus on the left-hand side of Roger Federer’s page–even ignoring the filters for head-to-heads, tournaments, countries, matchstats, and custom settings like those for date and rank–you can run five trillion different queries. That’s twelve zeroes–and that’s just Federer. Judging by my traffic numbers, it will be a bit longer before all of those have been tried.
Every filter is there for a reason–an attempt to answer some meaningful question about a player. But the vast majority of those five trillion queries settle debates that no one in their right mind would ever have, like Roger’s 2010 hard-court Masters record when winning a set 6-1 against a player outside the top 10. (He was 2-0.)
The danger in having all these answers is that it can be tempting to pretend we were asking the questions–or worse, that we were asking the questions and suspected all along that the answers would turn out this way.
The Hawkeye data on tennis broadcasts is a great example. When a graphic shows us the trajectory of several serves, or the path of the ball over every shot of a rally, we’re looking at an enormous amount of raw data, more than most of us could comprehend if it weren’t presented against the familiar backdrop of a tennis court. Given all those answers, our first instinct is too often to seek evidence for something we were already pretty sure about–that Jack Sock’s topspin is doing the damage, or Rafael Nadal’s second serve is attackable.
It’s tough to argue with those kind of claims, especially when a high-tech graphic appears to serve as confirmation. But while those graphics (or those results of long-tail Tennis Abstract queries) are “answers,” they address only narrow questions, rarely proving the points we pretend they do.
These narrow answers are merely jumping-off points for meaningful questions. Instead of looking at a breakdown of Novak Djokovic’s backhands over the course of a match and declaring, “I knew it, his down-the-line backhand is the best in the game,” we should realize we’re looking at a small sample, devoid of context, and take the opportunity to ask, “Is his down-the-line backhand always this good?” or “How does his down-the-line backhand compare to others?” Or even, “How much does a down-the-line backhand increase a player’s odds of winning a rally?”
Unfortunately, the discussion usually stops before a meaningful question is ever asked. Even without publicly released Hawkeye data, we’re beginning to have the necessary data to research many of these questions.
As much as we love to complain about the dearth of tennis analytics, too many people draw conclusions from the pseudo-answers of fancy graphics. With more data available to us than ever before, it is a shame to mistake narrow, facile answers for broad, meaningful ones.