## Novak Djokovic and a First-Serve Key to the Match

Landing lots of first serves is a good thing, right? Actually, how much it matters–even whether it matters–depends on who you’re talking about.

When I criticized IBM’s Keys To the Match after last year’s US Open, I identified first-serve percentage as one of three “generic keys” (along with first-serve points won and second-serve points won) that, when combined, did a better job of predicting the outcome of matches than IBM’s allegedly more sophisticated markers.  First-serve percentage is the weakest of the three generic keys–after all, the other two count points won which, short of counting sets, is as relevant as you can get.

First-serve percentage is a particularly appealing key because it is entirely dependent on one player. While a server may change his strategy based on the returning skills of his opponent, the returner has nothing to do with whether or not first serves go in the box.  Unlike the other two generic targets and the vast majority of IBM’s keys, a first-serve percentage goal is truly actionable: it is entirely within one player’s control to achieve.

In general, first-serve percentage correlates very strongly with winning percentage.  On the ATP tour from 2010 to 2013, when a player made exactly half of his first serves, he won 42.8% of the time. At 60% first serves in, he won 47.0% of the time. At 70%, the winning percentage is 57.4%.

This graph shows the rates at which players win matches when their first-serve percentages are between 50% and 72%:

As the first-serve percentage increases on the horizontal axis, winning percentage steadily rises as well.  With real-world tennis data, you’ll rarely see a relationship much clearer than this one.

Different players, different keys

When we use the same approach to look at specific players, the message starts to get muddled.  Here’s the same data for Novak Djokovic, 2009-13:

While we shouldn’t read too much into any particular jag in this graph, it’s clear that the overall trend is very different from the first graph. Calculate the correlation coefficient, and we find that Djokovic’s winning percentage has a negative relationship with his first-serve percentage. All else equal, he’s slightly more likely to win matches when he makes fewer first serves.

Djokovic isn’t alone in displaying this sort of negative relationship, either. The three tour regulars with even more extreme profiles over the last five years are Marin Cilic, Gilles Simon, and the always-unique John Isner.

Isner regularly posts first-serve percentages well above those of other players, including 39 career matches in which he topped 75%. That sort of number would be a near guarantee of victory for most players–for instance, Andy Murray is 32-3 in matches when he hits at least 70% of first serves in–but Isner has only won 62% of his 75%+ performances.  He is nearly as good (57%) when landing 65% or fewer of his first serves.

Djokovic, Isner, and this handful of others reveals a topic on which the tennis conventional wisdom can tie itself in knots. You need to make your first serve, but your first serve also needs to be a weapon, so you can’t take too much off of it.

The specific implied relationship–that every player has a “sweet spot” between giving up too much power and missing too many first serves–doesn’t show up in the numbers. But it does seem that different players face different risks.  The typical pro could stand to make more first serves. But a few guys find that their results improve when they make fewer–presumably because they’re take more risks in an attempt to hit better ones.

Demonstrating the key

Of the players who made the cut for this study–at least 10 matches each at 10 different first-serve-percentage levels in the last five years–9 of 21 display relationships between first-serve percentage and winning percentage at least as positive as Isner’s is negative.  The most traditional player in that regard is Philipp Kohlschreiber. His graph looks a bit like a horse:

More than any other player, Kohli’s results have a fairly clear-cut inflection point. While it’s obscured a bit by the noisy dip at 64%, the German wins far more matches when he reaches 65% than when he doesn’t.

Kohlschreiber is joined by a group almost as motley as the one that sits at the other extreme. The other players with the strongest positive relationships between first serve percentage and winning percentage are Richard Gasquet, Murray, Roger Federer, Jeremy Chardy, and Juan Martin del Potro.

These player-specific findings tell us that in some matchups, we’ll have to be a little more subtle in what we look for from each guy. When Murray plays Djokovic, we should keep an eye on the first-serve percentages of both competitors–the one to see that he’s making enough, and the other to check that he isn’t making too many.

## Analytics That Aren’t: Why I’m Not Excited about SAP in Tennis

It’s not analytics, it’s marketing.

The Grand Slams (with IBM) and now the WTA (with SAP) are claiming to deliver powerful analytics to tennis fans.  And it’s certainly true that IBM and SAP collect way more data than the tours would without them.  But what happens to that data?  What analytics do fans actually get?

Based on our experience after several years of IBM working with the Slams and Hawkeye operating at top tournaments, the answers aren’t very promising.  IBM tracks lots of interesting stats, makes some shiny graphs available during matches, and the end result of all this is … Keys to the Match?

Once matches are over and the performance of the Keys to the Match are (blessedly) forgotten, all that data goes into a black hole.

Here’s the message: IBM collects the data. IBM analyzes the data. IBM owns the data. IBM plasters their logo and their “Big Data” slogans all over anything that contains any part of the data. The tournaments and tours are complicit in this: IBM signs a big contract, makes their analytics part of their marketing, and the tournaments and tours consider it a big step forward for tennis analysis.

Sometimes, marketing-driven analytics can be fun.  It gives some fans what they want–counts of forehand winners, or average first-serve speeds. But let’s not fool ourselves. What IBM offers isn’t advancing our knowledge of tennis. In fact, it may be strengthening the same false beliefs that analytical work should be correcting.

SAP: Same Story (So Far)

Early evidence suggests that SAP, in its partnership with the WTA, will follow exactly the same model:

SAP will provide the media with insightful and easily consumable post-match notes which offer point-by-point analysis via a simple point tracker, highlight key events in the match, and compare previous head-to-head and 2013 season performance statistics.

“Easily consumable” is code for “we decide what the narratives are, and we come up with numbers to amplify those narratives.”

Narrative-driven analytics are just as bad–and perhaps more insidious–than marketing-driven analytics, which are simply useless.  The amount of raw data generated in a tennis match is enormous, which is why TV broadcasts give us the same small tidbits of Hawkeye data: distance run during a point, average rally hit point, and so on.  So, under the weight of all those possibilities, why not just find the numbers that support the prevailing narrative? The media will cite those numbers, the fans will feel edified, and SAP will get its name dropped all over the place.

What we’re missing here is context.  Take this SAP-generated stat from a writeup on the WTA site:

The first promising sign for Sharapova against Kanepi was her rally hit point. Sharapova made contact with the ball 76% of the time behind the baseline compared to 89% for her opponent. It doesn’t matter so much what the percentage is – only that it is better than the person standing on the other side of the net.

Is that actually true? I don’t think anyone has ever published any research on whether rally hit point correlates with winning, though it seems sensible enough. In any case, these numbers are crying out for more context.  Is 76% good for Maria? How about keeping her opponent behind the baseline 89% of the time? Is the gap between 76% and 89% particularly large on the WTA? Does Maria’s rally hit point in one match tell us anything about her likely rally hit point in her next match?  After all, the article purports to offer “keys to match” for Maria against her next opponent, Serena Williams.

Here’s another one:

There is a lot to be said for winning the first point of your own service game and that rung true for Sharapova in her quarterfinal. When she won the opening point in 11 of her service games she went on to win nine of those games.

Is there any evidence that winning your first point is more valuable than, say, winning your second point?  Does Sharapova typically have a tough time winning her opening service point?  Is Kanepi a notably difficult returner on the deuce side, or early in games?  “There is a lot to be said” means, roughly, that “we hear this claim a lot, and SAP generated this stat.”

In any type of analytical work, context is everything.  Narrative-driven analytics strip out all context.

The alternative

IBM, SAP, and Hawkeye are tracking a huge amount of tennis data.  For the most part, the raw data is inaccessible to researchers.  The outsiders who are most likely to provide the context that tennis stats so desperately need just don’t have the tools to evaluate these narrative-driven offerings.

Other sporting organizations–notably Major League Baseball–make huge amounts of raw data available.  All this data makes fans more engaged, not less. It’s simply another way for the tours to get fans excited about the game. Statheads–and the lovely people who read their blogs–buy tickets too.

So, SAP, how about it?  Make your branded graphics for TV broadcasts. Provide your easily consumable stats for the media.  But while you’re at it, make your raw data available for independent researchers. That’s something we should all be able to get excited about.

## Simpler, Better Keys to the Match

Italian translation at settesei.it

If you watched the US Open or visited its website at any point in the last two weeks, you surely noticed the involvement of IBM.  Logos and banner ads were everywhere, and even usually-reliable news sites made a point of telling us about the company’s cutting-edge analytics.

Particularly difficult to miss were the IBM “Keys to the Match,” three indicators per player per match.  The name and nature of the “keys” strongly imply some kind of predictive power: IBM refers to its tennis offerings as “predictive analytics” and endlessly trumpets its database of 41 million data points.

Yet, as Carl Bialik wrote for the Wall Street Journal, these analytics aren’t so predictive.

It’s common to find that the losing player met more “keys” than the winner did, as was the case in the DjokovicWawrinka semifinal.  Even when the winner captured more keys, some of these indicators sound particularly irrelevant, such as “average less than 6.5 points per game serving,” the one key that Rafael Nadal failed to meet in yesterday’s victory.

According to one IBM rep, their team is looking for “unusual” statistics, and in that they succeeded.  But tennis is a simple game, and unless you drill down to components and do insightful work that no one has ever done in tennis analytics, there are only a few stats that matter.  In their quest for the unusual, IBM’s team missed out on the predictive.

IBM vs generic

IBM offered keys for 86 of the 127 men’s matches at the US Open this year.  In 20 of those matches, the loser met as many or more of the keys as the winner did.  On average, the winner of each match met 1.13 more IBM keys than the loser did.

This is IBM’s best performance of the year so far.  At Wimbledon, winners averaged 1.02 more keys than losers, and in 24 matches, the loser met as many or more keys as the winner.  At Roland Garros, the numbers were 0.98 and 21, and at the Australian Open, the numbers were 1.08 and 21.

Without some kind of reference point, it’s tough to know how good or bad these numbers are.  As Carl noted: “Maybe tennis is so difficult to analyze that these keys do better than anyone else could without IBM’s reams of data and complex computer models.”

It’s not that difficult.  In fact, IBM’s millions of data points and scores of “unusual” statistics are complicating what could be very simple.

I tested some basic stats to discover whether there were more straightforward indicators that might outperform IBM’s. (Carl calls them “Sackmann Keys;” I’m going to call them “generic keys.”)  It is remarkable just how easy it was to create a set of generic keys that matched, or even slightly outperformed, IBM’s numbers.

Unsurprisingly, two of the most effective stats are winning percentage on first serves, and winning percentage on second serves.  As I’ll discuss in future posts, these stats–and others–show surprising discontinuities.  That is to say, there is a clear level at which another percentage point or two makes a huge difference in a player’s chances of winning a match.  These measurements are tailor-made for keys.

For a third key, I tried first-serve percentage.  It doesn’t have nearly the same predictive power as the other two statistics, but it has the benefit of no clear correlation with them.  You can have a high first-serve percentage but a low rate of first-serve or second-serve points won, and vice versa.  And contrary to some received wisdom, there does not seem to be some high level of first-serve percentage where more first serves is a bad thing.  It’s not linear, but he more first serves you put in the box, the better your odds of winning.

Put it all together, and we have three generic keys:

• Winning percentage on first-serve points better than 74%
• Winning percentage on second-serve points better than 52%
• First-serve percentage better than 62%

These numbers are based on the last few years of ATP results on every surface except for clay.  For simplicity’s sake, I grouped together grass, hard, and indoor hard, even though separating those surfaces might yield slightly more predictive indicators.

For those 86 men’s matches at the Open this year with IBM keys, the generic keys did a little bit better.  Using my indicators–the same three for every player–the loser met as many or more keys 16 times (compared to IBM’s 20) and the winner averaged 1.15 more keys (compared to IBM’s 1.13) than the loser.  Results for other slams (with slightly different thresholds for the different surface at Roland Garros) netted similar numbers.

A smarter planet

It’s no accident that the simplest, most generic possible approach to keys provided better results than IBM’s focus on the complex and unusual.  It also helps that the generic keys are grounded in domain-specific knowledge (however rudimentary), while many of the IBM keys, such as average first serve speeds below a given number of miles per hour, or set lengths measured in minutes, reek of domain ignorance.

Indeed, comments from IBM’s reps suggest that marketing is more important than accuracy.  In Carl’s post, a rep was quoted as saying, “It’s not predictive,” despite the large and brightly-colored announcements to the contrary plastered all over the IBM-powered US Open site.  “Engagement” keeps coming up, even though engaging (and unusual) numbers may have nothing to do with match outcomes, and much of the fan engagement I’ve seen is negative.

Then again, maybe the old saw is correct: It’s all good publicity as long as they spell your name right.  And it’s not hard to spell “IBM.”

Better keys, more insight

Amid such a marketing effort, it’s easy to lose sight of the fact that the idea of match keys is a good one.  Commentators often talk about hitting certain targets, like 70% of first serves in.  Yet to my knowledge, no one had done the research.

With my generic keys as a first step, this path could get a lot more interesting.  While these single numbers are good guides to performance on hard courts, several extensions spring to mind.

Mainly, these numbers could be improved by making player-specific adjustments.  74% of first-serve points is adequate for an average returner, but what about a poor returner like John Isner?  His average first-serve winning percentage this year is nearly 79%, suggesting that he needs to come closer to that number to beat most players.  For other players, perhaps a higher rate of first serves in is crucial for victory.  Or their thresholds vary particularly dramatically based on surface.

In future posts, I’ll delve into more detail regarding these generic keys and investigate ways in which they might be improved.  Outperforming IBM is gratifying, but if our goal is really a “smarter planet,” there is a lot more research to pursue.