The Changing Depth of the WTA

During yesterday’s broadcast of the Australian Open match between Alison Riske and Yanina Wickmayer, commentator Elise Burgin discussed whether the depth of the WTA has increased over the years. She felt strongly that it has, and she had a very useful illustration on screen, as 55th-ranked Riske was putting on an impressive display of shotmaking en route to a 6-1 6-1 victory.

From a quantitative perspective, “depth” can be hard to pin down. If lower-ranked players are holding their own against the top five, or ten, or thirty, it could mean that the field is very deep, or it could mean that we’re in an era without all-time greats.  As Burgin pointed out, the WTA might not currently have a top five to match those of some recent eras, but there’s little doubt that today’s top two could line up with just about any of the last few decades.

It would be very difficult to settle whether today’s top ranks are good, bad, or otherwise in historical terms, so for now, let’s assume they are average.  We’ll return to that in a bit.

Let’s start by looking at how the WTA top 32 has fared against everybody else. This encompasses about 900 matches per season. The trend isn’t overwhelming, but it does seem that the top 32 is not quite as dominant as it was in some previous periods:

depth32

The 2012 and 2013 winning percentages of 73.4% and 74.7% represented the lowest two-year span since 1984 (where my ranking database begins).  Aside from the outlier years of 2004 and 2007, the top 32 has won fewer than 77% of its matches against the pack for more than a decade.  In the 1980s and 1990s, the top 32 was consistently above that number.

Of course, drawing the line at the top 32 is arbitrary. Most of us would think of the 19th- or 26th-ranked player as part of the pack, not as a defining player of this generation.  Let’s see how the graph looks if we draw the line at the top 10:

depth10

Looking at the top 10 against everyone else doesn’t differentiate the current era quite as much as the top 32 does, but it continues to show that the pack is quite competitive in historical terms.

Since 1984, the top 10 has won almost exactly 80% of matches against everyone else, and for the last two years, the WTA has matched that number.  However, in the very recent past, from 2009 to 2011, the pack posted the three best single-season records against the top 10, peaking in 2010, when the top 10 won only 74% of matches against others.

As I noted at the outset, comparing “the top” with “the pack” in a series of years implies that one or the other is a constant. The top–especially a small group such as the top 10–almost certainly isn’t. In 2010, that great season for the pack, Serena Williams played only 29 matches, compared to 62 in 2009 and 82 in 2013. Add another 30 or 50 Serena matches to the sample and maybe the pack wouldn’t have looked so good.

While the pack is less affected by single injuries, it probably isn’t a constant either. After all, the claim that launched this post is that the pack has improved.  Thus, we can’t entirely trust these numbers as a rating of the top based on their record against the pack, or as a rating of the pack based on their record against the top.

However, we can see broad trends and supplement them with some qualitative judgments.  If you believe that today’s top ten is a particularly weak one, the fact that the pack is winning only 20% of their matches against that group isn’t exactly an endorsement. If you think the top of the game is particularly strong, that 20% looks much better, supporting Burgin’s position that the pack is better than ever.

An alternative theory that may explain this intuition about the pack is based on injuries. WTA injury numbers (based on retirements and withdrawals, anyway) are at an all-time high, and advances in sports medicine are getting players back on court quicker than ever. Thus, there is always a pool of players whose talent level is not represented by their ranking, either because they are injury prone and never reach that ranking, or because they’ve recently missed time and seen their ranking fall during that period.

Of course, there have always been players in the field returning from injury, but at any given time, there are probably more today than there were twenty years ago. And that means more unseeded, lower-ranked competitors with the capability of beating a top player. They usually don’t–as in the cases of Andrea Petkovic, Venus Williams, and Vera Zvonareva this week–but if you’re looking at a draw hunting for dark horses and interesting early-round matchups, those are the sorts of names that deliver.

Given all the moving parts in this sort of analysis, it’s tough to draw conclusions. If a couple of players suddenly emerge as dominant players and complement Serena and Vika at the top of the game, we could see these numbers swing in favor of the top. If Serena suddenly retires, they’ll probably swing in favor of the pack. For now, the best I can offer is that the pack–whether defined as those outside the top 10, top 32, or any number in between–is probably a bit better than the WTA’s historical average.

Roger Federer’s Break Point Opportunities

Remember Roger Federer‘s dreadful performance on break points against Tommy Robredo at last year’s US Open?  Of course you do. He had 16 chances to break, converted only two of them, and lost the match in straight sets.  Then we all cried.

Yesterday, Federer won in straight sets against James Duckworth, but his break point performance wasn’t much better.  Four breaks of serve was all he needed to cruise to victory, but the Australian saved 13 other break chances.  In his disappointing loss to Lleyton Hewitt in Brisbane, Fed only converted 1 of 10 break chances.

Is this the end? Is a lack of break point conversions the monster that will finally slay the old man?

Not so fast.

To identify how bad (or, possibly, good) Federer has been on break points, we must compare that performance to his record on other return points.  Roger isn’t same kind of master returner as Novak Djokovic or Rafael Nadal, so it would be unrealistic to expect him to convert as many break points as they do.  To control for general returning ability, we must compare break point conversion rate to winning percentage on all other return points.

Sure enough, 2013 wasn’t a good year for Fed.  His break point conversion rate was 8% lower than his winning percentage on other return points.  When I ran these numbers after the Robredo match, that ranked 40th out of the ATP top 50.

Most of us, thinking back to Fed’s glory days, surely imagine that this is new.  And it’s true: 2013 was a bad year. But watch out for runaway narratives–there’s more randomness here than trend.  The graph below shows how Fed has performed each year on break point conversions.  A number above 1 is good: He’s winning more break point chances than other return points, as in 2009, when he exceeded expectations by 4.4%. Below 1 is bad: Last year was 7.8% below expectations.

fedbp

If you see a pattern here, I’m impressed.  2013 was bad, but not as bad as 2003, when 21-year-old Fed performed more than 10% worse on break point chances than on other return points.  He also went 78-17, winning seven tournaments, including Wimbledon and the Masters Cup, raising his ranking from #6 to #2.

Last year’s break point record was also comparable to 2007, when he converted 5.9% fewer break points than expected … and won three Grand Slams.

As with so many popular tennis stats, this one just doesn’t have that much of a relationship with winning.  Breaks matter, but missed break chances don’t. In Federer’s case, even breaks don’t always matter that much–he’s one of history’s best in tiebreaks.

The bigger picture with break point conversions

Over his career, Federer has been just a tick below average on break point, winning about 1.5% more other return points than break points.  The year-to-year fluctuations don’t appear to be terribly meaningful.

That isn’t to say that no player has strong break point tendencies.  Nadal has consistently excelled in these clutch situations, winning more break points than expected for each of the last five seasons.  He is even better when facing break point, typically winning about 7% more service points in that situation than in others.  (Some of that is due to the advantage of a lefty serving in the ad court.)

Novak Djokovic has also been a little better on break points than on return points as a whole. But last year–a season he finished within a whisker of #1–his performance in those situations was almost as poor as Federer’s.

Andy Murray is consistent when handed break point chances–consistently bad.  Since 2006, he has only exceeded expectations once. In 2012–a pretty good year from him by most standards–he won 7.3% fewer break point chances than other return points.

David Ferrer? A tick below expectations. 7.7% below other return points in 2013. Juan Martin del Potro? Consistently above expectations, including an impressive +6.8% in 2011.  Stanislas Wawrinka? -7.3% in 2011, +7.8% in 2012, then in his breakthrough 2013 campaign, -3.0%.

Constant exposure to break point stats has tricked us into thinking they are particularly meaningful. There are plenty of reasons why Federer is winning fewer matches than he used to–for one thing, he’s almost as old as I am–but break point performance just isn’t that important.

The Geriatric Australian Open

You’ve probably heard about the steady aging of professional tennis.  In both the men’s and women’s games, fewer teenagers than ever are winning important matches, and more and more thirty-somethings are remaining at the top of the game.

My favorite illustration: 25 years ago, the oldest man in the Australian Open draw was Johan Kriek, about two months short of his 31st birthday when the tournament began.  This year, 24 men in the main draw are older.

A total of 33 men in the singles draw have reached their fourth decade, only the third time in tournament history that the number has exceeded 20.  If lucky loser Stephane Robert replaces the injured Gilles Simon, we’ll have 34 thirty-somethings, tied with the all-time record, set in 2012.

Even without Simon’s withdrawal, we already have a record for average age in the men’s draw.  That figure this year is 27 years and 126 days, 80 days more than the previous record, set last year.  (Replacing Simon with Robert would add another 11 days to the average.) The new record also marks the seventh consecutive year that the average age of the men’s singles draw has increased.

While the age of the women’s draw isn’t quite record-setting, the rise of thirty-somethings in the women’s game has been even more rapid.  Only 13 years ago, in 2001, Els Callens was the only woman over the age of 30 in the draw (she was a mere 156 days past her 30th birthday).  This year, there are a record-high 15 players over the age of 30 in the women’s singles draw.

The 2012 Aussie Open field remains the oldest on record, at 24 years and 321 days.  This year’s draw–at 24 years and 292 days–is close enough that, had 16-year-old Ana Konjuh lost her third-round qualifying match to Olga Savchuk, ten years her senior, we would be looking at a new record.

Long term trends and the folly of forecasting

By just about any metric you might devise, the game has gotten steadily older for about 25 years.  As with any trend in the news, this one has led too many commentators–both casual and more academic–to claim that this is a permanent trend, or that “you’ll never see another teenage tennis champ.”

Protip: Never put your money on “never.”

What these arguments often fail to account for is that, for about twenty years after the inception of the pro game in the late 1960s, the sport–both men’s and women’s–consistently got younger.  When the 2012 Wimbledon men’s draw broke that event’s record for average age, the record it was breaking was from 1968.

Sure, there are plenty of possible explanations for the steady age decline of the 1970s and 1980s, just as there are many for the current increase.  And there are probably hard limits at either extreme that prevent the age of the game from swinging too far in either direction.

In any case, we’re not in the middle of an infinite rise in ages any more than we were amid an endless decline in 1985.  Twenty years from now, the 2014 Aussie Open data points could be an meaningless step on this upward path or an important inflection point in another shift in the game.  We’re unlikely to see a teenage Slam champ next year, or the year after that, but is it really possible to make a sensible case that, in six years, today’s 12-year-olds will be helpless against today’s 24-year-olds?

What we can be confident about is what has happened, and even without accounting for the return of Pat Rafter, this year’s Melbourne field represents yet another data point in the aging of elite-level tennis.

Detailed stats: Lots of great things are happening with the Match Charting Project. Several people have stepped forward and started contributing to the project already this year, and we’re up to 144 matches in the database.  From Day One in Australia: Bencic vs Date-Krumm, Venus vs Makarova, and Errani vs Goerges.  I hope you’ll join in the fun.

Winners and Losers in the 2014 Australian Open Men’s Draw

Every draw carries with it plenty of luck, but even by Grand Slam standards, this year’s Australian Open men’s singles draw seems a bit lopsided.  The top half makes possible a Rafael NadalRoger Federer semifinal, at least if Federer gets past Andy Murray and Nadal beats the likes of Bernard Tomic.

While Novak Djokovic is seeded below Nadal, he gets the benefit of a projected semifinal matchup with David Ferrer.  A more substantial challenge may arise one round earlier, as a possible quarterfinal opponent is Stanislas Wwrinka, who took Djokovic to a fifth set twice in the last four Grand Slams.

As I’ve done in the past, let’s quantify each player’s draw luck.  Using my forecast, combined with a forecast generated by randomizing the bracket, we can see who were the biggest winners and losers in yesterday’s draw ceremony.

The algorithmic approach is most useful in confirming our suspicions about the draw luck of the top players.  Djokovic and Ferrer, the top seeds in the bottom half, definitely came out ahead.  While Djokovic had a respectable 28.0% chance of winning the tournament in the randomized projection, he has a 33.7% chance given the way the draw turned out.  In turns of expected ranking points, the draw gave him a 10.7% boost, from an expectation of 747 points to one of 827 points.  In percentage terms, Ferrer’s expectation jumped even more, from 312 to 368 (18.0%).

Nadal, however, had the worst draw luck of the top ten seeds.  Before the bracket was arranged, he had a 30.7% chance of winning the title, with an expectation of 763 ranking points.  Once the draw was set, his title chances fell to 24.9% and his point expectation dropped to 662.  No one else in the top ten lost more than 7% of their expected ranking points on draw day; Nadal lost 13%.

It doesn’t take an algorithm, though, to identify the draw’s worst losers.  They’re placed where you’ll always find them: right next to the top two seeds.  In the randomized projection, Tomic had a 58% chance of winning his first-round match and a 27% chance of reaching the third round.  In reality, though, he’ll play Nadal first.  His slight chance of earning a place in the second round gives him an expectation of 29 ranking points (10 of which he earns simply by showing up).  In the random projection, his ranking point expectation was 75.

Lukas Lacko, the unlucky man who will play Djokovic in the first round, didn’t suffer quite so much, if only because he didn’t have as high of expectations in the first place.  Before the draw, he could expect 48 ranking points and a 15% chance of reaching the third round.  Now, his projection is a mere 24 ranking points, one of the worst in the entire draw.

The luckiest players are always those who had little chance of progressing far in the draw, but managed to draw someone equally inept.  At the Australian Open, the four luckiest guys have yet to be identified: all are qualifiers.  The luckiest man of all will be the one who is placed in the topmost qualifying spot, opposite Lucas Pouille.  At this stage, my rating system doesn’t think much of the Frenchman, so it is likely that the qualifier will be the heavy favorite entering that match.

In the randomized projection, each qualifier has a 29% chance of winning his first match and a 6% chance of winning his second, for a weighted average of 32 ranking points.  The man who plays Pouille, however, will enter the field with an expectation of 55 ranking points.  Other qualifiers with nearly the same happy outcome will be those who draw Federico Delbonis, Julian Reister, and Jan Hajek in the opening round.

Here are the pre-draw and post-draw expected ranking points of the men’s seeds, along with the percentage of pre-draw points they gained or lost:

Player                 Seed  Pre  Post  Change  
Rafael Nadal           1     763   662  -13.2%  
Novak Djokovic         2     747   827   10.7%  
David Ferrer           3     312   368   18.0%  
Andy Murray            4     473   488    3.1%  
Juan Martin Del Potro  5     421   393   -6.6%  
Roger Federer          6     411   397   -3.4%  
Tomas Berdych          7     264   317   20.2%  
Stanislas Wawrinka     8     290   279   -3.9%  

Player                 Seed  Pre  Post  Change
Richard Gasquet        9     186   186    0.1%  
Jo Wilfried Tsonga     10    151   187   23.8%  
Milos Raonic           11    223   234    5.0%  
Tommy Haas             12    207   222    7.5%  
John Isner             13    176   196   11.2%  
Mikhail Youzhny        14    190   193    1.5%  
Fabio Fognini          15    101    81  -19.3%  
Kei Nishikori          16    172   135  -21.6%  

Player                 Seed  Pre  Post  Change
Tommy Robredo          17     71    61  -13.4%  
Gilles Simon           18    116    95  -18.3%  
Kevin Anderson         19     80   107   33.9%  
Jerzy Janowicz         20     99   154   55.3%  
Philipp Kohlschreiber  21    125   132    6.2%  
Grigor Dimitrov        22    136   122  -10.1%  
Ernests Gulbis         23    125   107  -14.1%  
Andreas Seppi          24     94    49  -47.8%  

Player                 Seed  Pre  Post  Change
Gael Monfils           25    147   101  -31.4%  
Feliciano Lopez        26    100    80  -20.7%  
Benoit Paire           27     94    89   -5.5%  
Vasek Pospisil         28     82    81   -0.9%  
Jeremy Chardy          29    111   126   13.7%  
Dmitry Tursunov        30    101    80  -21.0%  
Fernando Verdasco      31    106   105   -0.8%  
Ivan Dodig             32    104   106    1.8%

Men, Women, and Unforced Errors

Italian translation at settesei.it

If you’ve ever suffered through a debate about the relative merits of men’s and women’s tennis, you’ve probably heard the assertion that women’s tennis is sloppier–“riddled with unforced errors,” perhaps.  Maybe you’ve even made that claim yourself, which is understandable, given how often some version of it crops up, unchallenged, in tennis commentary.

But is it really true?  Do WTA matches feature so many more unforced errors than ATP matches? Unforced errors were counted at most slam matches last year, so we can find out.

Let’s start with the most recent results.  In men’s matches at the 2013 US Open, 33.2% of points ended in an unforced error.  Play may have tightened up just a bit in the final week: In the round of 16 and later, 32.9% of points ended in UFEs.

Women’s matches did, in fact, feature a higher rate of unforced errors. Considering the entire tournament, 39.7% of points ended that way, while in the fourth round and later, the rate dropped to 36.7%.

So yes, there are more unforced errors in the women’s game.  There are similar gaps between ATP and WTA error rates at Wimbledon and the Australian Open, and while the difference on the French Open clay is smaller, it is still present.

Eyeballing errors

However, these aren’t massive differences.  Using the US Open numbers, we can calculate that WTA points ended in UFEs about 20% more often than ATP points.  In the last four rounds of the tournament, when more people are watching closely and drawing conclusions, that difference drops to 11.7%.

Without a scorebook in hand, that gap may well be too small to spot.  In a typical set of, say, 60 points, the average ATP pairing averaged 20 UFEs, against a typical WTA matchup’s  24.  That’s one extra unforced error every other game–if that.  Looking at the four final rounds, the difference drops to 20 UFEs in a men’s match against 22 in a women’s match.  Two extra errors a set.

The divide is real, but it hardly seems substantial enough to represent a major difference in the quality of play or in the viewing experience.

Here are the numbers for the entire field at all four 2013 slams, followed by the rates in the final 16:

Slam             ATP UFE%  WTA UFE%  WTA/ATP  
Australian Open  36.2%        44.4%     1.22  
French Open      33.6%        37.0%     1.10  
Wimbledon        19.1%        24.6%     1.29  
US Open          33.2%        39.7%     1.20

R16 and later:                                           
Slam             ATP UFE%  WTA UFE%  WTA/ATP  
Australian Open  36.4%        41.1%     1.13  
French Open      33.9%        34.9%     1.03  
Wimbledon        20.5%        24.4%     1.19  
US Open          32.9%        36.8%     1.12

Don’t read too much into the contrasts between one slam and another–what’s important here is how the same set of scorers, in the same conditions, are judging men’s and women’s matches.  Wimbledon, especially, is known for its, shall we say, unique approach to counting unforced errors.

Instead, a power gap

The French Open rates are by far the closest of those at the four slams.  This shouldn’t come as a surprise.  On a slower surface, ATPers earn fewer free points than usual on serve, finding themselves more frequently in rallies.  Take away those one- or two-shot rallies that the men’s game is known for, and the UFE disparity starts to shrink.

While we can’t account for all service winners and forced error returns, we can take aces out of the equation.  So far, we’ve only see unforced errors as a percentage of all points.  Take UFEs as a percentage of all non-ace points, and the difference between men’s and women’s error rates decreases.

In other words, now we’re starting to look at what happens when the serve is returnable:

Slam             ATP UFE%  WTA UFE%  WTA/ATP  
Australian Open  39.6%        46.2%     1.17  
French Open      35.6%        38.3%     1.08  
Wimbledon        21.2%        25.9%     1.22  
US Open          36.1%        41.3%     1.14  

R16 and later:                                
Slam             ATP UFE%  WTA UFE%  WTA/ATP  
Australian Open  39.6%        42.8%     1.08  
French Open      35.3%        36.0%     1.02  
Wimbledon        22.7%        25.6%     1.13  
US Open          34.9%        38.3%     1.10

In most of these cases, we’re down to a couple of points per set.  If we were able to sort out service winners and perhaps forced error returns, we would almost surely see even more minor differences.

There’s no doubt that men hit harder serves and are, on average, more likely to win a point without having to hit a second ball.  But if we’re comparing the characteristics of women’s tennis, it doesn’t seem right to give the men credit for not hitting as many unforced errors when some of the already modest difference is due to the dominance of the serve.

Quibbles

This entire analysis depends on the unforced error stat, which I don’t much care for.  It is hugely dependent on the scorer, and there’s no widespread agreement in the sport on what exactly it means.

However, if we want to challenge a widely-held belief about unforced errors, there’s not really any way around using unforced errors, is there?

The best we can do to eliminate scorer’s biases is to compare only within single events.  The same person isn’t counting unforced errors at every US Open match, but each scorer probably works both men’s and women’s matches.  At a given venue, every scorer might even go through the same training program.

Even with that consideration, there is the strong possibility that scorers make adjustments–consciously or unconsciously–depending on the gender of players on court.  If unforced errors are shots that a player should have made but didn’t, a lot hinges on your interpretation of the word “should.”  It may be that some shots would be called unforced errors in a men’s match, but forced errors in a women’s match.  To the extent that’s the case, it’s awfully difficult to compare the genders using a stat that itself differs depending on gender.

On the other hand, scorers are presumably tennis fans, and they’ve heard the same conventional wisdom everyone else has.  If you believe that women hit more unforced errors than men do, perhaps you call borderline women’s shots unforced and borderline men’s shots forced.  In that case, scorers might be unwittingly amplifying the gender difference, not reducing it.

Given the difficulties of collecting data from hundreds of matches on different continents spread across many months, I doubt any non-automated method of counting unforced errors would address all of these issues.  For now, we have to take the official unforced error counts as the best available representation of reality and draw conclusions accordingly.

Whatever the limitations of the data, and whatever the other differences between the genders on a tennis court, unforced error counts are not nearly the distinguishing factor that they’ve been made out to be.

Analytics That Aren’t: Why I’m Not Excited about SAP in Tennis

It’s not analytics, it’s marketing.

The Grand Slams (with IBM) and now the WTA (with SAP) are claiming to deliver powerful analytics to tennis fans.  And it’s certainly true that IBM and SAP collect way more data than the tours would without them.  But what happens to that data?  What analytics do fans actually get?

Based on our experience after several years of IBM working with the Slams and Hawkeye operating at top tournaments, the answers aren’t very promising.  IBM tracks lots of interesting stats, makes some shiny graphs available during matches, and the end result of all this is … Keys to the Match?

Once matches are over and the performance of the Keys to the Match are (blessedly) forgotten, all that data goes into a black hole.

Here’s the message: IBM collects the data. IBM analyzes the data. IBM owns the data. IBM plasters their logo and their “Big Data” slogans all over anything that contains any part of the data. The tournaments and tours are complicit in this: IBM signs a big contract, makes their analytics part of their marketing, and the tournaments and tours consider it a big step forward for tennis analysis.

Sometimes, marketing-driven analytics can be fun.  It gives some fans what they want–counts of forehand winners, or average first-serve speeds. But let’s not fool ourselves. What IBM offers isn’t advancing our knowledge of tennis. In fact, it may be strengthening the same false beliefs that analytical work should be correcting.

SAP: Same Story (So Far)

Early evidence suggests that SAP, in its partnership with the WTA, will follow exactly the same model:

SAP will provide the media with insightful and easily consumable post-match notes which offer point-by-point analysis via a simple point tracker, highlight key events in the match, and compare previous head-to-head and 2013 season performance statistics.

“Easily consumable” is code for “we decide what the narratives are, and we come up with numbers to amplify those narratives.”

Narrative-driven analytics are just as bad–and perhaps more insidious–than marketing-driven analytics, which are simply useless.  The amount of raw data generated in a tennis match is enormous, which is why TV broadcasts give us the same small tidbits of Hawkeye data: distance run during a point, average rally hit point, and so on.  So, under the weight of all those possibilities, why not just find the numbers that support the prevailing narrative? The media will cite those numbers, the fans will feel edified, and SAP will get its name dropped all over the place.

What we’re missing here is context.  Take this SAP-generated stat from a writeup on the WTA site:

The first promising sign for Sharapova against Kanepi was her rally hit point. Sharapova made contact with the ball 76% of the time behind the baseline compared to 89% for her opponent. It doesn’t matter so much what the percentage is – only that it is better than the person standing on the other side of the net.

Is that actually true? I don’t think anyone has ever published any research on whether rally hit point correlates with winning, though it seems sensible enough. In any case, these numbers are crying out for more context.  Is 76% good for Maria? How about keeping her opponent behind the baseline 89% of the time? Is the gap between 76% and 89% particularly large on the WTA? Does Maria’s rally hit point in one match tell us anything about her likely rally hit point in her next match?  After all, the article purports to offer “keys to match” for Maria against her next opponent, Serena Williams.

Here’s another one:

There is a lot to be said for winning the first point of your own service game and that rung true for Sharapova in her quarterfinal. When she won the opening point in 11 of her service games she went on to win nine of those games.

Is there any evidence that winning your first point is more valuable than, say, winning your second point?  Does Sharapova typically have a tough time winning her opening service point?  Is Kanepi a notably difficult returner on the deuce side, or early in games?  “There is a lot to be said” means, roughly, that “we hear this claim a lot, and SAP generated this stat.”

In any type of analytical work, context is everything.  Narrative-driven analytics strip out all context.

The alternative

IBM, SAP, and Hawkeye are tracking a huge amount of tennis data.  For the most part, the raw data is inaccessible to researchers.  The outsiders who are most likely to provide the context that tennis stats so desperately need just don’t have the tools to evaluate these narrative-driven offerings.

Other sporting organizations–notably Major League Baseball–make huge amounts of raw data available.  All this data makes fans more engaged, not less. It’s simply another way for the tours to get fans excited about the game. Statheads–and the lovely people who read their blogs–buy tickets too.

So, SAP, how about it?  Make your branded graphics for TV broadcasts. Provide your easily consumable stats for the media.  But while you’re at it, make your raw data available for independent researchers. That’s something we should all be able to get excited about.