September 2013 - Heavy Topspin

Saving Time With One-Ad and Short-Set Formats

Earlier this week, a USTA advisory group proposed some major changes to the collegiate “dual match” format so that it would better fit three-hour television windows. Most of those involved in college tennis hated the proposal, especially since it virtually eliminates doubles.

Now, the Intercollegiate Tennis Association has issued a counterproposal, a sort of compromise that attempts to shorten the time required by dual matches while retaining the importance of the doubles point. (As usual, Colette Lewis has the scoop. Click that link for her take.)

The ITA’s suggestions boil down to this:

Each game follows the “one-ad” format–after deuce is reached the second time, a single point is played to decide the game.
Both singles and doubles sets will be played to 5, with a tiebreak played at 5-all. Currently, doubles sets are played to 7, while singles are played to the conventional 6.

Does this proposal have a chance of solving the problem it aims to address? How much time will these changes save?

One-ad scoring

On Tuesday, I shared my findings that “no-ad” scoring–that is, the format used by WTA and ATP doubles, and one often proposed by those hoping to shorten any level of tennis–can be expected to reduce the length of the average match by about 10%.

(Today, I’m sticking with percentages, since college matches may last considerably longer than pro matches. In an ATP Challenger match, that typical 10% reduction amounts to eight minutes; in college, it might be as much as twice as long.)

Of course, “one-ad” scoring will not have as much of an impact. In ATP Challenger matches this year, 11% of games have gone to a second deuce. Those second-deuce games, which short-ad would reduce to nine points, averaged 11.8 points. Thus, going to ‘short-ad’ would reduce the number of points per game by six or seven points per match. The effect is roughly half that of no-ad. On a percentage basis, it might be expected to cut match length by 5%.

Recognizing the limitations of ATP Challenger data, I also researched the same numbers using 2013 women’s ITF tournaments, running the gamut from 10K to 100K tournaments. The numbers are roughly the same; two-deuce games go to 11.9 points, and the overall effect would be a little more than seven points per match. In the aggregate, we might be looking at a time reduction of 6% instead of 5%.

While one-ad is a creative compromise between purists and reformers, it probably isn’t enough to get the job done. A dual match that would otherwise last three and a half hours would be cut by ten or twelve minutes.

Short sets

At first glance, the proposal to stop sets at five instead of six (or seven, in the case of doubles) seems like a tweak more cosmetic than anything else. Because the doubles matches are played simultaneously and the singles matches are played simultaneously, dual matches often leave several individual matches abandoned. The matches that would go to a third-set tiebreak rarely get that far. The relatively quick 6-4 6-2 contests are more likely to count toward the end result.

Still, let’s ignore the simultaneous format for a moment. How many matches is this tweak likely to affect?

The only singles sets that will be consistently shorted by this proposal are those that currently reach tiebreaks. A tiebreak is roughly equivalent to two games, so playing a tiebreak at 5-5 requires about the same amount of time as reaching 7-5.

Now using ITF women’s 10K’s as our sample, we can find that singles matches average roughly one tiebreak per six matches. Using the shorthand of “one tiebreak equals two games,” that means that the time savings is about one game per three matches. The average match lasts about twenty games, so that’s one game saved per sixty, or a bit less than 2%. I suppose it might help shorten dual matches in which several simultaneous singles are all going to tiebreaks, but at the aggregate level, short singles sets are less than half as effective as one-ad scoring.

Estimating the time-saving effect of short doubles sets is more difficult, because I don’t have any raw data on first-to-7 doubles sets.

Certainly, cutting two games from the total would be expected to have more than double the effect of one. Instead of simply avoiding 7-6 sets in singles, we’re avoiding 8-6 and 8-7 sets in doubles. Let’s say, for the sake of argument, that the time savings in these close sets is triple that of singles matches, or 6%.

Since doubles sets are cut to first-to-six instead of first-to-seven, lopsided sets will get a little faster, too. For instance, a 7-3 set will be cut to 6-3 or 6-2. Without knowing the distribution of various doubles scorelines, this is all guesswork, but if the lopsided sets are reduced in time by 12-15% and closer sets are less common, we might guess that the time saved in doubles is, on average, about 10%.

However, that’s 10% of a shorter length of time. Given that doubles points are generally shorter, and that doubles is one set as opposed to two or three sets in singles, the weighted average of time savings is probably about 4.5%.

Impact of the ITA proposal

Start with the assumption (however questionable, as I discussed Tuesday) of an average dual match length of three and a half hours. By cutting each set to first-to-five, we can expect a reduction of 4.5%, or perhaps nine minutes. That leaves us with 3:21.

Cut off another 5% to 6% for short-ad scoring, and we save another 10 to 12 minutes. Best case scenario, the overall length comes down to 3:09, for a total savings of just over twenty minutes.

Is that good enough? With on-court warmups eliminated, as both the USTA Advisory Group and the ITA have proposed, it might cut a half hour from each dual match. Still, that leaves plenty of dual matches on the wrong side of three hours. Fine with me, and fine with a whole lot of people in college tennis, but probably not good enough for the USTA and its broadcast partners.

Time-Saving Shenanigans and the Effect of No-Ad

Yesterday, Colette Lewis reported on another set of possible rule changes for college tennis. The goals, as always, are to shorten matches, increase television coverage, and systematically ignore the well-informed preferences of those most closely involved with the game.

Colette does a better job of explaining the limitations of the proposed format than I would, so I encourage you to go read her post.

So that we’re all on the same page, let’s summarize the most recent dual-match format:

Dual matches–meetings between two schools–have six players to a side. There are three doubles and six singles matches. The combined results of the doubles matches is worth one point, and every singles match is worth one point, for a total of seven. The first school to four points wins.
First, three doubles matches are played simultaneously. Each is a single set, first to seven, win by two, and a tiebreak is played at 8-8. If one school wins two of the three doubles matches while the other is still in progress, the third is abandoned.
Next, all six singles matches are played simultaneously. Each singles match is best of three tiebreak sets. Once one school has accumulated four points (including the doubles point), the remaining matches are abandoned, and the contest is over.

And here’s the new version:

Singles first. The singles format stays the same, with six simultaneous matches.
If and only if a team does not accumulate four points in the singles, a compressed version of the doubles is played: the three doubles matches are reduced to 10-point super-tiebreaks. (This time last year, we were debating the merits of those as third sets.)

The proposed alternative would certainly save time. It would also effectively destroy doubles as an important part of college tennis.

At last year’s NCAA Men’s Team Championships, 44 of the 63 dual matches were decided by a score of 4-0 or 4-1. While it’s impossible to know how the abandoned singles matches would have turned out, it’s safe to assume that almost all of those meetings–along with many of the 4-2 outcomes and a few of the 4-3’s–would have been decided before any doubles was necessary.

Since length is such an important part of these debates, I ran some numbers to see what else might be done.

No-ad, no-overtime

The most popular device for speeding up tennis is “no-ad” scoring. You’re probably familiar with it, as both the ATP and WTA tours use it for doubles. Once a game reaches 40-40, the receiving team decides whether to play a final point in the deuce or ad court, and the outcome of that point decides the game.

So, how much time does it save? To get a rough idea of the answer, I looked at roughly 3600 ATP Challenger singles matches from this year. (It’s not the most relevant dataset, I realize, but better “available” than “ideal.”)

In those matches, 24.2% of games went to deuce. Those games averaged 9.7 points each, meaning that a switch to the no-ad format would save 2.7 points per deuce game. Overall, such a change would save about 0.65 points per game across the board. The average best-of-three-sets match lasts about 22 games, so switching to no-ad scoring would reduce the number of points in the typical match by 14 or 15.

At the ATP level, each additional point within a game–that is, one that doesn’t add to the number of changeovers or set breaks–adds about 33 seconds to the length of the match. So the switch to no-ad scoring would shorten the length of an average match by about eight minutes.

Switching doubles to no-ad would have a lesser effect, because the matches are already shorter. Figuring an average match length of 10 to 12 games, that’s another four minutes saved.

The impact is a bit more ambiguous than I’ve made it out to be, because no-ad scoring makes service breaks more common. If the server has a 65% chance of winning a point (typical for male tour pros), he or she has only a 65% chance of holding from deuce in the no-ad format. The server’s chances might be a bit worse, assuming the returner chooses the side which favors him or her. In an ad game, that same server has a 77.5% chance of holding from deuce.

It would take a much more in-depth simulation (informed by much more college-specific assumptions) to know the impact of that difference. Some additional breaks would speed up matches, making 6-0 and 6-1 outcomes more likely, while others would push sets to tiebreaks.

But college tennis takes longer

So far, I’ve been forced to use numbers from the pros to evaluate proposals for collegiates. Somewhere along the line, the numbers don’t add up.

According to the advisory group responsible for the “hide-doubles-in-the-attic” proposal, the average dual match time at last year’s NCAA championships was over three and a half hours. What’s taking so long?

On the ATP tour, the average best-of-three match is just over 90 minutes. Doubles generally moves more quickly, so the first-to-seven matches should be less than half as long. Plus, since dual matches are often decided while the longest singles matches are still going, the average completed match must be shorter than the average match at the college level.

Thus, even accounting for less serve dominance, longer rallies, and assorted factors such as the absence of ballkids and the higher number of lets (remember, these matches are played on adjacent courts), how are we getting so far beyond the magic three-hour time frame?

One explanation is simply poor data collection and analysis. The numbers the advisory group cites are from last year’s team championships, a particularly small sample. And by using an average, and not a median, one or two very long matches can skew the numbers–especially with such a small sample. The five-hour dual matches are surely beyond saving, so why give them so much weight?

An alternative explanation is that college tennis really is that much slower, in which case many of the numbers I cited above don’t tell the whole story. Are there far more deuce games in college than the 24.2% on the Challenger tour? Are interminable, 15- to 20-point deuce games much more common? Do points take considerably longer?

If so, the effect of moving to no-ad scoring would be greater than the twelve-minute conclusion I reached above. Twelve minutes is a little less than one-tenth of my estimate for equivalent ATP matches, assuming a 90-minute average singles match and 45 minutes for doubles. So if dual matches are really lasting three hours and 40 minutes, the equivalent time reduction would be almost double–better than 20 minutes.

Purists may hate no-ad scoring, but given a choice between losing 15-point deuce games and losing college doubles, I’d ditch dramatic deuce games in a second.

Are There More Five-Setters in Davis Cup?

There’s no denying that Davis Cup gives us some of the most dramatic moments on the men’s tennis calendar. It’s easy, then, to fall prey to some mistaken conventional wisdom, such as the canard that upsets are much more common in the international competition.

(In fact, upsets are only more common if Amir Weintraub is playing.)

Even if the favorites usually win, what about hard-fought matches? Is it possible that any given Davis Cup match is more likely to go the distance than a Grand Slam match?

It sounds good, but no, the frequency of five-setters (and even four-setters, for that matter) is steady regardless of context. Since 2003, 18.7% of Grand Slam matches have gone five sets, while just 17.5% of best-of-five Davis Cup rubbers have gone that far.

There are differences among levels of Davis Cup, as we might expect. 19.9% of best-of-five World Group rubbers go five, and 20.3% of World Group playoff rubbers go five. But neither of these numbers stands out compared to some subsets of Slam matches. 19.6% of second-round matches at majors reach a fifth set, while 20.3% of fourth-rounders and 20.6% of quarterfinal matches do so.

Here is the complete breakdown by set length:

LEVEL        MATCHES  3 SETS  4 SETS  5 SETS  
Davis Cup       2496   56.6%   25.9%   17.5%  
Grand Slams     5453   51.2%   30.1%   18.7%  

DAVIS CUP                                              
World Group      473   51.0%   29.2%   19.9%  
WG Playoffs      261   52.9%   26.8%   20.3%  
Group 1          688   54.9%   27.5%   17.6%  
Group 2         1074   61.0%   23.3%   15.7%  

GRAND SLAMS                                            
F                 44   40.9%   40.9%   18.2%  
SF                88   51.1%   29.5%   19.3%  
QF               173   52.0%   27.2%   20.8%  
R16              340   49.4%   30.3%   20.3%  
R32              686   51.5%   30.2%   18.4%  
R64             1368   49.0%   31.4%   19.6%  
R128            2754   52.5%   29.4%   18.1%

There are good reasons why we believe Davis Cup five-setters to be so much more common. At the World Group level, there are never many matches going on, so if two players reach a fifth set–especially if it is the day’s second rubber, after other ties have finished play for the day–it is global tennis news. It’s easy to recall Dudi Sela‘s five-set battles against Vasek Pospisil and Kei Nishikori in the 2011 and 2012 World Group playoffs, but how many of us paid a moment’s attention to Sela’s four-hour clash with Andrey Kuznetsov in the first round of this year’s US Open?

Further, the Davis Cup atmosphere leaves the impression that every match is gripping, even when it isn’t. Janko Tipsarevic beat Pospisil in straight sets yesterday, but thanks to the pair of tiebreaks and the electricity of the Serbian home crowd, we’ll remember that match differently than a typical 7-6 6-2 7-6 victory at a Grand Slam.

Fortunately, fan enjoyment isn’t measured in sets. There is plenty to get excited about–especially the weekend of World Group playoffs–even if upsets and five-set matches aren’t any more frequent than usual.

Can Rafael Nadal Win Seventeen Slams?

Rafael Nadal just won his thirteenth Grand Slam. He’s 27 years old. If he wins four more, he’ll match Roger Federer. If he wins five more, he’ll set a new all-time record. (Assuming, of course, that Roger is done. No guarantees there.)

Can Rafa do it? I think he can, and while he is one of a kind, there are some historical precedents that suggest he will.

Before diving into the numbers, there’s the argument I’ve always used in favor of Nadal piling up plenty of Slams. They hold the French Open every year. Each clay season that Rafa is healthy and playing like Rafa, he’s probably going to add at least one title to the list.

He’ll be just shy of his 28th birthday at the 2014 French Open, meaning that if he keeps winning every match he plays at Roland Garros, he’ll have four more French Open titles right about the time he turns 31, at the 2017 French Open. With what seems like half the tour playing quality tennis at age 30, and with Rafa aggressively skipping events this year, it’s easy to imagine him winning four more Slams on clay.

Or seven more. As long as he can stay healthy enough to play on his favorite surface, one gets the sense it’s up to him.

Setting aside Rafa’s historic dominance in Paris, a look at other modern-era players who have piled up Grand Slam titles suggests that 27 or 28 is hardly the end of the road.

In the last 40 years, only 25 Slam titles have gone to players over the age of 27 and a half. But an awful lot of those 25 titles have been claimed by players who–like Nadal–had already put together quite the resume.

In search of precedents, I looked at the six other players who have won the most Grand Slams in the Open era. To make the list, you need at least eight. Of those, only Bjorn Borg failed to win at least one after his 27th birthday. (Borg, of course, was basically out of tennis at 26.) The other five each won at least three when they were older than Nadal is now.

Federer won four of his 17 from the age of 27 and 10 months. Pete Sampras won three from the age of 27 and 11 months. Ivan Lendl won three of his eight slams from the age of 27 and a half. Jimmy Connors had only won five of his eight slams by the age of 29 and 10 months. Andre Agassi won five of his eight slams after turning 29.

In other words, the players who came closest to matching Nadal’s level of achievement had plenty left in the tank for the last few years of their careers. And most of these players accomplished these feats in eras when 30 year olds weren’t nearly as successful as they are right now.

Average these six players, and we find that they won 23% of their slams after turning 27. (More, if Federer wins another.) If Nadal matches that number, he’ll have won four more, tying Roger’s all-time record.

It seems likely that Rafa will defy–and improve upon–history in at least one way: by showing that Roland Garros can be won by a “old” player. Since 1974, only four French Open titlists have won the tournament while older than Nadal is now. Lendl and Federer did it age 27, Agassi won at 29, and Andres Gomez won at 30. Every other Slam has had more winners in this age bracket.

But with Nadal’s performance this year, dominating on both clay and hard courts, it seems foolish to point to any precedent that suggests he might soon falter at the French.

While there’s no such thing as guaranteed Grand Slam titles–surely Novak Djokovic would have something to say about that, even in Paris–the evidence strongly points to at least a few more for the King of Clay. And as the newly-minted King of North American Hard, Nadal is well positioned to win five more and claim the all-time record.

US Open Point-by-Point Stats Recap

As regular readers know, I’m working on a system to track every shot in a tennis match and then generate meaningful data based on the results. Once I hammer out a few final bugs, I’ll introduce that system publicly. Then, with my interactive Excel doc–and at least a little bit of practice–you can chart matches as well.

In the meantime, I’ve added another set of tables to each one of the point-by-point recaps. My system allows (but does not require) the tracking of each shot’s direction, which seems particularly valuable in the case of a tactical baseline matchup like Monday’s final. Follow the link to the men’s final stats, and then click either of the “shot direction” links. I’ve broken down each player’s shots into crosscourt, down the middle, down the line, inside-out, and inside-in, then broken down each specific shot type (e.g. “forehand inside-out”) and shown the results of that shot.

At this point, the numbers are little more than a basis for conversation and speculation. Except for Serena Williams and Victoria Azarenka, I don’t have stats on more than two matches for any individual player. In time, however, I expect to amass a fair amount of raw data on the top-ranked men and women, and from there, we might really be able to learn something.

In the meantime, here is a list of all the point-by-point stat summaries available from the US Open.

Men:

Women:

Bonus:

Cincinnati final: Azarenka-Serena (thanks Amy!)

Simpler, Better Keys to the Match

Italian translation at settesei.it

If you watched the US Open or visited its website at any point in the last two weeks, you surely noticed the involvement of IBM. Logos and banner ads were everywhere, and even usually-reliable news sites made a point of telling us about the company’s cutting-edge analytics.

Particularly difficult to miss were the IBM “Keys to the Match,” three indicators per player per match. The name and nature of the “keys” strongly imply some kind of predictive power: IBM refers to its tennis offerings as “predictive analytics” and endlessly trumpets its database of 41 million data points.

Yet, as Carl Bialik wrote for the Wall Street Journal, these analytics aren’t so predictive.

It’s common to find that the losing player met more “keys” than the winner did, as was the case in the Djokovic–Wawrinka semifinal. Even when the winner captured more keys, some of these indicators sound particularly irrelevant, such as “average less than 6.5 points per game serving,” the one key that Rafael Nadal failed to meet in yesterday’s victory.

According to one IBM rep, their team is looking for “unusual” statistics, and in that they succeeded. But tennis is a simple game, and unless you drill down to components and do insightful work that no one has ever done in tennis analytics, there are only a few stats that matter. In their quest for the unusual, IBM’s team missed out on the predictive.

IBM vs generic

IBM offered keys for 86 of the 127 men’s matches at the US Open this year. In 20 of those matches, the loser met as many or more of the keys as the winner did. On average, the winner of each match met 1.13 more IBM keys than the loser did.

This is IBM’s best performance of the year so far. At Wimbledon, winners averaged 1.02 more keys than losers, and in 24 matches, the loser met as many or more keys as the winner. At Roland Garros, the numbers were 0.98 and 21, and at the Australian Open, the numbers were 1.08 and 21.

Without some kind of reference point, it’s tough to know how good or bad these numbers are. As Carl noted: “Maybe tennis is so difficult to analyze that these keys do better than anyone else could without IBM’s reams of data and complex computer models.”

It’s not that difficult. In fact, IBM’s millions of data points and scores of “unusual” statistics are complicating what could be very simple.

I tested some basic stats to discover whether there were more straightforward indicators that might outperform IBM’s. (Carl calls them “Sackmann Keys;” I’m going to call them “generic keys.”) It is remarkable just how easy it was to create a set of generic keys that matched, or even slightly outperformed, IBM’s numbers.

Unsurprisingly, two of the most effective stats are winning percentage on first serves, and winning percentage on second serves. As I’ll discuss in future posts, these stats–and others–show surprising discontinuities. That is to say, there is a clear level at which another percentage point or two makes a huge difference in a player’s chances of winning a match. These measurements are tailor-made for keys.

For a third key, I tried first-serve percentage. It doesn’t have nearly the same predictive power as the other two statistics, but it has the benefit of no clear correlation with them. You can have a high first-serve percentage but a low rate of first-serve or second-serve points won, and vice versa. And contrary to some received wisdom, there does not seem to be some high level of first-serve percentage where more first serves is a bad thing. It’s not linear, but he more first serves you put in the box, the better your odds of winning.

Put it all together, and we have three generic keys:

Winning percentage on first-serve points better than 74%
Winning percentage on second-serve points better than 52%
First-serve percentage better than 62%

These numbers are based on the last few years of ATP results on every surface except for clay. For simplicity’s sake, I grouped together grass, hard, and indoor hard, even though separating those surfaces might yield slightly more predictive indicators.

For those 86 men’s matches at the Open this year with IBM keys, the generic keys did a little bit better. Using my indicators–the same three for every player–the loser met as many or more keys 16 times (compared to IBM’s 20) and the winner averaged 1.15 more keys (compared to IBM’s 1.13) than the loser. Results for other slams (with slightly different thresholds for the different surface at Roland Garros) netted similar numbers.

A smarter planet

It’s no accident that the simplest, most generic possible approach to keys provided better results than IBM’s focus on the complex and unusual. It also helps that the generic keys are grounded in domain-specific knowledge (however rudimentary), while many of the IBM keys, such as average first serve speeds below a given number of miles per hour, or set lengths measured in minutes, reek of domain ignorance.

Indeed, comments from IBM’s reps suggest that marketing is more important than accuracy. In Carl’s post, a rep was quoted as saying, “It’s not predictive,” despite the large and brightly-colored announcements to the contrary plastered all over the IBM-powered US Open site. “Engagement” keeps coming up, even though engaging (and unusual) numbers may have nothing to do with match outcomes, and much of the fan engagement I’ve seen is negative.

Then again, maybe the old saw is correct: It’s all good publicity as long as they spell your name right. And it’s not hard to spell “IBM.”

Better keys, more insight

Amid such a marketing effort, it’s easy to lose sight of the fact that the idea of match keys is a good one. Commentators often talk about hitting certain targets, like 70% of first serves in. Yet to my knowledge, no one had done the research.

With my generic keys as a first step, this path could get a lot more interesting. While these single numbers are good guides to performance on hard courts, several extensions spring to mind.

Mainly, these numbers could be improved by making player-specific adjustments. 74% of first-serve points is adequate for an average returner, but what about a poor returner like John Isner? His average first-serve winning percentage this year is nearly 79%, suggesting that he needs to come closer to that number to beat most players. For other players, perhaps a higher rate of first serves in is crucial for victory. Or their thresholds vary particularly dramatically based on surface.

In future posts, I’ll delve into more detail regarding these generic keys and investigate ways in which they might be improved. Outperforming IBM is gratifying, but if our goal is really a “smarter planet,” there is a lot more research to pursue.

Rafael Nadal d. Novak Djokovic: Recap and Detailed Stats

There are a lot of words that can be used to describe Novak Djokovic, but “sloppy” usually isn’t one of them. Despite plenty of brilliance from the Serbian, he made far too many mistakes to win today. Of course, the man on the other side of the net, Rafael Nadal, may be the best in game at forcing his opponent to attempt low-percentage shots out of pure desperation.

This morning, I predicted that, in order to win the match, Nadal would need to serve well, piling up more quick service points than usual, as Djokovic is a master of neutralizing the server’s advantage. Give him a few shots, and it doesn’t matter who delivered the serve or how well they hit it.

That isn’t what happened. Nadal won fewer than one in five service points on or before his second shot. (Djokovic did a little better by that metric, but at 21%, not by much.) Instead, Rafa won the way Novak usually does: by neutralizing his opponent’s serve.

Rafa won 45% of return points today, a mark he has never before reached against Djokovic on hard courts. Even more importantly, he won return points at the same rate when Djokovic was serving at 30-30 or later. Djokovic won what would normally be an impressive number of return points: 38%. In recent years on hard courts, that was always enough to beat the Spaniard.

It was a different kind of hard-court match today, one that was decided in grueling rallies. 20% of points played today reached at least ten strokes, and Rafa won 59% of them. Of points that finished more quickly, Djokovic simply gave away too many. By my unofficial (and rather strict) count, he hit over 60 unforced errors, more than double Nadal’s total.

Too many of those sloppy shots came at crucial moments. A bad forehand miss on a mid-court sitter gave Nadal set point in the third set, which Rafa converted on the first try. Serving down a break in the fourth at 1-4, Djokovic quickly went up 30-30, then missed his second shot on three straight points to give Nadal another break point. At 30-0 in that game, it was possible to imagine Novak clawing his way back. Once the double break was sealed, the match was over.

Djokovic showed plenty of brilliance, especially in the second and third sets, and contributed to some incredible tennis moments, including ten rallies that exceeded 20 shots. Indeed, Djokovic converted a break chance by claiming the best of those, a 54-stroke slugfest in the second set (video here). He didn’t go quietly until that dreadful game at 1-4.

By beating Djokovic at his own game, Nadal solidified his status as the most dominant player on hard courts. His undefeated record on the surface this year didn’t leave that in much doubt, but it had been three years since he won a hard-court Grand Slam. Assuming he stays healthy, even Rafa might agree that he heads to Australia as the player to beat.

Here are the complete point-by-point stats from the match.

Here is a complete win-probability graph, as well.

Djokovic-Nadal XXXVII: The (Actual) Keys to the Match

Both Rafael Nadal and Novak Djokovic have had easy routes to the US Open final. Neither was tested before the semifinals, and neither has yet to play a top-eight opponent. Yet both were pushed further than expected in their last matches. Djokovic nearly lost in another tough five-setter against Stanislas Wawrinka, and Nadal looked almost human at times, spraying errors in his match with Richard Gasquet.

For all that, the field is down to the final two. They’ve played 36 times before, with Nadal leading the career matchup 21-15. On hard courts, it is the 18th meeting, with Djokovic leading 11-6. It is their eleventh encounter in a Grand Slam, of which Rafa has won seven of the previous ten, while they’ve split their two previous US Open finals.

Based on the most relevant pieces of this head-to-head–the last seven Djokovic-Nadal matches on hard courts, dating back to the 2010 US Open–we can identify some clear trends that tell us what to watch for, and what each player must do to seal the US Open title.

The key: Rafa’s service games

Of these last seven hard-court matches, Nadal has won three and Djokovic has won four. If we could find some statistical indicators that each player reached when they won and failed to accomplish when they lost, we might be on to something. Think of it like IBM’s Keys to the Match, but with actual predictive value.

Sure enough, there are plenty of indicators that fit the bill, and they almost all center on Nadal’s serve:

In four of the matches, Nadal has served fewer than 5% aces. In the other three, at least 7% aces. He lost all four of the former, and won all three of the latter.
In four of the matches, Nadal won fewer than 70% of his first-serve points. In the other three, he won at least 71%. He lost all four of the former, and won all three of the latter.
In three of the matches, Nadal won fewer than 47% of his second-serve points. In the other four, won at least 56%. He lost all of the former, and won all but one (the 2011 Indian Wells final) of the latter.

We can sum up the importance of Nadal’s service games from a more Djokovic-centered perspective:

In three of the matches, Djokovic won no more than 33% of return points. In the other four, he won at least 37% of return points. Care to guess which matches he won?

Djokovic’s service non-indicators

The numbers are not nearly so clear for Djokovic’s service games. In the two meetings when Novak hit the most aces, Rafa won. In three of the only four matches when Djokovic made 62% or more of his first serves, Rafa won. (These are starting to sound like some of the more inane of the IBM keys.)

Generally, winning 65% of first serves is good enough for Novak to beat Nadal, except for last month’s match in Canada, when he won 71% of first serves and lost in a third-set tiebreak. In Djokovic’s worst second-serve performance of the seven matches, the 2011 US Open final, he barely won 44% of those points, yet won the match.

Of course, this doesn’t mean that Djokovic’s service stats don’t matter. It’s no accident that Novak’s first-serve percentages were much higher in the three sets he won against Wawrinka than in the two sets he lost. On the contrary, Djokovic’s serve just isn’t as potentially dominant as Nadal’s is.

For example, in Saturday’s semifinals, Nadal won 36% of his service points on or before his second shot, while Djokovic won only 24% of his service points that way. Nadal’s number isn’t staggeringly high (for example, both Kevin Anderson and Marcos Baghdatis topped 40% in that category in their second-round match) but it’s a number he can earn only when serving well. When he isn’t earning those cheap, quick points against Djokovic, Novak takes away the server’s advantage, threatening to break in almost every service game.

By contrast, Djokovic–like Victoria Azarenka–doesn’t consistently earn that type of advantage on serve. Sure, he gets some free points that way, but in general, he takes the slight advantage that serving confers and uses that as an edge in a longer rally. In the semifinal against Wawrinka, his average service point–including aces and unreturnables–lasted more than five shots.

Getting one number for Novak

Individually, Djokovic’s service stats don’t tell us much. But if we consolidate them into one number–Nadal’s return points won–we get a little better clue of what beating Novak requires. In the three matches where Nadal failed to win 34% of return points, he lost. In the two matches where he won at least 42% of return points, he won.

But if you’re counting, you’ve surely noted that I left out two matches. In Montreal last month, Nadal won only 34.7% of return points, and won. In the 2011 US Open final, he won 41.7% of return points, yet lost. Djokovic can be so effective in his own return games–or simply unbeatable when given break point opportunities, like he was that day–that even a masterful return performance like Nadal displayed in that final isn’t always good enough.

So Novak’s numbers just aren’t as indicative as his opponent’s. Instead, keep your eyes on Rafa’s serve statistics. Despite the many long, gut-busting rallies we can expect this afternoon, Nadal has this match–like his previous hard-court meetings with the world #1–on his own racquet.

US Open Final: Serena Williams d. Victoria Azarenka: Recap and Detailed Stats

Today’s final was Serena Williams‘s for the taking. She didn’t seize it as boldly as she might have, but she performed just well enough to overcome both the windy conditions and a reliably dogged opponent in Victoria Azarenka.

When Serena is playing as well as she did during the third set, it’s tough to see how she ever loses. But today we saw an excellent illustration of both her assets and her liabilities. If her opponent can hang around in rallies, there will be enough errors to swing some matches in the other direction. Most of the WTA rank and file can’t absorb her pace and stick around long enough to reap the benefits of those errors, but Vika can.

And when Azarenka is playing her best, as she did on occasion throughout this match, she can attack on one of Serena’s less penetrating shots, creating opportunities for her own winners. A player with a bigger serve would do that with her serve; Vika must try to do so within each rally.

By the numbers, it’s a bit of a miracle that Vika forced a third set. Twice in the second set, Serena served for the match and was broken. It was a testament to Azarenka’s stubbornness, always putting one more ball back in play, forcing Serena to overcome both the pressure and the wind. In that second set, Williams had a hard time doing that.

It was the wind–and Serena’s difficulty dealing with it–that kept this match going as long as it did. While it made life difficult for both players at times, especially when playing on the right side of the chair, Serena struggled much more. She never really adjusted to the conditions, setting up early and taking big swings when the wind was likely to move the ball a bit too much for that. Many of Serena’s errors–especially her 33 unforced errors on the backhand side alone–can be attributed to that sloppiness.

By the third set, the wind had settled down and so had Serena. Azarenka provided some help with two crucial double faults in the fourth game of the set, including one on break point. It wasn’t her first poorly-timed double fault of the match–four of her five came at 30-30 or later–but this one was the beginning of the end. Unlike in the second set, Serena didn’t let up. She consolidated the break by holding to love, with an unreturnable, two aces, and a running backhand lob winner.

I wrote this morning that Azarenka’s chances hinged on her serve. She won 54.5% of her service points, a bit less than she did against Serena in Cincinnati, but better than she did in each of her last three matches in New York. Had she limited her double faults to less important moments, 54.5% may well have been enough.

In the end, Serena was simply too strong. Vika is the very best on tour at what she does, negating the advantage of those huge weapons, but it allows her very little margin for error against Serena. That margin for error wasn’t quite enough for her to pull off the upset today.

Here are the point-by-point-based serve, return, and shot-type stats for the match.

Does Azarenka Have a Chance?

The last two times Serena Williams and Victoria Azarenka have met on hard courts, Azarenka has come out on top. As much confidence as that might give her going into today’s final, it might be the only evidence suggesting she’s likely to win.

Today’s match will come down to Vika’s ability to hold serve, and while she has moved quickly through her last two rounds, she has yet to show that she can serve well enough to hold off the onslaught that is Serena’s return game.

In the semifinal against Flavia Pennetta, she lost more service points than she won, and was broken in five of her nine service games. Against Daniela Hantuchova, she lost 47% of her service points, suffering three service breaks. Playing Ana Ivanovic, she lost more than half of her service points, and was broken seven times.

While each of those players had a nice tournament, this is not exactly a Hall of Fame lineup that has reduced Azarenka’s service games to coin flips. None brings anywhere near the weaponry to the return game that Serena does. And Serena is considerably more difficult to break back.

These numbers make it all the more surprising that the last meeting between these two players ended in Vika’s favor. We have detailed data from that most recent matchup: Azarenka managed to win 55% of her service points (the same figure she held Serena to) and landed 11 of 12 serves on game points, winning nine of them.

Another promising data point is last year’s US Open final, in which Serena managed to win only 44% of Azarenka’s service points. In both of these recent contests, the differences between Vika’s first-serve and second-serve success rates is tiny–in New York last year, it was a mere two percentage points–suggesting that she needs only a slight edge at the beginning of a rally to win the point.

Azarenka has the ability to step up her game for the big matches, so the question she’ll have to answer today is: Can she serve more effectively than she has all tournament? If she does, even at the modest level she did in Cincinnati, we’re in for a very competitive afternoon of tennis.

—

Check out this final preview from Tom Perrotta, in which everyone agrees that Vika will raise her level today.

If you missed it yesterday, I wrote recaps of both men’s semifinals. Djokovic-Wawrinka here, and Nadal-Gasquet here. In those posts you can find links to my point-by-point based stats for both matches.

Finally, don’t miss this piece from Carl Bialik, in which he looks at IBM’s not-very-predictive “predictive analytics,” otherwise known as their Keys to the Match. Next week, I’ll offer a closer look at the details of the better-performing “Sackmann Keys,” which, it turns out, have much more value for tennis analysis than merely showing up the folks at IBM.