The Geriatric Australian Open

You’ve probably heard about the steady aging of professional tennis.  In both the men’s and women’s games, fewer teenagers than ever are winning important matches, and more and more thirty-somethings are remaining at the top of the game.

My favorite illustration: 25 years ago, the oldest man in the Australian Open draw was Johan Kriek, about two months short of his 31st birthday when the tournament began.  This year, 24 men in the main draw are older.

A total of 33 men in the singles draw have reached their fourth decade, only the third time in tournament history that the number has exceeded 20.  If lucky loser Stephane Robert replaces the injured Gilles Simon, we’ll have 34 thirty-somethings, tied with the all-time record, set in 2012.

Even without Simon’s withdrawal, we already have a record for average age in the men’s draw.  That figure this year is 27 years and 126 days, 80 days more than the previous record, set last year.  (Replacing Simon with Robert would add another 11 days to the average.) The new record also marks the seventh consecutive year that the average age of the men’s singles draw has increased.

While the age of the women’s draw isn’t quite record-setting, the rise of thirty-somethings in the women’s game has been even more rapid.  Only 13 years ago, in 2001, Els Callens was the only woman over the age of 30 in the draw (she was a mere 156 days past her 30th birthday).  This year, there are a record-high 15 players over the age of 30 in the women’s singles draw.

The 2012 Aussie Open field remains the oldest on record, at 24 years and 321 days.  This year’s draw–at 24 years and 292 days–is close enough that, had 16-year-old Ana Konjuh lost her third-round qualifying match to Olga Savchuk, ten years her senior, we would be looking at a new record.

Long term trends and the folly of forecasting

By just about any metric you might devise, the game has gotten steadily older for about 25 years.  As with any trend in the news, this one has led too many commentators–both casual and more academic–to claim that this is a permanent trend, or that “you’ll never see another teenage tennis champ.”

Protip: Never put your money on “never.”

What these arguments often fail to account for is that, for about twenty years after the inception of the pro game in the late 1960s, the sport–both men’s and women’s–consistently got younger.  When the 2012 Wimbledon men’s draw broke that event’s record for average age, the record it was breaking was from 1968.

Sure, there are plenty of possible explanations for the steady age decline of the 1970s and 1980s, just as there are many for the current increase.  And there are probably hard limits at either extreme that prevent the age of the game from swinging too far in either direction.

In any case, we’re not in the middle of an infinite rise in ages any more than we were amid an endless decline in 1985.  Twenty years from now, the 2014 Aussie Open data points could be an meaningless step on this upward path or an important inflection point in another shift in the game.  We’re unlikely to see a teenage Slam champ next year, or the year after that, but is it really possible to make a sensible case that, in six years, today’s 12-year-olds will be helpless against today’s 24-year-olds?

What we can be confident about is what has happened, and even without accounting for the return of Pat Rafter, this year’s Melbourne field represents yet another data point in the aging of elite-level tennis.

Detailed stats: Lots of great things are happening with the Match Charting Project. Several people have stepped forward and started contributing to the project already this year, and we’re up to 144 matches in the database.  From Day One in Australia: Bencic vs Date-Krumm, Venus vs Makarova, and Errani vs Goerges.  I hope you’ll join in the fun.

Winners and Losers in the 2014 Australian Open Men’s Draw

Every draw carries with it plenty of luck, but even by Grand Slam standards, this year’s Australian Open men’s singles draw seems a bit lopsided.  The top half makes possible a Rafael NadalRoger Federer semifinal, at least if Federer gets past Andy Murray and Nadal beats the likes of Bernard Tomic.

While Novak Djokovic is seeded below Nadal, he gets the benefit of a projected semifinal matchup with David Ferrer.  A more substantial challenge may arise one round earlier, as a possible quarterfinal opponent is Stanislas Wwrinka, who took Djokovic to a fifth set twice in the last four Grand Slams.

As I’ve done in the past, let’s quantify each player’s draw luck.  Using my forecast, combined with a forecast generated by randomizing the bracket, we can see who were the biggest winners and losers in yesterday’s draw ceremony.

The algorithmic approach is most useful in confirming our suspicions about the draw luck of the top players.  Djokovic and Ferrer, the top seeds in the bottom half, definitely came out ahead.  While Djokovic had a respectable 28.0% chance of winning the tournament in the randomized projection, he has a 33.7% chance given the way the draw turned out.  In turns of expected ranking points, the draw gave him a 10.7% boost, from an expectation of 747 points to one of 827 points.  In percentage terms, Ferrer’s expectation jumped even more, from 312 to 368 (18.0%).

Nadal, however, had the worst draw luck of the top ten seeds.  Before the bracket was arranged, he had a 30.7% chance of winning the title, with an expectation of 763 ranking points.  Once the draw was set, his title chances fell to 24.9% and his point expectation dropped to 662.  No one else in the top ten lost more than 7% of their expected ranking points on draw day; Nadal lost 13%.

It doesn’t take an algorithm, though, to identify the draw’s worst losers.  They’re placed where you’ll always find them: right next to the top two seeds.  In the randomized projection, Tomic had a 58% chance of winning his first-round match and a 27% chance of reaching the third round.  In reality, though, he’ll play Nadal first.  His slight chance of earning a place in the second round gives him an expectation of 29 ranking points (10 of which he earns simply by showing up).  In the random projection, his ranking point expectation was 75.

Lukas Lacko, the unlucky man who will play Djokovic in the first round, didn’t suffer quite so much, if only because he didn’t have as high of expectations in the first place.  Before the draw, he could expect 48 ranking points and a 15% chance of reaching the third round.  Now, his projection is a mere 24 ranking points, one of the worst in the entire draw.

The luckiest players are always those who had little chance of progressing far in the draw, but managed to draw someone equally inept.  At the Australian Open, the four luckiest guys have yet to be identified: all are qualifiers.  The luckiest man of all will be the one who is placed in the topmost qualifying spot, opposite Lucas Pouille.  At this stage, my rating system doesn’t think much of the Frenchman, so it is likely that the qualifier will be the heavy favorite entering that match.

In the randomized projection, each qualifier has a 29% chance of winning his first match and a 6% chance of winning his second, for a weighted average of 32 ranking points.  The man who plays Pouille, however, will enter the field with an expectation of 55 ranking points.  Other qualifiers with nearly the same happy outcome will be those who draw Federico Delbonis, Julian Reister, and Jan Hajek in the opening round.

Here are the pre-draw and post-draw expected ranking points of the men’s seeds, along with the percentage of pre-draw points they gained or lost:

Player                 Seed  Pre  Post  Change  
Rafael Nadal           1     763   662  -13.2%  
Novak Djokovic         2     747   827   10.7%  
David Ferrer           3     312   368   18.0%  
Andy Murray            4     473   488    3.1%  
Juan Martin Del Potro  5     421   393   -6.6%  
Roger Federer          6     411   397   -3.4%  
Tomas Berdych          7     264   317   20.2%  
Stanislas Wawrinka     8     290   279   -3.9%  

Player                 Seed  Pre  Post  Change
Richard Gasquet        9     186   186    0.1%  
Jo Wilfried Tsonga     10    151   187   23.8%  
Milos Raonic           11    223   234    5.0%  
Tommy Haas             12    207   222    7.5%  
John Isner             13    176   196   11.2%  
Mikhail Youzhny        14    190   193    1.5%  
Fabio Fognini          15    101    81  -19.3%  
Kei Nishikori          16    172   135  -21.6%  

Player                 Seed  Pre  Post  Change
Tommy Robredo          17     71    61  -13.4%  
Gilles Simon           18    116    95  -18.3%  
Kevin Anderson         19     80   107   33.9%  
Jerzy Janowicz         20     99   154   55.3%  
Philipp Kohlschreiber  21    125   132    6.2%  
Grigor Dimitrov        22    136   122  -10.1%  
Ernests Gulbis         23    125   107  -14.1%  
Andreas Seppi          24     94    49  -47.8%  

Player                 Seed  Pre  Post  Change
Gael Monfils           25    147   101  -31.4%  
Feliciano Lopez        26    100    80  -20.7%  
Benoit Paire           27     94    89   -5.5%  
Vasek Pospisil         28     82    81   -0.9%  
Jeremy Chardy          29    111   126   13.7%  
Dmitry Tursunov        30    101    80  -21.0%  
Fernando Verdasco      31    106   105   -0.8%  
Ivan Dodig             32    104   106    1.8%

Men, Women, and Unforced Errors

Italian translation at settesei.it

If you’ve ever suffered through a debate about the relative merits of men’s and women’s tennis, you’ve probably heard the assertion that women’s tennis is sloppier–“riddled with unforced errors,” perhaps.  Maybe you’ve even made that claim yourself, which is understandable, given how often some version of it crops up, unchallenged, in tennis commentary.

But is it really true?  Do WTA matches feature so many more unforced errors than ATP matches? Unforced errors were counted at most slam matches last year, so we can find out.

Let’s start with the most recent results.  In men’s matches at the 2013 US Open, 33.2% of points ended in an unforced error.  Play may have tightened up just a bit in the final week: In the round of 16 and later, 32.9% of points ended in UFEs.

Women’s matches did, in fact, feature a higher rate of unforced errors. Considering the entire tournament, 39.7% of points ended that way, while in the fourth round and later, the rate dropped to 36.7%.

So yes, there are more unforced errors in the women’s game.  There are similar gaps between ATP and WTA error rates at Wimbledon and the Australian Open, and while the difference on the French Open clay is smaller, it is still present.

Eyeballing errors

However, these aren’t massive differences.  Using the US Open numbers, we can calculate that WTA points ended in UFEs about 20% more often than ATP points.  In the last four rounds of the tournament, when more people are watching closely and drawing conclusions, that difference drops to 11.7%.

Without a scorebook in hand, that gap may well be too small to spot.  In a typical set of, say, 60 points, the average ATP pairing averaged 20 UFEs, against a typical WTA matchup’s  24.  That’s one extra unforced error every other game–if that.  Looking at the four final rounds, the difference drops to 20 UFEs in a men’s match against 22 in a women’s match.  Two extra errors a set.

The divide is real, but it hardly seems substantial enough to represent a major difference in the quality of play or in the viewing experience.

Here are the numbers for the entire field at all four 2013 slams, followed by the rates in the final 16:

Slam             ATP UFE%  WTA UFE%  WTA/ATP  
Australian Open  36.2%        44.4%     1.22  
French Open      33.6%        37.0%     1.10  
Wimbledon        19.1%        24.6%     1.29  
US Open          33.2%        39.7%     1.20

R16 and later:                                           
Slam             ATP UFE%  WTA UFE%  WTA/ATP  
Australian Open  36.4%        41.1%     1.13  
French Open      33.9%        34.9%     1.03  
Wimbledon        20.5%        24.4%     1.19  
US Open          32.9%        36.8%     1.12

Don’t read too much into the contrasts between one slam and another–what’s important here is how the same set of scorers, in the same conditions, are judging men’s and women’s matches.  Wimbledon, especially, is known for its, shall we say, unique approach to counting unforced errors.

Instead, a power gap

The French Open rates are by far the closest of those at the four slams.  This shouldn’t come as a surprise.  On a slower surface, ATPers earn fewer free points than usual on serve, finding themselves more frequently in rallies.  Take away those one- or two-shot rallies that the men’s game is known for, and the UFE disparity starts to shrink.

While we can’t account for all service winners and forced error returns, we can take aces out of the equation.  So far, we’ve only see unforced errors as a percentage of all points.  Take UFEs as a percentage of all non-ace points, and the difference between men’s and women’s error rates decreases.

In other words, now we’re starting to look at what happens when the serve is returnable:

Slam             ATP UFE%  WTA UFE%  WTA/ATP  
Australian Open  39.6%        46.2%     1.17  
French Open      35.6%        38.3%     1.08  
Wimbledon        21.2%        25.9%     1.22  
US Open          36.1%        41.3%     1.14  

R16 and later:                                
Slam             ATP UFE%  WTA UFE%  WTA/ATP  
Australian Open  39.6%        42.8%     1.08  
French Open      35.3%        36.0%     1.02  
Wimbledon        22.7%        25.6%     1.13  
US Open          34.9%        38.3%     1.10

In most of these cases, we’re down to a couple of points per set.  If we were able to sort out service winners and perhaps forced error returns, we would almost surely see even more minor differences.

There’s no doubt that men hit harder serves and are, on average, more likely to win a point without having to hit a second ball.  But if we’re comparing the characteristics of women’s tennis, it doesn’t seem right to give the men credit for not hitting as many unforced errors when some of the already modest difference is due to the dominance of the serve.

Quibbles

This entire analysis depends on the unforced error stat, which I don’t much care for.  It is hugely dependent on the scorer, and there’s no widespread agreement in the sport on what exactly it means.

However, if we want to challenge a widely-held belief about unforced errors, there’s not really any way around using unforced errors, is there?

The best we can do to eliminate scorer’s biases is to compare only within single events.  The same person isn’t counting unforced errors at every US Open match, but each scorer probably works both men’s and women’s matches.  At a given venue, every scorer might even go through the same training program.

Even with that consideration, there is the strong possibility that scorers make adjustments–consciously or unconsciously–depending on the gender of players on court.  If unforced errors are shots that a player should have made but didn’t, a lot hinges on your interpretation of the word “should.”  It may be that some shots would be called unforced errors in a men’s match, but forced errors in a women’s match.  To the extent that’s the case, it’s awfully difficult to compare the genders using a stat that itself differs depending on gender.

On the other hand, scorers are presumably tennis fans, and they’ve heard the same conventional wisdom everyone else has.  If you believe that women hit more unforced errors than men do, perhaps you call borderline women’s shots unforced and borderline men’s shots forced.  In that case, scorers might be unwittingly amplifying the gender difference, not reducing it.

Given the difficulties of collecting data from hundreds of matches on different continents spread across many months, I doubt any non-automated method of counting unforced errors would address all of these issues.  For now, we have to take the official unforced error counts as the best available representation of reality and draw conclusions accordingly.

Whatever the limitations of the data, and whatever the other differences between the genders on a tennis court, unforced error counts are not nearly the distinguishing factor that they’ve been made out to be.

Analytics That Aren’t: Why I’m Not Excited about SAP in Tennis

It’s not analytics, it’s marketing.

The Grand Slams (with IBM) and now the WTA (with SAP) are claiming to deliver powerful analytics to tennis fans.  And it’s certainly true that IBM and SAP collect way more data than the tours would without them.  But what happens to that data?  What analytics do fans actually get?

Based on our experience after several years of IBM working with the Slams and Hawkeye operating at top tournaments, the answers aren’t very promising.  IBM tracks lots of interesting stats, makes some shiny graphs available during matches, and the end result of all this is … Keys to the Match?

Once matches are over and the performance of the Keys to the Match are (blessedly) forgotten, all that data goes into a black hole.

Here’s the message: IBM collects the data. IBM analyzes the data. IBM owns the data. IBM plasters their logo and their “Big Data” slogans all over anything that contains any part of the data. The tournaments and tours are complicit in this: IBM signs a big contract, makes their analytics part of their marketing, and the tournaments and tours consider it a big step forward for tennis analysis.

Sometimes, marketing-driven analytics can be fun.  It gives some fans what they want–counts of forehand winners, or average first-serve speeds. But let’s not fool ourselves. What IBM offers isn’t advancing our knowledge of tennis. In fact, it may be strengthening the same false beliefs that analytical work should be correcting.

SAP: Same Story (So Far)

Early evidence suggests that SAP, in its partnership with the WTA, will follow exactly the same model:

SAP will provide the media with insightful and easily consumable post-match notes which offer point-by-point analysis via a simple point tracker, highlight key events in the match, and compare previous head-to-head and 2013 season performance statistics.

“Easily consumable” is code for “we decide what the narratives are, and we come up with numbers to amplify those narratives.”

Narrative-driven analytics are just as bad–and perhaps more insidious–than marketing-driven analytics, which are simply useless.  The amount of raw data generated in a tennis match is enormous, which is why TV broadcasts give us the same small tidbits of Hawkeye data: distance run during a point, average rally hit point, and so on.  So, under the weight of all those possibilities, why not just find the numbers that support the prevailing narrative? The media will cite those numbers, the fans will feel edified, and SAP will get its name dropped all over the place.

What we’re missing here is context.  Take this SAP-generated stat from a writeup on the WTA site:

The first promising sign for Sharapova against Kanepi was her rally hit point. Sharapova made contact with the ball 76% of the time behind the baseline compared to 89% for her opponent. It doesn’t matter so much what the percentage is – only that it is better than the person standing on the other side of the net.

Is that actually true? I don’t think anyone has ever published any research on whether rally hit point correlates with winning, though it seems sensible enough. In any case, these numbers are crying out for more context.  Is 76% good for Maria? How about keeping her opponent behind the baseline 89% of the time? Is the gap between 76% and 89% particularly large on the WTA? Does Maria’s rally hit point in one match tell us anything about her likely rally hit point in her next match?  After all, the article purports to offer “keys to match” for Maria against her next opponent, Serena Williams.

Here’s another one:

There is a lot to be said for winning the first point of your own service game and that rung true for Sharapova in her quarterfinal. When she won the opening point in 11 of her service games she went on to win nine of those games.

Is there any evidence that winning your first point is more valuable than, say, winning your second point?  Does Sharapova typically have a tough time winning her opening service point?  Is Kanepi a notably difficult returner on the deuce side, or early in games?  “There is a lot to be said” means, roughly, that “we hear this claim a lot, and SAP generated this stat.”

In any type of analytical work, context is everything.  Narrative-driven analytics strip out all context.

The alternative

IBM, SAP, and Hawkeye are tracking a huge amount of tennis data.  For the most part, the raw data is inaccessible to researchers.  The outsiders who are most likely to provide the context that tennis stats so desperately need just don’t have the tools to evaluate these narrative-driven offerings.

Other sporting organizations–notably Major League Baseball–make huge amounts of raw data available.  All this data makes fans more engaged, not less. It’s simply another way for the tours to get fans excited about the game. Statheads–and the lovely people who read their blogs–buy tickets too.

So, SAP, how about it?  Make your branded graphics for TV broadcasts. Provide your easily consumable stats for the media.  But while you’re at it, make your raw data available for independent researchers. That’s something we should all be able to get excited about.

The 2014 Coach Smackdown

On the heels of the announcement that Boris Becker will coach Novak Djokovic, today we learned that Stefan Edberg will be part of Roger Federer‘s team for the first ten weeks of the season.  There will be more men’s Grand Slam champions in Australian Open coaching boxes than in the singles draw.

We’ve probably wrenched all possible commentary out of the head-to-head matchups of today’s slate of top players, so why not turn to their coaches instead?  Steve Tignor got us started:

I put together a list of 15 coaches and advisors, including Becker, Edberg, and Ivan Lendl, along with such names as Juan Carlos Ferrero, Goran Ivanisevic, and Michael Chang.  Many of them never played each other, since not all of their careers overlapped, but many of them did.

Becker, Edberg, and Lendl figure most prominently in these matchups, while Chang, Ivanisevic, and Sergi Bruguera also played plenty of matches against their fellow coaches.

Novak’s new coach barely edges out Andy Murray‘s coach as the king of his generation of advisors.  His 66-38 record against these 14 colleagues is slightly better than Lendl’s 47-28.  In eight of ten head-to-heads, Becker came out even or better. But one of those, as Tignor pointed out, is the matchup against Lendl, which the Czech leads 11-10.  If coaches can possibly accomplish such a thing, this pair might make Djokovic-Murray matches a little more interesting.

The other unfavorable head-to-head of Becker’s is my favorite quirky stat of the lot.  Twice in April 1993, when Becker was ranked fourth in the world, Franco Davin defeated him.  That’s a little better record for Davin than Juan Martin del Potro‘s 3-11 record against Djokovic.

Here’s the whole set of head-to-heads.  Don’t worry–in a few days the regular season will be back in full swing.

New Ranking Maps and Charts

I’m excited to share with you a couple of new features I’ve been working on for TennisAbstract.com.

First is an interactive ranking map:

rankmap

The above map shows the geographic concentration of teenagers in the WTA top 1000.  Click through to the full-size map, and you can mouse over any country to find out how many players they have in that category.

More importantly, you can customize the map in a variety of ways.  Choose from either the ATP or WTA rankings, decide how deep you’d like to go in the rankings, and if you’d like, limit the age range.  It’s a great way to see which countries are most dominant on each tour, and it’s also an opportunity to visually investigate which nations are likely to hold that power in the near future.

Next is an interactive ranking history chart:

rankchart

This chart shows ranking points for the big four over the past three years.  Again, if you click through to the full-size map, you’ll get more features: mouse over any line to see the date and the player’s ranking points at the date.

Like the map, the ranking chart is fully interactive.  You can select anywhere from one to four players–for now, only in the ATP top 100–choose a timeframe, and select either ranking or ranking points.

One option I want to call you attention to is one of the timeframes: “Year-end (by age).”  Here, instead of dates, the horizontal axis shows ages.  For instance, this graph shows the big four’s year-end rankings at each age.

Enjoy!

The 5 Biggest Comebacks of the 2013 ATP Season

Everybody loves a big comeback, but some of the best come-from-behind wins on the ATP tour this year were such unheralded matchups that they’ve already fallen out of the spotlight.  While everyone else ranks NadalDjokovic matches in their year-end lists, let’s look at the five matches in which the winner had to climb out of the biggest hole.

To do this, I ranked every match this season by Comeback Factor (CF), a stat that identifies the lowest ebb in the match for the eventual winner.  If a player breaks serve to open the match and sails to victory, his chance of winning never falls below 50%.  But if he goes down a set and a break, his odds fall much lower.  If the latter player comes back to win, his CF is much higher.

1. Indian Wells Masters R64: Gilles Simon d. Paolo Lorenzi 6-3 3-6 7-5 (win probability graph)

Lorenzi went up a double break in the final set by winning the first four games on the trot.  Simon held twice to force the Italian to serve for it at 5-2.  Lorenzi went up 40-15 in that service game, earning two match points, before losing four points in a row and dropping serve.  At 5-4, Simon broke him to 15, then broke again to love to seal the final set, 7-5.

At 5-2 40-15 in the 3rd set, Lorenzi’s chance of winning was about 99.8%, the highest recorded in a match this year by a player who didn’t end up winning.

http://www.youtube.com/watch?v=AZVObyMMKbs

2. Queen’s Club R64: Ivan Dodig d. James Ward 6-7(8) 7-6(2) 7-6(2) (win probability graph)

Dodig fought back from nearly the same hole that Simon found himself in, but did so in the second set instead of the third.  Ward won the first set in a tight tiebreak, then earned an early break in the second.  He held on until he served at 5-3, when he reached 40-15.  Dodig won the next four points to erase the break, improving his probability of winning from 0.5% to 21.1%.

Amazingly, the scenario repeated itself in the third set after Dodig won the second in a tiebreak.  Ward went up a break and served for the match again at 5-4, but failed to generate another match point.  The Croatian won a pair of points from 30-30 in that game, then sealed the match in yet another tiebreak.

http://www.youtube.com/watch?v=xvCuh0YvRow

Dodig wasn’t so lucky a couple of months later, when he nearly upset Juan Martin del Potro in Montreal.  In this year’s 7th-biggest comeback, Delpo came back from a double-break hole in the third set to deny Dodig a place in the third round.

3. Madrid Masters R64: Mikhail Youzhny d. Fabio Fognini 7-6(4) 2-6 7-6(5) (win probability graph)

Fognini never had the double break that led to such disaster for Lorenzi and Ward, but he did have something neither of those men did: a triple match point.  At 3-3 in the deciding set, Fognini broke the Russian then consolidated, leading to a chance to serve for the match at 5-4.  After winning his first three points for a 40-0 advantage, his win probability climbed as high as 99.1%.

It wouldn’t go any higher.  Youzhny won 12 of the next 13 points, breaking the Italian, holding his own serve to love, then earning two match points of his own on the Fognini serve before Fabio gathered himself sufficiently to force a tiebreak.  Fognini kept up his streakiness to the end, claiming a minibreak to open the tiebreak, dropping five points in a row, and fighting back to 5-5 before finally losing the match.

http://www.youtube.com/watch?v=v4N6ZILPb0k

4. Roland Garros R32: Tommy Robredo d. Gael Monfils 2-6 6-7(5) 6-2 7-6(3) 6-2 (win probability graph)

Monfils won the first two sets, which you would think put Robredo at enough of a disadvantage.  But the Spaniard’s lowest ebb didn’t come until the fourth set.  He lost serve in the seventh game, and after fighting off a match point at 3-5, he needed to break serve just to stay alive.

The Frenchman went up 40-15, earning two more match points and a win probability of 98.9%.  Robredo won four straight points to get back on serve, easily held, and even challenged Monfils’s own serve (to 0-30) before landing in a tiebreak.  He won that breaker and, compared to the fourth set, won the fifth with ease.

5. Australian Open QF: David Ferrer d. Nicolas Almagro 4-6 4-6 7-5 7-6(4) 6-2 (win probability graph)

After Robredo beat Monfils, he faced Almagro in the 4th round and Ferrer in the quarters.  Conicidentally, those are the two men who, at the Australian Open, gave 2013 its fifth-biggest comeback.

As in Robredo did in his comeback, Ferrer dropped the first two sets.  Unlike his countryman, he found himself in the most danger in the third set.  Almagro broke in the seventh game of the third set and reached 5-4, an opportunity to serve for the match.  But here, history (or something) got in the way. Almagro reached his highest chance of winning, 98.7%, at 15-0, before Ferrer fought his way to 15-40, Almagro got back to deuce, but Ferrer won the game.

Almagro earned more chances to serve for the match, but his odds of winning would never again be so high.  After breaking in yet another seventh game, Nico served for it at 5-4 and again at 6-5.  At 6-5, he reached 15-0 and a win probability of 97.4%, but from that point on, it was all Ferrer.

http://www.youtube.com/watch?v=im3sgZTVP0M

Match Charting Project: Update, Tutorial, Tracking, Tools

Since I announced the Match Charting Project last week, the response has been tremendous.  More than one thousand of you read the post, more than one hundred people downloaded the match charting spreadsheet, and several people have already charted matches, helping build what is already a very useful resource.

We’re nearing 100 charted matches.  Here’s the full list.  A couple of notable recent additions are this year’s Wimbledon men’s final (thanks Verity!), and the 2009 French Open match in which Soderling upset Nadal (thanks Amy!).

New spreadsheet version

I’ve added functionality to note serve-and-volley points, using the plus sign (“+”) after the serve notation.  (I’ve added a bit more detail in the instructions sheet to help explain it.)  It’s optional, but it would be very useful information to have, and if you want to track serve-and-volley attempts this way, you’ll need the newest version of the spreadsheet.  Download it by clicking on the link.

Match charting tutorial

To give you an idea of what match charting is all about, I recorded my screen while charting the first few games of a match.  While it’s not the most captivating entertainment, it demonstrates how I set up my screen, and it may help you make sense out of the notation system we’re using.

Tracking

I maintain two versions of the list of charted matches–by date, or by player. If you’d like to chart a match that isn’t on those lists and is more than a couple of weeks old, you can be almost certain that no one else is working on it. But if you’d like to do a current match, or you just want to make sure, email me to check before you begin. Once you’ve completed your first match, I’ll invite you to a Google doc where charters “claim” matches to avoid duplication.

Charting tools

Here are some tips and tricks that might help you chart a little more effectively.

I find it more convenient to watch video files that are stored on my hard drive–that way, I can work without an internet connection, or survive a weak wireless connection.  You can download YouTube videos using KeepVid, and you can download videos from many other sites with Jaksta.

Once you’ve downloaded a video file, I highly recommend using mplayer to view them.  The killer feature here is that it allows you to speed up or slow down playback.  When you’re starting out, you might want to go as slow as 50% or 60%.  As you get better, you can speed up.  Another great mplayer feature for charting purposes is the ability to skip forward or backward ten seconds or one minute.  It’s a very effective way to rewind and watch a point again, if you missed it.  You can also quickly skip through changeovers, or even through long delays between points, if you’re charting that sort of player.

Finally, if you’re watching videos in fullscreen, you might want to try the 4t Tray Minimizer.  It allows you to pin any program on top, so for instance, if you want to watch TennisTV in fullscreen but keep the spreadsheet on top, it makes that possible.

If you have any questions or suggestions, please email me or leave them in the comments.  Thanks for all your interest so far!

The Luck of the Tiebreak, 2013 Edition

Another year, another new set of tiebreak masters.

Despite the conventional wisdom, very few players demonstrate any kind of consistent tiebreak skill over and above their regular, non-tiebreak tennis playing ability.  In other words, while someone like Novak Djokovic is bound to win well over half of the tiebreaks he plays–after all, he’s better than almost everyone he faces–there’s no secret sauce that allows him to win any more than his usual skill level would suggest.

Nowhere is this more evident than in this year’s top tiebreak performers.  I calculated the likelihood of each player winning every tiebreak they played this year, given their typical rates of serve and return points won, giving us a ranked list of those players who most exceeded and most underperformed expectations.  At the top of the list, names like Roberto Bautista Agut, Dmitry Tursunov, Marin Cilic, and Leonardo Mayer.

Maybe Bautista Agut is a clutch monster just waiting for recognition, but it’s more likely he just had a few bounces go his way.  Cilic is an excellent example: While he won 54% more tiebreaks than expected this year, 2013 was only the second season of the last six in which the Croat exceeded expectations in tiebreaks.  Whether tiebreak performance is clutch skill or simply luck, the numbers show that it isn’t persistent.

However, as I’ve noted before, a very few players do consistently outperform tiebreak expectations.  They tend to be players who find themselves in tiebreaks often, and their success may be because they manage to maintain their serve at its usual level.

John Isner and Roger Federer are the usual suspects.  Isner won 20% more tiebreaks this year than expected, in line with his numbers in 2011 and 2012.  (In 2009 and 2010, he was even better.)  Federer beat expectations by 10%, avoiding his first neutral-or-worse season since 2003 by winning a pair of breakers against tough opponents at the Tour Finals in London.

With another year’s worth of data in the books, we can safely add one more active player to this elite group.  Rafael Nadal was fifth overall this year, winning 23% more tiebreaks than expected.  Nadal hovered around the neutral level until 2008, winning almost exactly as many breakers as his overall skill level would suggest.  But since then, he has had only good tiebreak seasons.  No other player besides Isner and Federer has posted more than four better-than-expected tiebreak seasons in the last six.

For the rest of the ATP, it’s best to look at these numbers as indexes of luck.  The men at the top will probably have to win more non-tiebreak sets next year to maintain their ranking, while the guys at the bottom can expect a modest boost with just a little less bad luck.  That is, unless they play too many tiebreaks against John Isner.

The complete list of 2013 tiebreak performance is below.  ‘TBOE’ is “Tiebreaks Over Expectations,” the difference between the number of tiebreaks my algorithm expects a player to win and the number he actually won.  ‘TBOR’ is a rate version of the same stat, calculated by dividing TBOE by the total number of tiebreaks played.  TBOE rewards players like Isner who play lots of tiebreaks and play them well, while TBOR identifies those who have been particularly lucky in whatever number of tiebreaks they contested.

Player                  TB  TBWon  TBExp  TBOE    TBOR  
Roberto Bautista Agut   21     16   10.3   5.7   27.0%  
Dmitry Tursunov         21     16   10.4   5.6   26.8%  
Marin Cilic             15     11    8.2   2.8   18.7%  
Leonardo Mayer          15      9    6.8   2.2   14.9%  
Rafael Nadal            25     18   14.6   3.4   13.6%  
Gilles Simon            25     16   12.7   3.3   13.0%  
Ivo Karlovic            29     18   14.8   3.2   11.1%  
John Isner              53     36   30.1   5.9   11.1%  
Andy Murray             23     16   13.5   2.5   11.0%  
Fabio Fognini           23     14   11.7   2.3   10.0%  
Juan Martin Del Potro   33     21   17.7   3.3   10.0%  
Benoit Paire            29     17   14.3   2.7    9.3%  
Philipp Kohlschreiber   33     19   15.9   3.1    9.3%  
Jerzy Janowicz          26     15   12.9   2.1    8.2%  
Jarkko Nieminen         27     14   11.9   2.1    7.9%  
Bernard Tomic           30     16   13.7   2.3    7.6%  
Julien Benneteau        24     14   12.4   1.6    6.9%  
Alexandr Dolgopolov     21     11    9.6   1.4    6.8%  
Ernests Gulbis          23     13   11.5   1.5    6.4%  
Tommy Haas              26     16   14.4   1.6    6.3%  
Jeremy Chardy           21     12   10.7   1.3    6.0%  
Roger Federer           25     15   13.6   1.4    5.4%  
Grega Zemlja            19     10    9.0   1.0    5.3%  
Feliciano Lopez         24     14   12.9   1.1    4.4%  
Jo Wilfried Tsonga      30     17   15.8   1.2    4.2%  
Ryan Harrison           15      7    6.4   0.6    4.1%  
Tommy Robredo           24     14   13.1   0.9    3.8%  
Novak Djokovic          28     19   17.9   1.1    3.8%  
Lleyton Hewitt          16      9    8.4   0.6    3.5%  
Daniel Brands           19     10    9.4   0.6    3.4%  
Fernando Verdasco       24     14   13.5   0.5    1.9%  
David Ferrer            21     12   11.8   0.2    1.0%  
Kei Nishikori           16      9    8.9   0.1    0.9%  
Martin Klizan           15      7    6.9   0.1    0.9%  
Kevin Anderson          35     19   19.1  -0.1   -0.2%  
Marinko Matosevic       16      9    9.1  -0.1   -0.4%  
Mikhail Youzhny         23     11   11.4  -0.4   -1.8%  
Milos Raonic            36     19   19.7  -0.7   -1.9%  
Sam Querrey             31     15   15.6  -0.6   -2.1%  
Stanislas Wawrinka      32     17   17.7  -0.7   -2.3%  
Florian Mayer           18      8    8.4  -0.4   -2.4%  
Gael Monfils            27     13   13.7  -0.7   -2.5%  
Igor Sijsling           19      9    9.5  -0.5   -2.6%  
Andreas Seppi           19      9    9.5  -0.5   -2.8%  
Denis Istomin           24     11   11.8  -0.8   -3.2%  
Richard Gasquet         29     15   16.0  -1.0   -3.4%  
Daniel Gimeno Traver    18      7    7.6  -0.6   -3.5%  
Vasek Pospisil          24     11   11.9  -0.9   -3.6%  
Tomas Berdych           34     17   18.6  -1.6   -4.7%  
Victor Hanescu          24     10   11.2  -1.2   -5.2%  
Ivan Dodig              27     12   13.5  -1.5   -5.7%  
Robin Haase             24     10   11.4  -1.4   -5.9%  
Albert Ramos            16      7    7.9  -0.9   -5.9%  
Benjamin Becker         18      7    8.1  -1.1   -5.9%  
Horacio Zeballos        20      7    8.2  -1.2   -6.2%  
Jurgen Melzer           19      8    9.4  -1.4   -7.4%  
Nicolas Almagro         34     17   19.5  -2.5   -7.5%  
Lukas Rosol             15      6    7.3  -1.3   -8.9%  
Evgeny Donskoy          17      6    7.7  -1.7  -10.2%  
Alejandro Falla         15      6    7.6  -1.6  -10.9%  
Grigor Dimitrov         22      9   11.5  -2.5  -11.4%  
Marcos Baghdatis        20      6    9.5  -3.5  -17.4%  
Carlos Berlocq          18      7   10.2  -3.2  -17.5%  
Juan Monaco             15      5    7.7  -2.7  -18.3%  
Janko Tipsarevic        19      5    8.7  -3.7  -19.5%  
Edouard Roger Vasselin  19      4    8.2  -4.2  -22.3%

The Match Charting Project

Tennis needs better stats.  Now you can help.

Since the US Open, I’ve been developing a system to chart matches.  With a bit of practice, anyone can use this system to note the type and direction of every shot in a match–serve direction, return direction and depth, shot patterns, error types, error directions, and more.  A single charted match generates an enormous amount of data.

The true potential of match charting lies in the bigger picture.  So far, we have nearly 50 matches in the books–mostly from ATP events this fall.  Even with this relatively small subset of matches, I’ve been able to do some interesting research, such as analyzing how quickly Novak Djokovic can neutralize a server’s advantage, and evaluating the wisdom of the drop shot.

The more matches, the more players, the more surfaces, the better.  Want to join the fun?

I hope you do, and the off-season is a great time to start.  It will take you a couple of matches to get comfortable with the system, so charting recorded matches, with the ability to rewind and watch points multiple times, is the best way to get started.  There are hundreds, if not thousands, on YouTube, with plenty more available through other sources such as ESPN3 and TennisTV.

I’ve created an interactive spreadsheet to make the process as easy as possible. Download it here.  The fields highlighted in yellow are yours.  The first several rows are for general information about the match.  As you chart each point, the spreadsheet will automatically update the score and create an additional row for the next point.

Once you download and open the spreadsheet, click over to the “Instructions” tab.  There, you’ll find detailed instructions on the process.  It will take some time to understand all the details of how the system works, and then it will take you a match or two to get the hang of entering all that data.  Pretty soon, you’ll find that you’re comfortably charting points in real time.

In the next week or two, I’ll try to put together some additional training material.  However, if you’d like to get started right away, there’s nothing stopping you.  Once you finish charting a match, send the completed spreadsheet back to me (my email address is in the spreadsheet), and I’ll run it through my program to generate detailed stats for that match.

In addition to the interactive spreadsheet itself, you may find it helpful to see a couple of completed charted matches, perhaps following along while watching the matches:

(sorry, those two Youtube videos have been removed due to copyright claims. You can still download the completed spreadsheets. At some point, I’ll try to find charted matches with Youtube videos that are unlikely to be taken down, and post those here instead.)

What I love about this project is that we don’t need thousands of matches for it all to be worthwhile.  (Though I won’t complain when we accumulate thousands of matches!)  Every charted match we can add to the database contributes to our understanding of those two players and professional tennis as a whole.

I sincerely hope you’ll contribute.

Update: I’ve posted a few updates, tips, and tools here.