May 2018 - Heavy Topspin

Podcast Episode 27: Roland Garros Preview

Episode 27 of the Tennis Abstract Podcast, with Carl Bialik of the Thirty Love podcast, will help get you ready for the 2018 French Open. We talk through the men’s and women’s draws, with a focus on the unpredictability of the women’s field, the towering presence of Rafael Nadal in the men’s, and the big-name floaters lurking in both brackets.

We also touch on Mike Bryan’s new doubles partner, Serena/Venus in the women’s doubles, and the long-delayed suspension of Nicolas Kicker for match fixing. Thanks for listening!

(Note: this week’s episode is about 48 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Update: Episode index with links, thanks to FBITennis:

Roland Garros Women’s Draw (and Serena v Sharapova)	1:08
How much do head-to-head records matter?	5:09
Top Quarter Analysis: Simona Halep as a weak favorite	8:39
Second Quarter Analysis: Muguruza, Sleeper?	12:21
Third Quarter Analysis: Potential Ostapenko v Svitolina Quarterfinal	13:51
Fourth Quarter Analysis: Kvitova favorite, but Jeff has a “Super Dark Horse”	15:36
Jeff’s and Carl’s Picks on the Women’s Side	17:40
Roland Garros Men’s Draw: Nadal’s Tournament to Win	19:57
Do you take Kyle Edmund for your Fantasy Tennis Team?	24:56
Or Lucas Pouille?	27:44
Third Quarter Analysis: Djokovic’s quarter	29:22
Fourth Quarter Analysis: Zverev over Thiem, and can Wawrinka or Nishikori surprise?	31:55
Draw Luck on the Men’s Side	34:58
Men’s Doubles: Bryan/Bryan Streak Broken	37:19
Women’s Doubles: Williams Sisters	39:17
Kicker Kicked to the Curb; Match Fixing	40:48
On Demand Video of Roland Garros Qualies	45:38

Unseeded Serena and the Roland Garros Draw

In a wide-open women’s field at this year’s French Open, it seems fitting that one of the most dangerous players in the draw isn’t even seeded. Serena Williams has played only four matches–none of them on clay–since returning to tour after giving birth. As such, her official WTA ranking is No. 453, and her current match-play level is anyone’s guess.

Because her ranking is low, she needed to use the ‘special ranking’ rule to enter the tournament, and the rule doesn’t apply to seedings. (I’m not going to dive further into the debate about how the rule should work–I’ve written a lot about the rule in the past.) As an unseeded player, she could have drawn anyone in the first round; in that sense, she was a bit lucky to end up opposite another unseeded player, Kristyna Pliskova, in the first round. Her wider draw section is manageable as well, with a likely second-round match against 17th seed Ashleigh Barty and a possible third-rounder with 11th seed Julia Goerges. If she makes it to the round of 16, we’ll probably be treated to a big-hitting contest between Serena and Karolina Pliskova or Maria Sharapova.

According to my Elo-based forecast, a best guess about the level of post-pregnancy Serena is that she’s the 7th best overall player in the field, and 9th best on clay. That gives her about a 40% chance of winning her first three matches and reaching the second week, a 6.2% chance of making it to the final, and a 3.1% chance of adding yet another major title to her haul.

What if she were seeded? Seeds are a clear advantage for players who receive them, as a seeding protects against facing other top contenders until later rounds. By simulating the tournament with Serena seeded, we can get a sense of how much the WTA’s rule (and the French Federation’s decision not to seed her) impacts her chances.

Seeded 7th: Let’s imagine a bizarre world in which my Elo ratings were used for tournament seedings. In that case, Serena would be seeded 7th, knocking Caroline Garcia down to 8th and sending current 32nd seed Alize Cornet into the unseeded pool. That would be a clear advantage: 50/50 odds of reaching the fourth round, a 9% chance of playing in the final, and a 4.4% shot at the title, compared to 3.1% in reality.

Seeded 1st: If seeds were assigned based on protected ranking, Serena would be the top seed. You can’t get much more of an advantage than that: The top seed is protected from playing either of the other top-four seeds until the semifinals, for instance. (It’s no insurance against a meeting with 28th seed Sharapova, but Serena, of all people, isn’t worried about that.) Moving from 7th to 1st would give her another boost, but it’s a modest one: As the top seed, her chances of sticking around for the second week would still be 50/50, with 10.1% and 4.7% odds of reaching the final and winning the title, respectively.

Here’s a summary of Serena’s chances in the various seeding scenarios. The final column is “expected points”–a weighted average of the number of WTA ranking points she is expected to collect, given her likelihood of reaching each round.

Scenario     R16  Final  Title  ExpPts  
Actual     39.8%   6.2%   3.1%     273  
Unseeded*  34.4%   6.2%   3.0%     259  
Seeded 7   50.3%   9.0%   4.4%     356  
Seeded 1   50.5%  10.1%   4.7%     371

* the ‘unseeded’ scenario represents Serena’s chances as an unseeded entrant, given a random draw. She got a little lucky, avoiding top players until the 4th round, though her chances of making the final end up the same.

Seeds matter, though there’s only so much they can do. If Serena really is at a barely-top-ten level, she’s a long shot for the title regardless of whether there’s a number next to her name. If my model grossly underestimates her and she’s back at previous form–let’s not forget, she made the final the last time she played here, and won the title the year before that–then the rest of the field will once again look like a bunch of flies for her to swat away, regardless of which numbers they have next to their names.

Podcast Episode 26: Reasons for Optimism

Episode 26 of the Tennis Abstract Podcast, with Carl Bialik of the Thirty Love podcast, starts with the superlative performances of Elina Svitolina, Maria Sharapova, Alexander Zverev, and Novak Djokovic in Rome, and considers how they cause us to revise our estimates of those players. For Djokovic, Sharapova, and Serena Williams, we compare their current Elo ratings–including penalties for time off–to our perceptions of their current levels.

We also talk about the pros and cons of transitioning to a “weak era,” as well as the potential role of fatigue when tomorrow’s opponents don’t get the same amount of rest today. Thanks for listening!

(Note: this week’s episode is about 62 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

A few things we talked about:

Update: Episode index with links, thanks to FBITennis:

Zverev v Nadal (Rome Final)	0:58
Is Nadal declining?	4:37
Zverev’s narrow margins (tiebreak success)	7:15
Nostalgia sidebar: Honoring Ferrer’s clay prowess	10:40
Svitolina’s tremendous success in finals	11:47
WTA Rome final takeaways, if any	15:11
Halep’s not-tremendous (recent) results in finals…and Wozniacki too	17:15
Is Novak back? ELO says he only sorta left.	22:34
Djokovic’s “good” losses	28:30
Roland Garros favorites; comparing conventional wisdom and analytics	30:04
A “weak era” is a good thing! Maybe.	35:13
Serena a true #11 on clay; Sharapova a true #9 on clay (ELO, adjusted)	44:02
Will Ostapenko defend her French Open title? “Shock heuristic” says no.	53:03
The effect of fatigue between matches	55:22

New Match Charting Project Excel Template

For the first time in more than two years, I’m ready to release a new, substantially improved version of the MatchChart excel template, the platform on which volunteers log matches for the Match Charting Project.

New in version 0.2.0:

Color-coding of the players, as well as game-ending and set-ending rows, to make it easier to keep track of where you are in a match;
A new shot code for drop volleys, to differentiate them from other volleys;
Total points won and total points shown in the MatchStats tab;
Options to handle certain now-rare match formats, such as tiebreaks at 8-all (as in some 1970s Wimbledons) and matches with no tiebreaks at all.

If you’re already an experienced contributor, just click here to download the new version. Take a quick look through the Instructions tab, as I’ve highlighted the relevant changes.

If you’re new to the MCP, please take a look at my Quick Start Guide, after which you can give the new template a spin.

Handling Injuries and Absences With Tennis Elo

Italian translation at settesei.it

For the last year or so, every mention of my ATP and WTA Elo ratings has required some sort of caveat. Ratings don’t change while players are absent from the tour, so Serena Williams, Novak Djokovic, Andy Murray, Maria Sharapova, and Victoria Azarenka were all stuck at the top of their tour’s Elo rankings. When their layoffs started, they were among the best, and even a smattering of poor results (or a near season’s worth, in the case of Sharapova) isn’t enough to knock them too far down the list.

This is contrary to common sense, and it’s very different from how the official ATP and WTA rankings treat these players. Common sense says that returning players probably aren’t as good as they were before a long break. The official rankings are harsher, removing players entirely after a full year away from the tour. Serena probably isn’t the best player on tour right now (as Elo insisted during her time off), but she’s also much more of a threat than her WTA ranking of No. 454 implies. We must be able to do better.

Before we fix the Elo algorithm, let’s take a moment to consider what “better” means. Fans tend to get worked up about rankings and seedings, as if a number confers value on the player. The official rankings are, by design, backward-looking: They measure players based on their performance over the last 52 weeks, weighted by how the tour prioritizes events. (They are used in a forward-looking way, for tournament seedings, but the system is not designed to be predictive of future results.) In this way, the official rankings say, “this is how good she has played for the last year.” Whatever her ability or potential, Serena (along with Vika, Murray, and Djokovic) hasn’t posted many positive results this year, and her ranking reflects that.

Elo, on the other hand, is designed to be predictive. Out of necessity, it can only use past results, but it uses those results in a way to best estimate how well a player is competing right now–our best proxy for how someone will play tomorrow, or next week. Elo ratings–even the naive ones that said Serena and Novak are your current No. 1s–are considerably better at predicting match outcomes than are the official rankings. For my purposes, that’s the definition of “better”–ratings that offer more accurate forecasts and, by extension, the best approximation of each player’s level right now.

The time-off penalty

When players leave the tour for very long, they return–at least on average, and at least temporarily–at a lower level. I identified every layoff of eight weeks or longer in ATP history, taken by a player with an Elo rating of 1900 or above*. In their first matches back on tour, their pre-break Elo overestimated their chances of winning by about 25%. It varies a bit by the amount of time off: eight- to ten-week breaks resulted in an overestimation around 17%, while 30- to 52-week breaks meant Elo overestimated a player’s chances by nearly 50% upon return. There are exceptions to every rule, like Roger Federer at the 2017 Australian Open, and Rafael Nadal, who won 14 matches in a row after his two-month break this season, but in general, players are worse when they come back.

* I used the cutoff of 1900 because, below that level, some players are alternating between the ATP and Challenger tours. My Elo algorithm doesn’t include challenger results, so for lower-rated players, it’s not clear which timespans are breaks, and which are series of challenger events. Also, the eight-week threshold doesn’t count the offseason, so an eight-week layoff might really mean ~16 weeks between events, with the break including the offseason.

Translated into Elo terms, an eight-week break results in a drop of 100 Elo points, and a not-quite-one-year break, like Andy Murray’s current injury layoff, means a drop of 150 points. Making that adjustment results in an immediate improvement in Elo’s predictiveness for the first match after a layoff, and a small improvement in predictiveness for the first 20 matches after a break.

Incorporating uncertainty

Elo is designed to always provide a “best estimate”–when a player is new on tour, we give him a provisional rating of 1500, and then adjust the rating after each match, depending on the result, the quality of the opponent, and how many matches our player has contested. That provisional 1500 is a completely ignorant guess, so the first adjustment is a big one. Over time, the size of a player’s Elo adjustments goes down, because we learn more about him. If a player loses his first-ever match to Joao Sousa, the only information we have is that he’s probably not as good as Sousa, so we subtract a lot of points. If Alexander Zverev loses to Sousa after more than 150 career matches, including dozens of wins over superior players, we’ll still dock Zverev a few points, but not as many, because we know so much more about him.

But after a layoff, we are a bit less certain that what we knew about a player is still relevant. Djokovic a great example right now. If he lost six out of nine matches (as he did between the Australian Open fourth round and Madrid) without missing any time beforehand, we’d know it was a slump, but most of us would expect him to snap out of it. Elo would reduce his rating, but he’d remain near the top. Since he missed the second half of last season, however, we’re more skeptical–perhaps he’ll never return to his former level. Other cases are even more clear-cut, as when a player returns from injury without being fully healed.

Thus, after a layoff, it makes sense to alter how much we adjust a player’s Elo ratings. This isn’t a new idea–it’s the core concept behind Glicko, another chess rating system that expands on Elo. Over the years, I’ve tinkered with Glicko quite a bit, looking for improvements that apply to tennis, without much success. Changing the multiplier that determines rating adjustments (known as the k factor) doesn’t improve the predictiveness of tennis Elo on its own, but combined with the post-layoff penalties I described above, it helps a bit.

The nitty-gritty: After a layoff, I increase the multiplier by a factor of 1.5, and then gradually reduce it back to 1x over the next 20 matches. The flexible multiplier slightly improves the accuracy of Elo ratings for those 20 matches, though the difference is minor compared to the effect of the initial penalty.

No more caveats*

* I thought it would be funny to put an asterisk after “no more caveats.”

Post-layoff penalties and flexible multipliers end up bringing down the current Elo ratings of the players who are in the middle of long breaks or have recently come back from them, giving us ranking tables that come closer to what we expect–and should do a better job of predicting the outcome of upcoming matches. These changes to the algorithm also have minor effects on the ratings of other players, because everyone’s rating depends on the rating of all of his or her opponents. So Taro Daniel’s Elo bounce from defeating Djokovic in Indian Wells doesn’t look quite as good as it did before I implemented the penalty.

On the ATP side, the new algorithm knocks Djokovic down to 3rd in overall Elo, Murray to 6th, Jo-Wilfried Tsonga to 21st, and Stan Wawrinka to 24th. That’s still quite high for Novak considering what we’ve seen this year, but remember that the Elo algorithm only knows about his on-court performances: A six-month break followed by a half-dozen disappointing losses. The overall effect is about a 200-point drop from his pre-layoff level; the “problem” is that his Elo a year ago reflected how jaw-droppingly good he had recently been.

The WTA results match my intuition even better than I hoped. Serena falls to 7th, Sharapova to 18th, and Azarenka to 23rd. Because of the flexible multiplier, a few early wins for Williams will send her quickly back up the rankings. Like Djokovic, she rates so high in part because of her stratospheric Elo rating before her time off. For her part, Sharapova still rates higher by Elo than she does in the official rankings. Despite the penalty for her one-year drug suspension, the algorithm still treats her prior success as relevant, even if that relevance fades a bit more every week.

Elo is always an approximation, and given the wide range of causes that will sideline a player, not to mention the spectrum of strategies for returning to the tour, any rating/forecasting system is going to have a harder time with players in that situation. That said, these improvements give us Elo ratings that do a better job of representing the current level of players who have missed time, and they will allow us to make superior predictions about matches and tournaments involving those players.

Under the hood

If you’re interested in some technical details, keep reading.

Before making these adjustments, the Brier score for Elo-based predictions of all ATP matches since 1972 was about 0.20. For all matches that involved at least one player with an Elo of 1900 or better, it was 0.17. (Not only are 1900+ players better, their ratings tend to be based on more data, which at least partly explains why the predictions are better. The lower the Brier score, the better.)

For the population of about 500 “first matches” after layoffs for qualifying players, the Brier score before these changes was 0.192. After implementing the penalty, it improved to 0.173.

For the 2nd through 20th post-comeback matches, the Brier score for the original algorithm was 0.195. After adding the penalty, it was 0.191, and after making the multiplier flexible, it fell a bit more to 0.190. (Additional increases to the post-layoff multiplier had negative results, pushing the Brier score back to about 0.195 when the 2nd-match multiplier was 2x.) I realize that’s a tiny change, and it very possibly won’t hold up in the future. But in looking at various notable players over the course of their comebacks, that’s the option that generated results that looked the most intuitively accurate. Since my intuition matched the best Brier score (however miniscule the difference), it seems like the best option.

Finally, a note on players with multiple layoffs. If someone misses six months, plays a few matches, then misses another two months, it doesn’t seem right to apply the penalty twice. There aren’t a lot of instances to use for testing, but the limited sample confirms this. My solution: If the second layoff is within two years of the previous comeback, combine the length of the two layoffs (here: eight months), find the penalty for a break of that length, and then apply the difference between that penalty and the previous one. Usually, that results in second-layoff penalties of between 10 and 50 points.

Podcast Episode 25: Number Ones, Present and Future

Episode 25 of the Tennis Abstract Podcast, with Carl Bialik of the Thirty Love podcast, recaps the week’s events in Madrid, and starts with a dissection of the top clay-court contenders in the women’s game. We consider what it means for Simona Halep to be a “weak” No. 1, compare the present era to other periods in WTA, and evaluate the promise of some of Halep’s main competition, including Karolina Pliskova and Petra Kvitova.

On the men’s side, we look at the latest triumph for Alexander Zverev, the continuing clay-court prowess of Dominic Thiem (and a couple of other guys with one-handed backhands), and issue another set of optimistic forecasts for Juan Martin del Potro and Fabio Fognini.

Thanks for listening!

(Note: this week’s episode is about 1 hour, 12 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Update: Episode index, thanks to FBITennis:

Is Halep a weak #1?	1:04
Should Halep take more risks on her serve?	12:54
Ka. Pliskova’s ability to dominate for an extended period	17:19
Kvitova’s shot at #1 this year	20:27
Historical fluctuations in WTA #1s	23:14
Serena’s impending return	26:13
Can you still “play your way into” WTA grand slams?	30:42
Seeding Serena at Wimbledon	33:43
Adapting ELO to account for long layoffs (see link for more)	37:05
Sasha Zverev and Thiem on clay	45:00
Return of the One-Handed Backhand (on clay)	51:48
Shapovalov finds a second dimension (maybe)	55:46
Unusual Head-to-Heads on the ATP Tour	57:48
Favorites in Rome	1:00:53
Match Charting Project Milestones	1:07:54

The Last 156 Men’s Grand Slam Finals

I’m proud to report a big new milestone for the Match Charting Project! We’ve completed the set of men’s Grand Slam finals back to 1980, something that I’ve aspired to since the early days of the project, and a project that has drawn on a lot of effort from many contributors to the project, especially Edo, who is responsible for a huge part of this accomplishment.

Here’s the complete list of charted slam finals, with links to the shot-by-shot data for each match.

From 1980 to this year’s Australian Open, that’s 152 consecutive men’s finals. I went on a bit of a spree last week, which extended the set back to 1979 and upped the total to 156. We’ve got a few earlier slam finals in the database as well, though there’s a limit to how much more we’ll be able to achieve: Before the late 1970s, video quality and availability decreases sharply.

For researchers, as well as those interested in tennis history, this is valuable stuff, made even more useful by its completeness. With the exception of a handful of missing points here and there, the Match Charting Project now includes a wealth of data for the entirety of all of these matches: serve direction, shot types, strategic choices (like serve-and-volleying), and much more, all in a standard format.

It’s particularly satisfying the check off the last few items on a list. (In this case, the final missing pieces were 1987 Roland Garros and the 1981 Australian Open.) Even though 156 matches is a small fraction of the nearly 4,000 contests tracked as part of the MCP, the subset’s completeness means that we can study it without worrying about the non-random nature of video availability and fan interest. If you want to look into, for instance, how net play has changed at Wimbledon over the last four decades, we’ve got the entire run.

In that vein, we are working on several other noteworthy subsets: Masters 1000 finals, 2018 tour-level finals, meetings between members of the big four, and finals played by members of the big four, among others.

We’re getting close to the complete run of women’s slam finals, as well. We’re up to 137 of the 152 since 1980, and have them all from 1999-present. We haven’t been able to find video for the rest, most notably the 1998 US Open (Davenport-Hingis), 1994 Roland Garros (Graf-Pierce), 1994 US Open (Sanchez-Graf), and 1991 Australian Open (Seles-Novotna). The complete list is here, and the remainder date from 1980-86. If you can help us find any of these, please let me know!

As always, if you find this project interesting, please contribute. Our 2.3 million shots worth of detailed data didn’t appear by magic–we rely on volunteers to chart matches, and I hope you’ll join our ranks. Here’s why I think you should, and here’s how you can get started.

The Unique Late-Career Surge of Mihaela Buzarnescu

The newest member of the WTA top 32 got there the hard way. Mihaela Buzarnescu, who achieved her latest career-high ranking with a run to the final of last week’s Prague event, where she lost a three-setter to Petra Kvitova, made her professional debut 14 years ago. Despite a dose of junior success, including a junior doubles title at the 2006 US Open, she didn’t crack the top 100 until last October.

This isn’t how tennis career trajectories are supposed to work. Yes, the game is getting older and stars are extending their careers, but Buzarnescu’s year-long winning spree, in which she has climbed from outside the top 400 to inside the top 40, began after her 29th birthday. The closer we look at what the Romanian has achieved, and the age at which she’s doing so, the more unusual it appears.

The oldest top 100 debuts

Since the beginning of the 1987 season, 630 women have debuted in the top 100. Their average age, on the Monday they reached the ranking threshold, is just under 20 years and 6 months. Only 29 of the 630–less than five percent–broke into the top 100 after their 26th birthday.

Only 14 players did so after turning 27:

Player                 Debut  Age (Y)  Age (D)  Peak Rank  
Tzipi Obziler       20070219       33      306         75  
A. Villagran Reami  19880801       31      359         99  
Mihaela Buzarnescu  20171016       29      165         32  
Julie Ditty         20071105       28      305         89  
Eva Bes Ostariz     20010716       28      183         90  
Mashona Washington  20040719       28       49         50  
Maureen Drake       19990201       27      317         47  
Tatjana Maria       20150406       27      241         46  
Hana Sromova        20051107       27      211         87  
Laura Siegemund     20150914       27      193         27  
Flora Perfetti      19960708       27      160         54  
Louise Allen        19890227       27       51         83  
Kristina Barrois    20081020       27       20         57  
Iryna Bremond       20111017       27       11         93

Buzarnescu doesn’t quite top this list, but she is certainly a more consequential force on tour than either of the women who debuted at a more advanced age. Tzipi Obziler fought her way through the lower levels of the game for just as long as Buzarnescu did, though she never cracked the top 70. Adriana Villagran Reami played a limited schedule; she may have had the skills to play top-100 tennis long before the ranking table made it official, but she was never a tour regular.

The most comparable player to Buzarnescu is Laura Siegemund, who reached a double-digit ranking a few years ago, and has since climbed as high as No. 27. Of the oldest top-100 debutants, though, very few have continued to ascend the rankings as far as Buzarnescu and Siegemund have.

Here are the oldest top-100 debuts of players who went on to crack the top 32:

Player                      Debut  Age (Y)  Age (D)  Peak  
Mihaela Buzarnescu       20171016       29      165    32  
Laura Siegemund          20150914       27      193    27  
Sybille Bammer           20050822       25      117    19  
Shinobu Asagoe           20000710       24       12    21  
Manon Bollegraf          19880215       23      310    29  
Johanna Konta            20140623       23       37     4  
Anne Kremer              19981019       23        2    18  
Lesia Tsurenko           20120528       22      364    29  
Kveta Peschke            19980420       22      286    26  
Petra Cetkovska          20071022       22      256    25  
Tathiana Garbin          20000214       22      229    22  
Li Na                    20041004       22      221     2  
Mara Santangelo          20040202       22      219    27  
Ginger Helgeson Nielsen  19910325       22      192    29  
Casey Dellacqua          20070806       22      176    26

Here’s an indication of just how young women’s tennis is: The 9th-oldest top-100 debutant on this list achieved her feat before her 23rd birthday. Put another way: Of the 107 women to break into the top 100 after their 23rd birthday, only eight went on to a ranking of No. 32 or better. By comparison, about one-third of all top-100 players peak at a ranking in the top 32. In this category, Buzarnescu is charting entirely new territory.

Making up for lost time

The last six months or so have been a whirlwind for the Romanian, as she has gone from a fringe tour player that no one had ever heard of, to a solid tour regular that … well, most fans still don’t know much about. Many players need some time to adjust to the higher level of competition and spend months, even years, stagnating in the rankings. Buzarnescu, on the other hand, has barely stopped to take a breath.

It took 203 days from her top-100 debut last October to her latest career-high at No. 32 on Monday. Siegmund, by comparison, needed 315 days; Sybille Bammer took 574 days; Roberta Vinci, who eventually cracked the top ten, required 2,520 days, or nearly seven years. The average player who reaches the top 32 needs two and a half years between her first appearance in the top 100 and clearing the higher bar.

Buzarnescu’s climb doesn’t fit the mold of older debuts. Her climb has more in common with those of teenage sensations. Again since 1987, here are the 20 quickest ascents:

Player              Age (Y)  Age (D)  Peak  Ascent Days  
Jennifer Capriati        14       11     1            0  
Anke Huber               15      266     4           49  
Agnes Szavay             18      164    13           77  
Lindsay Davenport        16      238     1          112  
Naoko Sawamatsu          17       31    14          119  
Clarisa Fernandez        20      265    26          133  
Maria Sharapova          16       58     1          133  
Serena Williams          16       52     1          133  
Miriam Oremans           20      145    25          140  
Venus Williams           16      301     1          147  
Sofia Arvidsson          21      223    29          154  
Leila Meskhi             19      308    12          168  
Tatiana Golovin          16       22    12          175  
Eugenie Bouchard         19       42     5          189  
Martina Hingis           14       31     1          189  
Ana Ivanovic             16      361     1          196  
Conchita Martinez        16      107     2          196  
Mihaela Buzarnescu       29      165    32          203  
Darya Kasatkina          18      137    11          203  
Ashleigh Barty           20      316    16          210

The player Buzarnescu knocked out of the top 20: Kim Clijsters. She is the only woman on the list to have cracked the top 100 after her 22nd birthday, yet here she is, climbing from No. 101 to No. 32 in less time than 92% of her peers.

Common sense suggests that Buzarnescu can climb only so much higher: Most players don’t set new career highs in their 30s, especially those who have such a short track record of tour-level success. On the other hand, she has adapted quickly, recording her first top ten win, over Jelena Ostapenko, in February and taking a set from Kvitova in Saturday’s final.

What’s more, she’ll reap the benefits of seeds at many events, probably including Roland Garros and Wimbledon. Having proven that she can defeat top 50 players–she holds a 6-7 career record against them–her new status as a top-32 player means she’ll get plenty of opportunities to rack up points against a less-daunting brand of competition. After more a decade of fighting steeply uphill battles, she has finally–improbably–earned a place among the game’s elite. Now all she has to do is keep winning.

Podcast Episode 24: Figuring Out Who Figures Out Clay

Episode 24 of the Tennis Abstract Podcast, with Carl Bialik of the Thirty Love podcast, is all about clay. We use last week’s results as a starting point, talking up the runs of Frances Tiafoe, Alexander Zverev, Hyeon Chung, Elise Mertens, and Mihaela Buzarnescu, and discuss the various factors–from teenage training to split-second anticipation–that translate into better results on dirt.

Thanks for listening!

(Note: this week’s episode is about 1 hour, 13 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Update: Episode index, thanks to FBITennis:

Frances Tiafoe’s Success on Clay	1:00
Resistance of Some Players to Clay Events	6:24
Tiafoe’s Unorthodox Style	15:36
Hyeon Chung on Clay	25:14
Alexander Zverev’s Long-Term Outlook	32:18
WTA Saturday to Sunday Turnarounds	37:12
Elise Mertens and Scheduling Prowess	44:38
Disparity between Hard/Clay Results for Kiki Bertens	47:00
Drop Shots as a Weapon on Clay	50:15
Hidden Tangibles: Footwork and Anticipation	53:20
Evaluating Tennis Prospects	1:00:50
MCP Data: Bertens’ Drop Shots	1:08:25
Ways to Measure Scheduling Prowess	1:09:05
Carl’s Lightning Round	1:11:00

Rafael Nadal and the Greatest Single-Tournament Performances

Italian translation at settesei.it

In the last two weeks, Rafael Nadal recorded his 11th titles in both Monte Carlo and Barcelona. His career records at those two events, along with his ten Roland Garros championships, reflect a level of dominance never before seen on a single surface. They have to be considered among the greatest achievements in tennis history, and perhaps in all of sport.

The tennis fan in me is content to speculate about whether anyone will ever stop him. The analyst wants to dig deeper: Has Nadal’s performance at one of the tournaments been even better than the rest? How do these single-event records compare to other exploits, such as Roger Federer’s trophy haul at Wimbledon, or Bjorn Borg’s nearly-undefeated career at the French Open?

Barcelona by the numbers

Let’s start with Barcelona. Since 2005–we’ll ignore his 2003 appearance as a 16-year-old wild card–he has played the event 13 times, winning 11 of them. That’s a won-loss record of 57-2.

Usually, I would calculate the probability of a player winning so many tournaments in that many chances, then come up with a tiny percentage that would represent his odds of achieving such a feat. That would miss the mark here. Instead, I want to look at the problem from the opposite perspective: In order to win so many titles, how good must Nadal be?

We already know that Rafa is the best of all time on clay, in general. Using the Elo rating system, his peak surface-specific rating–that is, Elo calculated using only results on clay courts–is over 2,500, better than anyone else on clay … or anyone else on any surface. (Nadal’s current clay-specific Elo is around 2,400, and the closest things he has to rivals on the surface right now, Dominic Thiem and Kei Nishikori, sit at about 2,190 and 2,150. Stefanos Tsitsipas’s rating is 1865.) Since Rafa has posted his best results at these three events, it stands to reason that his tournament-specific levels are even higher.

Here, then, is the method we can use to figure that out. First, for each year he entered Barcelona, determine his path to the title. (For the 11 titles, that’s easy; for the other two, we use the players he would have faced had he kept winning.) Using each opponent’s clay court Elo rating at the time of the match, we can determine the odds that various hypothetical (and dominant) players would have progressed through the draw and won the title.

Here is Nadal’s path to the 2018 title, showing each player’s pre-match clay court Elo*, along with the odds that Rafa (given his own current rating) would beat him:

Round  Opponent                 Opp Elo  p(Rafa W)  
R32    Roberto Carballes Baena     1767      97.3%  
R16    Guillermo Garcia Lopez      1769      97.2%  
QF     Martin Klizan               1894      94.5%  
SF     David Goffin                2079      84.5%  
F      Stefanos Tsitsipas          1900      94.3%

* from this point on, the clay court Elos I use are 50/50 blends of clay-specific Elo–that is, a rating calculating only with clay court results–and overall Elo. The blended rating is the one that has proven best at predicting match outcomes. Nadal is the all-time leader in this category as well, with a 50/50 clay Elo that peaked around 2,510.

Given those five single-match probabilities, the odds that Nadal would win the tournament were just over 70%. That’s dominant, but it’s not 11-out-of-13 dominant.

What if Rafa were underrated by Elo, at least in Barcelona? Here is the probability that a player at various Elo ratings would have beaten the five opponents that he faced last week:

Clay Elo  p(2018 Title)  
2200              41.2%  
2250              50.4%  
2300              59.1%  
2350              66.9%  
2400              73.6%  
2450              79.3%  
2500              83.9%  
2550              87.6%  
2600              90.5%

It turns out that this year’s title path was one of the weakest since 2005. It is roughly equivalent to the players Nadal needed to defeat in 2006 (with Nicolas Almagro in the semis and Tommy Robredo in the final), and a bit tougher than last year’s route, which didn’t feature a top-50 player until Thiem in the final. The toughest was his hypothetical path in 2015, when he lost to Fabio Fognini in the second round. Had he progressed, he would have faced David Ferrer in the semis and Nishikori in the final.

Once we figure out the quality of Rafa’s opponents (and would-have-been opponents, for the two years he lost early), we can work out the odds that any player–given those paths–would have won the tournament each year.

If we assume that Rafa’s average level since 2005 is the same as his current level–a clay Elo of around 2,400–the odds that he would have won 11 Barcelona titles in 13 tries is 13.0%. We don’t have the luxury of replaying those 13 tournaments in a few thousand alternate universes, so it’s not entirely clear what to make of that number–was Rafa lucky? would he do it again, given the chance? is he actually way better than an Elo level of 2,400 in Barcelona?

I don’t know the answer to those questions; all we know is what happened. To compare (un)decimas (and related accomplishments by other players), we’re going to look at the Elo level that would have resulted in the achievement at least 50% of the time. In other words, how good would Nadal have to have been to give himself a 50/50 chance at winning 11 Barcelona titles in 13 tries?

At various clay Elo levels, here are the odds that Rafa would have completed the Barcelona undécima:

Clay Elo  p(11 of 13)  
2300             1.0%  
2350             4.6%  
2400            13.0%  
2450            28.0%  
2500            47.2%  
2550            64.2%  
2600            77.7%  
2650            87.3%  
2700            93.1%

Thus, a player with a clay Elo of about 2,505 would have had a 50% chance of matching Nadal’s feat at his home tournament. To put it another way: At this event, over a span of 14 years, he has played at a level roughly equal to his career peak which, incidentally, is the all-time best clay Elo rating ever achieved by an ATP player.

Comparing las (un)decimas

I hope that my method makes sense and seems like a reasonable way of quantifying a rare feat. Algorithm in hand, we can compare Nadal’s Barcelona record with his efforts in Monte Carlo and Paris.

Monte Carlo

Rafa has entered 14 times since 2005 (again, excluding his 2003 appearance) and won 11. That’s a bit less impressive than 11-of-13, but the competition level is much higher. Only last year’s tournament, in which the opposing finalist was Albert Ramos, is in the same league as most of the Barcelona draws.

Sure enough, the Monte Carlo undécima is lot more impressive. To have a 50% chance of winning 11 titles in 14 attempts, a player would need a clay Elo of about 2,595, almost 100 points higher than the comparable number for Barcelona, and well above the level any player has ever achieved, even at their peak.

Roland Garros

At the French Open, Nadal has entered 13 times, winning 10. The field is even more challenging than in Monte Carlo, but on the other hand, the five-set format gives a greater edge to favorites, lessening the chance of an underdog scoring an upset with two magical sets.

The Roland Garros 10-of-13 is not quite as eye-popping as the record at Monte Carlo. The clay Elo required to give a player a 50% chance of matching Nadal’s French Open feat is “only” around 2,570–still better than any player has ever attained, but a bit short of the comparable mark for Monte Carlo.

But wait … what about 2016? Rafa won two rounds and then withdrew from his third-rounder against Marcel Granollers. I don’t know whether that should count, but at least for argument’s sake, we should run the numbers without it, treating Nadal’s French Open record as 10 titles in 12 appearances, not 13. In that case, the clay Elo that would give a player a 50/50 shot at matching the record is 2,595–the same as the Monte Carlo number.

At the moment, Monte Carlo appears to be the tournament where Nadal has played his very best. With another French Open a few weeks away, though, that answer may be temporary.

Rafa vs other record holders

A few other players have racked up impressive totals at single events. Wikipedia has a convenient list, and a few accomplishments stand out: Federer’s tallies at Wimbledon, Basel, and Halle, Guillermo Vilas’s eight titles in Buenos Aires, and Borg’s six French Open titles in only eight appearances.

Let’s have a look at how they compare, ranked by the surface-specific Elo rating that would give a player a 50% chance of equaling the feat:

Player   Tourney          Wins  Apps  50% Elo  
Nadal    Monte Carlo        11    14     2595  
Nadal    French Open*       10    12     2595  
Nadal    French Open        10    13     2570  
Borg     French Open**       6     7     2550  
Nadal    Barcelona          11    13     2505  
Borg     French Open         6     8     2475  
Vilas    Buenos Aires***     8    10     2285  
Federer  Wimbledon           7    18     2285  
Federer  Halle               8    15     2205  
Federer  Basel               8    15     2180

* excluding 2016

** excluding 1973, when Borg was 16 years old, and lost in the fourth round

*** excluding 1969-71, both because Vilas was very young, and due to sketchy data

The only single-event achievement that ranks with Nadal’s is Borg’s record at Roland Garros–and even then, only when we don’t consider Borg’s loss there as a 16-year-old. Federer’s records in Wimbledon, Halle, and Basel are impressive, but fail to rate as highly because he has entered those tournaments so many times. Federer didn’t appear on tour ready to win everything on his chosen surface, the way Rafa did, and those early losses are part of the reason that his records at these tournaments are so low.

We never needed any numbers to know that Nadal’s accomplishments at his three favorite tournaments are among the best of all time. With these results, though, we can see just how dominant he has been, and how few achievements in tennis history can even compare. The scary thing: A month from now, I may need to come back and update this post with even more eye-popping numbers. The greatest show on clay courts isn’t over yet.