Heavy Topspin – Page 58 – The Tennis Abstract blog

Podcast Episode 86: A New Documentary on Guillermo Vilas and the No. 1 Ranking

Episode 86 of the Tennis Abstract Podcast features Jeff and co-host Carl Bialik, of the Thirty Love podcast, discussing the new Netflix doc Guillermo Vilas: Settling the Score.

The Argentine star was a multi-slam winner in the 1970s, yet he never reached the top of the official ATP ranking list. The film covers journalist Eduardo Puppos’s quest to prove that Vilas deserved to be #1. Over the course of the episode, we ponder the importance of the top ranking, the vagaries of the ATP ranking algorithm, how Elo rates Vilas’s peak years, and the ATP’s response to Vilas’s case for the top spot. We didn’t love the documentary, but the issues it raises are fun to debate.

Fans of the TA podcast will also want to check out Dangerous Exponents, the new Covid-19 podcast that Carl Bialik and I are doing. Episode 3 will be available later today.

Thanks for listening!

(Note: this week’s episode is about 48 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Welcome to 1967

Last week, I finished* adding complete** 1967 women’s results to the Tennis Abstract site. I’ll talk about those asterisks in a bit, but for the moment I’d prefer to revel in how cool this is.

The “Open Era” starts in 1968, and in the near-decade since I launched TA, I took that year as my starting point. Along the way I added men’s slams and Davis Cup back to the beginning, but it’s buried on the site as an afterthought. I can’t imagine that anyone uses the site for amateur-era results.

Even late 60s and 70s results were spotty for women. I initially built my database from the results published on the WTA and ITF websites, neither of which is (how to put this mildly?) primarily focused on the thoroughness and accessibility of its historical data. Add in the mistakes and omissions that come from building my own database from scratch, and you end up with a lot of gaps.

A more complete Tennis Abstract

A few weeks ago, I started filling in those gaps by adding about 20 missing tournaments with a Chris Evert–Martina Navratilova match. That head-to-head is now complete. Soon it will be “more than complete,” as I add various exhibitions that don’t count in the official tally. From there, I used various sources (more on that below) to fill in the remaining gaps of top-level Open Era women’s tennis back to 1968. The result is about 50 full tournaments per year, sometimes more, with various bonuses like Federation Cup and a lot of grand slam qualifying.

The further back I went and the more I stumbled on stories about the women’s game at the beginning of the Open Era, the more I wanted to know. 1968 is an important year, but a lot of tennis was unchanged from 1967 to 1968–almost all of the same players excelled, on the same surfaces and mostly at the same events. It seems a little silly to have a statistical record that starts smack in the middle of all-time-great careers like those of Billie Jean King and Margaret Court.

Into the unknown

One of the most incredible online tennis resources is one you’ve probably never heard of. On the “Blast From the Past” section of tennisforum.com, a group of contributors have assembled a unparalleled collection of women’s match results going back to the 1800s. They’ve dredged up results and tournament information from old annuals, newspapers, and just about any other source you can imagine.

The disadvantage of their forum-based, text-based format is that it is only awkwardly searchable. (Just to be clear, I am not taking anything away from their outstanding efforts.) The forum approach does allow for a certain kind of serendipity, and I’m sure I’m not the only one who has lost hours scrolling, reviewing results, reading the tournament recaps and anecdotes collected there. But it precludes the kind of serendipity made possible by sites like Baseball Reference and Tennis Abstract, where you see one result, get curious about a player, click the player’s name, and find yourself looking at a whole new list of unfamiliar scores and stats.

The further back in history we go, the more I want that kind of serendipity. Now, Tennis Abstract has that for 1967, and soon it will go back further still.

Okay then: 1967

The site now includes results from over 100 events in 1967, from familiar names like Rome and Queen’s Club to lesser tournaments such as the Pan-American Games (held that year in Winnipeg) and the Soviet Championships in Tblisi. I don’t have complete data for every draw–some are missing a handful of first-rounders, and others have only the final round or two. All told, the database now includes almost 2,300 matches from that single year. By comparison, there were about 3,000 tour-level WTA matches in 2019.

Since there was no formal “tour” in 1967, there’s no official definition of what’s “in” or “out.” A match is a match. I didn’t include every single event with some kind of data available, but I did import the entire main draw of any tournament with even a single “big-name” player, using a fairly broad definition of that term. (1969 Wimbledon champ Ann Jones may make me regret that decision. She played a lot of tennis.) Because the various circuits were more fractured, that means more events: There were many weeks with three or four tournaments each, and a couple with five.

Creating records for those 2,300 matches meant adding almost 300 players who weren’t in my database. The majority of those are early-round losers in small events, women who didn’t seriously pursue tennis. But where I had a full name, I did at least a cursory search for each one, turning up a noted Spanglish poet, the “first grunter,” a squash Hall of Famer, and Marat Safin’s mom.

100 events sounded like a lot until I started working on 1966. I have a provisional list of 160 tournaments to include from that year. Even with all those caveats on the meanings of “finished” and “complete,” this is going to take a while.

Diving in

Here are direct links to 1967 results for a few players:

If you go to the main page for one of those players (for example, here’s Peaches Bartkowicz), you’ll find a cool addition that all the new 60s and 70s data has made possible: women’s Elo ratings back to the end of 1967. Player pages for women who played at least 20 matches in a season include their year-end ratings and rankings, including surface-specific figures.

Here is a very provisional overall top 10 for year-end 1967:

Rank  Player               Elo  
1     Billie Jean King  2221.3  
2     Virginia Wade     2114.9  
3     Nancy Richey      2113.2  
4     Judy Dalton       2083.3  
5     Ann Jones         2042.7  
6     Lesley Bowrey     2018.8  
7     Kerry Reid        2006.0  
8     Francoise Durr    2005.4  
9     Rosie Casals      1940.4  
10    Annette Du Plooy  1926.8

I say provisional because there’s so much left to add. (You know, the entire history of tennis prior to 1967.) At the moment, the algorithm doesn’t know anything about any of the players prior to January 1st, 1967. As it learns more, each player’s rating will be different at that point, and the year-end results will be tweaked as well. That goes for all Elo ratings and rankings throughout the 60s and 70s. The broad strokes will remain constant, but the exact numbers will change, and sometimes players will swap positions. As I add more data, King, Court, and Richey (among others) keep creeping up the all-time list.

As for the project as a whole, I have no idea how far I’ll get. While fascinating, it’s a time-consuming project, and the further into history we go, the less information is available on players beyond the all-time greats. Still, every small step back in time improves the accessibility of this period of women’s tennis data, which includes some of the most important players in the history of the sport.

About those sources

I’ve mentioned tennisforum’s Blast From the Past, which is truly essential. Another exhaustive source for match results starting 1968 is John Dolan’s book, Women’s Tennis 1968-84. Wikipedia has oddly spotty coverage: the Italian Wikipedia is good for tournament data, while the French Wikipedia seems to cover more players. (For Swedish players, Swedish Wikipedia is awesome. All that time spent learning Norwegian is finally paying off.) English Wikipedia is disappointingly lacking in comparison.

Podcast Episode 85: Author Steven Blush on 1970s World Team Tennis

Episode 85 of the Tennis Abstract Podcast features Jeff with guest Steven Blush, author of the recent book Bustin’ Balls: World Team Tennis 1974-78: Pro Sports, Pop Culture, and Progressive Politics.

We talk about how drastically WTT has changed from the early days, the crucial importance of Billie Jean King and the 1973 Battle of the Sexes, and how WTT fit into the 1970s cultural milieu. As Steven tells it, the original WTT was revolutionary, even “proto-woke,” with a place for everyone, setting men and women on equal footing, and welcoming everyone from Black NBA star John Lucas to (eventually) transgender trailblazer Renee Richards. This is an in-depth look at a neglected but fascinating part of tennis history.

I had a great time recording this episode, so I hope you’ll give it a listen. And, of course, Steven’s book makes the perfect Christmas gift for the tennis fan in your life.

Fans of the TA podcast will also want to check out Dangerous Exponents, the new Covid-19 podcast that Carl Bialik and I are doing. We released episode 2 yesterday.

(Note: this week’s episode is about 63 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Podcast Episode 84: Daniil Medvedev, Dominic Thiem, and a Tactically Brilliant Future

Episode 84 of the Tennis Abstract Podcast features Jeff and co-host Carl Bialik, of the Thirty Love podcast, celebrating a fantastic championship match to end the 2020 ATP season.

We discuss Medvedev’s tactical savvy and physical versaility, along with over- and under-rated parts of Thiem’s game. Also on the agenda:

Are Medvedev and Thiem a clear “second group” behind Djokovic and Nadal but ahead of the rest of the pack?
Will Medvedev have a better career than Alexander Zverev or Andrey Rublev?
What constitutes tactical perfection? How could we measure it?
Are we biased toward all-around players when listing the strongest tacticians?
Is it possible for a 30-stroke rally to be tactically strong?

Finally, if you like hearing us talk about stuff, you’ll be glad to know that we’re launching a non-tennis podcast called Dangerous Exponents, on all things (well, some things) Covid-19. We recorded a pilot episode today, which should be available shortly. I’ll post it here and on Twitter when it’s released.

Thanks for listening!

(Note: this week’s episode is about 43 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

The Next Five Years, According To a (Dumb) Grand Slam Crystal Ball

Last year, I introduced a bare-bones model that predicts men’s grand slam results for the next five years. It takes a minimum of inputs: a player’s age, and his number of major semi-finals, finals, and titles in the last two years. Despite leaving out so much additional data, the model explains a lot of the variation among players, achieving most of what a more complex algorithm would, but with nothing more than basic arithmetic.

A bit further down, I’ll introduce a similar model for women’s grand slam results. First, let’s look at the revised numbers for the men. Keep in mind that these are not career slam forecasts, but only slams in the next five years. That’s good enough for the Big Three, but it probably doesn’t tell the whole story for, say, Stefanos Tsitsipas.

Player              Projected Slams  
Novak Djokovic                  2.5  
Rafael Nadal                    2.1  
Dominic Thiem                   2.0  
Alexander Zverev                0.9  
Stefanos Tsitsipas              0.6  
Daniil Medvedev                 0.6  
Matteo Berrettini               0.3  
Lucas Pouille                   0.1  
Diego Schwartzman               0.1

A few other players (notably Roger Federer) reached a semi-final in the last two years, but because of their age, the model forecasts zero slams. Also keep in mind that Wimbledon was not played this year, so there was a bit less data to work with.* The sum of the forecasts is a mere 9.2 slams, out of a possible 20. In some previous years, the model predicted as many as 15 titles for the players it took into consideration. Because today’s top players are so old, they aren’t expected to dominate much of the 2021-25 calendar, leaving room for new contenders to emerge.

* My original post describes the forecasting algorithm as counting results from “the last four slams” and “the previous four slams.” We could account for the three-slam 2020 season by following those steps literally, giving greater weights to the last four slams (the 2019 US Open plus the three 2020 slams), and giving lesser (but still non-zero) weights to the four slams before that. I rejected that approach because (a) it would give an awful lot of weight to the US Open, and (b) the relative lack of 2020 data reflects higher-than-usual uncertainty, which ought to show up in the forecasts, as well. Thus, only seven slams were taken into account for 2021-25 predictions, instead of the usual eight.

Interestingly, the 2020 season has barely budged the predicted career totals for the big three. Numbers I published immediately after last year’s US Open forecast Rafael Nadal for 22.5 career slams: his (then) 19 plus 3.5 more. Now he has 20, and the model pegs him for another 2.1. Novak Djokovic was slated for a career total of 19.5: 3.5 more on top his then-total of 16. He’s still penciled in for 19.5: 17 plus another 2.5 in the future. Federer didn’t have reason to expect much a year ago, and it’s no better now.

The women’s model

It turns out that a similar back-of-the-envelope approach gives good approximations of future slam totals for WTA stars, as well. The weights are a bit different, the average peak age is one year sooner, and the age adjustment is slightly smaller, but the idea is essentially the same.

Here’s how to calculate the number of expected major titles for your favorite player:

Start with zero points
Add 20 points for each slam semi-final reached in the last 12 months
Add 20 points for each slam final reached in the last 12 months
Add 80 points for each slam title won in the last 12 months
Add 10 points for each slam semi-final reached in the previous 12 months
Add 10 points for each slam final reached in the previous 12 months
Add 40 points for each slam title won in the previous 12 months
If the player is older than 26 (at the time of the next slam), subtract 7 points for each year she is older than 26
If the player is younger than 26, add 7 points for each year she is younger than 26
Divide the sum by 100

To take a simple example, consider Iga Swiatek. For her recent French Open title, she gets 20 points for the semi, 20 points for the final, and 80 points for the title. She will still be 19 when the Australian Open rolls around, so we add another 49 points: 7 years younger than 26, times 7 points per year. Her projected total is (20 + 20 + 80 + 49) / 100 = 1.69.

Here are the results for all of the women who reached a major semi-final in 2019 or 2020 and are projected to win more than zero slams between 2021 and 2025:

Player               Projected Slams  
Naomi Osaka                      2.0  
Sofia Kenin                      1.9  
Iga Swiatek                      1.7  
Bianca Andreescu                 1.0  
Ashleigh Barty                   0.9  
Amanda Anisimova                 0.6  
Simona Halep                     0.6  
Marketa Vondrousova              0.6  
Nadia Podoroska                  0.4  
Garbine Muguruza                 0.3  
Belinda Bencic                   0.3  
Jennifer Brady                   0.3  
Elina Svitolina                  0.2  
Petra Kvitova                    0.1  
Victoria Azarenka                0.1

These forecasts sum to 11.0 slams, more than the men’s total. That’s largely because so many of recent women’s champions are younger, giving the model more reason to be optimistic about them. It still leaves plenty of room for other players to earn some hardware in the next half-decade, which makes sense. The WTA has featured a non-stop succession of breakout young stars for the past few years, and with players like Aryna Sabalenka, Elena Rybakina, and Cori Gauff in the mix, there’s no shortage of talent to keep the carousel turning.

And then there’s Serena Williams. The model projects her for zero slams, despite her three semi-finals and two finals in the last two years. The reason is her age: The algorithm expects players to steadily decline from age 27 onwards, so the age penalty by age 39 is harsh. One one hand, that makes sense: we’re forecasting the results of events that will mostly take place when she’s in her 40s. On the other hand, a player who had so much success at age 37 is probably a good bet to break the mold at 39, as well. Were this a more fully-developed model, we’d probably be smart to tinker with the age adjustment to reflect the reality that Williams is a much better bet to win a major title than Nadia Podoroska.

We could go on all day. For every variable that these forecasts take into account, there are a dozen more than have some plausible claim to relevance. But this simple approach gets us surprisingly far in telling the future–a future in which the men’s all-time grand slam race keeps getting more complicated, and the women’s game continues to feature a wide array of promising young stars.

Not All Twenties Are Created Equal

The top of the all-time men’s grand slam ranking just got even more crowded. With his 13th Roland Garros title, Rafael Nadal has matched Roger Federer at the top of the list by securing his 20th major title. Novak Djokovic, Nadal’s final obstacle en route to the historic mark, remains within shouting distance with 17 slams.

The Roger-Rafa tie has spurred another (interminable, unresolvable) round of the (interminable, unresolvable) GOAT debate. Of course there’s much more to determining the best ever than the slam count. But the slam count is a big part of the conversation. If we’re going to keep doing this, we ought to at least recognize that not all major titles are created equal. And by extension, not all collections of twenty major titles are equivalent.

We all have intuitions about the difficulty of how a particular draw shakes out, with its typical mix of good and bad fortune. Nadal was lucky that he missed a few dangerous opponents in the early rounds, luckier still that he didn’t have to face Dominic Thiem in the semi-final, and unfortunate that he had to face down the next-best player in the draw, Djokovic, in the final. As it turned out, it didn’t really matter, but I think most of us would agree that Nadal’s achievement–staggering as it is–would look even better had he faced more than two more players ranked in the top 70.

Stop dithering and start calculating

I’ve written about this before, and I’ve established a metric to quantify those intuitions. Take the surface-weighted Elo rating of each of a player’s opponents, and determine the probability that an average slam champion would beat those players. After a couple of steps to normalize the results, we end up with a single number for the path to each slam title. The larger the result, the more difficult the path, and an average slam works out to 1.0.

Nadal’s path was easier than the historical average. Aside from Djokovic, none of his opponents would have had more than an 8% chance of knocking out an average slam champion on clay. The exact result is 0.64, which is easier than almost nine-tenths of majors in the Open Era. Rafa has had three easier paths to his major titles, including the 2017 US Open, which scored only 0.33. That’s the easiest US Open, Wimbledon, or Roland Garros in a half-century.

Of course, he’s had his share of difficult paths, such as 2012 Roland Garros (1.36), when he faced several clay specialists and a peak-level Djokovic. Federer and Djokovic have gotten their own shares of lucky and unlucky draws over the years–that’s why we need a metric. You might have a better memory for this kind of thing than I do, but I don’t think any of us can weigh 57 majors with 7 opponents each and work out any meaningful results in our heads.

The tally

Sum up the difficulty of the title paths for these 57 slams, and here are the results:

Player    Slams  Avg Score  Total  
Nadal        20       0.95   19.0  
Djokovic     17       1.06   18.1  
Federer      20       0.89   17.9  
                                   
Player     Easy     Medium   Hard  
Nadal         7          8      5  
Djokovic      5          5      7  
Federer       9         10      1

The first table shows each player’s average score for the paths to his major titles, and the total number of “adjusted slams” that gives them. Nadal is in the lead with 19, and Djokovic and Federer follow in a near-tie, just above and below 18.

You might be surprised to see the implication that this is a slightly weak era, with average scores a bit below 1.0. That wasn’t the case a few years ago, but there has only been one above-average title path since 2016. The Big Three-or-Four has generally stayed out of each other’s way since then, and even when they do clash, as they did yesterday, the leading contenders for quarter-final or semi-final challenges failed to make it that far. The average score of the last 15 slam title paths is a mere 0.73, while the 16 before that (spanning 2013-16) averaged 1.20.

The second table paints with a broader brush, classifying all Open Era slam titles into thirds: “easy,” “medium” and “hard” paths to the championship. Anything below 0.89 rates as “easy,” anything above 1.14 is marked as “hard,” with the remainder left as “medium.”

Djokovic is the leader in hard slams, with 7 of his 17 meriting that classification. Federer has racked up 10 medium slams, including several that score above 1.0, but only one that cleared the bar for the “hard” category. Nadal’s mix is more balanced.

Go yell at someone else

Hopefully these numbers have given you some new ammunition for your next twitter fight. Some of you will froth at the mouth while insisting that players can’t control who they play. You’re right, but it doesn’t really matter. We can’t start giving out GOAT points for things that players didn’t do, like beat Thiem in the 2020 French Open semi-finals. All three of these guys were or are good enough at various points to have beaten some of the opponents they didn’t have to face. There are other approaches we could take to the GOAT debate that incorporate peak Elo ratings and longevity at various levels, but that’s not what we’re talking about when we count slams.

If we are going to focus so much on the slam count, we might as well acknowledge that Nadal’s 20 is better than Federer’s 20, and Djokovic’s 17 is awfully close to both of them.

What Happens to the Pace of Play Without Fans, Challenges, or Towelkids?

The COVID-19 pandemic has forced some experimentation on the US Open ahead of schedule. After just a couple of years at marginal events such as the NextGen Finals, Hawkeye’s live line-calling system is taking over (on most courts) for human line judges. Another NextGen-tested innovation, requiring players to fetch their own towels, has also arrived for social distancing reasons.

Automated line-calling and towel-fetching pale in comparison to the biggest change for the bubble slam: no fans. The biggest stars now get to experience what has long been de rigueur for qualifiers and challengers: high-stakes competition with no one in the stands watching.

All of these changes come not long after the US Open (and a few other tournaments) finally adopted a serve clock. I’ve written ad nauseam* about the effect of the serve clock, which is nominally designed to speed up play, but in practice has slowed it down. The problem is that chair umpires start the clock when they announce the score, which is not always immediately after the preceding point. The bigger the crowd, the more serious the discrepancy, as noisy fans tend to delay announcements from the chair.

* Incidentally, this is also the Latin term for a long game with many deuces.

Therefore, the pace of play should be faster with no fans, right? Use of the Hawkeye live system also eliminates challenges, which should speed things up a little more. The counteracting force is the time it takes players to fetch their towels. It would be nice to evaluate each of these effects in isolation*, but most of the data we have comes from matches with all of these changes at once.

* No pun intended.

The net effect

The most straightforward measurement of pace of play is seconds per point, where we simply take the official match time and divide by the total number of points. It’s an approximate measure, because official match time includes changeovers, medical timeouts, and all sorts of other delays which have nothing to do with how long it takes for players to get themselves to the line and hit a serve. It also captures a bit of first serve percentage (second serve points take more time) and rally length (longer rallies take more time), although these factors mostly wash out, especially when comparing pace of play at the same tournament from one year to the next.

The following graph shows seconds per point for all Cincinnati (and “Cincinnati”) main draw men’s singles matches each year since 2000:

(I’m looking only at pace of play for men’s matches because I don’t have match time for women before 2016. Lame, I know.)

Over the 21-year span, the average time per point is just under 40 seconds, and before 2020, the yearly average exceeded 42 seconds only once. This year, Cinci clocked in at a whopping 44.6 seconds per point, more than three standard deviations above the 2000-2017 (that is, pre-serve clock) average. The pace has gradually slowed down over the years for reasons unrelated to the serve clock, so it’s probably overstating things a bit to say that the effect of the bubble is 3 SD, but it’s clear that 2020 was slow.

But wait, what about

All four of this year’s men’s semi-finalists are rather deliberate, so you might think that the slow average pace is due in part to the mix of players who won a lot of matches. That’s what I thought too, but it’s not so. (It helps to remember that more than half of a tournament’s matches are in the first two rounds, even with some first-round byes, so we’re guaranteed a decent mix of players for calculations like this, no matter who advances.)

First, I re-did the seconds-per-point calculations above, but excluded all matches with Novak Djokovic or Rafael Nadal, two guys who win a lot of matches and are known to play slowly. It didn’t really matter. I won’t bother to print a second graph, because it looks essentially the same as the one above.

Another approach is to consider the average pace of play for each player in the draw, and compare his seconds per point in Cincinnati to his seconds per point at other events. If every man played at the same speed in Cincinnati that he did on average in 2019, the average seconds per point at the 2020 Cinci event would have been 41.3. That’s just barely above the 2019 Cinci figure of 41.0, and of course it is far below the actual rate of 44.6 seconds per point. The mix of players can’t account for 2020’s glacial pace.

But why?

I hope you’re with me thus far that the pace of play in the 2020 Cincinnati men’s event was very slow. It seems reasonable to assume that the US Open will be the same, because the conditions and rules are identical.

The simplest explanation is that players are spending extra time fetching their own towels.*

* No, you’re a towel.

It’s true–walking to and from the towel takes time. But it’s not the whole story. At the typical non-bubble rate of 40 seconds per point (again, including changeovers and other delays), there are plenty of points where the umpire delays calling the score and the server ends up taking longer than the rulebook-permitted 25 seconds without getting called for a time violation. So if the average is now pushing 45 seconds, there must be a lot of points like that.

Anecdotally, there definitely are such points. In the Cincinnati semi-final, I noticed one instance in which Roberto Bautista Agut used more than 40 seconds before serving. He’s not the only offender: All four men’s semi-finalists (among many others) occasionally used more than 25 seconds. My impression was that, ironically, Djokovic was the speediest of the four.

Chair umpires are using their discretion to act as if there are fans making noise. After long points, they often wait to call the score, and even when they announce the score immediately, they hold off several more seconds before starting the clock. In one glaring instance in the Lexington final, the umpire waited a full 17 seconds after the previous point ended before the clock showed 0:25. The broadcast camera angles at the National Tennis Center made it hard to measure the same thing for Cincinnati matches, but given the length of time between points and the dearth of time violation penalties, there must have been other delays in the range of 15 to 20 seconds.

With no fans delaying play, and no tactical challenges to force a delay, a slow pace is something that the umpire can control. Yes, towel-fetching takes time, but if the 25-second clock starts immediately and it is enforced, players will make it back to the line in time–matches at the NextGen Finals were generally brisk. But apparently, enforcing the rulebook-standard pace is not something that the officials are willing to do. We’re two years into the great tennis serve-clock experiment, and the game just keeps getting slower.

How Should We Value the Masters and Premier Titles in the Bubble?

Tennis is back, but plenty of top players are still at home–or crashing out in the early rounds of their first tournament in months. While the ATP “Cincinnati” Masters event delivered the expected winner in Novak Djokovic, the Serb never had to face a top-ten opponent. The same was true of Victoria Azarenka, who won the WTA Premier tournament with the benefit of Naomi Osaka’s withdrawal in the final round, and without playing a top-tenner on her way there.

The tennis world’s “asterisk” talk has mostly focused on the US Open, since most people care about slams and don’t care about anything else. But judging from these easy paths to the two Cincinnati titles, should we be talking asterisk about the event just passed?

Novak’s 35th, but not (quite) his easiest

Last week, I explained why I thought the asterisk talk was premature, if not wrong. The field doesn’t matter, because the player who wins the title faces only a handful of players. The presence of, say, Rafael Nadal doesn’t have much to do with the difficulty of winning the title unless the eventual winner has to go through Rafa. If the champion’s opponents are very good, the path to the title is hard; if they are relatively weak, the path to the title is easy. Keep in mind I’m using the terms “good” and “weak” in theoretical terms. On paper, Djokovic was fortunate that his semi-final and final opponents were ranked 12th and 30th, respectively, and his title path was “easy.” As it happened, he was forced to work hard for both wins.

We now know that the title paths of the Cincinnati champions were relatively easy. But just how weak were they?

I calculate the difficulty of a path-to-the-title by determining the probability that the average Masters champion on that surface would beat the opponents that the champion faced. By using the “average Masters champion,” we are taking the skill level of the actual champ out of the equation, and looking only at the quality of his opposition. The resulting numbers vary wildly, from 2.5%–the odds that a typical Masters champion would have beaten the players that Jo Wilfried Tsonga defeated to win the 2014 Canada Masters–to 61.2%–the chances that an average titlist would have beaten the players that confronted Nikolay Davydenko at the 2006 Paris Masters.

Novak’s number this week was 40.5%. In other words, an average hard-court Masters champion would have a four-in-ten shot at beating the five guys that fate threw in Djokovic’s path. That’s the 11th easiest Masters title since 1990:

Title Odds  Tournament       Winner             
61.2%       2006 Paris       Nikolay Davydenko  
50.5%       2012 Paris       David Ferrer       
49.8%       2000 Paris       Marat Safin        
48.3%       2004 Paris       Marat Safin        
47.0%       1999 Paris       Andre Agassi       
44.5%       2013 Shanghai    Novak Djokovic     
43.3%       2002 Madrid      Andre Agassi       
42.9%       2005 Paris       Tomas Berdych      
41.4%       2009 Canada      Andy Murray        
41.3%       2017 Paris       Jack Sock          
40.5%       2020 Cincinnati  Novak Djokovic     
39.6%       2011 Shanghai    Andy Murray        
39.1%       2019 Canada      Rafael Nadal       
37.9%       2008 Rome        Novak Djokovic     
36.2%       2007 Cincinnati  Roger Federer

Unless we’re prepared to put a permanent asterisk next to the Paris Masters, we should hold off on cheapening this year’s Cincinnati title. Surprisingly, Djokovic’s path was even easier at the 2013 Shanghai Masters. He had to face two top-ten opponents in the final rounds (Tsonga and Juan Martin del Potro), but Elo didn’t think that highly of them at the time.

Azarenka: asterisk squared

Evaluating the WTA title is trickier. Part of the problem is the small number of “Premier Mandatory” events, and the fact that two of them (Indian Wells and Miami) have substantially larger draws, and are thus that much harder to win. The even bigger issue is how to think about Azarenka’s final-round walkover.

Let’s start with the numbers. If we consider the five opponents that Vika defeated on court and calculate the odds that an average WTA Premier (not just Premier Mandatory) champion would beat them, her path-to-the-title number is 20.7%. If we add Osaka to the mix, on the theory that Azarenka should get credit for beating her, the resulting number is 7.4%.

Compared to the ATP numbers above, those sound pretty good. But the devil lies in the tournament-category details–the average WTA Premier event is much weaker than a marquee (dare I say “premier”?) tour stop like Cincinnati. Here’s how the Cinci title-paths stack up for the last dozen years:

20.7%       2020  Victoria Azarenka  (W/O Osaka)  
7.4%        2020  Victoria Azarenka  (d. Osaka)   
7.3%        2016  Karolina Pliskova             
5.5%        2010  Kim Clijsters                 
5.5%        2012  Li Na                         
5.3%        2015  Serena Williams               
4.5%        2011  Maria Sharapova               
4.3%        2014  Serena Williams               
4.2%        2017  Garbine Muguruza              
3.9%        2019  Madison Keys                  
2.9%        2013  Victoria Azarenka             
2.0%        2009  Jelena Jankovic               
1.3%        2018  Kiki Bertens

20.7% is respectable for a run-of-the-mill Premier–in fact, Vika’s 2016 Brisbane title was almost exactly the same, at 20.8%. But Cincinnati reliably offers tougher competition. Even if we factor in the difficulty of beating Osaka, Azarenka’s path was (barely) the easiest at the event since the Premier-level designation came into being.

Yay, nay, meh

I’ll reiterate a main point from my last article about the US Open asterisk debate: There’s no simple yes or no answer when it comes to whether a title should “count.” (That’s assuming that you even think there are circumstances under which a title should be formally discounted.) Long before the COVID-19 pandemic messed with everything, there were titles–even at the grand slam level–that were a lot easier to win than others.

Djokovic’s championship falls squarely within the usual continuum, even if it will go down as one of his least challenging. Azarenka’s is tougher to define, but more because of Osaka’s withdrawal than because of the weakness of the field. The level of competition, despite missing many top players, was plenty good enough to offer Azarenka a path to the title that was comparable at least one recent Cinci championship, and plenty of other top-tier events.

With that in mind, I’ll leave you with a couple of predictions. First: the US Open champions will face relatively easy paths to their titles, but like Djokovic’s, they will fall on the established continuum. And second: by the end of the fortnight, you’ll hope to never hear the word “asterisk” again.

How Sports are (Analytically) Different in the Bubble

Most of the world’s major sports have resumed, or will pick up again soon, in some form or other. But a lot is different, with most leagues forming one or more bubbles, often excluding fans, limiting travel, and tweaking things like officiating rules to better maintain social distance.

Many of these changes have second-order effects. For instance, the “Cincinnati” tennis event requires that players fetch their own towels–which probably slows down play–but has no fans–which could accelerate it. We’ll soon have enough data to draw some preliminary conclusions about the overall effect of the new rules on pace of play.

Some of the issues that arise when a league moves into a bubble apply across sports, like home-court advantage. With that in mind, I’m gathering evidence of how sports are playing differently in our time of social distance. I’ll try to keep this post updated as we learn more. The comments are open, so you can contribute any demonstrated effects that I haven’t listed here. (Or similar effects in other sports.) You can also tweet at me.

Baseball

So far, home-field advantage is almost non-existent. Historically, home teams win about 54% of games.

Basketball

NBA offenses can’t stop scoring. Refs are calling more fouls, and fewer off-court distractions get in the way of making shots.

The WNBA is showing the effects of a league full of fresh legs, and has displayed a record-setting pace of play. And despite playing on the same court every night, there is a marked home-court advantage.

Hockey

Fighting is up! Lucky NHLers–most of us don’t go to work where it’s culturally acceptable to hit people.

Soccer

Home-field advantage is reduced, but it still exists, even behind closed doors. A recent paper (summary / PDF) notes that refs have been more lenient than usual toward away teams. That tallies with long-held conventional wisdom that home-advantage stems from officiating bias, which is driven by noisy, partisan crowds.

Speaking of officiating, refs were more likely to grant penalty kicks, but despite the quieter environment, penalties aren’t converted any more often.

For more detail on home-field advantage in various leagues since the restart, here is a valuable Twitter thread from @recspecs730.

Tennis

I’m keeping tabs on whether match results are less predictable than usual. (They are, but we haven’t really seen enough to be sure.) Other than that, it’s still speculation. We’ll know more after “Cincinnati,” and much more after the US Open.

US Open Asterisk Talk is Premature. It Might be Flat-Out Wrong.

Many high-profile players will be missing from the 2020 US Open. Rafael Nadal opted out of the abbreviated North American swing, and Roger Federer will miss the rest of the season due to injury. More than half of the WTA top ten is skipping Flushing Meadows as well. The thinned-out fields increase the odds that a few remaining favorites, such as Novak Djokovic and Serena Williams, add another major trophy to their collection.

As a result, pundits and fans are discussing whether the 2020 US Open deserves an “asterisk.” The idea is that, because of the depleted fields, this slam is worth less than others, so much so that the history books* should note the relative meaninglessness of this year’s titles.

* Nobody buys history books anymore, so we’re really talking** about a page on the US Open website, and a never-ending edit war on Wikipedia.

** Yes, I see the irony.

From what I’ve seen, people are thinking about this the wrong way. Yes, a weak field makes it easier–in theory–to win the tournament. It’s certainly true that the 2020 champions won’t have to go through Nadal or Ashleigh Barty to get their hardware. But the field isn’t what matters.

The field isn’t what matters

I repeated that on purpose, because it’s that important. The winner of a grand slam must get through seven matches. The difficulty of securing the title depends almost entirely on his or her opponents in those seven matches. Each main draw consists of 128 players, but 120 of them are mostly irrelevant.

I say “mostly” because I can foresee some objections. Sometimes a player can compete so hard in a loss that they weaken their opponent for the next round. Take the 2009 Madrid Masters, in which Nadal needed four hours to defeat Djokovic in the semi-final, then lost to Federer in the final. We could say that Djokovic’s presence was relevant, even though Federer won the title without playing him. That sort of thing happens, though probably not as much as you think. Even when it does, it needn’t be a top tier player who wears out their opponent in an early round.

Another objection is that a depleted field affects seedings. For instance, Serena’s current WTA ranking is 9th, an unenviable position going into most slams. The 9th seed lines up for a fourth-round match with a top-eight player, meaning that she could face four top-eight players en route to the title. But with all the absences, Williams will instead be seeded third, behind only Karolina Pliskova and Sofia Kenin.

I’m not dismissing these concerns out of hand. They do matter a bit. But they only matter insofar as they affect the way the tournament plays out. The difference between the difficulties facing the 3rd and 9th seeds could be enormous … or it could be nothing, especially if the draw is riddled with early upsets.

Difficulty is a continuum

Even if you grant some credence to the objections above (or others that I haven’t mentioned), I hope you’ll agree that the most meaningful obstacles standing between a player and a grand slam title are the seven opponents he or she will need to overcome.

If those seven opponents are, on average, very strong, we would say that the player faced a particularly tough path to a slam title. Take Stan Wawrinka’s 2014 Australian Open title: he beat both Djokovic and Nadal at a time when those two were dominating the game. If the collective skill level of the seven opponents doesn’t amount to much–at least by grand slam standards–we’d say it was an easy path. For example, Federer clinched the 2006 Australian Open despite facing only a single player ranked in the top 20, and none in the top four.

We can quantify path difficulty in a variety of ways. One approach that will be useful here is to calculate the odds that an average slam champion would beat those seven opponents. The difference between easy and hard championships is enormous. The typical major titlist (that is, someone with an Elo rating around 2100) would have had a 3.3% chance of beating the seven men that Wawrinka drew in Melbourne the year that he won. Only two slam paths have ever been tougher: Mats Wilander’s routes to the 1982 and 1985 French Open titles. By contrast, the average slam champion would have had a 51% chance of going 7-0 when faced by Federer’s 2006 Australian Open draw.

The extreme “easy” draw is fifteen times easier than the extreme “hard” draw. Fifteen times! You can find plenty of champions for any approximate level of difficulty in between those extremes. The typical slam champ would’ve had a 10% chance of doing what Djokovic did in progressing through seven rounds at the 2011 US Open. Same in New York in 2012. Andy Murray’s 2016 Wimbledon path would have given the average champion a 20% chance. The 2018 Roland Garros draw was manageable for Rafael Nadal, and a typical major titlist would have had a 30% chance of securing those seven match wins.

None of this is to say that any of those players did or didn’t “deserve” their titles. Federer didn’t choose his 2006 Melbourne opponents any more than Wawrinka selected his foes eight years later. The trophy is the same, and in many important ways, their achievements are the same–both of the Swiss stars swept away all of their opponents, who in turn were the best performers (at least during those fortnights) of the players who showed up.

Asterisks for everybody

Here’s another thing 2006 Roger and 2014 Stan had in common: Almost all of the best players in the world participated in the tournaments that they ultimately won. (I say “almost” because defending champion Marat Safin was injured and missed the 2006 Aussie Open.) The “field” was effectively the same, but to win the titles, one player cruised through a two-week cakewalk and the other needed to put together one of the most impressive final weeks of the modern era.

Tennis fans have collectively decided that each major title counts as “one.” It doesn’t have to be that way: We could give more “slam points” for achievements like Wawrinka’s and grant fewer for the easy ones. Most people don’t like this idea, and I admit that it sounds a bit weird. I’m not advocating it for general use, though it is an interesting concept that I’ve pursued in a number of earlier articles, showing that Djokovic’s majors are–on average–more impressive than Nadal’s, which in turn have been tougher than Federer’s. Weighting majors by difficulty results in some changes in the order of the all-time grand slam list, ensuring that fans of all players hate me because I wrote some code and played with some spreadsheets.*

* With, I admit, malice aforethought.

Adjusting slam counts for difficulty is, in a sense, asterisking every slam title. The tricky draws get an acknowledgement of their difficult, and the ones that opened up get tweaked to account for their ease. It’s a continuum, not a simple up-and-down decision between normal slams and abnormal slams.

The 2020 US Open champions will probably have title paths that sit in the easier half of that continuum. But even that modest claim is far from guaranteed.

Let’s say Venus Williams recaptures her vintage form and wins the title, beating 3rd seed Serena in the quarter-finals, 2nd seed Kenin in the semis, and top seed Pliskova in the title match. (It doesn’t matter if the surprise winner is Venus–it could be any lower-ranked player, though Venus seems more plausible than most.) An average slam champion would beat those three players in succession about 37% of the time. 37% is already lower odds than about 20% of women’s slam draws in the last 45 years. (Kenin’s Australian Open title rated 39%.)

37% for Venus’s hypothetical title isn’t even the whole story–four more rounds of journeywomen would knock the number down to around 26%–harder than one-third of women’s slam draws. Add in another tricky opponent or two–maybe Cori Gauff, or Petra Kvitova in the fourth round–and suddenly the path to the 2020 US Open women’s championship is just as hard as the typical slam.

It’s even easier to illustrate how the 2020 US Open men’s title could be as difficult as many other slams. By the numbers, simply upsetting Djokovic (simply! ha!) is more difficult than it was to defeat all seven of Federer’s opponents at the 2006 Australian Open. That’s right: Six withdrawals and one win over Novak wouldn’t be the easiest slam victory in the last 15 years. Tack on six actual wins, including a few against strong opponents, and the result is a seven-match path that stands up against the typical non-pandemic slam.

Ironically, the player who could win the title with the weakest possible draw is Djokovic. It would be odd to claim that any of Novak’s accomplishments should be asterisked, but it does make things much simpler when he doesn’t have to beat himself.

Masked competitiveness

Once again, the field doesn’t really matter. When we focus on the players who are in New York instead of the few dozen who aren’t, we see that the ingredients are in place for a couple of respectable path to US Open titles. Wilander’s and Wawrinka’s marks are probably safe, but it’s more than possible that the winners will have faced competition equivalent to that of the average slam champ.

At the very least, we don’t know any better until the tail end of the second week. Until then, asterisk talk is premature. After that, it will probably be moot.