Podcast Episode 90: Joshua Robinson on Global Sports (and Tennis) in a Tough Pandemic Year

In Episode 90 of the Tennis Abstract Podcast, Jeff and Carl welcome Joshua Robinson (@joshrobinson23), European sports reporter for the Wall Street Journal and co-author of the book The Club: How the English Premier League Became the Wildest, Richest, Most Disruptive Force in Sports. Josh first joined me for an episode about 17 years ago, back in December 2019, and it’s great to get another round of his insights. If you haven’t read his book, I highly recommend it, even if you’re not a soccer fan.

In this episode, we run the gamut of Covid-in-sports topics, including the fate of the 2020/21 Tokyo Olympics, the outlook for athletes who want to jump the vaccine queue, the miraculously completed Tour de France, how Wimbledon’s response to the pandemic might have been the best of all, and what to expect in international sports once vaccines are widely available. Josh has written about most of these subjects, and I encourage you to browse his archives at the WSJ website.

We also touch on a few non-Covid questions, like what Slovenian sports can teach the rest of the world, and the role of the underhand serve. We close with a few words about our late friend and colleague, Tom Perrotta.

Thanks for listening!

Also, one last reminder: Next week we’ll be talking about our first book club pick, A Handful of Summers by Gordon Forbes. Let us know if you have thoughts about the book, questions for us to discuss on the show, or suggestions for future book club selections.

Fans of the TA podcast will also want to check out Dangerous Exponents, the new Covid-19 podcast that Carl and I are doing. Today we released episode 8, about issues with the global vaccine rollout.

(Note: this week’s episode is about 59 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

What Happens to the Pace of Play Without Fans, Challenges, or Towelkids?

The COVID-19 pandemic has forced some experimentation on the US Open ahead of schedule. After just a couple of years at marginal events such as the NextGen Finals, Hawkeye’s live line-calling system is taking over (on most courts) for human line judges. Another NextGen-tested innovation, requiring players to fetch their own towels, has also arrived for social distancing reasons.

Automated line-calling and towel-fetching pale in comparison to the biggest change for the bubble slam: no fans. The biggest stars now get to experience what has long been de rigueur for qualifiers and challengers: high-stakes competition with no one in the stands watching.

All of these changes come not long after the US Open (and a few other tournaments) finally adopted a serve clock. I’ve written ad nauseam* about the effect of the serve clock, which is nominally designed to speed up play, but in practice has slowed it down. The problem is that chair umpires start the clock when they announce the score, which is not always immediately after the preceding point. The bigger the crowd, the more serious the discrepancy, as noisy fans tend to delay announcements from the chair.

* Incidentally, this is also the Latin term for a long game with many deuces.

Therefore, the pace of play should be faster with no fans, right? Use of the Hawkeye live system also eliminates challenges, which should speed things up a little more. The counteracting force is the time it takes players to fetch their towels. It would be nice to evaluate each of these effects in isolation*, but most of the data we have comes from matches with all of these changes at once.

* No pun intended.

The net effect

The most straightforward measurement of pace of play is seconds per point, where we simply take the official match time and divide by the total number of points. It’s an approximate measure, because official match time includes changeovers, medical timeouts, and all sorts of other delays which have nothing to do with how long it takes for players to get themselves to the line and hit a serve. It also captures a bit of first serve percentage (second serve points take more time) and rally length (longer rallies take more time), although these factors mostly wash out, especially when comparing pace of play at the same tournament from one year to the next.

The following graph shows seconds per point for all Cincinnati (and “Cincinnati”) main draw men’s singles matches each year since 2000:

(I’m looking only at pace of play for men’s matches because I don’t have match time for women before 2016. Lame, I know.)

Over the 21-year span, the average time per point is just under 40 seconds, and before 2020, the yearly average exceeded 42 seconds only once. This year, Cinci clocked in at a whopping 44.6 seconds per point, more than three standard deviations above the 2000-2017 (that is, pre-serve clock) average. The pace has gradually slowed down over the years for reasons unrelated to the serve clock, so it’s probably overstating things a bit to say that the effect of the bubble is 3 SD, but it’s clear that 2020 was slow.

But wait, what about

All four of this year’s men’s semi-finalists are rather deliberate, so you might think that the slow average pace is due in part to the mix of players who won a lot of matches. That’s what I thought too, but it’s not so. (It helps to remember that more than half of a tournament’s matches are in the first two rounds, even with some first-round byes, so we’re guaranteed a decent mix of players for calculations like this, no matter who advances.)

First, I re-did the seconds-per-point calculations above, but excluded all matches with Novak Djokovic or Rafael Nadal, two guys who win a lot of matches and are known to play slowly. It didn’t really matter. I won’t bother to print a second graph, because it looks essentially the same as the one above.

Another approach is to consider the average pace of play for each player in the draw, and compare his seconds per point in Cincinnati to his seconds per point at other events. If every man played at the same speed in Cincinnati that he did on average in 2019, the average seconds per point at the 2020 Cinci event would have been 41.3. That’s just barely above the 2019 Cinci figure of 41.0, and of course it is far below the actual rate of 44.6 seconds per point. The mix of players can’t account for 2020’s glacial pace.

But why?

I hope you’re with me thus far that the pace of play in the 2020 Cincinnati men’s event was very slow. It seems reasonable to assume that the US Open will be the same, because the conditions and rules are identical.

The simplest explanation is that players are spending extra time fetching their own towels.*

* No, you’re a towel.

It’s true–walking to and from the towel takes time. But it’s not the whole story. At the typical non-bubble rate of 40 seconds per point (again, including changeovers and other delays), there are plenty of points where the umpire delays calling the score and the server ends up taking longer than the rulebook-permitted 25 seconds without getting called for a time violation. So if the average is now pushing 45 seconds, there must be a lot of points like that.

Anecdotally, there definitely are such points. In the Cincinnati semi-final, I noticed one instance in which Roberto Bautista Agut used more than 40 seconds before serving. He’s not the only offender: All four men’s semi-finalists (among many others) occasionally used more than 25 seconds. My impression was that, ironically, Djokovic was the speediest of the four.

Chair umpires are using their discretion to act as if there are fans making noise. After long points, they often wait to call the score, and even when they announce the score immediately, they hold off several more seconds before starting the clock. In one glaring instance in the Lexington final, the umpire waited a full 17 seconds after the previous point ended before the clock showed 0:25. The broadcast camera angles at the National Tennis Center made it hard to measure the same thing for Cincinnati matches, but given the length of time between points and the dearth of time violation penalties, there must have been other delays in the range of 15 to 20 seconds.

With no fans delaying play, and no tactical challenges to force a delay, a slow pace is something that the umpire can control. Yes, towel-fetching takes time, but if the 25-second clock starts immediately and it is enforced, players will make it back to the line in time–matches at the NextGen Finals were generally brisk. But apparently, enforcing the rulebook-standard pace is not something that the officials are willing to do. We’re two years into the great tennis serve-clock experiment, and the game just keeps getting slower.

How Should We Value the Masters and Premier Titles in the Bubble?

Tennis is back, but plenty of top players are still at home–or crashing out in the early rounds of their first tournament in months. While the ATP “Cincinnati” Masters event delivered the expected winner in Novak Djokovic, the Serb never had to face a top-ten opponent. The same was true of Victoria Azarenka, who won the WTA Premier tournament with the benefit of Naomi Osaka’s withdrawal in the final round, and without playing a top-tenner on her way there.

The tennis world’s “asterisk” talk has mostly focused on the US Open, since most people care about slams and don’t care about anything else. But judging from these easy paths to the two Cincinnati titles, should we be talking asterisk about the event just passed?

Novak’s 35th, but not (quite) his easiest

Last week, I explained why I thought the asterisk talk was premature, if not wrong. The field doesn’t matter, because the player who wins the title faces only a handful of players. The presence of, say, Rafael Nadal doesn’t have much to do with the difficulty of winning the title unless the eventual winner has to go through Rafa. If the champion’s opponents are very good, the path to the title is hard; if they are relatively weak, the path to the title is easy. Keep in mind I’m using the terms “good” and “weak” in theoretical terms. On paper, Djokovic was fortunate that his semi-final and final opponents were ranked 12th and 30th, respectively, and his title path was “easy.” As it happened, he was forced to work hard for both wins.

We now know that the title paths of the Cincinnati champions were relatively easy. But just how weak were they?

I calculate the difficulty of a path-to-the-title by determining the probability that the average Masters champion on that surface would beat the opponents that the champion faced. By using the “average Masters champion,” we are taking the skill level of the actual champ out of the equation, and looking only at the quality of his opposition. The resulting numbers vary wildly, from 2.5%–the odds that a typical Masters champion would have beaten the players that Jo Wilfried Tsonga defeated to win the 2014 Canada Masters–to 61.2%–the chances that an average titlist would have beaten the players that confronted Nikolay Davydenko at the 2006 Paris Masters.

Novak’s number this week was 40.5%. In other words, an average hard-court Masters champion would have a four-in-ten shot at beating the five guys that fate threw in Djokovic’s path. That’s the 11th easiest Masters title since 1990:

Title Odds  Tournament       Winner             
61.2%       2006 Paris       Nikolay Davydenko  
50.5%       2012 Paris       David Ferrer       
49.8%       2000 Paris       Marat Safin        
48.3%       2004 Paris       Marat Safin        
47.0%       1999 Paris       Andre Agassi       
44.5%       2013 Shanghai    Novak Djokovic     
43.3%       2002 Madrid      Andre Agassi       
42.9%       2005 Paris       Tomas Berdych      
41.4%       2009 Canada      Andy Murray        
41.3%       2017 Paris       Jack Sock          
40.5%       2020 Cincinnati  Novak Djokovic     
39.6%       2011 Shanghai    Andy Murray        
39.1%       2019 Canada      Rafael Nadal       
37.9%       2008 Rome        Novak Djokovic     
36.2%       2007 Cincinnati  Roger Federer

Unless we’re prepared to put a permanent asterisk next to the Paris Masters, we should hold off on cheapening this year’s Cincinnati title. Surprisingly, Djokovic’s path was even easier at the 2013 Shanghai Masters. He had to face two top-ten opponents in the final rounds (Tsonga and Juan Martin del Potro), but Elo didn’t think that highly of them at the time.

Azarenka: asterisk squared

Evaluating the WTA title is trickier. Part of the problem is the small number of “Premier Mandatory” events, and the fact that two of them (Indian Wells and Miami) have substantially larger draws, and are thus that much harder to win. The even bigger issue is how to think about Azarenka’s final-round walkover.

Let’s start with the numbers. If we consider the five opponents that Vika defeated on court and calculate the odds that an average WTA Premier (not just Premier Mandatory) champion would beat them, her path-to-the-title number is 20.7%. If we add Osaka to the mix, on the theory that Azarenka should get credit for beating her, the resulting number is 7.4%.

Compared to the ATP numbers above, those sound pretty good. But the devil lies in the tournament-category details–the average WTA Premier event is much weaker than a marquee (dare I say “premier”?) tour stop like Cincinnati. Here’s how the Cinci title-paths stack up for the last dozen years:

20.7%       2020  Victoria Azarenka  (W/O Osaka)  
7.4%        2020  Victoria Azarenka  (d. Osaka)   
7.3%        2016  Karolina Pliskova             
5.5%        2010  Kim Clijsters                 
5.5%        2012  Li Na                         
5.3%        2015  Serena Williams               
4.5%        2011  Maria Sharapova               
4.3%        2014  Serena Williams               
4.2%        2017  Garbine Muguruza              
3.9%        2019  Madison Keys                  
2.9%        2013  Victoria Azarenka             
2.0%        2009  Jelena Jankovic               
1.3%        2018  Kiki Bertens

20.7% is respectable for a run-of-the-mill Premier–in fact, Vika’s 2016 Brisbane title was almost exactly the same, at 20.8%. But Cincinnati reliably offers tougher competition. Even if we factor in the difficulty of beating Osaka, Azarenka’s path was (barely) the easiest at the event since the Premier-level designation came into being.

Yay, nay, meh

I’ll reiterate a main point from my last article about the US Open asterisk debate: There’s no simple yes or no answer when it comes to whether a title should “count.” (That’s assuming that you even think there are circumstances under which a title should be formally discounted.) Long before the COVID-19 pandemic messed with everything, there were titles–even at the grand slam level–that were a lot easier to win than others.

Djokovic’s championship falls squarely within the usual continuum, even if it will go down as one of his least challenging. Azarenka’s is tougher to define, but more because of Osaka’s withdrawal than because of the weakness of the field. The level of competition, despite missing many top players, was plenty good enough to offer Azarenka a path to the title that was comparable at least one recent Cinci championship, and plenty of other top-tier events.

With that in mind, I’ll leave you with a couple of predictions. First: the US Open champions will face relatively easy paths to their titles, but like Djokovic’s, they will fall on the established continuum. And second: by the end of the fortnight, you’ll hope to never hear the word “asterisk” again.

How Sports are (Analytically) Different in the Bubble

Most of the world’s major sports have resumed, or will pick up again soon, in some form or other. But a lot is different, with most leagues forming one or more bubbles, often excluding fans, limiting travel, and tweaking things like officiating rules to better maintain social distance.

Many of these changes have second-order effects. For instance, the “Cincinnati” tennis event requires that players fetch their own towels–which probably slows down play–but has no fans–which could accelerate it. We’ll soon have enough data to draw some preliminary conclusions about the overall effect of the new rules on pace of play.

Some of the issues that arise when a league moves into a bubble apply across sports, like home-court advantage. With that in mind, I’m gathering evidence of how sports are playing differently in our time of social distance. I’ll try to keep this post updated as we learn more. The comments are open, so you can contribute any demonstrated effects that I haven’t listed here. (Or similar effects in other sports.) You can also tweet at me.

Baseball

So far, home-field advantage is almost non-existent. Historically, home teams win about 54% of games.

Basketball

NBA offenses can’t stop scoring. Refs are calling more fouls, and fewer off-court distractions get in the way of making shots.

The WNBA is showing the effects of a league full of fresh legs, and has displayed a record-setting pace of play. And despite playing on the same court every night, there is a marked home-court advantage.

Hockey

Fighting is up! Lucky NHLers–most of us don’t go to work where it’s culturally acceptable to hit people.

Soccer

Home-field advantage is reduced, but it still exists, even behind closed doors. A recent paper (summary / PDF) notes that refs have been more lenient than usual toward away teams. That tallies with long-held conventional wisdom that home-advantage stems from officiating bias, which is driven by noisy, partisan crowds.

Speaking of officiating, refs were more likely to grant penalty kicks, but despite the quieter environment, penalties aren’t converted any more often.

For more detail on home-field advantage in various leagues since the restart, here is a valuable Twitter thread from @recspecs730.

Tennis

I’m keeping tabs on whether match results are less predictable than usual. (They are, but we haven’t really seen enough to be sure.) Other than that, it’s still speculation. We’ll know more after “Cincinnati,” and much more after the US Open.

US Open Asterisk Talk is Premature. It Might be Flat-Out Wrong.

Many high-profile players will be missing from the 2020 US Open. Rafael Nadal opted out of the abbreviated North American swing, and Roger Federer will miss the rest of the season due to injury. More than half of the WTA top ten is skipping Flushing Meadows as well. The thinned-out fields increase the odds that a few remaining favorites, such as Novak Djokovic and Serena Williams, add another major trophy to their collection.

As a result, pundits and fans are discussing whether the 2020 US Open deserves an “asterisk.” The idea is that, because of the depleted fields, this slam is worth less than others, so much so that the history books* should note the relative meaninglessness of this year’s titles.

* Nobody buys history books anymore, so we’re really talking** about a page on the US Open website, and a never-ending edit war on Wikipedia.

** Yes, I see the irony.

From what I’ve seen, people are thinking about this the wrong way. Yes, a weak field makes it easier–in theory–to win the tournament. It’s certainly true that the 2020 champions won’t have to go through Nadal or Ashleigh Barty to get their hardware. But the field isn’t what matters.

The field isn’t what matters

I repeated that on purpose, because it’s that important. The winner of a grand slam must get through seven matches. The difficulty of securing the title depends almost entirely on his or her opponents in those seven matches. Each main draw consists of 128 players, but 120 of them are mostly irrelevant.

I say “mostly” because I can foresee some objections. Sometimes a player can compete so hard in a loss that they weaken their opponent for the next round. Take the 2009 Madrid Masters, in which Nadal needed four hours to defeat Djokovic in the semi-final, then lost to Federer in the final. We could say that Djokovic’s presence was relevant, even though Federer won the title without playing him. That sort of thing happens, though probably not as much as you think. Even when it does, it needn’t be a top tier player who wears out their opponent in an early round.

Another objection is that a depleted field affects seedings. For instance, Serena’s current WTA ranking is 9th, an unenviable position going into most slams. The 9th seed lines up for a fourth-round match with a top-eight player, meaning that she could face four top-eight players en route to the title. But with all the absences, Williams will instead be seeded third, behind only Karolina Pliskova and Sofia Kenin.

I’m not dismissing these concerns out of hand. They do matter a bit. But they only matter insofar as they affect the way the tournament plays out. The difference between the difficulties facing the 3rd and 9th seeds could be enormous … or it could be nothing, especially if the draw is riddled with early upsets.

Difficulty is a continuum

Even if you grant some credence to the objections above (or others that I haven’t mentioned), I hope you’ll agree that the most meaningful obstacles standing between a player and a grand slam title are the seven opponents he or she will need to overcome.

If those seven opponents are, on average, very strong, we would say that the player faced a particularly tough path to a slam title. Take Stan Wawrinka’s 2014 Australian Open title: he beat both Djokovic and Nadal at a time when those two were dominating the game. If the collective skill level of the seven opponents doesn’t amount to much–at least by grand slam standards–we’d say it was an easy path. For example, Federer clinched the 2006 Australian Open despite facing only a single player ranked in the top 20, and none in the top four.

We can quantify path difficulty in a variety of ways. One approach that will be useful here is to calculate the odds that an average slam champion would beat those seven opponents. The difference between easy and hard championships is enormous. The typical major titlist (that is, someone with an Elo rating around 2100) would have had a 3.3% chance of beating the seven men that Wawrinka drew in Melbourne the year that he won. Only two slam paths have ever been tougher: Mats Wilander’s routes to the 1982 and 1985 French Open titles. By contrast, the average slam champion would have had a 51% chance of going 7-0 when faced by Federer’s 2006 Australian Open draw.

The extreme “easy” draw is fifteen times easier than the extreme “hard” draw. Fifteen times! You can find plenty of champions for any approximate level of difficulty in between those extremes. The typical slam champ would’ve had a 10% chance of doing what Djokovic did in progressing through seven rounds at the 2011 US Open. Same in New York in 2012. Andy Murray’s 2016 Wimbledon path would have given the average champion a 20% chance. The 2018 Roland Garros draw was manageable for Rafael Nadal, and a typical major titlist would have had a 30% chance of securing those seven match wins.

None of this is to say that any of those players did or didn’t “deserve” their titles. Federer didn’t choose his 2006 Melbourne opponents any more than Wawrinka selected his foes eight years later. The trophy is the same, and in many important ways, their achievements are the same–both of the Swiss stars swept away all of their opponents, who in turn were the best performers (at least during those fortnights) of the players who showed up.

Asterisks for everybody

Here’s another thing 2006 Roger and 2014 Stan had in common: Almost all of the best players in the world participated in the tournaments that they ultimately won. (I say “almost” because defending champion Marat Safin was injured and missed the 2006 Aussie Open.) The “field” was effectively the same, but to win the titles, one player cruised through a two-week cakewalk and the other needed to put together one of the most impressive final weeks of the modern era.

Tennis fans have collectively decided that each major title counts as “one.” It doesn’t have to be that way: We could give more “slam points” for achievements like Wawrinka’s and grant fewer for the easy ones. Most people don’t like this idea, and I admit that it sounds a bit weird. I’m not advocating it for general use, though it is an interesting concept that I’ve pursued in a number of earlier articles, showing that Djokovic’s majors are–on average–more impressive than Nadal’s, which in turn have been tougher than Federer’s. Weighting majors by difficulty results in some changes in the order of the all-time grand slam list, ensuring that fans of all players hate me because I wrote some code and played with some spreadsheets.*

* With, I admit, malice aforethought.

Adjusting slam counts for difficulty is, in a sense, asterisking every slam title. The tricky draws get an acknowledgement of their difficult, and the ones that opened up get tweaked to account for their ease. It’s a continuum, not a simple up-and-down decision between normal slams and abnormal slams.

The 2020 US Open champions will probably have title paths that sit in the easier half of that continuum. But even that modest claim is far from guaranteed.

Let’s say Venus Williams recaptures her vintage form and wins the title, beating 3rd seed Serena in the quarter-finals, 2nd seed Kenin in the semis, and top seed Pliskova in the title match. (It doesn’t matter if the surprise winner is Venus–it could be any lower-ranked player, though Venus seems more plausible than most.) An average slam champion would beat those three players in succession about 37% of the time. 37% is already lower odds than about 20% of women’s slam draws in the last 45 years. (Kenin’s Australian Open title rated 39%.)

37% for Venus’s hypothetical title isn’t even the whole story–four more rounds of journeywomen would knock the number down to around 26%–harder than one-third of women’s slam draws. Add in another tricky opponent or two–maybe Cori Gauff, or Petra Kvitova in the fourth round–and suddenly the path to the 2020 US Open women’s championship is just as hard as the typical slam.

It’s even easier to illustrate how the 2020 US Open men’s title could be as difficult as many other slams. By the numbers, simply upsetting Djokovic (simply! ha!) is more difficult than it was to defeat all seven of Federer’s opponents at the 2006 Australian Open. That’s right: Six withdrawals and one win over Novak wouldn’t be the easiest slam victory in the last 15 years. Tack on six actual wins, including a few against strong opponents, and the result is a seven-match path that stands up against the typical non-pandemic slam.

Ironically, the player who could win the title with the weakest possible draw is Djokovic. It would be odd to claim that any of Novak’s accomplishments should be asterisked, but it does make things much simpler when he doesn’t have to beat himself.

Masked competitiveness

Once again, the field doesn’t really matter. When we focus on the players who are in New York instead of the few dozen who aren’t, we see that the ingredients are in place for a couple of respectable path to US Open titles. Wilander’s and Wawrinka’s marks are probably safe, but it’s more than possible that the winners will have faced competition equivalent to that of the average slam champ.

At the very least, we don’t know any better until the tail end of the second week. Until then, asterisk talk is premature. After that, it will probably be moot.

The Post-Covid WTA is Drifting Back to Normal

In the two latest WTA events, we saw a mix of the expected and the unusual. Simona Halep, the heavy favorite in Prague, wound up with the title despite a couple of demanding three-setters in her first two rounds. The week’s other tournament, in Lexington, failed to follow the script. Serena Williams and Aryna Sabalenka, the big hitters at the top and bottom of the bracket, combined for three wins, with four unseeded players making up the semi-final field.

Last week I pointed out that Palermo–the tour’s initial comeback event–was so unpredictable that you would’ve been better off to treat each match as a coin flip than to use pre-layoff player strength ratings (such as Elo) to forecast outcomes. Such an upset-ridden event isn’t unheard of, even in pandemic-free times, but it is suggestive that the WTA rank-and-file haven’t quite returned to their usual form.

Prague and Lexington give us three times as much data to work with. Plus, we might theorize that Prague would be a little more predictable because so many players in that field also took part in the Palermo event, meaning that they have a little more recent match experience. While our sample of 93 main draw matches is still flimsy, it brings us a little closer to understanding how well traditional forecasts will handle this unusual time.

A thorny Brier patch

The metric I’m using to quantify predictability–or to put it another way, the validity of pre-layoff player ratings–is Brier Score, which takes into account both raw accuracy (did the forecast pick the right player to win?) and confidence level (was the forecast too strong, too weak, or just right?). Tour-level Brier Scores are usually in the range of 0.21, while a score of 0.25 means the predictions were no better than coin flips. A lower score represents more accurate predictions.

Here are the Brier Scores for Palermo, Lexington, and Prague, along with the average of the three, and the average of all WTA International events (on all surfaces) since 2017. (The scores are based on forecasts generated from my Elo ratings.) We might expect the first round to be different, since players are particularly rusty at that stage, so I’ve also broken out first round (“R32 Brier”) matches for each of the tournaments and averages in the table.

Tournament    Brier  R32 Brier  
Palermo       0.268      0.295  
Lexington     0.226      0.170  
Prague        0.212      0.247  
Comeback Avg  0.235      0.237  
Intl Avg      0.217      0.213

As we last week, the Palermo results truly defied expectations. More than half of the matches were upsets (according to my Elo ratings), with a particularly unpredictable first round.

That didn’t last. The Prague first round rated 0.247–just barely better than coin flips–but the messiness didn’t last beyond the first couple of days. The event’s overall Brier Score was 0.212, slightly better than the average WTA International. In other words, this group of 32 women, only recently returned from a months-long break, delivered results that were roughly as predictable as we would expect in the middle of a normal season.

The Lexington numbers are a bit more difficult to make sense of, but like Prague’s, they point to a post-coronavirus world that isn’t all that weird. The opening round closely followed the script, with a Brier Score of 0.170. Of the last 115 WTA International events, only 22 were more predictable. The forecast accuracy didn’t last, in large part because of Serena’s loss at the hands of Shelby Rogers. The rating for the entire tournament was 0.226, less predictable than usual, but much better than random guessing and closer to tour average than to the assumption-questioning Palermo numbers.

Revised estimates

We’re still early in the process of evaluating what to expect from players after the COVID-19 layoff. As more tournaments take place, we can identify whether players become more predictable with more matches under their belts. (Perhaps the Prague participants who skipped Palermo were more difficult to forecast, although Halep is an obvious counterexample.)

At this point, anything is possible. It could be that we will steadily drift back to business is usual. On the other hand, the new social-distancing-oriented rules–with few or no fans on site, nightlife limited to Netflix, players fetching their own towels, and new variations of on-court coaching–might work to the advantage of some women and the disadvantage of others. If that’s the case, Elo ratings will go through a novel period of adjustment as they shift to reflect which players thrive on the post-corona tour.

It’s too early to do much more than speculate about something as significant as that. But in the last week, we’ve seen forecasts go from wildly wrong (in Palermo) to not half bad (in Lexington and Prague). We’ve gained some confidence that for all the things that have obviously changed since March, our approach to player ratings may be one thing that largely remains the same.

Did Palermo Show the Signs of a Five-Month Pandemic Layoff?

Are tennis players tougher to predict when they haven’t played an official match for almost half a year? Last week’s WTA return-to-(sort-of)-normal in Palermo gave us a glimpse into that question. In a post last week I speculated that results would be tougher than usual to forecast for awhile, necessitating some tweaks to my Elo algorithm. The 31 main draw matches from Sicily allow us to run some preliminary tests.

At first glance, the results look a bit surprising. Only two of the eight seeds reached the semifinals, and the ultimate champion was the unseeded Fiona Ferro. Two wild cards reached the quarters. Is that notably weird for a WTA International-level event? It doesn’t seem that strange, so let’s establish a baseline.

Palermo the unpredictable

My go-to metric for “predictability” is Brier Score, which measures the accuracy of percentage forecasts. It’s nice to pick the winner, but it’s more important to assign the right level of probability. If you say that 100 matches are all 60/40 propositions, your favorites should win 60 of the 100 matches. If they win 90, you weren’t nearly confident enough; if they win 50, you would’ve been better off flipping a coin. Brier Score encapsulates those notions into a single number, the lower the better. Roughly speaking, my Elo forecasts for ATP and WTA matches hover a bit above 0.2.

From 2017 through March 2020, the 975 completed matches at clay-court WTA International events had a collective Brier Score of 0.223. First round matches were a tiny bit more predictable, with R32’s scoring 0.219.

Palermo was a roller-coaster by comparison. The 31 main-draw matches combined for a Brier Score of 0.268. Of the 32 other events I considered, only last year’s Prague tourney was higher, generating a 0.277 mark.

The first round was more unpredictable still, at 0.295. On the other hand, the combination of a smaller per-event sample and the wide variety of first-round fields means that several tournaments were wilder for the first few days. 9 of the 32 others had a first-round Brier Score above 0.250, with four of them scoring higher–that is, worse–than Palermo did.

The Brier Score of shame

I mentioned the 0.250 mark because it is a sort of Brier Score of shame. Let’s say you’re predicting the outcome of a series of coin flips. The smart pick is 50/50 every time. It’s boring, but forecasting something more extreme just means you’re even more wrong half the time. If you set your forecast at 50% for a series of random events with a 50/50 chance of occurring, your Brier Score will be … 0.250.

Another way to put it is this: If your Brier Score is higher than 0.250, you would’ve been better off predicting that every match was 50/50. All the fancy forecasting went to waste.

In Palermo, 17 of the 31 matches went the way of the underdog, at least according to my Elo formula. The Brier Scores were on the shameful side of the line. My earlier post–which advocated moderating all forecasts, at least a bit–didn’t go far enough. At least so far, the best course would’ve been to scrap the algorithm entirely and start flipping that coin.

Moderating the moderation

All that said, I’m not quite ready to throw away my Elo ratings. (At the moment, they pick Simona Halep and Aryna Sabalenka, my two favorite players, to win in Prague in Lexington. So there’s that.) 31 matches is small sample, far from adequate to judge the accuracy of a system designed to predict the outcome of thousands of matches each year. As I mentioned above, Elo failed even worse at Prague last year, but because that tournament didn’t follow several months of global shutdowns, it wouldn’t have even occurred to me to treat it as more than a blip.

This time, a week full of forecast-busting surprises could well be more than a blip. Treating players as if they have exactly the abilities they had in March is probably the wrong way to do things, and it could be a very wrong way of doing things. We’ll triple the size our sample in the next week, and expand it even more over the next month. It won’t help us pick winners right now, but soon we’ll have a better idea of just how unpredictable the post-COVID-19 tennis world really is.

Elo, Meet COVID-19

Tennis is back, and no one knows quite what to expect. Unpredictability is the new normal at both the macro level–will the US Open be a virus-ridden disaster?–and the micro level–which players will come back stronger or weaker? While I plead ignorance on the macro issues, estimating player abilities is more in my line.

Thanks to global shutdowns, every professional player has spent almost five months away from ATP, WTA, and ITF events–“official” tournaments. Some pros, such as those who didn’t play in the few weeks before the shutdowns began, or who are opting not to compete at the first possible opportunity, will have sat out seven or eight months by the time they return to court. Exhibition matches have filled some of the gap, but not for every player.

Half a year is a long time without any official matches. Or, from the analyst’s perspective: It’s tough to predict a player’s performance without any data from the last six months.

Increased uncertainty

Let’s start with the obvious. All this time off means that we know less about each player’s current ability level than we did before the shutdown, back when most pros were competing every week or two. Back in March, my Elo ratings put Dominic Thiem in 5th place, with a rating of ~2050, and David Goffin in 15th, with a rating of ~1900. Those numbers gave Thiem a 70% chance of winning a head-to-head.

What about now? Both men have played in exhibitions, but can we be confident that their levels are the same as they were in March? Or that they’ve risen or fallen roughly the same amount? To me, it’s obvious that we can’t be as sure. Whenever our confidence drops, our predictions should move toward the “naive” prediction of a 50/50 coin flip. A six-month coronavirus layoff isn’t that severe, so it doesn’t mean that Thiem is no longer the favorite against Goffin, but it does mean our prediction should be closer to 50% than it was before.

So, 60%? Maybe 65%? Or 69%? I can’t answer that–yet, anyway.

The (injury) layoff penalty

My Elo ratings already incorporate a layoff penalty, which I introduced here. The idea is that if a player misses a substantial amount of time (usually due to injury, but possibly because of suspension, pregnancy, or other reasons), they usually play worse when they come back. But it’s tough to predict how much worse, and players regain their form at different rates.

Thus, the tweak to the rating formula has two components:

  • A one-time penalty based on the amount of time missed (more time off = bigger penalty)
  • A temporarily increased k-factor (the part of the formula that determines how much each match increases or decreases a player’s rating) to account for the initial uncertainty. After an injury, the k-factor increases by a bit more than 50%, and steadily declines back to the typical k-factor over the next 20 matches.

Not an injury

A six-month coronavirus layoff is not an injury. (At least, not for players who haven’t lost practice time due to contracting COVID-19 or picking up other maladies.) So the injury-penalty algorithm can’t be applied as-is. But we can take away two ideas from the injury penalty:

  • If we generate those closer-to-50% forecasts by shifting certain players’ ratings downward, the penalty should be less than the injury penalty. (The minimum injury penalty is 100 Elo points for a non-offseason layoff of eight or nine weeks.)
  • The temporarily increased k-factor is a useful tool to handle the type of uncertainty that surrounds a player’s ability level after a layoff.

The injury-penalty framework is useful because it has been validated by data. We can look at hundreds of injury (and other) layoffs in modern tennis history and see how players fared upon return. And the numbers I use in the Elo formula are based on exactly that. We don’t have the same luxury with the last six months, because it is so unprecedented.

Not an offseason, but…

The closest thing we have to a half-year shutdown in existing tennis data is the offseason. The sport’s winter break is much shorter, and it isn’t the same for every player. Yet some of the dynamics are the same: Many players fill their time with exhibitions, others sit on the beach, some let injuries recover, others work particularly hard to improve their games, and so on.

Here’s a theory, then: The first few weeks of each season should be less predictable than average.

Fact check: False! For the years 2010-19, I labeled each match according to how many previous matches the two players had contested that year. If it was both players’ first match, the label was 1. If it was one player’s 15th match and the other’s 21st, the label was the average, 18. Then, I calculated the Brier Score–a measure of prediction accuracy–of the Elo-generated predictions for the matches with each label.

The lower the Brier Score, the better. If my theory were right, we would see the highest Brier Scores for the first few matches of the season, followed by a decrease. Not exactly!

The jagged blue line shows the Brier Scores for each individual label (match 1, match 2, match 23, etc), while the orange line is a 5-match moving average that aims to represent the overall trend.

There’s not a huge difference throughout the season (which is reassuring), but the early-season trend is the opposite of what I predicted. Maybe the women, with their slightly longer offseason, will make me feel better?

No such luck. Again, the match-to-match variation in prediction accuracy is very small, and there’s no sign of early-season uncertainty.

I will not be denied

Despite disproving my own theory, I still expect to see an unpredictable couple of post-pandemic months. The regular offseason is something that players are accustomed to, and there is conventional wisdom in the game surrounding how to best use that time. And it’s two months, not five to seven. In addition, there are many other things that will make tour life more challenging–or different, at the very least–in 2020, such as limited crowds, social distancing protocols, and scheduling uncertainty. Some players will better handle those challenges than others, but it won’t necessarily be the strongest players who respond the best.

So my Elo ratings will, for the time being, incorporate a small penalty and a temporarily increased k-factor. (Something more like 69% for Thiem-Goffin, not 60%.) I haven’t finished the code yet, in large part because handling the two different types of layoffs–coronavirus and the usual injuries, etc–makes things very complicated. If you’re watching closely, you’ll see some minor tweaks to the numbers before the “Cincinnati” tournament in a few weeks.

There is a right answer

It’s clear from what I’ve written so far that any attempt to adjust Elo ratings for the COVID-19 layoff is a bit of a guessing game. But it won’t always be that way!

By the end of the year, we’ll know the right answer: just how unpredictable results turned out to be in the early going. Just as I’ve calculated penalties and k-factor adjustments for injury layoffs based on historical data, we will be able to do the same with match results from the second half of 2020. To be more precise, we’ll be able to work out a class of right answers, because one adjustment to the Elo formula will give us the best Brier Score, while another will best represent the gap between Novak Djokovic and Rafael Nadal, while others could target different goals.

The ultimate after-the-fact COVID-19 Elo-formula adjustment won’t help you win more money betting on tennis, but it will give us more insight into how the coronavirus layoff affected players after so much time off, and how quickly they returned to pre-layoff form. We’ll understand a little bit more about the game, even if we desperately hope never to have reason to apply the newly-won knowledge.