Maybe, Finally, The Next Generation is Here

Italian translation at settesei.it

Alexander Zverev is winning Masters titles. Stefanos Tsitsipas is beating top ten players. Denis Shapovalov, Frances Tiafoe, and even Alex De Minaur are making life more difficult for ATP veterans.

For most of the last decade, the story of men’s tennis has been the degree to which the game is getting older. Even now, thirty-somethings hold half of the places in the top ten. Wave after wave of hyped prospects have failed to take over the sport, settling in for a long fight to the top.  On Monday, Juan Martin del Potro, once hailed as the man who would topple the Big Four, will reach a new career-best ranking of No. 3 … six weeks away from his 30th birthday.

At last, though, men’s tennis appears to be getting younger. Teenagers Shapovalov, Tiafoe, and De Minaur are rising just as some of the game’s crustiest vets are on their way out: 36-year-olds David Ferrer and Julien Benneteau are calling it quits this year, tumbling in the rankings alongside the likes of Feliciano Lopez and Ivo Karlovic.

The result is that the average age of the ATP top 50 is falling–something it hasn’t done for a really, really long time. The following graph shows the average age of the top 50 at the end of every season since 1983, plus–the rightmost data point–the mean age of the current top 50:

At the end of 2017, the average age was 29.0 years; it has since fallen to 27.75. That’s bigger than any single-year swing (up or down) in the last 35 years. As the graph shows, there were plenty of “down” years in the late 1990s and early 2000s, but none of them had even half the magnitude of the current drop.

There’s still an enormous gap between the current state of affairs and the days when men’s tennis was young. If we expand our view to the top 100, this year’s shift is less dramatic–with Ferrer, Benneteau, Lopez and others ranked between 51 and 100, that average still sits at 28.1 years, only about seven months younger than the corresponding number at the end of last season. But even that weaker evidence of a youth movement points in the same direction: 28.1 years is the youngest the top 100 has been since 2012.

Barring fundamental changes in rules or equipment, we’re unlikely to return to the teenage-driven game of the early 1990s. But after a decade of waiting, watching, and wondering, we can see some cracks in the greatest generation of men’s tennis. And finally, there’s a group of young players ready to take advantage.

The Cost of a Double Fault

We all know that double faults aren’t good, but it’s less clear just how bad they are. Over the course of an entire match, a single point here or there doesn’t seem to matter too much, especially when a double fault creeps in at a harmless moment, like 40-love. Yet many missed second serves are far more costly. Let’s try to quantify the impact of tennis’s most enervating outcome.

To do this, we need to think in terms of win probability. In each match, a player wins a certain percentage of service points and a certain percentage of return points. If those rates are sufficiently dominating–say, Mihaela Buzarnescu’s 65% of service points won and 59% of return points won in last week’s San Jose final–the player’s chance of winning the match is 100%. No matter how unlucky or unclutch she was, those percentages result in a win. But in a close contest, in which both players win about 50% of points (often referred to as “lottery matches”), the result is heavily influenced by clutch play and luck. In Buzarnescu’s tour de force, flipping the result of a single point would be meaningless. But in a tight match, like the Wimbledon semifinal between John Isner and Kevin Anderson, a single point could mean the difference between a spot in the championship match and an early flight home.

My aim, then, is to measure the average win probability impact of a double fault. To take another example, consider last week’s Washington quarter-final between Andrea Petkovic and Belinda Bencic. Bencic won nearly 51% of total points–59% of her service points and 42% on return–but lost in a third-set tiebreak. Those serve and return components were enough to give her a 56.3% chance of winning the match: claiming more than half of total points usually results in victory, but so close to 50%, there’s plenty of room for things to go the other way.

I refer to this match because double faults played a huge role. Bencic tallied 12 double faults in 105 service points, a rate of 11.4%, more than double the WTA tour average of 5.1%. Had she avoided those 12 double faults and won those points at the same rate as her other 93 service points, she would have ended up with a much more impressive service-points-won rate of 67%. Combined with her 42% rate of return points won, that implies an 87% chance of winning the match–more than 30 percentage points higher than her actual figure! Roughly speaking, each of her 12 double faults cost her a 2.5% chance (30% divided by 12) of winning the match.

A double fault rate above 10% is unusual, but a cost of 2.5% per offense is not. When we run this algorithm across the breadth of the ATP and WTA tours, we find that the cost of double faults adds up fast.

Tour averages

Using the method I’ve described above–replacing double faults with average non-double-fault service points–and taking the average of all tour-level matches in 2017 and 2018 through last week’s tournaments, we find that the average WTA double fault costs a player 1.83% of a win. Put another way, every 55 additional double faults subtracts one match from the win column and adds one to the loss column.

In the men’s game, the equivalent number is 1.99% of a win. The slightly bigger figure is due to the fact that men, on average, win more service points, so the difference between a double fault and a successful service offering is greater.

There is, however, an alternative way we could approach this. By comparing double faults to all other service points, we’re trading a lot of the double faults for first serve outcomes. We might be more interested in knowing how a player would fare if his or her second serve were bulletproof–still eliminating double faults, but replacing them specifically with second serves instead of a generic mix of service points.

In that case, the algorithm remains very similar. Instead of replacing double faults with non-double-fault serve points, we replace them with non-double-fault second serve points. Then the cost of a double fault is a little bit less, because second serve points result in fewer points won than service points overall. The second-serve numbers are 1.61% per double fault in the women’s game and 1.70% per double fault in the men’s game. For the remainder of this post, I’ll stick with the generic service points, but one approach is not necessarily better than the other; they simply measure different things.

Building a player-specific stat

Odious as double faults are, they are not completely avoidable. Very few players are able to sustain a double fault rate below 2%, and tour averages are around twice that. Since the beginning of 2017, the ATP average has been about 3.9%, and the WTA average roughly 5.1%, as we saw above.

We can measure players by considering their match-by-match double fault rates compared to tour average. In Bencic’s unfortunate case, her 12 double faults were 6.7 more than a typical player would’ve committed in the same number of service points. In contrast, in the same match, Petkovic recorded only 3 double faults in 102 service points, 2.2 double faults fewer than an average player would have.

We know that each WTA double fault affects a player’s chances of winning the match by 1.83%, so compared to an average service performance, Bencic’s excessive service errors cost her about a 17% chance of winning (6.7 times 1.83%), while Petkovic’s stinginess increased her own odds by about 6.6% (2.2 times 1.83%).

Repeat the process for every one of a player’s matches, and you can assemble a longer-term statistic. Let’s start with the WTA players who, since the start of last season, have cost themselves the most matches (“DF Cost”–negative numbers are bad), along with those who have most improved their lot by avoiding double faults:

Player                   DF%  DF Cost  
Kristina Mladenovic     7.7%    -3.84  
Daria Gavrilova         7.9%    -3.77  
Jelena Ostapenko        7.7%    -3.58  
Petra Kvitova           8.1%    -3.01  
Camila Giorgi           8.3%    -2.63  
Oceane Dodin           10.2%    -2.51  
Donna Vekic             7.0%    -1.91  
Venus Williams          6.7%    -1.71  
Coco Vandeweghe         6.4%    -1.60  
Aliaksandra Sasnovich   6.7%    -1.55  
…                                      
Agnieszka Radwanska     2.3%     1.27  
Sloane Stephens         2.1%     1.43  
Caroline Wozniacki      3.2%     1.43  
Barbora Strycova        3.5%     1.47  
Elina Svitolina         3.9%     1.48  
Simona Halep            3.5%     1.53  
Qiang Wang              2.6%     1.54  
Anastasija Sevastova    3.1%     1.57  
Carla Suarez Navarro    2.1%     1.67  
Caroline Garcia         3.6%     1.82

And the same for the men:

Player                  DF%  DF Cost  
Benoit Paire           6.2%    -4.51  
Ivo Karlovic           5.8%    -3.63  
Fabio Fognini          5.0%    -2.38  
Denis Shapovalov       6.3%    -2.26  
Grigor Dimitrov        5.1%    -2.25  
Gael Monfils           5.0%    -2.22  
David Ferrer           5.2%    -2.06  
Jeremy Chardy          5.3%    -2.00  
Fernando Verdasco      4.8%    -1.94  
Jack Sock              4.8%    -1.73  
…                                     
Roger Federer          2.1%     0.88  
Tomas Berdych          2.9%     0.89  
Juan Martin del Potro  2.8%     0.93  
Albert Ramos           3.1%     0.97  
Pablo Carreno Busta    2.2%     1.07  
Richard Gasquet        2.6%     1.12  
John Isner             2.6%     1.23  
Dusan Lajovic          1.9%     1.23  
Denis Istomin          1.9%     1.23  
Philipp Kohlschreiber  2.5%     1.24

Situational double faults

These aggregate numbers have the potential to hide a lot of information. They consider only two things about each match: how many double faults a player committed, and how close the match was. This statistic would treat Bencic the same whether she hit nine of her double faults at 40-love, or nine of her double faults in the third-set tiebreak. Yet the latter would have a colossally greater impact.

While this is an important limitation to keep in mind, it appears that double faults are distributed relatively randomly. That is, most players do not hit a majority of their double faults in particularly high- or low-leverage situations. The player lists displayed above show both the most basic stat–double fault percentage–along with my more complex approach. For players with at least 20 matches since the beginning of last season, double fault rate is very highly correlated with the match-denominated cost of double faults. (For men, r^2 = 0.752, and for women, r^2 = 0.789.) In other words, most of the variance in double fault cost can be explained by the number of double faults, leaving little room for other factors, such as the importance of the situation when double faults are committed.

That said, there’s plenty of room for additional analysis into those specific sitations. Instead of taking a match-level look at win probability, as I have here, one could identify the point score of every single one of a player’s double faults, and see how each event affected the win probability of that match. I suspect that, for most players, that would amount to a whole lot of extra complexity for not a lot of added insight, but perhaps there are some players who are uniquely able to land their second serve when it matters most, or particularly prone to double faults at key moments. This match-level look has made it clear how costly double faults can be, and it’s possible that for some players, missed serves are even more damaging than that.

How Servers Respond To Double Faults

Italian translation at settesei.it

In the professional game, double faults are quite rare. They sometimes reflect a momentary lapse in concentration, and can negatively impact a server’s confidence. Players are sometimes particularly careful after losing a point to a double fault, taking some speed off their next delivery, or aiming closer to the middle of the box.

Let’s dig into some data from last year’s grand slams to see what players do–and how it affects their results–immediately after double faults. IBM’s Slamtracker provided point-by-point data for most 2017 grand slam singles matches, including serve speed and direction, and the available matches give us about 5,000 double faults to work with. (I’ve organized the data and made it freely available here.)

For each server in each match, I’ve tallied their results on points immediately following double faults. (That means that we exclude after-double-fault points when the double fault ended the game.) Then, for each player, I compared those results with match-long averages. Because double faults are so unusual, and because we only have this data for the majors, the sample isn’t adequate to tell us much about individual players. But for tour-wide analyses, it’s more than enough.

Serve points won: As we’ll see in a moment, men and women have different overall tendencies on the point following a double fault. But by the most important measure of simply winning the next point, gender plays little part. Men, who in this sample win 65.1% of service points, fall just over one percentage point to 64.0% on the point following a double fault. Women, who average 57.8% of service points won, drop even more, to 56.1% after a double.

First serve percentage: I expected that servers become more conservative immediately after a double fault. For women, that hypothesis is correct: In these matches, they land 63.3% of their first serves, while after a double fault, that number jumps to 65.4%. On the other hand, men don’t seem to change their approach very much. On average, they make 62.3% of their first offerings, a number that barely changes, to 62.5%, after double faults.

First serve points won: Here is additional evidence that women become more conservative after double faults, while men do not. In general, women win 63.7% of their first serve points, but just after a double fault, that number drops to 62.9%. For men, there is a decrease in first serve points won, but it is almost as small as their difference in first serve percentage: 72.7% overall, 72.4% after a double fault.

First serve speed: With serve speed, we run into a limitation of the Slamtracker data, which gives us speed only for those serves that go in. So when we look at the average speed of first serves, we’re excluding attempts that miss the box. Even with that caveat, the data keeps pointing in the same direction. Contrary to my “conservative” hypothesis, men serve a bit faster than usual after a double fault–183.3 km/h following doubles, versus 182.8 km/h in general. Women do seem to change their tactics, dropping from an average speed of 155.5 km/h to a post-double-fault pace of 152.2 km/h.

First serve direction: Slamtracker divides serve direction into five categories: wide, body-wide, body, body-center, and center. After a double fault, men are less likely than usual to hit a wide serve (24.1% to 25.8%), and those serves get split roughly evenly between the body and center categories. The difference in body serves is most striking: They account for only 3.5% of first serves overall, but 4.4% of post-double first serves. This may be the one way in which men opt for the conservative path, by maintaining speed but giving themselves a wider margin of error.

Women move many of their after-double-fault serves toward the middle of the box. On average, over 44% of serves are classified as either “wide” or “center,” but immediately after a double fault, that number drops below 41%. It’s not a huge difference, but like all of the other tendencies we’ve seen in the women’s game, it suggests that for many players, caution creeps in immediately after missing a second serve.

Tactics

As usual, it’s difficult to move from these sorts of findings to any sort of tactical advice. Even the first data point, that both men and women win fewer service points than usual right after they’ve double faulted, can be interpreted in multiple ways. By one reading, players may be serving too conservatively, missing out of the benefits of big first serves. On the other hand, if confidence is an issue, perhaps serving more aggressively would just result in more misses.

When in doubt, we have to trust that the players and coaches know what they’re doing–they’ve honed these tradeoffs through decades of experience and thousands of hours of match play. For fans, these numbers add to our understanding of the conclusions that players have reached. For the pros, perhaps a more detailed look at what happens after a double fault would help tweak their own strategies, both bouncing back from their own double faults and taking advantage of the lapses in concentration of their opponents.

Podcast Episode 27: Roland Garros Preview

Episode 27 of the Tennis Abstract Podcast, with Carl Bialik of the Thirty Love podcast, will help get you ready for the 2018 French Open. We talk through the men’s and women’s draws, with a focus on the unpredictability of the women’s field, the towering presence of Rafael Nadal in the men’s, and the big-name floaters lurking in both brackets.

We also touch on Mike Bryan’s new doubles partner, Serena/Venus in the women’s doubles, and the long-delayed suspension of Nicolas Kicker for match fixing. Thanks for listening!

(Note: this week’s episode is about 48 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Update: Episode index with links, thanks to FBITennis:

Roland Garros Women’s Draw (and Serena v Sharapova) 1:08
How much do head-to-head records matter? 5:09
Top Quarter Analysis: Simona Halep as a weak favorite 8:39
Second Quarter Analysis: Muguruza, Sleeper? 12:21
Third Quarter Analysis: Potential Ostapenko v Svitolina Quarterfinal 13:51
Fourth Quarter Analysis: Kvitova favorite, but Jeff has a “Super Dark Horse” 15:36
Jeff’s and Carl’s Picks on the Women’s Side 17:40
Roland Garros Men’s Draw:  Nadal’s Tournament to Win 19:57
Do you take Kyle Edmund for your Fantasy Tennis Team? 24:56
Or Lucas Pouille? 27:44
Third Quarter Analysis: Djokovic’s quarter 29:22
Fourth Quarter Analysis:  Zverev over Thiem, and can Wawrinka or Nishikori surprise? 31:55
Draw Luck on the Men’s Side 34:58
Men’s Doubles:  Bryan/Bryan Streak Broken 37:19
Women’s Doubles:  Williams Sisters 39:17
Kicker Kicked to the Curb; Match Fixing 40:48
On Demand Video of Roland Garros Qualies 45:38

Unseeded Serena and the Roland Garros Draw

In a wide-open women’s field at this year’s French Open, it seems fitting that one of the most dangerous players in the draw isn’t even seeded. Serena Williams has played only four matches–none of them on clay–since returning to tour after giving birth. As such, her official WTA ranking is No. 453, and her current match-play level is anyone’s guess.

Because her ranking is low, she needed to use the ‘special ranking’ rule to enter the tournament, and the rule doesn’t apply to seedings. (I’m not going to dive further into the debate about how the rule should work–I’ve written a lot about the rule in the past.) As an unseeded player, she could have drawn anyone in the first round; in that sense, she was a bit lucky to end up opposite another unseeded player, Kristyna Pliskova, in the first round. Her wider draw section is manageable as well, with a likely second-round match against 17th seed Ashleigh Barty and a possible third-rounder with 11th seed Julia Goerges. If she makes it to the round of 16, we’ll probably be treated to a big-hitting contest between Serena and Karolina Pliskova or Maria Sharapova.

According to my Elo-based forecast, a best guess about the level of post-pregnancy Serena is that she’s the 7th best overall player in the field, and 9th best on clay. That gives her about a 40% chance of winning her first three matches and reaching the second week, a 6.2% chance of making it to the final, and a 3.1% chance of adding yet another major title to her haul.

What if she were seeded? Seeds are a clear advantage for players who receive them, as a seeding protects against facing other top contenders until later rounds. By simulating the tournament with Serena seeded, we can get a sense of how much the WTA’s rule (and the French Federation’s decision not to seed her) impacts her chances.

Seeded 7th: Let’s imagine a bizarre world in which my Elo ratings were used for tournament seedings. In that case, Serena would be seeded 7th, knocking Caroline Garcia down to 8th and sending current 32nd seed Alize Cornet into the unseeded pool. That would be a clear advantage: 50/50 odds of reaching the fourth round, a 9% chance of playing in the final, and a 4.4% shot at the title, compared to 3.1% in reality.

Seeded 1st: If seeds were assigned based on protected ranking, Serena would be the top seed. You can’t get much more of an advantage than that: The top seed is protected from playing either of the other top-four seeds until the semifinals, for instance. (It’s no insurance against a meeting with 28th seed Sharapova, but Serena, of all people, isn’t worried about that.) Moving from 7th to 1st would give her another boost, but it’s a modest one: As the top seed, her chances of sticking around for the second week would still be 50/50, with 10.1% and 4.7% odds of reaching the final and winning the title, respectively.

Here’s a summary of Serena’s chances in the various seeding scenarios. The final column is “expected points”–a weighted average of the number of WTA ranking points she is expected to collect, given her likelihood of reaching each round.

Scenario     R16  Final  Title  ExpPts  
Actual     39.8%   6.2%   3.1%     273  
Unseeded*  34.4%   6.2%   3.0%     259  
Seeded 7   50.3%   9.0%   4.4%     356  
Seeded 1   50.5%  10.1%   4.7%     371

* the ‘unseeded’ scenario represents Serena’s chances as an unseeded entrant, given a random draw. She got a little lucky, avoiding top players until the 4th round, though her chances of making the final end up the same.

Seeds matter, though there’s only so much they can do. If Serena really is at a barely-top-ten level, she’s a long shot for the title regardless of whether there’s a number next to her name. If my model grossly underestimates her and she’s back at previous form–let’s not forget, she made the final the last time she played here, and won the title the year before that–then the rest of the field will once again look like a bunch of flies for her to swat away, regardless of which numbers they have next to their names.

Podcast Episode 26: Reasons for Optimism

Episode 26 of the Tennis Abstract Podcast, with Carl Bialik of the Thirty Love podcast, starts with the superlative performances of Elina Svitolina, Maria Sharapova, Alexander Zverev, and Novak Djokovic in Rome, and considers how they cause us to revise our estimates of those players. For Djokovic, Sharapova, and Serena Williams, we compare their current Elo ratings–including penalties for time off–to our perceptions of their current levels.

We also talk about the pros and cons of transitioning to a “weak era,” as well as the potential role of fatigue when tomorrow’s opponents don’t get the same amount of rest today. Thanks for listening!

(Note: this week’s episode is about 62 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

A few things we talked about:

Update: Episode index with links, thanks to FBITennis:

Zverev v Nadal (Rome Final) 0:58
Is Nadal declining? 4:37
Zverev’s narrow margins (tiebreak success) 7:15
Nostalgia sidebar:  Honoring Ferrer’s clay prowess 10:40
Svitolina’s tremendous success in finals 11:47
WTA Rome final takeaways, if any 15:11
Halep’s not-tremendous (recent) results in finals…and Wozniacki too 17:15
Is Novak back? ELO says he only sorta left. 22:34
Djokovic’s “good” losses 28:30
Roland Garros favorites; comparing conventional wisdom and analytics 30:04
A “weak era” is a good thing!  Maybe. 35:13
Serena a true #11 on clay; Sharapova a true #9 on clay (ELO, adjusted) 44:02
Will Ostapenko defend her French Open title? “Shock heuristic” says no. 53:03
The effect of fatigue between matches 55:22

New Match Charting Project Excel Template

For the first time in more than two years, I’m ready to release a new, substantially improved version of the MatchChart excel template, the platform on which volunteers log matches for the Match Charting Project.

New in version 0.2.0:

  • Color-coding of the players, as well as game-ending and set-ending rows, to make it easier to keep track of where you are in a match;
  • A new shot code for drop volleys, to differentiate them from other volleys;
  • Total points won and total points shown in the MatchStats tab;
  • Options to handle certain now-rare match formats, such as tiebreaks at 8-all (as in some 1970s Wimbledons) and matches with no tiebreaks at all.

If you’re already an experienced contributor, just click here to download the new version. Take a quick look through the Instructions tab, as I’ve highlighted the relevant changes.

If you’re new to the MCP, please take a look at my Quick Start Guide, after which you can give the new template a spin.

Handling Injuries and Absences With Tennis Elo

Italian translation at settesei.it

For the last year or so, every mention of my ATP and WTA Elo ratings has required some sort of caveat. Ratings don’t change while players are absent from the tour, so Serena Williams, Novak Djokovic, Andy Murray, Maria Sharapova, and Victoria Azarenka were all stuck at the top of their tour’s Elo rankings. When their layoffs started, they were among the best, and even a smattering of poor results (or a near season’s worth, in the case of Sharapova) isn’t enough to knock them too far down the list.

This is contrary to common sense, and it’s very different from how the official ATP and WTA rankings treat these players. Common sense says that returning players probably aren’t as good as they were before a long break. The official rankings are harsher, removing players entirely after a full year away from the tour. Serena probably isn’t the best player on tour right now (as Elo insisted during her time off), but she’s also much more of a threat than her WTA ranking of No. 454 implies. We must be able to do better.

Before we fix the Elo algorithm, let’s take a moment to consider what “better” means. Fans tend to get worked up about rankings and seedings, as if a number confers value on the player. The official rankings are, by design, backward-looking: They measure players based on their performance over the last 52 weeks, weighted by how the tour prioritizes events. (They are used in a forward-looking way, for tournament seedings, but the system is not designed to be predictive of future results.) In this way, the official rankings say, “this is how good she has played for the last year.” Whatever her ability or potential, Serena (along with Vika, Murray, and Djokovic) hasn’t posted many positive results this year, and her ranking reflects that.

Elo, on the other hand, is designed to be predictive. Out of necessity, it can only use past results, but it uses those results in a way to best estimate how well a player is competing right now–our best proxy for how someone will play tomorrow, or next week. Elo ratings–even the naive ones that said Serena and Novak are your current No. 1s–are considerably better at predicting match outcomes than are the official rankings. For my purposes, that’s the definition of “better”–ratings that offer more accurate forecasts and, by extension, the best approximation of each player’s level right now.

The time-off penalty

When players leave the tour for very long, they return–at least on average, and at least temporarily–at a lower level. I identified every layoff of eight weeks or longer in ATP history, taken by a player with an Elo rating of 1900 or above*. In their first matches back on tour, their pre-break Elo overestimated their chances of winning by about 25%. It varies a bit by the amount of time off: eight- to ten-week breaks resulted in an overestimation around 17%, while 30- to 52-week breaks meant Elo overestimated a player’s chances by nearly 50% upon return. There are exceptions to every rule, like Roger Federer at the 2017 Australian Open, and Rafael Nadal, who won 14 matches in a row after his two-month break this season, but in general, players are worse when they come back.

* I used the cutoff of 1900 because, below that level, some players are alternating between the ATP and Challenger tours. My Elo algorithm doesn’t include challenger results, so for lower-rated players, it’s not clear which timespans are breaks, and which are series of challenger events. Also, the eight-week threshold doesn’t count the offseason, so an eight-week layoff might really mean ~16 weeks between events, with the break including the offseason.

Translated into Elo terms, an eight-week break results in a drop of 100 Elo points, and a not-quite-one-year break, like Andy Murray’s current injury layoff, means a drop of 150 points. Making that adjustment results in an immediate improvement in Elo’s predictiveness for the first match after a layoff, and a small improvement in predictiveness for the first 20 matches after a break.

Incorporating uncertainty

Elo is designed to always provide a “best estimate”–when a player is new on tour, we give him a provisional rating of 1500, and then adjust the rating after each match, depending on the result, the quality of the opponent, and how many matches our player has contested. That provisional 1500 is a completely ignorant guess, so the first adjustment is a big one. Over time, the size of a player’s Elo adjustments goes down, because we learn more about him. If a player loses his first-ever match to Joao Sousa, the only information we have is that he’s probably not as good as Sousa, so we subtract a lot of points. If Alexander Zverev loses to Sousa after more than 150 career matches, including dozens of wins over superior players, we’ll still dock Zverev a few points, but not as many, because we know so much more about him.

But after a layoff, we are a bit less certain that what we knew about a player is still relevant. Djokovic a great example right now. If he lost six out of nine matches (as he did between the Australian Open fourth round and Madrid) without missing any time beforehand, we’d know it was a slump, but most of us would expect him to snap out of it. Elo would reduce his rating, but he’d remain near the top. Since he missed the second half of last season, however, we’re more skeptical–perhaps he’ll never return to his former level. Other cases are even more clear-cut, as when a player returns from injury without being fully healed.

Thus, after a layoff, it makes sense to alter how much we adjust a player’s Elo ratings. This isn’t a new idea–it’s the core concept behind Glicko, another chess rating system that expands on Elo. Over the years, I’ve tinkered with Glicko quite a bit, looking for improvements that apply to tennis, without much success. Changing the multiplier that determines rating adjustments (known as the k factor) doesn’t improve the predictiveness of tennis Elo on its own, but combined with the post-layoff penalties I described above, it helps a bit.

The nitty-gritty: After a layoff, I increase the multiplier by a factor of 1.5, and then gradually reduce it back to 1x over the next 20 matches. The flexible multiplier slightly improves the accuracy of Elo ratings for those 20 matches, though the difference is minor compared to the effect of the initial penalty.

No more caveats*

* I thought it would be funny to put an asterisk after “no more caveats.”

Post-layoff penalties and flexible multipliers end up bringing down the current Elo ratings of the players who are in the middle of long breaks or have recently come back from them, giving us ranking tables that come closer to what we expect–and should do a better job of predicting the outcome of upcoming matches. These changes to the algorithm also have minor effects on the ratings of other players, because everyone’s rating depends on the rating of all of his or her opponents. So Taro Daniel’s Elo bounce from defeating Djokovic in Indian Wells doesn’t look quite as good as it did before I implemented the penalty.

On the ATP side, the new algorithm knocks Djokovic down to 3rd in overall Elo, Murray to 6th, Jo-Wilfried Tsonga to 21st, and Stan Wawrinka to 24th. That’s still quite high for Novak considering what we’ve seen this year, but remember that the Elo algorithm only knows about his on-court performances: A six-month break followed by a half-dozen disappointing losses. The overall effect is about a 200-point drop from his pre-layoff level; the “problem” is that his Elo a year ago reflected how jaw-droppingly good he had recently been.

The WTA results match my intuition even better than I hoped. Serena falls to 7th, Sharapova to 18th, and Azarenka to 23rd. Because of the flexible multiplier, a few early wins for Williams will send her quickly back up the rankings. Like Djokovic, she rates so high in part because of her stratospheric Elo rating before her time off. For her part, Sharapova still rates higher by Elo than she does in the official rankings. Despite the penalty for her one-year drug suspension, the algorithm still treats her prior success as relevant, even if that relevance fades a bit more every week.

Elo is always an approximation, and given the wide range of causes that will sideline a player, not to mention the spectrum of strategies for returning to the tour, any rating/forecasting system is going to have a harder time with players in that situation. That said, these improvements give us Elo ratings that do a better job of representing the current level of players who have missed time, and they will allow us to make superior predictions about matches and tournaments involving those players.

Under the hood

If you’re interested in some technical details, keep reading.

Before making these adjustments, the Brier score for Elo-based predictions of all ATP matches since 1972 was about 0.20. For all matches that involved at least one player with an Elo of 1900 or better, it was 0.17. (Not only are 1900+ players better, their ratings tend to be based on more data, which at least partly explains why the predictions are better. The lower the Brier score, the better.)

For the population of about 500 “first matches” after layoffs for qualifying players, the Brier score before these changes was 0.192. After implementing the penalty, it improved to 0.173.

For the 2nd through 20th post-comeback matches, the Brier score for the original algorithm was 0.195. After adding the penalty, it was 0.191, and after making the multiplier flexible, it fell a bit more to 0.190. (Additional increases to the post-layoff multiplier had negative results, pushing the Brier score back to about 0.195 when the 2nd-match multiplier was 2x.) I realize that’s a tiny change, and it very possibly won’t hold up in the future. But in looking at various notable players over the course of their comebacks, that’s the option that generated results that looked the most intuitively accurate. Since my intuition matched the best Brier score (however miniscule the difference), it seems like the best option.

Finally, a note on players with multiple layoffs. If someone misses six months, plays a few matches, then misses another two months, it doesn’t seem right to apply the penalty twice. There aren’t a lot of instances to use for testing, but the limited sample confirms this. My solution: If the second layoff is within two years of the previous comeback, combine the length of the two layoffs (here: eight months), find the penalty for a break of that length, and then apply the difference between that penalty and the previous one. Usually, that results in second-layoff penalties of between 10 and 50 points.

Podcast Episode 25: Number Ones, Present and Future

Episode 25 of the Tennis Abstract Podcast, with Carl Bialik of the Thirty Love podcast, recaps the week’s events in Madrid, and starts with a dissection of the top clay-court contenders in the women’s game. We consider what it means for Simona Halep to be a “weak” No. 1, compare the present era to other periods in WTA, and evaluate the promise of some of Halep’s main competition, including Karolina Pliskova and Petra Kvitova.

On the men’s side, we look at the latest triumph for Alexander Zverev, the continuing clay-court prowess of Dominic Thiem (and a couple of other guys with one-handed backhands), and issue another set of optimistic forecasts for Juan Martin del Potro and Fabio Fognini.

Thanks for listening!

(Note: this week’s episode is about 1 hour, 12 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Update: Episode index, thanks to FBITennis:

Is Halep a weak #1? 1:04
Should Halep take more risks on her serve? 12:54
Ka. Pliskova’s ability to dominate for an extended period 17:19
Kvitova’s shot at #1 this year 20:27
Historical fluctuations in WTA #1s 23:14
Serena’s impending return 26:13
Can you still “play your way into” WTA grand slams? 30:42
Seeding Serena at Wimbledon 33:43
Adapting ELO to account for long layoffs (see link for more) 37:05
Sasha Zverev and Thiem on clay 45:00
Return of the One-Handed Backhand (on clay) 51:48
Shapovalov finds a second dimension (maybe) 55:46
Unusual Head-to-Heads on the ATP Tour 57:48
Favorites in Rome 1:00:53
Match Charting Project Milestones 1:07:54

The Last 156 Men’s Grand Slam Finals

I’m proud to report a big new milestone for the Match Charting Project! We’ve completed the set of men’s Grand Slam finals back to 1980, something that I’ve aspired to since the early days of the project, and a project that has drawn on a lot of effort from many contributors to the project, especially Edo, who is responsible for a huge part of this accomplishment.

Here’s the complete list of charted slam finals, with links to the shot-by-shot data for each match.

From 1980 to this year’s Australian Open, that’s 152 consecutive men’s finals. I went on a bit of a spree last week, which extended the set back to 1979 and upped the total to 156. We’ve got a few earlier slam finals in the database as well, though there’s a limit to how much more we’ll be able to achieve: Before the late 1970s, video quality and availability decreases sharply.

For researchers, as well as those interested in tennis history, this is valuable stuff, made even more useful by its completeness. With the exception of a handful of missing points here and there, the Match Charting Project now includes a wealth of data for the entirety of all of these matches: serve direction, shot types, strategic choices (like serve-and-volleying), and much more, all in a standard format.

It’s particularly satisfying the check off the last few items on a list. (In this case, the final missing pieces were 1987 Roland Garros and the 1981 Australian Open.) Even though 156 matches is a small fraction of the nearly 4,000 contests tracked as part of the MCP, the subset’s completeness means that we can study it without worrying about the non-random nature of video availability and fan interest. If you want to look into, for instance, how net play has changed at Wimbledon over the last four decades, we’ve got the entire run.

In that vein, we are working on several other noteworthy subsets: Masters 1000 finals, 2018 tour-level finals, meetings between members of the big four, and finals played by members of the big four, among others.

We’re getting close to the complete run of women’s slam finals, as well. We’re up to 137 of the 152 since 1980, and have them all from 1999-present. We haven’t been able to find video for the rest, most notably the 1998 US Open (Davenport-Hingis), 1994 Roland Garros (Graf-Pierce), 1994 US Open (Sanchez-Graf), and 1991 Australian Open (Seles-Novotna). The complete list is here, and the remainder date from 1980-86. If you can help us find any of these, please let me know!

As always, if you find this project interesting, please contribute. Our 2.3 million shots worth of detailed data didn’t appear by magic–we rely on volunteers to chart matches, and I hope you’ll join our ranks. Here’s why I think you should, and here’s how you can get started.