Forecasting Future Felix With ATP Aging Patterns

Italian translation at settesei.it

It’s been an exceptional six weeks for Felix Auger-Aliassime. He broke into the top 100 with a runner-up performance on clay in Rio de Janeiro, won two matches each at Sao Paulo and Indian Wells (including an upset of Stefanos Tsitsipas), and raced to a semi-final at the Miami Masters, the youngest player ever to make the final four of that event. Four months away from his 19th birthday, his ranking is up to 33rd in the world, and he has few points to defend until June.

Felix is the youngest man in the top 100, and he’s reaching milestones early enough to draw comparisons with some of the best young players in the sport’s history. Will he follow in the footsteps of past wunderkinds such as Rafael Nadal and Lleyton Hewitt? To answer that question, let’s take a look at typical ATP aging patterns, what they say about when players hit their peaks, and what they can show us about the fate of the best 18 year olds.

The standard curve

Last week, I looked at WTA aging curves and found that women tend to peak around age 23 or 24, an age that has not changed even as the sport has gotten older. I also discovered that there is a surprisingly modest gap–about 70 Elo points–between 18-year-old performance and a woman’s peak level. The men’s results are different.

To calculate the average ATP aging curve, I found over 700 players who were born between 1960 and 1989 and played at least 20 tour-level, tour qualifying, or challenger-level matches in each of five seasons. Overall, peak age was 25, though the difference from age 24 to 27 is only a few Elo points, so small as to be negligible.

As the tour has gotten older, the men’s peak age has also increased. Of the nearly 300 players born between 1980 and 1989, peak age is 26-27, with ages 28 and 29 also within 10 Elo points of the age 26-27 peak. Plenty of players are peaking at older ages, and many of those who aren’t are remaining close to their best levels into their late twenties. The peak age could be even higher still–a few of the players in the 1980-89 cohort turn 30 this year, and could conceivably still improve on their career bests.

The following graph shows the trajectory of the average player (with peak year-end Elo set to 1,850) born in the 1960s and the pattern of the average player born in the 1980s:

It’s a long ascent from the performance level at age 18 to the typical peak, especially for more recent players. There’s even a hefty bit of selection bias that should inflate the level of 18 year olds, since only about 10% of the players in the overall sample qualified for a year-end Elo rating when they were 18. The ones who did were, in general, the best of the bunch.

Felix forward

Through the Miami semi-final, Auger-Aliassime’s Elo rating is 1,848. The average player in the entire dataset who played at least 20 matches in their age-18 season went on to add another 281 Elo points to their rating between the end of their age-18 season and their peak. In the narrower, more recent cohort of 1980-89 births, the players with year-end ratings as 18 year olds improved their Elos by a whopping 369 points before reaching their peaks.

Adding either of those numbers to Felix’s current rating gives us quite the rosy forecast:

Cohort   Current  Increase  Proj. Peak  
1960-89     1848       281        2129  
1980-89     1848       369        2217

There’s a bit of slight of hand in how I’m doing this, since my study uses players’ year-end ratings, and I’m using Felix’s rating in April. However, there’s no natural law that says one artificial 12-month span is better than another, and Felix’s current age of 18.6 is roughly in the middle of the ages of the year-end 18-year-olds with whom I’m comparing him.

An Elo rating of 2,129 would be good enough for fourth place on the current list, behind only the big three. The rating of 2,217 is better than any of the big three can boast at the moment, and would be the fourth-best peak year-end rating among active players, again trailing only the big three. (And Andy Murray, if you consider him active.) Only 15 Open era players have managed year-end Elo peaks above 2,217.

No comparisons

It’s tough to say whether this method, of finding the typical difference between 18-year-old and peak Elo ratings, is adequate to handle the extremes. Some players peak earlier than average, and it stands to reason that the best young talents are more likely to do so. Boris Becker posted a whopping 2,212 Elo rating at the end of his age-18 season, which didn’t leave much room for improvement. He gained another 90 points before the end of his age-19 season, which was his career best.

Becker’s career path is not particularly helpful to our effort to forecast Felix’s, in part because the German was so unique, and also because his experience reflects such a different era. But even among less unique players, there are few useful comparables. No one born since 1987 managed a better age-18 Elo rating than Felix’s 1,848, and only a handful of active or recently-retired players even reached 1,750 by that age.

Lacking the data for a more precise approach, let’s repeat what I did for Bianca Andreescu last week, and see how the nearest 18-year-old comparisons fared. Of the players whose age-18 year-end Elos were closest to Felix’s 1,848, here are the 10 above him and the 10 below him on the list:

Player               BirthYr  18yo Elo  Incr  Peak Elo  
Stefan Edberg           1966      1916   350      2266  
John Mcenroe            1959      1912   496      2408  
Guillermo Coria         1982      1909   145      2055  
Pat Cash                1965      1907   151      2058  
G. Perez Roldan         1969      1884    41      1925  
Andy Murray             1987      1878   465      2343  
Roger Federer           1981      1871   487      2359  
Thomas Enqvist          1974      1865   216      2081  
Rafael Nadal            1986      1862   452      2314  
Jim Courier             1970      1849   283      2132  
…                                                       
Jimmy Brown             1965      1834     0      1834  
Andy Roddick            1982      1815   291      2106  
Aaron Krickstein        1967      1812   246      2058  
Yannick Noah            1960      1812   299      2112  
Fabrice Santoro         1972      1805    85      1890  
Andreas Vinciguerra     1981      1803    16      1819  
Novak Djokovic          1987      1792   645      2436  
Sergi Bruguera          1971      1790   265      2055  
Thomas Muster           1967      1788   329      2117  
Dominik Hrbaty          1978      1779   133      1913

The average increase among this group is 270 Elo points, close to the overall average for players who qualified for a year-end Elo rating at age 18. The youngest members of this list are encouraging: the big four, Andy Roddick, and Andreas Vinciguerra. Most promising youngsters would happily take a two-in-three shot at having a career at the level of the big four.

Perhaps the best comparison for Felix is a player who didn’t quite make that list, Alexander Zverev. The 21-year-old German posted a year-end Elo of 1,768 as an 18 year old, and already boosted that number by more than 300 points at the end of his 2018 campaign. Zverev is only an approximate comparison, he’s just a single data point, and we don’t know where he’ll end up, but his experience is a decade more recent than those of Novak Djokovic, Murray, and Nadal.

Forecasting the career performance of young tennis players is an inexact science, at best. Potential outcomes for Auger-Aliassime range from teenage flameout to double-digit major winner. Based on the limited information he’s given us so far, the latter seems within reach. What we know for sure is that he’s playing better tennis than any 18 year old we’ve seen in a decade. If that’s not reason for optimism, I don’t know what is.

Nick Kyrgios is More Predictable Than We Think

Italian translation at settesei.it

There is a persistent belief among tennis fans and commentators that some players are particularly inconsistent. For today’s purposes, I’m talking about match-to-match results, the players who have a knack for upsetting higher-ranked opponents but are also particularly susceptible to losses against weaker players. We have a range of words for this, like unpredictable, dangerous, tricky, and the preferred term for Nick Kyrgios: mercurial.

So far in 2019, Kyrgios has provided a perfect example of the inconsistent type. After early losses to Jeremy Chardy and Radu Albot, he bounced back to win last week’s ATP 500 in Acapulco, knocking out Rafael Nadal, Stan Wawrinka, John Isner, and Alexander Zverev. There’s no question that the Australian possesses more talent than his ranking would suggest. This is a guy who has yet to crack the top ten, but holds a .500 record in completed matches against the Big 3, a feat managed by no other active player (minimum 5 matches, excepting Nadal and Novak Djokovic themselves).

He sounds inconsistent. His results look unpredictable. But compared to the uncertainty that comes with every tennis match between highly-ranked professionals, how does he stack up? As my headline suggests, it’s not as clear-cut as it seems.

Measuring predictability

Consider the opposite type, a player who reliably beats lower-ranked opponents and usually loses against his betters. Roberto Bautista Agut has this type of reputation. As we’ll see, the numbers bear it out, notwithstanding his Doha upset of Djokovic a couple of months ago. If someone really is so predictable, that should show up in a comparison of his pre-match forecasts to his results. For a Bautista Agut type, the forecasts would be particularly accurate, while for a Kyrgios type, the forecasts would be much less reliable.

We already have a metric for this. Brier Score measures the accuracy of forecasts, considering not just how often predictions proved correct, but how close they came. For instance, after Kyrgios beat Zverev in Saturday’s Acapulco final, those prognosticators who gave the Aussie a 90% chance of winning were “more” correct than those who gave him a 60% shot. On the other hand, too much confidence runs the risk of a worse Brier Score–if you’re always giving tennis favorites a 90% chance of winning, you’ll often be wrong. Brier Score is the average of the squared difference between the pre-match forecast (e.g. 90%) and the result (1 or 0, depending if the pick was correct).

Brier Scores for ATP forecasting hover around the 0.2 mark. A lower Brier Score is better, representing less difference between prediction and results, so if you can come in much lower than 0.2, you should be making money betting on matches. If you’re much higher than 0.2, you might as well be flipping a coin. If we use random, 50/50 pre-match predictions, the resulting Brier Score is 0.25.

Brier-gios

If a player is truly unpredictable, the Brier Score for his matches should approach the 0.25 mark, and it should definitely exceed the tour-typical 0.2. To measure the reliability of pre-match forecasts for Kyrgios and other players, I used my surface-weighted Elo ratings for every completed tour-level main draw match since 2000 and generated percentage forecasts for each one. By this method, Zverev had a 67.4% probability of winning the Acapulco final.

So far in 2019, Kyrgios does look truly unpredictable. The Brier Score of his ten match results is 0.318, meaning that we’d have done better by simply flipping a coin to forecast the result of each of his matches. Even if we retroactively increase his chances of winning each match to account for the fact that he’s playing better than his Elo rating predicted, the Brier Score is 0.277, still worse than coin flips.

On the other hand, it’s just ten matches. Several other players have 2019 Brier Scores well over the 0.25 threshold, including Frances Tiafoe, Joao Sousa, Juan Ignacio Londero, and Felix Auger Aliassime. In a handful of tournaments, you’ll always get a few oddball results, either because of marked improvements (as is likely with Auger Aliassime) or extreme good or bad luck. Unless we’re willing to say that Sousa and Londero are remarkably unpredictable players, we shouldn’t draw the same conclusion based on Kyrgios’s last ten matches.

What you predict is what you get

The Brier Score for Elo-based forecasts of Kyrgios’s career matches at tour level is 0.219. That’s higher–and thus less predictable–than average, but not by that much. Of the 280 players with at least 100 tour-level matches this century, Kyrgios ranks 84th, more reliable than 30% of his peers. In 2017, his results were quite unpredictable, with a Brier Score of 0.244, but in 2015 and 2016 they generated a more pedestrian 0.210, and last year they looked downright predictable, at 0.177.

The Australian may be quite unpredictable in tactics, point-to-point performance, or on-court behavior, but his results just aren’t that unusual. The following table shows the 15 most unpredictable active players, as measured by Brier Score, along with Kyrgios, followed by the 15 most predictable active players:

Player                 Matches  Brier  
Lucas Pouille              189  0.247  
Andrey Rublev              106  0.245  
Benoit Paire               377  0.239  
Ivo Karlovic               650  0.239  
Stefanos Tsitsipas         100  0.232  
Karen Khachanov            154  0.231  
Peter Gojowczyk            102  0.231  
Federico Delbonis          225  0.227  
Marius Copil               108  0.227  
Damir Dzumhur              173  0.227  
Ernests Gulbis             420  0.226  
Pablo Cuevas               338  0.226  
Mischa Zverev              297  0.226  
Joao Sousa                 323  0.226  
Borna Coric                210  0.226  
...                                       
Nick Kyrgios               191  0.219  
...                                       
Matthew Ebden              171  0.188  
David Goffin               344  0.188  
Marin Cilic                684  0.186  
Richard Gasquet            770  0.183  
Tomas Berdych              911  0.182  
Milos Raonic               448  0.178  
David Ferrer              1048  0.177  
Jo Wilfried Tsonga         600  0.175  
Roberto Bautista Agut      384  0.172  
Kei Nishikori              517  0.167  
Juan Martin Del Potro      560  0.160  
Andy Murray                802  0.146  
Roger Federer             1350  0.121  
Novak Djokovic             951  0.117  
Rafael Nadal              1060  0.114 

Lucas Pouille’s results have been almost impossible to forecast. The Brier Score generated by his 2018 results was nearly 0.3, suggesting it would have been smarter to calculate a forecast and then bet against it! Ivo Karlovic also shows up among the less reliable players, though it’s not clear whether that’s due to his unusual game style. Isner, the only decent parallel we have, is as reliable as the tour in general, with a career Brier Score of 0.201. Reilly Opelka, the other towering ace machine in the ATP top 100, has defied the odds so far in 2019, but he hasn’t yet amassed enough data to draw any conclusions.

At the other end of the spectrum, the most reliable players are many of the best. That adds up: A dominant player not only wins most of the matches he should, but his performance also allows us to make more aggressive forecasts. Nadal often enters matches with a 90% or better probability of winning, and confident predictions like that–as long the player converts them into wins–are what generate the lowest Brier Scores.

Consistent consistency results

We all tend to read too much into unusual results. Kyrgios has given us plenty of those, and we’ve repaid the favor by making him out to be even more of a wild card than he is. A couple of weeks ago, I took on a similar question and found that ATPers don’t really “play their way in” to tournaments, earning better or worse results in different rounds. This isn’t quite the same issue, but it all comes back to similar truths: Existing forecasts are pretty good, there’s always going to be a lot of randomness in the results, and the stories we invent to account for the randomness don’t really explain much at all.

Kyrgios is an immensely interesting player–I joked in yesterday’s podcast that readers should prepare themselves for a ten-part series–and digging into his point-by-point stats could reveal characteristics that are unique among tour players. That is still true. But at the match level, the likelihood that his contests will end in upsets isn’t unique at all–even if he is the proud new owner of a sombrero that says otherwise.

Belinda Bencic Won a Historically Difficult Title, Just Not Last Week

Italian translation at settesei.it

Belinda Bencic is back among the WTA elites. Last week in Dubai, she won her first Premier-level title since 2015, knocking out four top-ten players in the process. They were hardly dominant victories, with all four going to deciding sets and two of the four culminating in final-set tiebreaks, but there is no question that the 21-year-old Swiss is once again a threat at the tour’s biggest events.

Her string of top-ten victories leaves us to wonder how her title stacks up against similar feats in the past. Most relevant is the path Bencic took to her last Premier title, the 2015 Canadian Open. Four years ago in Toronto, she defeated four members of the top six, including then-top-ranked Serena Williams in the semi-final and Simona Halep in the championship match. Even the two lower-ranked opponents she faced that week were dangerous players then ranked in the top 25, Eugenie Bouchard and Sabine Lisicki. Those two presented more serious challenges than Bencic’s first two matches last week against Lucie Hradecka and Stefanie Voegele.

Spoiler alert: Toronto was the tougher path. It wasn’t the most difficult of all time, but it’s in the conversation. Bencic’s Dubai title surely wasn’t easy, but it wasn’t quite as unusual as last weekend’s press made it out to be.

Quantifying path difficulty

This is something we’ve done before. I’ve written several articles comparing the quality of opposition faced in slams, particularly as it applies to the ATP’s big three. It’s more complicated to compare all WTA events, in part because there are so many different levels of tournament, and the categorizations have changed over the years. But we can wave some of that aside for today’s purposes.

Here’s the simple algorithm to measure the difficulty of a player’s path to a title:

  • Pick a standard Elo rating for the type of tournament won. (In this case, we’re using 1900 for hard-court wins. We’d use lower numbers for clay and grass, but it gets complicated, and it’s more practical for today’s purposes to focus solely on hard-court events.)
  • Find the surface-weighted Elos of each opponent she played in the tournament
  • For each opponent, calculate the odds using the standard Elo rating and the opponent’s Elo rating.
  • Calculate the difficulty for each match as one minus the odds in the previous step.
  • Sum the single-match difficulties.

In the grand slam exercises I’ve done in the past, I’ve taken a final step of normalizing the results so that an average major title is exactly 1.0. Here, the idea of ‘average’ is more nebulous, so we’ll leave our results un-normalized.

The average difficulty of a hard-court title (excluding majors and year-end championships) is about 1.8. Bencic’s 2015 Toronto run was 3.64, and her path last week was 3.01.

It’s hotter in Miami (and Indian Wells)

One of the variables that influences path difficulty is number of matches. Bencic played six last week (as she did at the 2015 Canadian Open), but the top eight seeds played only five. At Indian Wells and Miami, the top 32 seeds play up to six matches, but those might be expected to present more challenges than Bencic’s six in Dubai, since the round-of-64 opponent has already won a match.

Certainly it has turned out that way. Here are the top ten most difficult hard-court WTA title paths since 2000:

Year  Event          Winner             Matches  Difficulty  
2010  Miami          Kim Clijsters            6        3.80  
2011  Miami          Victoria Azarenka        6        3.78  
2007  Miami          Serena Williams          6        3.65  
2015  Canadian Open  Belinda Bencic           6        3.64  
2012  Indian Wells   Victoria Azarenka        6        3.59  
2018  Cincinnati     Kiki Bertens             6        3.54  
2000  Miami          Martina Hingis           6        3.46  
2002  Miami          Serena Williams          6        3.45  
2008  Miami          Serena Williams          6        3.37  
2013  Miami          Serena Williams          6        3.35

Seven of the ten are from Miami, an event with a grand-slam-like field. Indian Wells is similar, but featured a weaker draw for most of the 21st century because Serena and Venus Williams chose not to play there. Bencic’s Toronto run is one of only two in the top ten outside of the March sunshine swing. The other is Kiki Bertens’s path to last year’s Cincinnati title, in which she also defeated Halep, Petra Kvitova, and Elina Svitolina, albeit not quite in the same order than Bencic did last week.

Also hot in Dubai

I calculated title difficulty for about 600 hard-court champions going back to 2000. Bencic’s Dubai path doesn’t register among the very most challenging, but it still stands above most of the pack. Here are the next 25 toughest routes, including every path rated a 3.0 or above:

Year  Event         Winner              Matches  Difficulty  
2016  Wuhan         Petra Kvitova             6        3.32  
2000  Indian Wells  Lindsay Davenport         6        3.32  
2014  Beijing       Maria Sharapova           6        3.30  
2008  Olympics      Elena Dementieva          6        3.27  
2009  Indian Wells  Vera Zvonareva            6        3.27  
2007  Indian Wells  Daniela Hantuchova        6        3.23  
2002  Filderstadt   Kim Clijsters             5        3.23  
2013  Beijing       Serena Williams           6        3.21  
2018  Doha          Petra Kvitova             6        3.18  
2002  Los Angeles   Chanda Rubin              5        3.18  
2000  Los Angeles   Serena Williams           5        3.16  
2009  Miami         Victoria Azarenka         6        3.15  
2003  Miami         Serena Williams           6        3.13  
2002  Indian Wells  Daniela Hantuchova        6        3.10  
2018  Wuhan         Aryna Sabalenka           6        3.08  
2008  Indian Wells  Ana Ivanovic              6        3.08  
2012  Tokyo         Nadia Petrova             6        3.08  
2010  Sydney        Elena Dementieva          5        3.06  
2010  Indian Wells  Jelena Jankovic           6        3.03  
2000  Sydney        Venus Williams            6        3.02  
2000  Sydney        Amelie Mauresmo           4        3.02  
2019  Dubai         Belinda Bencic            6        3.01  
2009  Tokyo         Maria Sharapova           6        3.00  
2002  San Diego     Venus Williams            5        3.00  
2001  Sydney        Martina Hingis            4        2.99

There’s Belinda again, at 32nd overall. Historically, the February tournaments in the Gulf haven’t been the toughest on the calendar, at least compared with Indian Wells, Miami, and Sydney. Yet Kvitova took an even more difficult path to the title last year in Doha. (Dubai and Doha trade tournament levels each year. As a Premier 5, Doha was worth more points in 2018; Dubai took over the status and was worth more points in 2019.) She also plowed through four top-ten opponents, and she needed to beat 33rd-ranked Agnieszka Radwanska just to earn a place in the round of 16.

Strong but weaker

Again, Bencic’s Dubai title was an impressive feat. But as we’ve seen, it pales in comparison with her previous Premier title. I suppose she might have won anyway if faced with more difficult competition, but that pair of third-set tiebreaks suggests she was pushed to the limit as it was.

While the current WTA field is extremely deep, packed with very good players, the lack of one historically great superstar (or more!) shows up in the Elo ratings. Of the 35 champions shown in the two tables above, 12 had to beat a player with a surface-weighted rating of 2240 or higher, and 12 more needed to get past an opponent rated 2100 or above. Bencic’s toughest task last week was Halep, at 2054. While it isn’t easy to knock off several consecutive foes in the 2000 range, it’s not the same as including one victory over a superstar like Serena, Venus, Maria Sharapova, or Victoria Azarenka at her peak.

At the 2015 Canadian Open, Bencic counted Serena among the vanquished. Maybe in another four years, when the Swiss is due for her next odds-defying Premier title, she’ll face down a couple of new young superstars and earn a place at the top of this list.

Forecasting the Davis Cup Finals

It took more than a year to decide on a new format, but barely a week to make the draw. With 12 countries qualifying for the inaugural Davis Cup Finals in home-and-away ties earlier in month, the field of 18 is set. Using the ITF’s own system to rank countries, the 18 teams were divided into three “pots,” then assigned to the six round-robin groups that will kick off the tournament this November in Madrid.

The new format sounds complicated, but as round-robin events go, it’s easy enough to understand. Each of the six round-robin groups will send a winning team to the quarter-finals. Two second-place sides will also advance to the final eight, as determined by matches won, then sets won, and so on as necessary, until John Isner and Ivo Karlovic stand back to back to determine which one is really taller. From that point, it’s an eight-team knock-out tournament.

Here are the groups, as determined by yesterday’s draw, with seeded countries indicated:

  • Group A: France (1), Serbia, Japan
  • Group B: Croatia (2), Spain, Russia
  • Group C: Argentina (3), Germany, Chile
  • Group D: Belgium (4), Australia, Colombia
  • Group E: Great Britain (5), Kazakhstan, Netherlands
  • Group F: United States (6), Italy, Canada

The ITF ranking system considers the last four years of Davis Cup results, so Spain’s brief exit from the World Group makes the seedings a bit wonky. As it turns out, not only is it a top team (Croatia) who will have to deal with early ties against the Spaniards, the entire Group B trio constitutes a group of death. Russia would be an up-and-coming squad in any format, and it is clearly the most dangerous of the six lowest-ranked sides.

Madrid to Monte Carlo

Last week, I introduced a more accurate, predictive rating system for Davis Cup, involving surface-specific Elo ratings for the players likely to compete. Those rankings put Spain at the top, Croatia second, Russia fifth, and fourth-seeded Belgium 14th in the 18-team field.

Now that we have a draw, we can use those ratings to run Monte Carlo simulations of the entire Davis Cup carnival Finals. As in my post last week, I’m estimating that singles players have a 75% chance of playing at any given opportunity and doubles players have an 85% chance. Those are just guesses–there’s no data involved in this step. Surely some teams are more fragile than others, perhaps because their stars are particularly susceptible to injury or just uninterested in the next event. I’ve excluded Andy Murray, but for the moment, I’m keeping Novak Djokovic and Alexander Zverev in the mix.

(We’re using Elo ratings for each individual player, which means the simulation is telling us what would be likely to happen if it were played today. Things will change between now and November, even if every eligible player shows up. A proper forecast that takes the time lag into account would probably give a slight boost for younger teams [whose players will have nine months to mature] and a penalty for older ones [who are more likely to be hit by injury]. And overall, it would shift all of the championship probabilities a bit toward the mean.)

Here are the results of 100,000 simulations of the draw, with percentages given for each country’s chance of winning their group, then reaching each of the knock-out rounds:

Country  Group     QF     SF      F      W  
ESP      46.1%  59.1%  41.9%  30.3%  19.3%  
FRA      54.2%  66.6%  40.6%  25.1%  14.6%  
AUS      74.5%  84.4%  46.0%  23.8%  12.1%  
USA      53.0%  65.5%  36.8%  19.7%  10.4%  
CRO      31.0%  43.0%  27.2%  17.8%   9.8%  
GER      52.5%  67.9%  39.7%  17.6%   7.7%  
RUS      22.9%  33.1%  19.5%  12.0%   6.1%  
SRB      33.0%  47.9%  24.1%  12.6%   6.0%  
GBR      66.8%  78.7%  35.9%  12.5%   4.4%  
ARG      39.7%  56.6%  28.6%  10.4%   3.8%  
ITA      24.3%  35.9%  14.6%   5.5%   2.1%  
CAN      22.7%  33.4%  13.1%   4.9%   1.8%  
JPN      12.8%  19.5%   7.2%   2.8%   0.9%  
BEL      20.3%  32.0%   8.5%   2.1%   0.6%  
NED      21.7%  35.5%   8.6%   1.7%   0.3%  
CHI       7.8%  12.9%   3.4%   0.6%   0.1%  
KAZ      11.5%  19.0%   3.2%   0.5%   0.1%  
COL       5.1%   8.9%   1.2%   0.1%   0.0%

Spain is our clear favorite, despite their path through the group of death. Five teams have a better chance of winning their group and reaching the quarters than the Spaniards do, but their chances in the single-elimination rounds make the difference. At the other extreme, Australia seems to be the biggest beneficiary of draw luck. My rankings put them sixth, and they landed in a group with Belgium (the lowest-rated seed) and Colombia (the weakest team in the field). Their good fortune makes them the most likely country to reach the final four, even if Spain and France have a better chance of advancing to the championship tie.

Less randomness, more Spain

What if we run the simulation one step earlier in the process? That is to say, ignore yesterday’s draw and see what each country’s chances were before their round-robin assignments were determined. For this simulation, we’ll keep the ITF’s seeds, so Spain is still a floater. Here’s how it looked ahead of the ceremony:

Country  Group     QF     SF      F      W  
ESP      63.0%  75.9%  52.9%  35.0%  22.6%  
FRA      56.8%  70.8%  43.9%  25.7%  14.5%  
CRO      55.5%  69.4%  42.2%  25.1%  13.5%  
USA      51.3%  65.6%  38.5%  19.8%  10.0%  
AUS      48.3%  62.9%  34.8%  17.7%   8.5%  
RUS      40.6%  53.5%  30.2%  15.8%   7.9%  
SRB      42.9%  55.8%  28.3%  13.5%   5.9%  
GER      42.0%  55.7%  27.3%  12.5%   5.4%  
ARG      35.9%  49.1%  20.9%   7.9%   2.8%  
ITA      33.6%  47.1%  19.2%   7.2%   2.5%  
GBR      34.9%  48.3%  20.3%   7.5%   2.5%  
CAN      24.5%  35.5%  14.3%   5.3%   1.9%  
JPN      19.8%  29.4%  10.6%   3.6%   1.1%  
BEL      20.9%  30.4%   7.5%   1.8%   0.4%  
NED       9.5%  15.5%   3.5%   0.7%   0.1%  
CHI       7.9%  13.3%   2.6%   0.4%   0.1%  
KAZ       8.4%  14.1%   2.1%   0.3%   0.0%  
COL       4.3%   7.5%   1.1%   0.2%   0.0%

With the “group of death” out of the picture, Croatia jumps from fifth to third, swapping places with Australia. The defending champs lost the most from the draw, while Spain suffered a bit as well.

Elo in charge

Another variation is to ignore the ITF rankings and generate the entire draw based on my Elo-based ratings. In this case, the top six seeds would be Spain, Croatia, France, USA, Russia, and Australia, in that order. Argentina and Great Britain would fall to the middle group, and Belgium would drop to the bottom third. Here’s how that simulation looks:

Country  Group     QF     SF      F      W  
ESP      71.6%  82.8%  57.3%  38.0%  24.1%  
FRA      64.6%  77.6%  45.8%  26.7%  14.4%  
CRO      63.1%  76.3%  45.8%  25.6%  13.6%  
USA      59.7%  73.3%  41.1%  20.2%  10.2%  
RUS      58.6%  71.2%  37.0%  19.7%   9.5%  
AUS      57.7%  71.4%  37.7%  17.7%   8.8%  
SRB      37.1%  53.0%  26.1%  12.1%   5.3%  
GER      35.3%  52.3%  24.5%  10.9%   4.6%  
ARG      28.0%  44.2%  17.5%   6.4%   2.2%  
ITA      27.4%  43.6%  16.9%   6.2%   2.1%  
GBR      27.0%  43.1%  16.5%   6.0%   2.0%  
CAN      26.7%  41.8%  16.0%   5.8%   2.0%  
JPN      15.9%  23.6%   8.1%   2.6%   0.8%  
BEL       9.4%  15.1%   3.9%   0.9%   0.2%  
NED       6.5%  10.8%   2.3%   0.5%   0.1%  
CHI       5.3%   9.0%   1.8%   0.3%   0.1%  
KAZ       3.2%   5.8%   0.9%   0.1%   0.0%  
COL       3.1%   5.2%   0.8%   0.1%   0.0%

The big winners in the Elo scenario are the Russians, who gain a seed and avoid a round-robin encounter with either Spain or Croatia. Australia gets a seed as well, but the benefit of protection from the powerhouses isn’t as valuable as the luck than shone on the Aussies in the actual draw.

Imagine a world with no rankings

Finally, let’s see what happens if we ignore the rankings altogether. It would be unusual for the tournament to take such an approach, but if there’s ever a time to have a tennis event with no seedings, this is it. The existing rankings are far too dependent on years-old results, leaving young teams at a disadvantage. And my system, while more accurate, doesn’t quite feel appropriate either. It is based on individual player ratings, and this is a team event.

Whatever the likelihood of a ranking-free draw in the Davis Cup future, here’s what a simulation looks like with completely random assignment of nations into round-robin groups:

Country  Group     QF     SF      F      W  
ESP      62.8%  75.4%  52.4%  34.8%  22.5%  
FRA      54.8%  68.6%  42.6%  25.0%  13.9%  
CRO      53.4%  67.2%  41.0%  23.6%  13.0%  
USA      48.8%  62.9%  35.9%  19.1%   9.7%  
RUS      47.9%  61.0%  34.8%  18.5%   9.3%  
AUS      47.1%  61.1%  34.1%  17.6%   8.5%  
SRB      41.5%  54.3%  28.0%  13.5%   6.1%  
GER      40.3%  53.6%  26.7%  12.3%   5.3%  
ARG      31.9%  44.9%  18.8%   7.2%   2.6%  
ITA      31.5%  44.2%  18.6%   7.1%   2.5%  
GBR      30.7%  43.4%  17.6%   6.5%   2.3%  
CAN      30.4%  42.7%  17.4%   6.4%   2.2%  
JPN      25.9%  36.4%  13.5%   4.6%   1.4%  
BEL      17.2%  25.9%   7.2%   1.8%   0.4%  
NED      12.5%  20.0%   4.6%   0.9%   0.2%  
CHI      10.4%  16.9%   3.5%   0.6%   0.1%  
KAZ       7.0%  11.8%   1.9%   0.3%   0.0%  
COL       5.9%   9.7%   1.5%   0.2%   0.0%

Round-robin formats do a decent job of surfacing the best teams, so the fully random approach doesn’t give us wildly different results than the seeded simulations. The main effect of the no-seed version is to give the weakest sides a slightly better chance at advancing past the group stage, since there is a better chance for them to avoid strong round-robin competition.

Madrid or Maldives redux

Some top players are likely to skip the event. Zverev has said he’ll be in the Maldives, and Djokovic has hinted he may miss the tournament as well. The new three-rubber format means that teams will suffer a bit less from the absence of a singles star, assuming he also isn’t one of the best doubles options as well. Still, both Germany and Serbia would much rather head to the party with a top-three singles player on their side.

Here are the results of the intial simulation–based on the actual draw–but without Djokovic or Zverev:

Country  Group     QF     SF      F      W  
ESP      46.5%  59.5%  44.0%  33.2%  21.3%  
FRA      68.2%  79.3%  49.6%  30.6%  17.8%  
AUS      74.3%  84.5%  46.1%  24.2%  12.6%  
USA      53.4%  66.2%  37.5%  20.4%  10.8%  
CRO      30.3%  42.5%  28.4%  19.6%  10.8%  
RUS      23.2%  33.6%  21.1%  13.8%   7.0%  
GBR      67.0%  79.0%  40.9%  14.6%   5.2%  
ARG      52.1%  66.9%  35.5%  12.9%   4.9%  
GER      36.4%  52.3%  23.3%   7.2%   2.2%  
ITA      24.2%  35.9%  14.5%   5.7%   2.2%  
CAN      22.4%  33.2%  13.4%   5.2%   2.0%  
JPN      19.4%  31.7%  11.5%   4.8%   1.6%  
BEL      20.5%  32.4%   8.6%   2.3%   0.6%  
SRB      12.4%  21.1%   6.0%   1.9%   0.5%  
NED      21.6%  35.5%   9.8%   2.0%   0.4%  
CHI      11.4%  18.5%   4.9%   0.9%   0.2%  
KAZ      11.3%  19.1%   3.8%   0.5%   0.1%  
COL       5.2%   9.0%   1.2%   0.2%   0.0%

Germany’s chances of winning the inaugural Pique Cup would fall from 7.7% to 2.2%, and Serbia’s odds drop from 6.0% to 0.5%. Argentina and France, the seeded teams sharing groups with Germany and Serbia, respectively, would be the biggest gainers from such high-profile absences.

Anybody’s game

I’ve been skeptical of the new Davis Cup, and while I remain unconvinced that it’s an improvement, I find myself getting excited for the weeklong tennis hootenanny in Madrid. These simulations were even more encouraging. As always, the ranking and seeding isn’t the way I’d do it, but in this format, the differences are minimal. The event format will give us a chance to see plenty of tennis from every qualifying nation, and the high level of competition from most of these countries ensures that most teams have a shot at going all the way.

Top Seed Upsets in ATP 250s

Italian translation at settesei.it

In a typical week, no one would notice if Fabio Fognini, Karen Khachanov, and Lucas Pouille combined to go 0-3. This week is different, as those three men held the top seeds at the ATP events in Cordoba, Sofia, and Montpellier. After their first-round byes, each of them lost in the second round, to Aljaz Bedene, Matteo Berrettini, and Marcos Baghdatis, respectively. At least two of the top seeds pushed their opponents to three sets, while Fognini lasted only 71 minutes.

This is not the first time a trio of number one seeds have suffered first-match upsets in the same week. Amazingly, it’s not even the first such occurrence in this very week on the calendar. Two years ago, when the South American event was played in Quito, the results were the same: top seeds Marin Cilic, Ivo Karlovic, and Dominic Thiem all failed to win a match. Thiem’s vanquisher, Nikoloz Basilashvili, even extended the streak the following week, heading to Memphis and handing Karlovic his second straight second-round ouster.

Predictable upsets?

Focusing on these losses, it’s natural to wonder whether top seeds are particularly fragile in this sort of tournament. There’s certainly a logic to it. The number one seed at an ATP 250 is usually ranked in the top 20, and is the sort of player who might have considered taking the week off. He knows that more ranking points are available at slams and Masters, so winning a smaller event isn’t his highest priority. His opponent, on the other hand, is competing every chance he gets, and the points on offer at a smaller event could make a big difference in his standing. Further, he has already played–and won–his first-round match, so he might be performing better than usual, or the conditions might suit him particularly well.

Let’s put it to the test. Since 2010, not counting this week’s carnage, I found 267 non-Masters events at which a top seed got a first-round bye and completed his second-round match. (Additionally, there have been three retirements and one withdrawal; only one of those resulted in a loss for the top seed.) The number one seeds had a median rank of 10, and the underdogs had a median rank of 89. Based on my surface-weighted Elo ratings at the time of each match, the favorites should have won 81.5% of the time. That’s better than this week’s trio of top-seeded losers, who were 64% (Fognini), 80% (Khachanov), and 69% (Pouille) favorites.

As it happened, the unseeded challengers were more successful than expected. The favorites won only 76.8% of those matches–a rate low enough that there is only a 3% probability it is due to chance alone. It’s not an overwhelming effect–certainly not enough that we should have predicted this week’s results–but it seems that a few of the top seeds are showing up unmotivated and a handful of the underdogs are playing better than expected.

Riding the wave

What about the underdog winners? Once they’ve defeated the top seed, how many capitalize on the opportunity? Berrettini came back to beat Fernando Verdasco in his quarter-final match today, while Baghdatis and Bedene play later. My forecasts believe that, of the three, Bedene has the best chance of claiming a title, though still less than a one-in-five shot at doing so.

In our subset of 267 matches, the underdog won 66 of them. More than half the time, though, that was the end of the run. 38 of the 66 (58%) fell in the quarter-finals. Another 17 lost in the semis. Whatever works so well for these underdogs in the second round disappears afterward. In the 105 matches contested by these 66 men in the quarter-finals and beyond, Elo thinks they should have won 44.9% of them. Instead, they managed only 42.3%.

There’s still a bit of hope. Five men knocked out the top seed in the second round and went on to win the entire tournament. One of those was a challenger we’ve already mentioned: Estrella, who knocked out Karlovic and went on to hoist the trophy in Quito two years ago. Maybe there’s some magic in week six. This week’s trio of underdogs would surely love to think so.

Picking Favorites With Better Davis Cup Rankings

Yesterday, the ITF announced the seedings for the first new-look Davis Cup Finals, to be held in Madrid this November. The 18-country field was completed by the 12 home-and-way ties contested last weekend. Those 12 winners will join France, Croatia, Spain, and USA (last year’s semi-finalists) along with the two wild cards, recent champions Argentina and Great Britain.

The six nations who skipped the qualifying round will make up five of the top six seeds. (Spain is 7th, while Belgium, who had to qualify, is 4th.) The preliminary round of the November event will feature six round-robin groups of three, each consisting of one top-six seed, a second country ranked 7-12, and a third ranked 13-18. Seeding really matters, as a top position (deserved or not!) guarantees that a side will avoid dangerous opponents like last year’s finalists France and Croatia. Even the difference between 12 and 13 could prove decisive, as a 7-through-12 spot ensures that a nation will steer clear of the always-strong Spaniards, who are seeded 7th.

The seeds are based on the Davis Cup’s ranking system, which relies entirely on previous Davis Cup results. While the formula is long-winded, the concept is simple: A country gets more points for advancing further each season, and recent years are worth the most. The last four years of competition are taken into consideration. It’s not how I would do it, but the results aren’t bad. Four or five of the top six seeds will field strong sides, and one of the exceptions–Great Britain–would have done so had Andy Murray’s hip cooperated. Spain is obviously misranked, but given the limitations of the Davis Cup ranking system, it’s understandable, as the 2011 champions spent 2015 and 2016 languishing outside the World Group.

We can do better

The Davis Cup rankings have several flaws. First, they rely heavily on a lot of old results. If we’re interested in how teams will compete in November, it doesn’t matter how well a side fared three or four years ago, especially if some of their best players are no longer in the mix. Second, they don’t reflect the change in format. Until last year, doubles represented one rubber in a best-of-five-match tie. A good doubles pair helped, but it wasn’t particularly necessary. Now, there are only two singles matches alongside the doubles rubber. The quality of a nation’s doubles team is more important than it used to be.

Let’s see what happens to the rankings when we generate a more forward-looking rating system. Using singles and doubles Elo, I’m going to make a few assumptions:

  • Each country’s top two singles players have a 75% chance of participating (due to the possibility of injury, fatigue, or indifference), and if either one doesn’t take part, the country’s third-best player will replace him.
  • Same idea for doubles, but the top two doubles players have an 85% chance of showing up, to be replaced by the third-best doubles player if necessary.
  • The three matches are equally important. (This isn’t technically true–the third match is likely to be necessary less than half the time, though when it does decide the tie, it is twice as important as the other two matches.)
  • Andy Murray won’t play.

Those assumptions allow us to combine the singles and doubles Elo ratings of the best players of each nation. The result is a weighted rating for each side, one that has a lot of bones to pick with the official Davis Cup rankings.

Forward-looking rankings

The following table shows the 18 countries at the Davis Cup finals along with the 12 losing qualifiers. For each team, I’ve listed their Davis Cup ranking, and their finals seed (if applicable). To demonstrate my results, I’ve shown each nation’s weighted Elo rank and rating and their hard-court Elo rank and rating. The table is sorted by hard-court Elo:

Country  DC Rank  Seed  Elo Rank   Elo  sElo Rank  sElo  
ESP            7     7         1  1936          1  1891  
CRO            2     2         2  1898          2  1849  
FRA            1     1         3  1880          3  1845  
USA            6     6         4  1876          4  1835  
RUS           21    17         7  1855          5  1827  
AUS            9     9         5  1857          6  1820  
SRB            8     8         8  1849          7  1808  
GER           11    11         6  1855          8  1799  
AUT           16              10  1800          9  1766  
ARG            3     3         9  1803         10  1755  
                                                         
Country  DC Rank  Seed  Elo Rank   Elo  sElo Rank  sElo  
GBR            5     5        11  1796         11  1750  
SUI           24              14  1763         12  1749  
ITA           10    10        12  1780         13  1745  
CAN           14    13        13  1777         14  1744  
JPN           17    14        15  1735         15  1719  
BEL            4     4        17  1688         16  1673  
CZE           13              16  1712         17  1661  
NED           19    16        18  1685         18  1643  
BRA           28              20  1659         19  1638  
IND           20              21  1652         20  1621  
                                                         
Country  DC Rank  Seed  Elo Rank   Elo  sElo Rank  sElo  
SVK           29              22  1645         21  1617  
CHI           22    18        19  1682         22  1609  
KAZ           12    12        26  1582         23  1574  
COL           18    15        24  1597         24  1551  
SWE           15              27  1570         25  1542  
BIH           27              28  1552         26  1540  
POR           26              23  1610         27  1535  
HUN           23              25  1583         28  1533  
UZB           25              29  1491         29  1489  
CHN           30              30  1468         30  1465

Spain is the comfortable favorite, regardless of whether we look at overall Elo or hard-court Elo. When the draw is conducted, we’ll see which top-six seed is unlucky enough to end up with the Spaniards in their group, and whether the hosts will remain the favorite.

The biggest mismatch between the Davis Cup rankings and my Elo-based approach is in our assessment of the Russian squad. Daniil Medvedev is up to sixth in my singles Elo ratings, with Karen Khachanov at 10th. Those ratings might be a little aggressive, but as it stands, Russia is the only player with two top-ten Elo singles players. Spain is close, with Rafael Nadal ranked 2nd and Roberto Bautista Agut 11th, and the hosts have the additional advantage of a deep reservoir of doubles talent from which to choose.

In the opposite direction, my rankings do not forecast good things for the Belgians. David Goffin has fallen out of the Elo top 20, and there are no superstar doubles players to pick up the slack. In a just world, Spain and Belgium will land in the same round-robin group–preferably one without the Russians as well.

Madrid or Maldives

The results I’ve shown assume that every top singles player has the same chance of participating. That’s certainly not the case, with high-profile stars like Alexander Zverev telling the press that they’ll be spending the week on holiday in the Maldives. Some teams are heavily dependent on one singles player who could make or break their chances with a decision or an injury.

As it stands, Germany is 8th in the surface-weighted Elo. If we take Zverev entirely out of the mix, they drop to a tie for 14th with Japan. It’s something the German side would prefer to avoid, but it’s not catastrophic, partly because the Germans were never among the favorites, and partly because Zverev could play only one singles rubber per tie and the doubles replacements are competent.

Even more reliant on a single player is the Serbian side, which qualified last weekend without the help of their most dangerous threat, Novak Djokovic. With Djokovic, the Serbs rank 7th–a case where my surface Elo ratings almost agree with the official rankings. But without the 15-time major winner, the Serbs fall down to a tie with Belgium in 16th place. While the Serbs are unlikely to take home the trophy regardless, Novak would make a huge difference.

The draw will take place next Thursday. We’ll check back then to see which sides have the best forecasts, nine months out from the showdown in Madrid.

Novak Djokovic and the Narrowing Slam Race

Italian translation at settesei.it

It doesn’t take a statistician, or even a spreadsheet, to recognize that the 2019 Australian Open wasn’t Novak Djokovic’s most difficult path to a major title. We can debate whether the straight-set win over Rafael Nadal in the final was due to Djokovic’s utter dominance or a subpar performance from (a possibly still recovering) Rafa. But there’s more to a grand slam title than the final, and the only top-18 opponent Novak faced in the first six rounds was Kei Nishikori, who retired after 52 minutes.

On the traditional grand slam leaderboard, quality of competition doesn’t matter. Roger Federer has 20, Nadal has 17, and now Djokovic has 15. As I’ve written before, the race is closer than that, since Nadal’s and Djokovic’s opponents have, on average, been stronger than Federer’s. My metric for “adjusted slams” estimates the likelihood that a typical major titlist would defeat the specific seven opponents that a player faced, based on their surface-weighted Elo at the time of the match. (I’ve also used this approach for Masters titles.) The explanation is a mouthful, but the underlying idea is simple: Some majors represent greater achievements than others, both because some eras offer stiffer competition and because some draws are particularly daunting.

A slam title against an average level of competition is worth exactly 1. Tougher paths are worth more than 1, and easier draws are worth less. Here is the current leaderboard, with each player’s raw tally, average difficulty rating of their titles, and adjusted total:

Player          Slams  Avg Diff  Adj Slams  
Roger Federer      20      0.88       17.7  
Rafael Nadal       17      1.01       17.1  
Novak Djokovic     15      1.11       16.6 

(The numbers in this post do not all precisely agree with those I’ve published in the past, because I’ve improved the accuracy of my Elo-based rating system. All three of the players have seen their adjusted slam totals decrease, because the improved Elo algorithm eliminates some of the Elo “inflation” that overvalued recent achievements.)

These three guys have often had to go through each other, but Djokovic has had the toughest paths of all. The average difficulty of his first 12 majors was 1.2, higher than all but three of Rafa’s titles, one of Roger’s, and two of those won by Pete Sampras. Only recently has he been able to boost his total without quite so much of a challenge. His Australian Open title was worth 0.84 majors, only the fourth of his titles against a below-average set of opponents. It was, however, tougher than Wimbledon or the US Open last year, which were worth 0.77 and 0.65, respectively.

It’s unlikely, of course, that the current leaderboard–adjusted or otherwise–will be the final reckoning among these three men. But on the adjusted list, they will probably remain tightly packed. Because the rest of the pack has weakened, with Andy Murray and Stan Wawrinka no longer regular features of the second week, major titles aren’t what they used to be. Early in the decade, it wasn’t uncommon for a player to beat multiple members of the big four en route to a title and add at least 1.2 to his adjusted tally.

In 2018, slam difficulty was barely half of that recent peak level:

Year    Avg Diff  
2002        0.73  
2003        0.65  
2004        0.82  
2005        0.95  
2006        0.77  
2007        0.93  
2008        1.05  
2009        1.00  
2010        0.95  
2011        1.19  
2012        1.23  
2013        1.22  
2014        1.28  
2015        1.12  
2016        1.27  
2017        0.91  
2018        0.69

This could all change, especially if Djokovic wins a Roland Garros title by upsetting Nadal. (Nothing generates high competition-adjusted numbers like beating Nadal on clay.) But it’s more likely that these three men will have to keep incrementing their totals by 0.6s and 0.7s. While that could be enough to put Rafa or Novak on top by the end of the 2019, it won’t give anyone a commanding lead. It’s a good thing that there’s a lot more to the GOAT debate than slam totals, because slam totals–when properly adjusted for the difficulty of achieving them–make it awfully hard to pick a winner.

Identifying Underrated Players With Minor League Elo

I’ve just revised my published Elo ratings (men, women) to better reflect the performance of players who mostly compete at the (men’s) ATP Challenger and (women’s) ITF levels. Previously, my Elo ratings used only tour-level main-draw matches. For top players, it makes very little difference–not only do Novak Djokovic and Simona Halep play no matches at the lower levels, they rarely encounter opponents who spend much time there. But for the second tier of players, the effect can be substantial.

The Elo system rates players according to the quality of their opponents. Beat a good player with a high rating, and your own rating will jump by a healthy margin. Beat a weakling, and your rating will inch up a tiny bit. Essentially, Elo looks at each result and asks, “Based on this new result, how much do we need to adjust our earlier rating?” When Bianca Andreescu upset Caroline Wozniacki in Auckland last week, the system responded by upping Andreescu’s rating by quite a bit, and by penalizing Wozniacki more than for the typical loss. After a more predictable result, like Djokovic’s defeat of Damir Dzumhur, ratings barely move.

It’s important to understand the basic mechanics of the system, but the main takeaway for most fans is that Elo just works. The algorithm generates more accurate player ratings (and resulting match forecasts) than the official ATP and WTA rankings, among other attempts to rank players. Now, you can see Elo rankings for a much wider range of players.

Of of my main uses of Elo ratings is identifying players whose official rankings haven’t caught up to reality. For instance, a few months ago I noted when Daniil Medvedev moved into the Elo top ten, even though he has yet to crack that threshold on the official list. Most players who reach the top ten on the Elo table eventually do the same in the ATP or WTA rankings. Another two current examples are Aryna Sabalenka and Ashleigh Barty, considered by Elo to be two of the top three women on tour right now, even though neither is in the top ten of the WTA rankings. That may be too aggressive, and the margins at the top of the women’s list are tiny right now, but it is a clear signal that these women’s results bear watching. (We talked about this on the most recent Tennis Abstract podcast.)

Now that we have unified Elo lists that cover more players, let’s dig deeper. For each tour, let’s find the players current outside the official top 100s who are rated the highest by the more sophisticated formula. First, the ATP:

Player                  ATP Rank  Elo Rank  
David Ferrer                 124        36  
Thanasi Kokkinakis           145        62  
Miomir Kecmanovic            126        66  
Jack Sock                    105        77  
Reilly Opelka                102        84  
Ricardas Berankis            107        86  
Marcos Baghdatis             122        87  
Gilles Muller                137        88  
Daniel Evans                 190        89  
Viktor Troicki               201        90  
Horacio Zeballos             182        92  
Jared Donaldson              115        94  
Mikael Ymer                  196        95  
Egor Gerasimov               157       100  
Lloyd Harris                 119       102  
Tommy Paul                   195       104  
Guillermo Garcia Lopez       101       106  
Felix Auger Aliassime        106       108  
Alexei Popyrin               149       109  
Dudi Sela                    240       114

One thing that pops out from the list is the number of veterans. Elo ratings are “stickier” than ATP rankings, since the official system works with only 52 weeks worth of results. Elo ratings make constant adjustments, but quality performances–even when they are more than 52 weeks old–continue to affect current ratings for some time. David Ferrer has had a hard time staying healthy enough to compete at his former level, but according to Elo, he remains fairly dangerous when he is able to take the court.

Fortunately the list isn’t all veterans. Elo suggests that younger players such as Thanasi Kokkinakis, Miomir Kecmanovic, Mikael Ymer, and Tommy Paul are better than their current rankings indicate.

The WTA list is even more laden with veterans, players who are still competing at a high level, if not as frequently as they used to:

Player                WTA Rank  Elo Rank  
Lucie Safarova             105        39  
Coco Vandeweghe            100        40  
Shuai Peng                 129        43  
Svetlana Kuznetsova        106        50  
Sara Errani                114        52  
Varvara Lepchenko          134        80  
Laura Siegemund            110        84  
Kristyna Pliskova          101        96  
Anna Kalinskaya            167        97  
Viktorija Golubic          104        98  
Ivana Jorovic              117        99  
Marie Bouzkova             120       103  
Kateryna Bondarenko        140       104  
Sachia Vickery             123       105  
Veronika Kudermetova       111       107  
Sabine Lisicki             198       109  
Vitalia Diatchenko         131       112  
Yanina Wickmayer           126       113  
Nao Hibino                 115       114  
Danielle Lao               169       115

Part of the reason why so few prospects appear on this list is because of my decision to exclude ITF $25Ks. For example, up-and-coming 18-year-old Kaja Juvan, who knocked out Yanina Wickmayer in Australian Open qualifying today, hasn’t played nearly enough matches at higher levels to appear on my Elo list. But last year, she was 29-7 at ITF $25Ks, and won her last ten matches at that level.

Another issue is that the most promising women tend to climb into the top 100 more quickly. Another 18-year-old, Dayana Yastremska, rocketed up the rankings with a tour-level title in Hong Kong last fall. She sits at No. 59 on the WTA table, but after 13 top-100 wins in 2018, Elo is even more optimistic, placing her at No. 27, just ahead of Maria Sharapova and Venus Williams.

I’ll continue to update these expanded Elo ratings weekly and use them to generate forecasts for every tour-level and Challenger event. Enjoy!

Daniil Medvedev’s Leading Elo Indicator

Italian translation at settesei.it

It is shaping up to be a breakthrough season for 22-year-old Russian Daniil Medvedev. His Tokyo title two weeks ago was his first at the ATP 500 level and his third on the season, after earlier triumphs in Sydney and Winston-Salem. The run in Japan was a particularly notable step, since he knocked out three top-20 players along the way. He had only four top-20 victories in the entire season leading up to Tokyo, and two of those were against the slumping Jack Sock.

His ATP ranking is rising alongside his results. The Winston-Salem title moved him into the top 40, and the Tokyo trophy resulted in a leap to No. 22. After a first-round win in Shanghai last week, Medvedev crept to his current career-high of No. 21. With a couple of wins in Moscow this week, he could overtake Milos Raonic and reach the top 20.

The improvement on the ATP ranking table is nothing next to the Russian’s race to the top of the Elo list. Last Monday, with the Japanese title in the books, Medevdev rose to No. 8 on my men’s Elo ranking. Since then, he has dropped two places but remains in the top ten, ahead of Marin Cilic, Kevin Anderson, and a host of others who outrank him on the official ATP list.

Given the discrepancy, what do we believe? Is Medvedev inside the top 10 or outside the top 20? Is Elo a leading indicator–that is to say, an early-warning signal for future ATP ranking milestones–or a misleading one? Elo is designed to be forward-looking, tuned to forecast upcoming match outcomes and weighting wins and losses based on the quality of the opponent. The official rankings explicitly consider a year’s worth of results, with no adjustments for quality of competition. In theory, Elo should be the better of the two measures for predicting longer-term results, but that assumes the algorithm works well, and that it doesn’t overreact to short-term successes. Let’s take a look at past differences between the two systems and see what the future might hold for the 22-year-old.

Precedents

Since 1988, 102 men have debuted in the ATP top ten. A slightly larger number, 113, have shown up in the top ten of my Elo ratings. There’s a very substantial overlap between the two, with 94 names appearing in both categories. Thus, 8 players have reached the ATP top ten without clearing the Elo threshold, while 19 have rated a spot in the Elo top ten without convincing the ATP computer to agree.

Here are the eight ATP top-tenners whose Elos have never merited the same status:

Player               ATP Top Ten Debut  ATP Top Ten Weeks  
Jonas Svensson                19910325                  5  
Nicolas Massu                 20040913                  2  
Radek Stepanek                20060710                 12  
Jurgen Melzer                 20110131                 14  
Juan Monaco                   20120723                  8  
Kevin Anderson                20151012                 31  
Pablo Carreno Busta           20170911                 17  
Lucas Pouille                 20180319                  1

A few of these players could still make progress on the Elo list, especially Kevin Anderson, who is currently 11th, a miniscule five points behind Medvedev.

Here is the longer list of Elo top-ten players without any weeks in the official top ten:

Player                 Elo Top Ten Debut  Elo Top Ten Weeks  
Carl Uwe Steeb                1989/05/22                  3  
Andrei Cherkasov              1990/12/11                  1  
Goran Prpic                   1991/05/20                  1  
David Wheaton                 1991/07/08                  9  
Jerome Golmard                1999/05/03                  2  
Dominik Hrbaty                2001/01/15                  2  
Jan Michael Gambill           2001/04/06                  6  
Nicolas Escude                2002/02/25                  4  
Younes El Aynaoui             2002/05/20                  2  
Paul Henri Mathieu            2002/10/14                  8  

Player                 Elo Top Ten Debut  Elo Top Ten Weeks
Agustin Calleri               2003/05/19                  2  
Taylor Dent                   2003/10/06                 10  
Andrei Pavel                  2004/05/10                  2  
Robby Ginepri                 2005/10/24                  1  
Ivo Karlovic                  2007/11/12                  3  
Roberto Bautista Agut         2016/02/22                  1  
Nick Kyrgios                  2016/03/04                 62  
Stefanos Tsitsipas            2018/08/13                  3  
Daniil Medvedev               2018/10/08                  2

* I define ‘weeks’ a little differently for Elo ratings, as ratings are generated only for those weeks with an ATP-level tournament or Davis Cup tie.

Most of these guys came very close to cracking the ATP top ten. For example, David Wheaton’s peak ranking was No. 12. With the exception of Nick Kyrgios, no one spent more than ten weeks in the Elo top ten without eventually reaching the same standard according to the ATP formula. This list shows that it’s possible to have a brief peak that cracks the Elo top ten but doesn’t last long enough to reflect the kind of success that the official ranking system was designed to reward. About one in six players with a top-ten Elo rating never reached the ATP top ten, though as we can see, the odds of remaining an Elo-only star fall quickly with each additional week in the top ten.

Kyrgios is a perfect example of the differences between the two approaches to player ranking. The Australian has recorded a number of high-profile upsets, which are the fastest way to climb the Elo list. But knocking out the second-ranked player in the world, as Kyrgios did to Novak Djokovic at Indian Wells last year, doesn’t have much impact on the ATP ranking when it happens in the fourth round. Usually, a player who can oust the elites will start piling up wins in a form that the official computer will appreciate. But Kyrgios, unlike just about every player in history with his talent, hasn’t done that.

In short, Elo will always elevate a few players to top-ten status even if they’ll never deserve the same treatment from the ATP formula. It’s too early to say whether Medvedev fits that mold. But where Elo really excels is identifying top players before the ATP computer does. Of the 94 cases since 1988 in which a man debuted in both top tens, Elo was first to anoint the player a top-tenner in 76 of them–better than 80%. The official rankings were first 10 times, and the two systems tied in the other eight instances. On average, players reached the Elo top ten about 32 weeks before the ATP top ten.

Here are the 11 most extreme gaps in which Elo got there first, along with the top-ten debuts of the Big Four:

Player               ATP Debut   Elo Debut  Week Diff  
Mariano Puerta      2005/07/25  2000/06/12        267  
Marc Rosset         1995/07/10  1990/11/05        244  
Fernando Gonzalez   2006/04/24  2002/10/07        185  
Guillermo Canas     2005/05/09  2002/08/05        144  
Mikhail Youzhny     2007/08/13  2004/11/15        143  
Gaston Gaudio       2004/06/07  2002/04/29        110  
Richard Gasquet     2007/07/09  2005/06/20        107  
Tomas Berdych       2006/10/23  2004/10/11        106  
Robin Soderling     2009/10/19  2007/10/08        106  
Mark Philippoussis  1999/03/29  1997/03/24        105  
Jack Sock           2017/11/06  2016/01/18         94  
                                                       
Player               ATP Debut   Elo Debut  Week Diff  
Roger Federer       2002/05/20  2001/02/19         65  
Andy Murray         2007/04/16  2006/08/21         34  
Novak Djokovic      2007/03/19  2006/07/31         33  
Rafael Nadal        2005/04/25  2005/02/21          9

And in case you’re curious, the ten cases in which the ATP computer beat Elo to the punch:

Player              ATP Debut   Elo Debut  Week Diff  
Stan Wawrinka      2008/05/12  2010/10/25        128  
David Ferrer       2006/01/30  2007/05/28         69  
Janko Tipsarevic   2011/11/14  2012/05/13         26  
Rainer Schuettler  2003/06/09  2003/08/25         11  
Tommy Robredo      2006/05/08  2006/07/24         11  
Fernando Verdasco  2009/02/02  2009/04/06          9  
Albert Costa       1997/04/21  1997/05/26          5  
Nicolas Almagro    2011/04/25  2011/05/22          4  
John Isner         2012/03/19  2012/04/15          4  
Jiri Novak         2002/10/14  2002/10/21          1

The 32-week average difference is suggestive. As I’ve noted, Elo ratings are optimized to forecast the near future, so at least in theory, they reflect each player’s level right now. The ATP algorithm tallies each man’s performance over 52 weeks, with equal weight given to the first and last weeks in that timeframe. Setting aside improvement and decline due to age, that means the ATP computer is telling us how each player was performing, on average, 26 weeks ago. If Medvedev continues to oust top-20 players on a regular basis and claims another 500-level title or two, he could well be 26 or 32 weeks away from a top-ten debut.

Elo isn’t designed to make long-term forecasts–the tools needed to do so, for the most part, have yet to be invented. And the system occasionally gives high ratings to players who don’t sustain them for very long. But in general, a superlative Elo rating is a sign that a similar mark on the ATP ranking list isn’t far behind. So far, Kyrgios has managed to defy the odds, but the smart money still points to an eventual ATP top-ten debut for Medvedev.

The Rosy Forecast of Arnya Sabalenka’s Elo Rating

Italian translation at settesei.it

It’s been almost two weeks since Aryna Sabalenka’s last title, and the next one is starting to feel overdue. With respect to Naomi Osaka’s ascent, the Belarussian is the hottest rising star on the women’s tour right now, with two titles in the last two months, plus two more finals earlier in the season. The 20-year-old is 8-4 against the top ten this year, with wins over Caroline Wozniacki, Petra Kvitova, Elina Svitolina, and Karolina Pliskova.

It takes time for all of these wins to show up in the WTA rankings. Sabalenka nudged into the top 20 after winning New Haven in August, and rose as high as 11th last Monday, though she is set to fall back to 14th after failing to defend her title in Tianjin this week. While the official ranking is a lagging indicator, Elo ratings react more quickly, especially to high-profile upsets like the ones Sabalenka has been recording almost every week.

Sabalenka’s Elo rating has rocketed to the top of the list. Through last week’s matches, she sits at second overall, behind Simona Halep, but closer to Halep than to third-place Wozniacki. After knocking out Caroline Garcia in Beijing last week, she briefly took over the Elo top spot before handing it back after her quarter-final loss to Qiang Wang. Still, an overall ranking of #2 is a lot more suggestive of future stardom than the WTA computer’s report of #11.

When Elo looks at hard court matches alone, it is even more optimistic, putting Sabalenka at the very top of the list. Elo would narrowly favor the Belarussian in a hard-court match against Halep and, assuming the draw treated both players equally, would make Sabalenka the early favorite for the 2019 Australian Open title.

What should we make of this? Is it time to appoint Sabalenka the next superstar, or ought we treat Elo ratings with more circumspection? Let’s take a look at players who have topped the Elo list in the past to get a better idea.

Since 1984, only 29 women (including Sabalenka) have reached the #1 or #2 spot on the overall Elo list. 19 of them got to #1 in the official rankings. Here are the other ten:

Player               Peak  
Petra Kvitova           2  
Conchita Martinez       2  
Jana Novotna            2  
Agnieszka Radwanska     2  
Elina Svitolina         3  
Gabriela Sabatini       3  
Elena Dementieva        3  
Samantha Stosur         4  
Johanna Konta           4  
Aryna Sabalenka        11

This is pretty good company. Svitolina could still reach #1, and several of the others were expected to attain even greater heights than they did. The only warning sign here is Johanna Konta, who isn’t the best comp for a young star, as she didn’t crack the top two until close to her 26th birthday.

The group of women who have ranked #1 on the hard-court specific Elo ranking table is even more select. Sabalenka is only the 17th player since 1984 to head the list, and 14 of the 17 have topped the official rankings as well. The only other exceptions are Svitolina and Konta.

If there’s ever a good time to anoint a 14th-ranked player the future of the sport, I’d say this is it. Elo isn’t perfect, and it’s possible that the algorithm has overreacted to a series of upsets in a season packed full of them. But if the system has made a mistake, it’s one that it doesn’t make very often. Sabalenka has only won four main-draw matches at majors, so maybe that 2019 Australian Open title is too much to ask. But in the long term, one grand slam title might be a mere harbinger of even greater things to come.