What I Should’ve Known About Playing Styles and Upsets

In the podcast Carl Bialik and I recorded yesterday, I mentioned a pet theory I’ve had for awhile, that upsets are more likely in matches between players with contrasting styles. The logic is fairly simple. If you have two counterpunchers going at it, the better counterpuncher will probably win. If two big servers face off, the better big server should have no problem. But if a big server plays a counterpuncher … then, all bets are off.

We’ve seen Rafael Nadal struggle against the likes of John Isner and Dustin Brown, and and we’ve seen big servers neutralized by their opposites, as in Marin Cilic’s 1-6 record against Gilles Simon. There are upsets when similar styles clash, as well, but as untested theories go, this one is appealing and not obviously flawed.

Then, to kick off the 2019 Australian Open, Reilly Opelka knocked out Isner. Playing styles don’t come much more evenly matched, and the veteran was the heavy favorite. It was a perfect example of the kind of match I would expect to follow the script, yet the underdog came out on top. They played four tiebreaks and there were only two breaks of serve, but Opelka didn’t even need the Australian Open’s new fifth-set 10-point tiebreak. While it’s just one match, of course, it suggested that I ought to look more closely at my assumptions.

After a couple of hours playing with data this afternoon, my theory is no longer untested … and it turned out to be flawed. Fortunately, it isn’t just another negative result. Playing style is related to upset likelihood, but not in the way I predicted.

Measuring predictability

Let me explain how I tested the idea, and we’ll work our way to the results. First, I used used Match Charting Project data to calculate aggression score for every ATP player with at least 10 charted matches since 2010. Aggression score is, essentially, the percentage of shots that end the point (by winner, unforced error, or inducing a forced error), as will serve as our proxy for playing style. That gives us a group of 106 players, from the conservative Simon and Yoshihito Nishioka with aggression scores around 13%, to the freewheeling Brown and Ivo Karlovic, with scores nearing 30%. I divided those 106 players into quartiles (by number of matches, not number of players, so each quartile contains between 21 and 31 players) so we could see how each general playing style fares against the others. Here are the groups:

(Aggression score conflates two things: big serving/big hitting and tactical aggression. Isner is sometimes not particularly aggressive, but because of his size and serve skill, he is able to end points so frequently that, statistically, he appears to be extremely aggressive. Accordingly, I’ll refer to “big servers” and “aggressive players” interchangeably, even though in reality, there are plenty of differences between the two groups.)

Limiting our view to these 106 men, I found just over 11,000 matches to evaluate and divided them into groups based on which quartiles the two players fell into. Each of the ten possible subsets of matches, like Q1 vs Q2, or Q4 vs Q4, contains at least 400 examples.

For every match, I used surface-adjusted Elo ratings to determine the likelihood that the favorite would win. That gives us pre-match odds that aren’t quite as accurate as what sportsbooks might offer, though they’re close.

Those pre-match odds are key to determining whether certain groups are more predictable than others. If there are 100 matches in which the favorite is given a 60% chance of winning, and the favorites win 70 of them, we’d say that the results were more predictable than expected. If the favorites win only 50, the results were less predictable.

Goodbye, pet theory

For the matches in each of the ten quartile-vs-quartile subsets, I calculated the average favorite’s chance of winning (“Fave Odds”), then compared that to the frequency with which the favorites went on to win (“Fave Win%”). The table below shows the results, along with the relationship between those two numbers (“Ratio”). A ratio of 1.0 means that matches within the subset are exactly as predictable as expected; higher ratios mean that the favorites were even better bets than the odds gave them credit for, and lower ratios indicate more upsets than expected.

MatchupMatchesFave OddsFave Win%Ratio
Q1 vs Q141271.1%75.2%1.06
Q1 vs Q2107269.5%70.6%1.02
Q1 vs Q3138269.7%68.6%0.98
Q1 vs Q4118769.7%70.0%1.00
Q2 vs Q261270.2%69.9%1.00
Q2 vs Q3161668.8%67.8%0.99
Q2 vs Q4143468.8%67.4%0.98
Q3 vs Q388666.7%60.3%0.90
Q3 vs Q4168567.3%66.8%0.99
Q4 vs Q479167.1%61.2%0.91

There’s a striking finding here: The largest ratio, marking the most predictable bucket of matches, is for the most conservative pairs of players, while the smallest ratio, pointing to the most frequent upsets, is for the most aggressive players.

Before analyzing the relationship, let’s check one more thing. The very best players aren’t evenly divided throughout the quartiles, since Q1 has two of the big four. Elo-based match predictions–one of the building blocks of these results–are tougher to get right for the best players and the most uneven matchups, so we need to be careful whenever the elites might be influencing our findings. Therefore, let’s look at the same numbers, but this time for only those matches in which the favorite has a 50% to 70% chance of winning. This way, we exclude many of the best players’ matchups and all of their more lopsided contests:

MatchupMatchesFave OddsFave Win%Ratio
Q1 vs Q119659.5%62.8%1.05
Q1 vs Q260459.8%60.6%1.01
Q1 vs Q373159.7%58.1%0.97
Q1 vs Q466359.9%60.6%1.01
Q2 vs Q232259.0%54.7%0.93
Q2 vs Q393159.8%59.8%1.00
Q2 vs Q482259.7%57.2%0.96
Q3 vs Q354459.5%55.0%0.92
Q3 vs Q4102459.5%58.2%0.98
Q4 vs Q449359.3%55.0%0.93

We discard about 40% of our sample, but the predictability trend remains the generally the same. In both the overall sample and the narrower 50%- to 70%-favorite subset, the strongest relationship I could find was between the predictability ratio and the quartile of the less aggressive player. In other words, a counterpuncher is likely to have more predictable results–regardless of whether he faces a big server, a fellow counterpuncher, or anyone in between–than a more aggressive player.

Back to basics

My initial theory is clearly wrong. I expected to find that Q1 vs Q1 matches were more predictable than average, and I was right. But by my logic, I also guessed that Q4 vs Q4 matches went according to script, and that other pairings, like Q1 vs Q4, would be more upset-prone. I would have done better had I let the neighbor’s cat make my predictions for me.

Instead, we find that that matches with more aggressive players are more likely to result in surprises. That doesn’t sound so groundbreaking, and it’s something I should’ve seen coming. Big servers tend to hold serve more often and break serve less frequently, meaning that their matches end with narrower margins, opening the door for luck to play a larger role, especially when sets and matches are determined by tiebreaks.

After all this, you might be thinking that I’ve squandered my afternoon, plus another few minutes of your attention, arriving at something obvious and unremarkable. I agree that it’s not that exciting to proclaim that big servers are more influenced by luck. But there’s still a useful–even surprising–discovery buried here.

Exponential upset potential

We know that the most one-dimensional players are more subject than others to the ups and downs of luck, thanks to the narrow margins of tiebreaks. For a man who rarely breaks serve, no match is a guaranteed win; for a man who rarely gets broken, no opponent is impossible to beat. However, I would have expected that the unpredictability of big servers was already incorporated into our match predictions, via the Elo ratings of the big servers. If a player has unusually random results, we’d expect his rating to drift toward tour average. That’s one reason that it’s very difficult for poor returners to reach the very top of the rankings.

But apparently, that isn’t quite right. The randomness-driven Elo ratings of our big servers do a nearly perfect job of predicting match outcomes against counterpunchers, and they’re only a little bit too confident against the more middle-of-the-road players in Q2 and Q3. Against each other, though, upsets run rampant. That extremely volatile fraction of results–the tiebreak-packed outcomes when the biggest servers face off–only accounts for part of these players’ ratings.

We’re accustomed to getting unpredictable results from the most aggressive players, with their big serves, inconsistent returns, and short rallies. Today’s findings give us a better idea of when these do and do not occur. Against counterpunchers, things aren’t so unpredictable after all. But when big servers play each other, we expect the unexpected–and the results are even more unpredictable than that.

Daniil Medvedev’s Leading Elo Indicator

Italian translation at settesei.it

It is shaping up to be a breakthrough season for 22-year-old Russian Daniil Medvedev. His Tokyo title two weeks ago was his first at the ATP 500 level and his third on the season, after earlier triumphs in Sydney and Winston-Salem. The run in Japan was a particularly notable step, since he knocked out three top-20 players along the way. He had only four top-20 victories in the entire season leading up to Tokyo, and two of those were against the slumping Jack Sock.

His ATP ranking is rising alongside his results. The Winston-Salem title moved him into the top 40, and the Tokyo trophy resulted in a leap to No. 22. After a first-round win in Shanghai last week, Medvedev crept to his current career-high of No. 21. With a couple of wins in Moscow this week, he could overtake Milos Raonic and reach the top 20.

The improvement on the ATP ranking table is nothing next to the Russian’s race to the top of the Elo list. Last Monday, with the Japanese title in the books, Medevdev rose to No. 8 on my men’s Elo ranking. Since then, he has dropped two places but remains in the top ten, ahead of Marin Cilic, Kevin Anderson, and a host of others who outrank him on the official ATP list.

Given the discrepancy, what do we believe? Is Medvedev inside the top 10 or outside the top 20? Is Elo a leading indicator–that is to say, an early-warning signal for future ATP ranking milestones–or a misleading one? Elo is designed to be forward-looking, tuned to forecast upcoming match outcomes and weighting wins and losses based on the quality of the opponent. The official rankings explicitly consider a year’s worth of results, with no adjustments for quality of competition. In theory, Elo should be the better of the two measures for predicting longer-term results, but that assumes the algorithm works well, and that it doesn’t overreact to short-term successes. Let’s take a look at past differences between the two systems and see what the future might hold for the 22-year-old.

Precedents

Since 1988, 102 men have debuted in the ATP top ten. A slightly larger number, 113, have shown up in the top ten of my Elo ratings. There’s a very substantial overlap between the two, with 94 names appearing in both categories. Thus, 8 players have reached the ATP top ten without clearing the Elo threshold, while 19 have rated a spot in the Elo top ten without convincing the ATP computer to agree.

Here are the eight ATP top-tenners whose Elos have never merited the same status:

Player               ATP Top Ten Debut  ATP Top Ten Weeks  
Jonas Svensson                19910325                  5  
Nicolas Massu                 20040913                  2  
Radek Stepanek                20060710                 12  
Jurgen Melzer                 20110131                 14  
Juan Monaco                   20120723                  8  
Kevin Anderson                20151012                 31  
Pablo Carreno Busta           20170911                 17  
Lucas Pouille                 20180319                  1

A few of these players could still make progress on the Elo list, especially Kevin Anderson, who is currently 11th, a miniscule five points behind Medvedev.

Here is the longer list of Elo top-ten players without any weeks in the official top ten:

Player                 Elo Top Ten Debut  Elo Top Ten Weeks  
Carl Uwe Steeb                1989/05/22                  3  
Andrei Cherkasov              1990/12/11                  1  
Goran Prpic                   1991/05/20                  1  
David Wheaton                 1991/07/08                  9  
Jerome Golmard                1999/05/03                  2  
Dominik Hrbaty                2001/01/15                  2  
Jan Michael Gambill           2001/04/06                  6  
Nicolas Escude                2002/02/25                  4  
Younes El Aynaoui             2002/05/20                  2  
Paul Henri Mathieu            2002/10/14                  8  

Player                 Elo Top Ten Debut  Elo Top Ten Weeks
Agustin Calleri               2003/05/19                  2  
Taylor Dent                   2003/10/06                 10  
Andrei Pavel                  2004/05/10                  2  
Robby Ginepri                 2005/10/24                  1  
Ivo Karlovic                  2007/11/12                  3  
Roberto Bautista Agut         2016/02/22                  1  
Nick Kyrgios                  2016/03/04                 62  
Stefanos Tsitsipas            2018/08/13                  3  
Daniil Medvedev               2018/10/08                  2

* I define ‘weeks’ a little differently for Elo ratings, as ratings are generated only for those weeks with an ATP-level tournament or Davis Cup tie.

Most of these guys came very close to cracking the ATP top ten. For example, David Wheaton’s peak ranking was No. 12. With the exception of Nick Kyrgios, no one spent more than ten weeks in the Elo top ten without eventually reaching the same standard according to the ATP formula. This list shows that it’s possible to have a brief peak that cracks the Elo top ten but doesn’t last long enough to reflect the kind of success that the official ranking system was designed to reward. About one in six players with a top-ten Elo rating never reached the ATP top ten, though as we can see, the odds of remaining an Elo-only star fall quickly with each additional week in the top ten.

Kyrgios is a perfect example of the differences between the two approaches to player ranking. The Australian has recorded a number of high-profile upsets, which are the fastest way to climb the Elo list. But knocking out the second-ranked player in the world, as Kyrgios did to Novak Djokovic at Indian Wells last year, doesn’t have much impact on the ATP ranking when it happens in the fourth round. Usually, a player who can oust the elites will start piling up wins in a form that the official computer will appreciate. But Kyrgios, unlike just about every player in history with his talent, hasn’t done that.

In short, Elo will always elevate a few players to top-ten status even if they’ll never deserve the same treatment from the ATP formula. It’s too early to say whether Medvedev fits that mold. But where Elo really excels is identifying top players before the ATP computer does. Of the 94 cases since 1988 in which a man debuted in both top tens, Elo was first to anoint the player a top-tenner in 76 of them–better than 80%. The official rankings were first 10 times, and the two systems tied in the other eight instances. On average, players reached the Elo top ten about 32 weeks before the ATP top ten.

Here are the 11 most extreme gaps in which Elo got there first, along with the top-ten debuts of the Big Four:

Player               ATP Debut   Elo Debut  Week Diff  
Mariano Puerta      2005/07/25  2000/06/12        267  
Marc Rosset         1995/07/10  1990/11/05        244  
Fernando Gonzalez   2006/04/24  2002/10/07        185  
Guillermo Canas     2005/05/09  2002/08/05        144  
Mikhail Youzhny     2007/08/13  2004/11/15        143  
Gaston Gaudio       2004/06/07  2002/04/29        110  
Richard Gasquet     2007/07/09  2005/06/20        107  
Tomas Berdych       2006/10/23  2004/10/11        106  
Robin Soderling     2009/10/19  2007/10/08        106  
Mark Philippoussis  1999/03/29  1997/03/24        105  
Jack Sock           2017/11/06  2016/01/18         94  
                                                       
Player               ATP Debut   Elo Debut  Week Diff  
Roger Federer       2002/05/20  2001/02/19         65  
Andy Murray         2007/04/16  2006/08/21         34  
Novak Djokovic      2007/03/19  2006/07/31         33  
Rafael Nadal        2005/04/25  2005/02/21          9

And in case you’re curious, the ten cases in which the ATP computer beat Elo to the punch:

Player              ATP Debut   Elo Debut  Week Diff  
Stan Wawrinka      2008/05/12  2010/10/25        128  
David Ferrer       2006/01/30  2007/05/28         69  
Janko Tipsarevic   2011/11/14  2012/05/13         26  
Rainer Schuettler  2003/06/09  2003/08/25         11  
Tommy Robredo      2006/05/08  2006/07/24         11  
Fernando Verdasco  2009/02/02  2009/04/06          9  
Albert Costa       1997/04/21  1997/05/26          5  
Nicolas Almagro    2011/04/25  2011/05/22          4  
John Isner         2012/03/19  2012/04/15          4  
Jiri Novak         2002/10/14  2002/10/21          1

The 32-week average difference is suggestive. As I’ve noted, Elo ratings are optimized to forecast the near future, so at least in theory, they reflect each player’s level right now. The ATP algorithm tallies each man’s performance over 52 weeks, with equal weight given to the first and last weeks in that timeframe. Setting aside improvement and decline due to age, that means the ATP computer is telling us how each player was performing, on average, 26 weeks ago. If Medvedev continues to oust top-20 players on a regular basis and claims another 500-level title or two, he could well be 26 or 32 weeks away from a top-ten debut.

Elo isn’t designed to make long-term forecasts–the tools needed to do so, for the most part, have yet to be invented. And the system occasionally gives high ratings to players who don’t sustain them for very long. But in general, a superlative Elo rating is a sign that a similar mark on the ATP ranking list isn’t far behind. So far, Kyrgios has managed to defy the odds, but the smart money still points to an eventual ATP top-ten debut for Medvedev.

The Rosy Forecast of Arnya Sabalenka’s Elo Rating

Italian translation at settesei.it

It’s been almost two weeks since Aryna Sabalenka’s last title, and the next one is starting to feel overdue. With respect to Naomi Osaka’s ascent, the Belarussian is the hottest rising star on the women’s tour right now, with two titles in the last two months, plus two more finals earlier in the season. The 20-year-old is 8-4 against the top ten this year, with wins over Caroline Wozniacki, Petra Kvitova, Elina Svitolina, and Karolina Pliskova.

It takes time for all of these wins to show up in the WTA rankings. Sabalenka nudged into the top 20 after winning New Haven in August, and rose as high as 11th last Monday, though she is set to fall back to 14th after failing to defend her title in Tianjin this week. While the official ranking is a lagging indicator, Elo ratings react more quickly, especially to high-profile upsets like the ones Sabalenka has been recording almost every week.

Sabalenka’s Elo rating has rocketed to the top of the list. Through last week’s matches, she sits at second overall, behind Simona Halep, but closer to Halep than to third-place Wozniacki. After knocking out Caroline Garcia in Beijing last week, she briefly took over the Elo top spot before handing it back after her quarter-final loss to Qiang Wang. Still, an overall ranking of #2 is a lot more suggestive of future stardom than the WTA computer’s report of #11.

When Elo looks at hard court matches alone, it is even more optimistic, putting Sabalenka at the very top of the list. Elo would narrowly favor the Belarussian in a hard-court match against Halep and, assuming the draw treated both players equally, would make Sabalenka the early favorite for the 2019 Australian Open title.

What should we make of this? Is it time to appoint Sabalenka the next superstar, or ought we treat Elo ratings with more circumspection? Let’s take a look at players who have topped the Elo list in the past to get a better idea.

Since 1984, only 29 women (including Sabalenka) have reached the #1 or #2 spot on the overall Elo list. 19 of them got to #1 in the official rankings. Here are the other ten:

Player               Peak  
Petra Kvitova           2  
Conchita Martinez       2  
Jana Novotna            2  
Agnieszka Radwanska     2  
Elina Svitolina         3  
Gabriela Sabatini       3  
Elena Dementieva        3  
Samantha Stosur         4  
Johanna Konta           4  
Aryna Sabalenka        11

This is pretty good company. Svitolina could still reach #1, and several of the others were expected to attain even greater heights than they did. The only warning sign here is Johanna Konta, who isn’t the best comp for a young star, as she didn’t crack the top two until close to her 26th birthday.

The group of women who have ranked #1 on the hard-court specific Elo ranking table is even more select. Sabalenka is only the 17th player since 1984 to head the list, and 14 of the 17 have topped the official rankings as well. The only other exceptions are Svitolina and Konta.

If there’s ever a good time to anoint a 14th-ranked player the future of the sport, I’d say this is it. Elo isn’t perfect, and it’s possible that the algorithm has overreacted to a series of upsets in a season packed full of them. But if the system has made a mistake, it’s one that it doesn’t make very often. Sabalenka has only won four main-draw matches at majors, so maybe that 2019 Australian Open title is too much to ask. But in the long term, one grand slam title might be a mere harbinger of even greater things to come.

Forecasting the 2018 Laver Cup

Embed from Getty Images

Italian translation at settesei.it

It’s that time of year again: group selfies in suits, dodgy Davis Cup excuses, and a reminder that it takes more than six continents just to equal Europe. That’s right, it’s Laver Cup.

Last year, I worked out a forecast of the event, walking through a variety of ways in which captains Bjorn Borg and John McEnroe could use their rosters and ultimately predicting a 16-8 win for Team Europe. As it happened, both captains intelligently deployed their stars, and the result was 15-9. This year, the competitors are a little different and the home court has moved from Prague to Chicago, but the format remains the same.

Let’s start with a look at the rosters. I’ve included two additional players for reference: Juan Martin del Potro, scheduled to play for Team World, but withdrew; and Pierre Hugues Herbert, the doubles specialist Borg hasn’t realized he needs. Each player is shown alongside his surface-weighted singles Elo rating and surface-weighted doubles “D-Lo” rating:

EUROPE                       Singles Elo  Doubles D-Lo  
Novak Djokovic                      2137          1667  
Roger Federer*                      2097          1700  
Alexander Zverev                    1971          1690  
David Goffin                        1960          1582  
Grigor Dimitrov                     1928          1719  
Kyle Edmund                         1780          1542  
                                                        
WORLD                        Singles Elo  Doubles D-Lo  
Kevin Anderson                      1914          1692  
Nick Kyrgios                        1910          1668  
John Isner                          1887          1800  
Diego Sebastian Schwartzman         1814          1540  
Frances Tiafoe                      1772          1544  
Jack Sock                           1724          1925  
                                                        
ALSO                                                    
Juan Martin Del Potro               2062          1678  
Pierre Hugues Herbert               1691          1890

* Federer has played very little tour-level doubles for a long time. Last year I estimated his D-Lo at 1650; he played rather well last year, so I’ll bump him up to 1700 this time around.

Especially with Delpo on the sidelines, Europe looks to dominate the singles. The doubles leans in World’s favor, largely because Jack Sock is so good, especially in comparison with guys who have focused on singles.

Format review

Let’s do a quick refresher on the format. Laver Cup takes place over three days, each of which has three singles matches and one doubles match. Each player must play singles at least once, and no doubles pairing can repeat itself. Day 1 matches are worth one point each, day 2 matches are worth 2 points each, and day 3 matches are worth 3 points each. If there’s a 12-12 tie at the end of day 3, a single doubles set–in which a previously-used team may compete–will decide it all.

Given that format, the best way for the captains to use their rosters is to stick their three worst singles players on day 1 duty, then use their best three on both day 2 and day 3. For doubles, they should use their best doubles player every day, with the best partner on day 3, next best on day 2, and third best on day 1. As I’ve mentioned, Borg and McEnroe came close to this last year, although Borg didn’t use Rafael Nadal (his best doubles player) in day 3 doubles, and he generally overused Tomas Berdych. Both decisions are understandable, as Nadal may not have been physically able to play every possible match, and Berdych was in front of a Czech crowd.

Now that we know the captains will act in a reasonably savvy way, we can forecast the second edition with a little more confidence than the inaugural one.

The forecast

Nadal’s absence this year will hurt the Europe squad on both singles and doubles. Combined with a small step backward for Federer’s singles game, this year’s Laver Cup figures to be closer than last year. Recall that my forecast a year ago called for a 16-8 Europe victory, and the result was 15-9.

Assuming optimal usage, the 2018 forecast gives Europe a 67.6% chance of winning, with a most likely final score of 14-10. There’s a nearly one-in-ten shot that we’ll see a 12-12 tie, in which the superior doubles capabilities of Team World give them the edge, with a 70.7% probability of winning the tie-breaking set.

Were del Potro not so fragile, this could get even more interesting. Swap out Frances Tiafoe for the Tower of Tandil, and Europe’s chances fall to 56.8%, with a most likely final score of 13-11.

Nothing McEnroe could have done, short of going to medical school a few decades ago, could have put the Argentine on his team this week. But Borg has less of an excuse for failing to maximize the potential of his team. Unlike World, with its world-beating doubles specialist, Europe has a stunning singles roster that rarely takes to the doubles court. As we’ve seen, one doubles player can take the court three times, plus the potential 12-12 tie-breaking set. The specialist would need to play singles only once, on the low-leverage first day.

The obvious choice is Pierre Hugues Herbert, a top-five doubles player with the ability to play respectable singles as well. The Frenchman would be considerably more valuable than Kyle Edmund, who is a better singles player, but not good enough to be of much help to an already loaded side. (I made a similar point last year and illustrated it with Herbert’s partner, Nicolas Mahut. Since then, Herbert has taken the lead over his Mahut in both singles and doubles Elo ratings.)

When we sub in Herbert for Edmund, the simulation spits out the best result yet for Europe. Against the actual World team (that is, no Delpo), the hypothetical Europe squad would have a 74.6% chance of winning, with the likely final score between 14-10 and 15-9. Herbert and a mediocre partner would still be the underdogs in a tie-breaking final set against Sock and John Isner, but the presence of a legitimate doubles threat would narrow the odds to about 58/42.

We won’t get to see either Delpo or Herbert in Chicago this year, but we can expect a slightly more competitive Laver Cup than last year. Add in home court advantage, and the result is even less of a foregone conclusion. It’s no match for last week’s Davis Cup World Group play-offs, but I suspect it’ll make for more compelling viewing this weekend than the final rounds in Metz and St. Petersburg.

Two Servebots and Zero Tiebreaks

Embed from Getty Images

Isner had energy to burn since he never needed to count to seven.

Italian translation at settesei.it

There have been plenty of upsets at this year’s US Open, but they all pale in comparison with the surprise that John Isner and Milos Raonic delivered Sunday night in their fourth round match. Isner won, 3-6 6-3 6-4 3-6 6-2, failing to hold twice and breaking Raonic’s serve four times. Rarely has a tiebreak seemed so assured, and the two big men didn’t even get close.

In five previous meetings, Isner and Raonic have been more likely to deliver two tiebreaks than only one, and most of their matches were best-of-three, not the grand slam best-of-five format. In 13 previous sets, they had played 9 tiebreaks. In the last year, 45% of Isner’s sets have reached 6-6, while nearly a quarter of the Canadian’s have. One or the other of these guys is responsible for the longest match in history, the longest ever major semi-final, and the longest match in Olympics history. They are really, really good at holding serve, and really not-so-good at breaking.

Great expectations

The likelihood that Isner and Raonic would play a tiebreak depends on some basic assumptions. If Raonic served like he has for the last 52 weeks, that’s a service-point won percentage (SPW) of 72.8%, which is equivalent to holding 93% of the time. If we use Isner’s actual SPW from the match of 74.3%, that translates to a hold rate of 94.4%. If we choose Isner’s SPW from his previous meetings with Raonic of a whopping 76.5%, that gives us an implied hold rate of 96%. Those all sound high but, as we’ll see, the difference between them ends up affecting the probability quite a bit.

I’m going to run the numbers using three sets of assumptions:

  1. The head-to-head. In five matches (four of them on hard courts, the fifth at Wimbledon this year), Isner won 76.5% of service points, while Raonic won 71.4%. That’s equivalent to hold rates of 96.0% and 91.7%, respectively.
  2. The last 52 weeks (adjusted). Across all surfaces, going back to last year’s US Open, Isner has won 73.6% of service points, against Raonic’s 72.8%. Those numbers, however, are against average opponents. Both players, and especially Isner, have below-par return games. If we adjust each SPWs for the other player’s rate of return points won (RPW), we get 75.5% for Isner and 78.5% for Raonic. In game-level terms, those are hold rates of 95.3% and 97.1%.
  3. The match itself. On Sunday night, Isner won 74.3% of service points and Raonic won 68.8%. Using these numbers doesn’t give us a true prediction, since we couldn’t have known them ahead of time. But maybe, if we used every scrap of information available to us and put them all together in a really smart way, we could have gotten close to the true number. Those rates translate to hold percentages of 94.4% for Isner and 88.5% for Raonic.

Not enough tiebreaks

Apparently, the betting odds for at least one tiebreak in the match set the probability around 95%. That turns out to be in line with my predictions, though the specific assumptions affect the result quite a bit.

I’ve calculated a few likelihoods using each set of assumptions. The first, “p(No brk),” is the probability that the two men would simply hold serve for 12 games. It’s not the only way to reach a tiebreak, but it accounts for most of the possibilities. Next, “p(TB)” is the result of a Monte Carlo simulation to show the odds that any given set would result in a tiebreak. “eTB” is the expected number of tiebreaks if we knew that Isner and Raonic would play five sets. Finally, “p(1+ TB)” is the chance that the match would have at least one tiebreak in five sets.

Model   JI Hld  MR Hld  p(No brk)   p(TB)   eTB  p(1+ TB)  
H2H      96.0%   91.7%      46.5%   51.3%   2.6     97.3%  
Last52   95.3%   97.1%      62.8%   65.3%   3.3     99.5%  
Match    94.4%   88.5%      34.0%   41.2%   2.1     93.0%

Given how the big men played on Sunday, it isn’t unthinkable that they never got to 6-6. In large part because Isner’s return game brought Raonic’s SPW under 70%, each set had “only” a 41.2% chance of going to a tiebreak, and there was a 7% chance that a five-setter would have none. The other two sets of assumptions, though, point to the sort of tiebreak certainty reflected in the betting market … and just about anyone who has ever seen these two guys play tennis.

Perhaps the strangest aspect of all of this is that, in six previous matches at this year’s Open, Isner and Raonic combined for seven tiebreaks–at least one in five of their six matches–before their anticlimactic encounter. Knowing Isner, this is a blip, not a trend, and he’s sure to give us a breaker or two in his quarter-final against Juan Martin del Potro. His tournament record will likely show one or two tiebreaks in every match … except for the one against his fellow servebot. This must be why we stick with tennis: Every match has the potential to surprise us, even if we never really wanted to watch it.

Did Rafael Nadal Almost Lose a Set to David Ferrer?

Italian translation at settesei.it

In David Ferrer’s final grand slam, the draw gods handed him a doozy of a first-round assignment in Rafael Nadal. Ferrer has struggled all year, and no one seriously expected him to improve on his 6-24 career record against the King of Clay. In the end, he didn’t: Ferrer was forced to retire midway through the second set with a calf injury. But before his final Flushing exit, he gave Rafa a bit of a scare.

Nadal won the first set, 6-3. The second set was a bit messier: Ferrer broke to love in the opening game, Rafa broke him back in the next, and a bit later, Ferrer broke again to take a 3-2 lead. He maintained that one break advantage until he physically couldn’t continue. Leading 4-3 and serving the next game, he was been two holds away from leveling the match.

Does that mean Nadal “almost” lost the set? People on the internet argue about these things, and while I don’t understand why, I do love a good probability question. If it overlaps with semantics (yay sematics!), that’s a bonus.

Let’s forget the word choice for now and reframe the question: Ignoring the injury, what were Ferrer’s chances of winning the set? If we assume that both players were equal, it’s a simple thing to plug into my win probability model and–ta da!–we find that from 4*-3, Ferrer had a roughly 85% chance of winning the set.

But wait: I can already hear the Rafa fans screaming at me, these two players aren’t exactly equal. In the 102 points the Spanish duo played on Monday night, Ferrer won 38% on return and Nadal won 47%. For an entire five-set match, those rates work out to a 93% chance of Rafa winning. Maybe that’s not quite high enough, but it’s in the ballpark. Using those figures, Ferrer’s chance of hanging on to win the second set drop significantly, to 57.5%. When you’re winning barely half of your service points, your odds of securing a pair of holds are worse than a coin flip. Had Ferrer won the set, it’s more likely that he would’ve needed to either break Rafa again or come through in a tiebreak.

That’s a pretty big difference between our two initial estimates. 85% sounds good enough to qualify for “almost” (though one study quantifies the meaning of “almost” at above 90%), but 57.5% does not.

That doesn’t quite settle it, though. The win probability model takes all notions of streakiness out of the equation.  According to the formula, there’s no patches of good or bad play, no dips in motivation, so extra energy to finish off a set, etc. I’m not convinced any of those exist in any systematic manner, but it’s tough to settle the question either way. Therefore, if we have the ability to use data from real-life matches, we should.

And here, we can. Let’s start with Nadal. Going back to late 2011, I was able to identify 69 sets in which Rafa was returning down a break at 4-3. (There are probably more; my point-by-point dataset isn’t exhaustive, but the missing matches are mostly random, so the 69 should be representative of the last several years.) Of those 69, he came back to win 21, almost exactly 30%.

Ferrer has been more solid than Nadal’s opponents. (It helps that Ferrer only had to face Rafa once, while Nadal’s opponents had face him every time.) I found 122 sets in which Ferrer served at 4-3, leading by a break. He went on to win the set 109 of those times, or about 89%.

The 89% figure is definitely too high for our purposes: Not only was Ferrer a better player, on average, between 2012 and today, than he is now, but he also had the benefit of facing weaker opponents than Nadal in almost all of those 122 sets. 89%–not far from the theoretical 85% we started with–is a grossly optimistic upper limit.

Even if we take the average of Nadal’s and Ferrer’s real-life results–roughly 90% conversions for Ferru and 70% for Rafa’s opponents–80% is still overshooting the mark. As we’ve established, Ferrer’s numbers refer to a stronger version of the Spaniard, while Rafa is still near the level of his last half-decade. Even 80%, then, is overstating the chances that Nadal would’ve lost a set.

That leaves us with a range between 57%, which assumes Nadal would keep winning nearly half of Ferrer’s service points, and 80%, which is based on the experience of both players over the last several years. Ultimately, any final figure comes down to what we think about Ferrer’s level right now–not as good as it was even a couple of years ago, but at the same time, good enough to come within two games of taking a set from the top-ranked player in the world.

It would take a lot more work to come up with a more precise estimate, and even then, we’d still be stuck not only trying to establish Ferrer’s current ability level, but also his ability level in that set. Just as the word “almost” refers to a range of probabilities, I’m happy to call it a day with my own range. Taking all of these calculations together, we might settle on a narrower field of, say, 65-70%, or about two in three. There’s a good chance a healthy Ferrer would have taken that set from his long-time tormentor, but it was far from a sure thing … or even, given the usual meaning of the word, an “almost” sure thing.

Unseeded Serena and the Roland Garros Draw

In a wide-open women’s field at this year’s French Open, it seems fitting that one of the most dangerous players in the draw isn’t even seeded. Serena Williams has played only four matches–none of them on clay–since returning to tour after giving birth. As such, her official WTA ranking is No. 453, and her current match-play level is anyone’s guess.

Because her ranking is low, she needed to use the ‘special ranking’ rule to enter the tournament, and the rule doesn’t apply to seedings. (I’m not going to dive further into the debate about how the rule should work–I’ve written a lot about the rule in the past.) As an unseeded player, she could have drawn anyone in the first round; in that sense, she was a bit lucky to end up opposite another unseeded player, Kristyna Pliskova, in the first round. Her wider draw section is manageable as well, with a likely second-round match against 17th seed Ashleigh Barty and a possible third-rounder with 11th seed Julia Goerges. If she makes it to the round of 16, we’ll probably be treated to a big-hitting contest between Serena and Karolina Pliskova or Maria Sharapova.

According to my Elo-based forecast, a best guess about the level of post-pregnancy Serena is that she’s the 7th best overall player in the field, and 9th best on clay. That gives her about a 40% chance of winning her first three matches and reaching the second week, a 6.2% chance of making it to the final, and a 3.1% chance of adding yet another major title to her haul.

What if she were seeded? Seeds are a clear advantage for players who receive them, as a seeding protects against facing other top contenders until later rounds. By simulating the tournament with Serena seeded, we can get a sense of how much the WTA’s rule (and the French Federation’s decision not to seed her) impacts her chances.

Seeded 7th: Let’s imagine a bizarre world in which my Elo ratings were used for tournament seedings. In that case, Serena would be seeded 7th, knocking Caroline Garcia down to 8th and sending current 32nd seed Alize Cornet into the unseeded pool. That would be a clear advantage: 50/50 odds of reaching the fourth round, a 9% chance of playing in the final, and a 4.4% shot at the title, compared to 3.1% in reality.

Seeded 1st: If seeds were assigned based on protected ranking, Serena would be the top seed. You can’t get much more of an advantage than that: The top seed is protected from playing either of the other top-four seeds until the semifinals, for instance. (It’s no insurance against a meeting with 28th seed Sharapova, but Serena, of all people, isn’t worried about that.) Moving from 7th to 1st would give her another boost, but it’s a modest one: As the top seed, her chances of sticking around for the second week would still be 50/50, with 10.1% and 4.7% odds of reaching the final and winning the title, respectively.

Here’s a summary of Serena’s chances in the various seeding scenarios. The final column is “expected points”–a weighted average of the number of WTA ranking points she is expected to collect, given her likelihood of reaching each round.

Scenario     R16  Final  Title  ExpPts  
Actual     39.8%   6.2%   3.1%     273  
Unseeded*  34.4%   6.2%   3.0%     259  
Seeded 7   50.3%   9.0%   4.4%     356  
Seeded 1   50.5%  10.1%   4.7%     371

* the ‘unseeded’ scenario represents Serena’s chances as an unseeded entrant, given a random draw. She got a little lucky, avoiding top players until the 4th round, though her chances of making the final end up the same.

Seeds matter, though there’s only so much they can do. If Serena really is at a barely-top-ten level, she’s a long shot for the title regardless of whether there’s a number next to her name. If my model grossly underestimates her and she’s back at previous form–let’s not forget, she made the final the last time she played here, and won the title the year before that–then the rest of the field will once again look like a bunch of flies for her to swat away, regardless of which numbers they have next to their names.

Handling Injuries and Absences With Tennis Elo

Italian translation at settesei.it

For the last year or so, every mention of my ATP and WTA Elo ratings has required some sort of caveat. Ratings don’t change while players are absent from the tour, so Serena Williams, Novak Djokovic, Andy Murray, Maria Sharapova, and Victoria Azarenka were all stuck at the top of their tour’s Elo rankings. When their layoffs started, they were among the best, and even a smattering of poor results (or a near season’s worth, in the case of Sharapova) isn’t enough to knock them too far down the list.

This is contrary to common sense, and it’s very different from how the official ATP and WTA rankings treat these players. Common sense says that returning players probably aren’t as good as they were before a long break. The official rankings are harsher, removing players entirely after a full year away from the tour. Serena probably isn’t the best player on tour right now (as Elo insisted during her time off), but she’s also much more of a threat than her WTA ranking of No. 454 implies. We must be able to do better.

Before we fix the Elo algorithm, let’s take a moment to consider what “better” means. Fans tend to get worked up about rankings and seedings, as if a number confers value on the player. The official rankings are, by design, backward-looking: They measure players based on their performance over the last 52 weeks, weighted by how the tour prioritizes events. (They are used in a forward-looking way, for tournament seedings, but the system is not designed to be predictive of future results.) In this way, the official rankings say, “this is how good she has played for the last year.” Whatever her ability or potential, Serena (along with Vika, Murray, and Djokovic) hasn’t posted many positive results this year, and her ranking reflects that.

Elo, on the other hand, is designed to be predictive. Out of necessity, it can only use past results, but it uses those results in a way to best estimate how well a player is competing right now–our best proxy for how someone will play tomorrow, or next week. Elo ratings–even the naive ones that said Serena and Novak are your current No. 1s–are considerably better at predicting match outcomes than are the official rankings. For my purposes, that’s the definition of “better”–ratings that offer more accurate forecasts and, by extension, the best approximation of each player’s level right now.

The time-off penalty

When players leave the tour for very long, they return–at least on average, and at least temporarily–at a lower level. I identified every layoff of eight weeks or longer in ATP history, taken by a player with an Elo rating of 1900 or above*. In their first matches back on tour, their pre-break Elo overestimated their chances of winning by about 25%. It varies a bit by the amount of time off: eight- to ten-week breaks resulted in an overestimation around 17%, while 30- to 52-week breaks meant Elo overestimated a player’s chances by nearly 50% upon return. There are exceptions to every rule, like Roger Federer at the 2017 Australian Open, and Rafael Nadal, who won 14 matches in a row after his two-month break this season, but in general, players are worse when they come back.

* I used the cutoff of 1900 because, below that level, some players are alternating between the ATP and Challenger tours. My Elo algorithm doesn’t include challenger results, so for lower-rated players, it’s not clear which timespans are breaks, and which are series of challenger events. Also, the eight-week threshold doesn’t count the offseason, so an eight-week layoff might really mean ~16 weeks between events, with the break including the offseason.

Translated into Elo terms, an eight-week break results in a drop of 100 Elo points, and a not-quite-one-year break, like Andy Murray’s current injury layoff, means a drop of 150 points. Making that adjustment results in an immediate improvement in Elo’s predictiveness for the first match after a layoff, and a small improvement in predictiveness for the first 20 matches after a break.

Incorporating uncertainty

Elo is designed to always provide a “best estimate”–when a player is new on tour, we give him a provisional rating of 1500, and then adjust the rating after each match, depending on the result, the quality of the opponent, and how many matches our player has contested. That provisional 1500 is a completely ignorant guess, so the first adjustment is a big one. Over time, the size of a player’s Elo adjustments goes down, because we learn more about him. If a player loses his first-ever match to Joao Sousa, the only information we have is that he’s probably not as good as Sousa, so we subtract a lot of points. If Alexander Zverev loses to Sousa after more than 150 career matches, including dozens of wins over superior players, we’ll still dock Zverev a few points, but not as many, because we know so much more about him.

But after a layoff, we are a bit less certain that what we knew about a player is still relevant. Djokovic a great example right now. If he lost six out of nine matches (as he did between the Australian Open fourth round and Madrid) without missing any time beforehand, we’d know it was a slump, but most of us would expect him to snap out of it. Elo would reduce his rating, but he’d remain near the top. Since he missed the second half of last season, however, we’re more skeptical–perhaps he’ll never return to his former level. Other cases are even more clear-cut, as when a player returns from injury without being fully healed.

Thus, after a layoff, it makes sense to alter how much we adjust a player’s Elo ratings. This isn’t a new idea–it’s the core concept behind Glicko, another chess rating system that expands on Elo. Over the years, I’ve tinkered with Glicko quite a bit, looking for improvements that apply to tennis, without much success. Changing the multiplier that determines rating adjustments (known as the k factor) doesn’t improve the predictiveness of tennis Elo on its own, but combined with the post-layoff penalties I described above, it helps a bit.

The nitty-gritty: After a layoff, I increase the multiplier by a factor of 1.5, and then gradually reduce it back to 1x over the next 20 matches. The flexible multiplier slightly improves the accuracy of Elo ratings for those 20 matches, though the difference is minor compared to the effect of the initial penalty.

No more caveats*

* I thought it would be funny to put an asterisk after “no more caveats.”

Post-layoff penalties and flexible multipliers end up bringing down the current Elo ratings of the players who are in the middle of long breaks or have recently come back from them, giving us ranking tables that come closer to what we expect–and should do a better job of predicting the outcome of upcoming matches. These changes to the algorithm also have minor effects on the ratings of other players, because everyone’s rating depends on the rating of all of his or her opponents. So Taro Daniel’s Elo bounce from defeating Djokovic in Indian Wells doesn’t look quite as good as it did before I implemented the penalty.

On the ATP side, the new algorithm knocks Djokovic down to 3rd in overall Elo, Murray to 6th, Jo-Wilfried Tsonga to 21st, and Stan Wawrinka to 24th. That’s still quite high for Novak considering what we’ve seen this year, but remember that the Elo algorithm only knows about his on-court performances: A six-month break followed by a half-dozen disappointing losses. The overall effect is about a 200-point drop from his pre-layoff level; the “problem” is that his Elo a year ago reflected how jaw-droppingly good he had recently been.

The WTA results match my intuition even better than I hoped. Serena falls to 7th, Sharapova to 18th, and Azarenka to 23rd. Because of the flexible multiplier, a few early wins for Williams will send her quickly back up the rankings. Like Djokovic, she rates so high in part because of her stratospheric Elo rating before her time off. For her part, Sharapova still rates higher by Elo than she does in the official rankings. Despite the penalty for her one-year drug suspension, the algorithm still treats her prior success as relevant, even if that relevance fades a bit more every week.

Elo is always an approximation, and given the wide range of causes that will sideline a player, not to mention the spectrum of strategies for returning to the tour, any rating/forecasting system is going to have a harder time with players in that situation. That said, these improvements give us Elo ratings that do a better job of representing the current level of players who have missed time, and they will allow us to make superior predictions about matches and tournaments involving those players.

Under the hood

If you’re interested in some technical details, keep reading.

Before making these adjustments, the Brier score for Elo-based predictions of all ATP matches since 1972 was about 0.20. For all matches that involved at least one player with an Elo of 1900 or better, it was 0.17. (Not only are 1900+ players better, their ratings tend to be based on more data, which at least partly explains why the predictions are better. The lower the Brier score, the better.)

For the population of about 500 “first matches” after layoffs for qualifying players, the Brier score before these changes was 0.192. After implementing the penalty, it improved to 0.173.

For the 2nd through 20th post-comeback matches, the Brier score for the original algorithm was 0.195. After adding the penalty, it was 0.191, and after making the multiplier flexible, it fell a bit more to 0.190. (Additional increases to the post-layoff multiplier had negative results, pushing the Brier score back to about 0.195 when the 2nd-match multiplier was 2x.) I realize that’s a tiny change, and it very possibly won’t hold up in the future. But in looking at various notable players over the course of their comebacks, that’s the option that generated results that looked the most intuitively accurate. Since my intuition matched the best Brier score (however miniscule the difference), it seems like the best option.

Finally, a note on players with multiple layoffs. If someone misses six months, plays a few matches, then misses another two months, it doesn’t seem right to apply the penalty twice. There aren’t a lot of instances to use for testing, but the limited sample confirms this. My solution: If the second layoff is within two years of the previous comeback, combine the length of the two layoffs (here: eight months), find the penalty for a break of that length, and then apply the difference between that penalty and the previous one. Usually, that results in second-layoff penalties of between 10 and 50 points.

Forecasting the Laver Cup

Italian translation at settesei.it

This weekend brings us the first edition of the Laver Cup, a star-studded three-day affair that pits Europe against the rest of the world. The European team features Roger Federer and Rafael Nadal, and even though several other elites from the continent are missing due to injury, the European team is still much stronger on paper.

Here are the current rosters, along with each competitor’s weighted hard court Elo rating and rank among active players:

EUROPE                  Elo Rating  Elo Rank  
Roger Federer                 2350         2  
Rafael Nadal                  2225         4  
Alexander Zverev              2127         7  
Tomas Berdych                 2038        14  
Marin Cilic                   2029        15  
Dominic Thiem                 1995        17  
                                              
WORLD                   Elo Rating  Elo Rank  
Nick Kyrgios                  2122         8  
John Isner                    1968        22  
Jack Sock                     1951        23  
Sam Querrey                   1939        25  
Denis Shapovalov              1875        36  
Frances Tiafoe                1574       153  
Juan Martin del Potro*        2154         5

*del Potro has withdrawn. I’ve included his singles Elo rating and rank to emphasize how damaging his absence is to the World squad.

“Weighted” surface Elo is the average of overall (all-surface) Elo and surface-specific Elo. The 50/50 split is a much better predictor of match outcomes than either number on its own.

Nick Kyrgios can hang with anybody on a hard court. But despite some surface-specific skills represented by the American contingent, every other member of the World team rates lower than every member of team Europe. This isn’t a good start for the rest of the world.

What about doubles? Here are the D-Lo (Elo for doubles) ratings and rankings for all twelve participants, plus Delpo:

EUROPE                  D-Lo rating  D-Lo rank  
Rafael Nadal                   1895          4  
Tomas Berdych                  1760         28  
Marin Cilic                    1676         76  
Roger Federer**                1650         90  
Alexander Zverev               1642         99  
Dominic Thiem                  1521        185  
                                                
WORLD                   D-Lo rating  D-Lo rank  
Jack Sock                      1866          8  
John Isner                     1755         29  
Nick Kyrgios                   1723         45  
Sam Querrey                    1715         49  
Denis Shapovalov**             1600        130  
Frances Tiafoe                 1546        166  
Juan Martin del Potro*         1711         55

** Federer hasn’t played tour-level doubles since 2015, and Shapovalov hasn’t done so at all. These numbers are my best guesses, nothing more.

Here, the World team has something of an edge. While both sides feature an elite doubles player–Rafa and Jack Sock–the non-European side is a bit deeper, especially if they keep Denis Shapovalov and last-minute Delpo replacement Frances Tiafoe on the sidelines. Only one-quarter of Laver Cup matches are doubles (plus a tie-breaking 13th match, if necessary), so it still looks like team Europe are the heavy favorite.

The format

The Laver Cup will take place in Prague over three days (starting Friday, September 22nd), and consist of four matches each day: three singles and one doubles. Every match is best-of-three sets with ad scoring and a 10-point super-tiebreak in place of the third set.

On the first day, the winner of each match gets one point; on the second day, two points, and on the third day, three points. That’s a total of 24 points up for grabs, and if the twelve matches end in a 12-12 deadlock, the Cup will be decided with a single doubles set.

All twelve participants must play at least one singles match, and no one can play more than two. At least four members of each squad must play doubles, and no doubles pairing can be repeated, except in the case of a tie-breaking doubles set.

Got it? Good.

Optimal strategy

The rules require that three players on each side will contest only one singles match while the other three will enter two each. A smart captain would, health permitting, use his three best players twice. Since matches on days two and three count for more than matches on day one, it also makes sense that captains would use their best players on the final two days.

(There are some game-theoretic considerations I won’t delve into here. Team World could use better players on day one in hopes of racking up each points against the lesser members of team Europe, or could drop hints that they will do so, hoping that the European squad would move its better players to day one. As far as I can tell, neither team can change their lineup in response to the other side’s selections, so the opportunities for this sort of strategizing are limited.)

In doubles, the ideal roster deployment strategy would be to use the team’s best player in all three matches. He would be paired with the next-best player on day three, the third-best on day two, and the fourth-best on day one. Again, this is health permitting, and since all of these guys are playing singles, fatigue is a factor as well. My algorithm thus far would use Nadal five times–twice in singles and three times in doubles–and I strongly suspect that isn’t going to happen.

The forecast

Let’s start by predicting the outcome of the Cup if both captains use their roster optimally, even if that’s a longshot. I set up the simulation so that each day’s singles competitors would come out in random order–if, say, Querrey, Shapovalov, and Tiafoe play for team World on day one, we don’t know which of them will play first, or which European opponent each will face. So each run of the simulation is a little different.

As usual, I used Elo (and D-Lo) to predict the outcome of specific matchups. Because of the third-set super-tiebreak, and because it’s an exhibition, I added a bit of extra randomness to every forecast, so if the algorithm says a player has a 60% chance of winning, we knock it down to around 57.5%. When I dug into IPTL results last winter, I discovered that exhibition results play surprisingly true to expectations, and I suspect players will take Laver Cup a bit more seriously than they do IPTL.

Our forecast–again, assuming optimal player usage–says that Europe has an 84.3% chance of winning, and the median point score is 16-8. There’s an approximately 6.5% chance that we’ll see a 12-12 tie, and when we do, Europe has a slender 52.4% edge.

If Delpo were participating, he would increase the World team’s chances by quite a bit, reducing Europe’s likelihood of victory to 75.5% and narrowing the most probable point score to 15-9.

What if we relax the “optimal usage” restriction? I have no idea how to predict what captains John McEnroe and Bjorn Borg will do, but we can randomize which players suit up for which matches to get a sense of how much influence they have. If we randomize everything–literally, just pick a competitor out of a hat for each match–Europe comes out on top 79.7% of the time, usually winning 15-9. There’s a 7.6% chance of a tie-breaking 13th match, and because the World team’s doubles options are a bit deeper, they win a slim majority of those final sets. (When we randomize everything, there’s a slight risk that we violate the rules, perhaps using the same doubles pairing twice or leaving a player on the bench for all nine singles matches. Those chances are very low, however, so I didn’t tackle the extra work required to avoid them entirely.)

We can also tweak roster usage by team, in case it turns out that one captain is much savvier than the other. (Or if a star like Nadal is unable to play as much as his team would like.) The best-case scenario for our World team underdogs is that McEnroe chooses the best players for each match and Borg does not. Assuming that only European players are chosen from a hat, the probability that the favorites win falls all the way to 63.1%, and the typical gap between point totals narrows all the way to 13-11. The chance of a tie rises to 10%.

On the other hand, it’s possible that Borg is better at utilizing his squad. After all, it doesn’t take an 11-time grand slam winner to realize that Federer and Nadal ought to be on court when the stakes are the highest. This final forecast, with random roster usage from team World and ideal choices from Borg, gives Europe a whopping 92.3% chance of victory, and median point totals of 17 to 7. The World team would have only a 4% shot at reaching a deadlock, and even then, the Europeans win two-thirds of the tiebreakers.

There we have it. The numbers bear out our expectation that Europe is the heavy favorite, and they give us a sense of the likely margin of victory. Tiafoe and Shapovalov might someday be part of a winning Laver Cup side, but it looks like they’ll have to wait a few years before that happens.

Update: One more thing… What about doubles specialists? Both captains have two discretionary picks to use on players regardless of ranking. Most great doubles players are much worse at singles, but as we’ve seen, a player can be relegated to a lone one-point singles match on day one, and as a doubles player, he can have an effect on three different matches, totaling six points.

Sure enough, swapping out Dominic Thiem (a very weak doubles player for whom indoor hard courts are less than ideal) for Nicolas Mahut would have increased Europe’s chances of winning from 84.3% to 88.5%. On the slight chance that the Cup stayed tight through the final doubles match and into a tiebreaker, the doubles team of Mahut-Nadal (however unorthodox that sounds) would be among the best that any captain could put on the court.

There’s even more room for improvement on the World side, especially with del Potro out. At the moment, the third-highest rated hard court player by D-Lo is Marcelo Melo, who would be a major step down in singles but a huge improvement on most of the potential partners for Sock in doubles. If we give him a singles Elo of 1450 and put him on the roster in place of Tiafoe and pit the resulting squad against the original Europe team (with Thiem, not Mahut), it almost makes up for the loss of Delpo–World’s chances of winning increase from 15.7% to 19.3%.

Unfortunately, Borg and McEnroe may have missed their chance to eke out extra value from their six-man rosters–this is a trick that will only work once. If both teams made this trade, Mahut-for-Thiem and Melo-for-Tiafoe, each side’s win probability goes back to near where it started: 85.8% for Europe. That’s a boost over where we started (84.3%), just because Mahut is better suited for the competition than Melo is, as an elite doubles specialist who is also credible on the singles court. No one available to the World team (except for Sock, who is already on the roster) fits the same profile on a hard court. Vasek Pospisil comes to mind, though he has taken a step back from his peaks in both singles and doubles. And on clay, Pablo Cuevas would do nicely, but on a faster surface, he would represent only a marginal improvement over the doubles players already playing for team World.

Maybe next year.

 

Measuring the Impact of Wimbledon’s Seeding Formula

Italian translation at settesei.it

Unlike every other tournament on the tennis calendar, Wimbledon uses its own formula to determine seedings. The grass court Grand Slam grants seeds to the top 32 players in each tour’s rankings, and then re-orders them based on its own algorithm, which rewards players for their performance on grass over the last two seasons.

This year, the Wimbledon seeding formula has more impact on the men’s draw than usual. Seven-time champion Roger Federer is one of the best grass court players of all time, and though he dominated hard courts in the first half of 2017, he still sits outside the top four in the ATP rankings after missing the second half of 2016. Thanks to Wimbledon’s re-ordering of the seeds, Federer will switch places with ATP No. 3 Stan Wawrinka and take his place in the draw as the third seed.

Even with Wawrinka’s futility on grass and the shakiness of Andy Murray and Novak Djokovic, getting inside the top four has its benefits. If everyone lives up to their seed in the first four rounds (they won’t, but bear with me), the No. 5 seed will face a path to the title that requires beating three top-four players. Whichever top-four guy has No. 5 in his quarter would confront the same challenge, but the other three would have an easier time of it. Before players are placed in the draw, top-four seeds have a 75% chance of that easier path.

Let’s attach some numbers to these speculations. I’m interested in the draw implications of three different seeding methods: ATP rankings (as every other tournament uses), the Wimbledon method, and weighted grass-court Elo. As I described last week, weighted surface-specific Elo–averaging surface-specific Elo with overall Elo–is more predictive than ATP rankings, pure surface Elo, or overall Elo. What’s more, weighted grass-court Elo–let’s call it gElo–is about as predictive as its peers for hard and clay courts, even though we have less grass-court data to go on. In a tennis world populated only by analysts, seedings would be determined by something a lot more like gElo and a lot less like the ATP computer.

Since gElo ratings provide the best forecasts, we’ll use them to determine the effects of the different seeding formulas. Here is the current gElo top sixteen, through Halle and Queen’s Club:

1   Novak Djokovic         2296.5  
2   Andy Murray            2247.6  
3   Roger Federer          2246.8  
4   Rafael Nadal           2101.4  
5   Juan Martin Del Potro  2037.5  
6   Kei Nishikori          2035.9  
7   Milos Raonic           2029.4  
8   Jo Wilfried Tsonga     2020.2  
9   Alexander Zverev       2010.2  
10  Marin Cilic            1997.7  
11  Nick Kyrgios           1967.7  
12  Tomas Berdych          1967.0  
13  Gilles Muller          1958.2  
14  Richard Gasquet        1953.4  
15  Stanislas Wawrinka     1952.8  
16  Feliciano Lopez        1945.3

We might quibble with some these positions–the algorithm knows nothing about whatever is plaguing Djokovic, for one thing–but in general, gElo does a better job of reflecting surface-specific ability level than other systems.

The forecasts

Next, we build a hypothetical 128-player draw and run a whole bunch of simulations. I’ve used the top 128 in the ATP rankings, except for known withdrawals such as David Goffin and Pablo Carreno Busta, which doesn’t differ much from the list of guys who will ultimately make up the field. Then, for each seeding method, we randomly generate a hundred thousand draws, simulate those brackets, and tally up the winners.

Here are the ATP top ten, along with their chances of winning Wimbledon using the three different seeding methods:

Player              ATP     W%  Wimb     W%  gElo     W%  
Andy Murray           1  23.6%     1  24.3%     2  24.1%  
Rafael Nadal          2   6.1%     4   5.7%     4   5.5%  
Stanislas Wawrinka    3   0.8%     5   0.5%    15   0.4%  
Novak Djokovic        4  34.1%     2  35.4%     1  34.8%  
Roger Federer         5  21.1%     3  22.4%     3  22.4%  
Marin Cilic           6   1.3%     7   1.0%    10   1.0%  
Milos Raonic          7   2.0%     6   1.6%     7   1.7%  
Dominic Thiem         8   0.4%     8   0.3%    17   0.2%  
Kei Nishikori         9   1.9%     9   1.7%     6   1.9%  
Jo Wilfried Tsonga   10   1.6%    12   1.4%     8   1.5%

Again, gElo is probably too optimistic on Djokovic–at least the betting market thinks so–but the point here is the differences between systems. Federer gets a slight bump for entering the top four, and Wawrinka–who gElo really doesn’t like–loses a big chunk of his modest title hopes by falling out of the top four.

The seeding effect is a lot more dramatic if we look at semifinal odds instead of championship odds:

Player              ATP    SF%  Wimb    SF%  gElo    SF%  
Andy Murray           1  58.6%     1  64.1%     2  63.0%  
Rafael Nadal          2  34.4%     4  39.2%     4  38.1%  
Stanislas Wawrinka    3  13.2%     5   7.7%    15   6.1%  
Novak Djokovic        4  66.1%     2  71.1%     1  70.0%  
Roger Federer         5  49.6%     3  64.0%     3  63.2%  
Marin Cilic           6  13.6%     7  11.1%    10  10.3%  
Milos Raonic          7  17.3%     6  14.0%     7  15.2%  
Dominic Thiem         8   7.1%     8   5.4%    17   3.8%  
Kei Nishikori         9  15.5%     9  14.5%     6  15.7%  
Jo Wilfried Tsonga   10  14.0%    12  13.1%     8  14.0%

There’s a lot more movement here for the top players among the different seeding methods. Not only do Federer’s semifinal chances leap from 50% to 64% when he moves inside the top four, even Djokovic and Murray see a benefit because Federer is no longer a possible quarterfinal opponent. Once again, we see the biggest negative effect to Wawrinka: A top-four seed would’ve protected a player who just isn’t likely to get that far on grass.

Surprisingly, the traditional big four are almost the only players out of all 32 seeds to benefit from the Wimbledon algorithm. By removing the chance that Federer would be in, say, Murray’s quarter, the Wimbledon seedings make it a lot less likely that there will be a surprise semifinalist. Tomas Berdych’s semifinal chances improve modestly, from 8.0% to 8.4%, with his Wimbledon seed of No. 11 instead of his ATP ranking of No. 13, but the other 27 seeds have lower chances of reaching the semis than they would have if Wimbledon stopped meddling and used the official rankings.

That’s the unexpected side effect of getting rankings and seedings right: It reduces the chances of deep runs from unexpected sources. It’s similar to the impact of Grand Slams using 32 seeds instead of 16: By protecting the best (and next best, in the case of seeds 17 through 32) from each other, tournaments require that unseeded players work that much harder. Wimbledon’s algorithm took away some serious upset potential when it removed Wawrinka from the top four, but it made it more likely that we’ll see some blockbuster semifinals between the world’s best grass court players.