Forecasting Archives - Page 2 of 12

The Next Five Years, According To a (Dumb) Grand Slam Crystal Ball

Last year, I introduced a bare-bones model that predicts men’s grand slam results for the next five years. It takes a minimum of inputs: a player’s age, and his number of major semi-finals, finals, and titles in the last two years. Despite leaving out so much additional data, the model explains a lot of the variation among players, achieving most of what a more complex algorithm would, but with nothing more than basic arithmetic.

A bit further down, I’ll introduce a similar model for women’s grand slam results. First, let’s look at the revised numbers for the men. Keep in mind that these are not career slam forecasts, but only slams in the next five years. That’s good enough for the Big Three, but it probably doesn’t tell the whole story for, say, Stefanos Tsitsipas.

Player              Projected Slams  
Novak Djokovic                  2.5  
Rafael Nadal                    2.1  
Dominic Thiem                   2.0  
Alexander Zverev                0.9  
Stefanos Tsitsipas              0.6  
Daniil Medvedev                 0.6  
Matteo Berrettini               0.3  
Lucas Pouille                   0.1  
Diego Schwartzman               0.1

A few other players (notably Roger Federer) reached a semi-final in the last two years, but because of their age, the model forecasts zero slams. Also keep in mind that Wimbledon was not played this year, so there was a bit less data to work with.* The sum of the forecasts is a mere 9.2 slams, out of a possible 20. In some previous years, the model predicted as many as 15 titles for the players it took into consideration. Because today’s top players are so old, they aren’t expected to dominate much of the 2021-25 calendar, leaving room for new contenders to emerge.

* My original post describes the forecasting algorithm as counting results from “the last four slams” and “the previous four slams.” We could account for the three-slam 2020 season by following those steps literally, giving greater weights to the last four slams (the 2019 US Open plus the three 2020 slams), and giving lesser (but still non-zero) weights to the four slams before that. I rejected that approach because (a) it would give an awful lot of weight to the US Open, and (b) the relative lack of 2020 data reflects higher-than-usual uncertainty, which ought to show up in the forecasts, as well. Thus, only seven slams were taken into account for 2021-25 predictions, instead of the usual eight.

Interestingly, the 2020 season has barely budged the predicted career totals for the big three. Numbers I published immediately after last year’s US Open forecast Rafael Nadal for 22.5 career slams: his (then) 19 plus 3.5 more. Now he has 20, and the model pegs him for another 2.1. Novak Djokovic was slated for a career total of 19.5: 3.5 more on top his then-total of 16. He’s still penciled in for 19.5: 17 plus another 2.5 in the future. Federer didn’t have reason to expect much a year ago, and it’s no better now.

The women’s model

It turns out that a similar back-of-the-envelope approach gives good approximations of future slam totals for WTA stars, as well. The weights are a bit different, the average peak age is one year sooner, and the age adjustment is slightly smaller, but the idea is essentially the same.

Here’s how to calculate the number of expected major titles for your favorite player:

Start with zero points
Add 20 points for each slam semi-final reached in the last 12 months
Add 20 points for each slam final reached in the last 12 months
Add 80 points for each slam title won in the last 12 months
Add 10 points for each slam semi-final reached in the previous 12 months
Add 10 points for each slam final reached in the previous 12 months
Add 40 points for each slam title won in the previous 12 months
If the player is older than 26 (at the time of the next slam), subtract 7 points for each year she is older than 26
If the player is younger than 26, add 7 points for each year she is younger than 26
Divide the sum by 100

To take a simple example, consider Iga Swiatek. For her recent French Open title, she gets 20 points for the semi, 20 points for the final, and 80 points for the title. She will still be 19 when the Australian Open rolls around, so we add another 49 points: 7 years younger than 26, times 7 points per year. Her projected total is (20 + 20 + 80 + 49) / 100 = 1.69.

Here are the results for all of the women who reached a major semi-final in 2019 or 2020 and are projected to win more than zero slams between 2021 and 2025:

Player               Projected Slams  
Naomi Osaka                      2.0  
Sofia Kenin                      1.9  
Iga Swiatek                      1.7  
Bianca Andreescu                 1.0  
Ashleigh Barty                   0.9  
Amanda Anisimova                 0.6  
Simona Halep                     0.6  
Marketa Vondrousova              0.6  
Nadia Podoroska                  0.4  
Garbine Muguruza                 0.3  
Belinda Bencic                   0.3  
Jennifer Brady                   0.3  
Elina Svitolina                  0.2  
Petra Kvitova                    0.1  
Victoria Azarenka                0.1

These forecasts sum to 11.0 slams, more than the men’s total. That’s largely because so many of recent women’s champions are younger, giving the model more reason to be optimistic about them. It still leaves plenty of room for other players to earn some hardware in the next half-decade, which makes sense. The WTA has featured a non-stop succession of breakout young stars for the past few years, and with players like Aryna Sabalenka, Elena Rybakina, and Cori Gauff in the mix, there’s no shortage of talent to keep the carousel turning.

And then there’s Serena Williams. The model projects her for zero slams, despite her three semi-finals and two finals in the last two years. The reason is her age: The algorithm expects players to steadily decline from age 27 onwards, so the age penalty by age 39 is harsh. One one hand, that makes sense: we’re forecasting the results of events that will mostly take place when she’s in her 40s. On the other hand, a player who had so much success at age 37 is probably a good bet to break the mold at 39, as well. Were this a more fully-developed model, we’d probably be smart to tinker with the age adjustment to reflect the reality that Williams is a much better bet to win a major title than Nadia Podoroska.

We could go on all day. For every variable that these forecasts take into account, there are a dozen more than have some plausible claim to relevance. But this simple approach gets us surprisingly far in telling the future–a future in which the men’s all-time grand slam race keeps getting more complicated, and the women’s game continues to feature a wide array of promising young stars.

The Post-Covid WTA is Drifting Back to Normal

In the two latest WTA events, we saw a mix of the expected and the unusual. Simona Halep, the heavy favorite in Prague, wound up with the title despite a couple of demanding three-setters in her first two rounds. The week’s other tournament, in Lexington, failed to follow the script. Serena Williams and Aryna Sabalenka, the big hitters at the top and bottom of the bracket, combined for three wins, with four unseeded players making up the semi-final field.

Last week I pointed out that Palermo–the tour’s initial comeback event–was so unpredictable that you would’ve been better off to treat each match as a coin flip than to use pre-layoff player strength ratings (such as Elo) to forecast outcomes. Such an upset-ridden event isn’t unheard of, even in pandemic-free times, but it is suggestive that the WTA rank-and-file haven’t quite returned to their usual form.

Prague and Lexington give us three times as much data to work with. Plus, we might theorize that Prague would be a little more predictable because so many players in that field also took part in the Palermo event, meaning that they have a little more recent match experience. While our sample of 93 main draw matches is still flimsy, it brings us a little closer to understanding how well traditional forecasts will handle this unusual time.

A thorny Brier patch

The metric I’m using to quantify predictability–or to put it another way, the validity of pre-layoff player ratings–is Brier Score, which takes into account both raw accuracy (did the forecast pick the right player to win?) and confidence level (was the forecast too strong, too weak, or just right?). Tour-level Brier Scores are usually in the range of 0.21, while a score of 0.25 means the predictions were no better than coin flips. A lower score represents more accurate predictions.

Here are the Brier Scores for Palermo, Lexington, and Prague, along with the average of the three, and the average of all WTA International events (on all surfaces) since 2017. (The scores are based on forecasts generated from my Elo ratings.) We might expect the first round to be different, since players are particularly rusty at that stage, so I’ve also broken out first round (“R32 Brier”) matches for each of the tournaments and averages in the table.

Tournament    Brier  R32 Brier  
Palermo       0.268      0.295  
Lexington     0.226      0.170  
Prague        0.212      0.247  
Comeback Avg  0.235      0.237  
Intl Avg      0.217      0.213

As we last week, the Palermo results truly defied expectations. More than half of the matches were upsets (according to my Elo ratings), with a particularly unpredictable first round.

That didn’t last. The Prague first round rated 0.247–just barely better than coin flips–but the messiness didn’t last beyond the first couple of days. The event’s overall Brier Score was 0.212, slightly better than the average WTA International. In other words, this group of 32 women, only recently returned from a months-long break, delivered results that were roughly as predictable as we would expect in the middle of a normal season.

The Lexington numbers are a bit more difficult to make sense of, but like Prague’s, they point to a post-coronavirus world that isn’t all that weird. The opening round closely followed the script, with a Brier Score of 0.170. Of the last 115 WTA International events, only 22 were more predictable. The forecast accuracy didn’t last, in large part because of Serena’s loss at the hands of Shelby Rogers. The rating for the entire tournament was 0.226, less predictable than usual, but much better than random guessing and closer to tour average than to the assumption-questioning Palermo numbers.

Revised estimates

We’re still early in the process of evaluating what to expect from players after the COVID-19 layoff. As more tournaments take place, we can identify whether players become more predictable with more matches under their belts. (Perhaps the Prague participants who skipped Palermo were more difficult to forecast, although Halep is an obvious counterexample.)

At this point, anything is possible. It could be that we will steadily drift back to business is usual. On the other hand, the new social-distancing-oriented rules–with few or no fans on site, nightlife limited to Netflix, players fetching their own towels, and new variations of on-court coaching–might work to the advantage of some women and the disadvantage of others. If that’s the case, Elo ratings will go through a novel period of adjustment as they shift to reflect which players thrive on the post-corona tour.

It’s too early to do much more than speculate about something as significant as that. But in the last week, we’ve seen forecasts go from wildly wrong (in Palermo) to not half bad (in Lexington and Prague). We’ve gained some confidence that for all the things that have obviously changed since March, our approach to player ratings may be one thing that largely remains the same.

Did Palermo Show the Signs of a Five-Month Pandemic Layoff?

Are tennis players tougher to predict when they haven’t played an official match for almost half a year? Last week’s WTA return-to-(sort-of)-normal in Palermo gave us a glimpse into that question. In a post last week I speculated that results would be tougher than usual to forecast for awhile, necessitating some tweaks to my Elo algorithm. The 31 main draw matches from Sicily allow us to run some preliminary tests.

At first glance, the results look a bit surprising. Only two of the eight seeds reached the semifinals, and the ultimate champion was the unseeded Fiona Ferro. Two wild cards reached the quarters. Is that notably weird for a WTA International-level event? It doesn’t seem that strange, so let’s establish a baseline.

Palermo the unpredictable

My go-to metric for “predictability” is Brier Score, which measures the accuracy of percentage forecasts. It’s nice to pick the winner, but it’s more important to assign the right level of probability. If you say that 100 matches are all 60/40 propositions, your favorites should win 60 of the 100 matches. If they win 90, you weren’t nearly confident enough; if they win 50, you would’ve been better off flipping a coin. Brier Score encapsulates those notions into a single number, the lower the better. Roughly speaking, my Elo forecasts for ATP and WTA matches hover a bit above 0.2.

From 2017 through March 2020, the 975 completed matches at clay-court WTA International events had a collective Brier Score of 0.223. First round matches were a tiny bit more predictable, with R32’s scoring 0.219.

Palermo was a roller-coaster by comparison. The 31 main-draw matches combined for a Brier Score of 0.268. Of the 32 other events I considered, only last year’s Prague tourney was higher, generating a 0.277 mark.

The first round was more unpredictable still, at 0.295. On the other hand, the combination of a smaller per-event sample and the wide variety of first-round fields means that several tournaments were wilder for the first few days. 9 of the 32 others had a first-round Brier Score above 0.250, with four of them scoring higher–that is, worse–than Palermo did.

The Brier Score of shame

I mentioned the 0.250 mark because it is a sort of Brier Score of shame. Let’s say you’re predicting the outcome of a series of coin flips. The smart pick is 50/50 every time. It’s boring, but forecasting something more extreme just means you’re even more wrong half the time. If you set your forecast at 50% for a series of random events with a 50/50 chance of occurring, your Brier Score will be … 0.250.

Another way to put it is this: If your Brier Score is higher than 0.250, you would’ve been better off predicting that every match was 50/50. All the fancy forecasting went to waste.

In Palermo, 17 of the 31 matches went the way of the underdog, at least according to my Elo formula. The Brier Scores were on the shameful side of the line. My earlier post–which advocated moderating all forecasts, at least a bit–didn’t go far enough. At least so far, the best course would’ve been to scrap the algorithm entirely and start flipping that coin.

Moderating the moderation

All that said, I’m not quite ready to throw away my Elo ratings. (At the moment, they pick Simona Halep and Aryna Sabalenka, my two favorite players, to win in Prague in Lexington. So there’s that.) 31 matches is small sample, far from adequate to judge the accuracy of a system designed to predict the outcome of thousands of matches each year. As I mentioned above, Elo failed even worse at Prague last year, but because that tournament didn’t follow several months of global shutdowns, it wouldn’t have even occurred to me to treat it as more than a blip.

This time, a week full of forecast-busting surprises could well be more than a blip. Treating players as if they have exactly the abilities they had in March is probably the wrong way to do things, and it could be a very wrong way of doing things. We’ll triple the size our sample in the next week, and expand it even more over the next month. It won’t help us pick winners right now, but soon we’ll have a better idea of just how unpredictable the post-COVID-19 tennis world really is.

How Much Will the ATP Cup Raise for Australian Bushfire Relief?

Yesterday, the ATP announced that it would make a sizeable donation to the Australian Red Cross:

Each ace served across the @ATPCup at all three venues will deliver $100 to the @RedCrossAU bushfire disaster relief and recovery efforts.

With more than 1500 aces expected to be served, the tournament contribution is expected to exceed $150,000.
— ATPCup (@ATPCup) January 2, 2020

Several players, including Nick Kyrgios, have made additional pledges of their own that extend across the several tournaments of the Australian summer. (Kyrgios’s pledge started the ball rolling, a rare instance of the tour following the lead of its most controversial star.)

How much?

The ATP offered an estimate of 1,500 aces. This is the first edition of the ATP Cup, not to mention the first men’s tour event in Perth, so we can’t simply check how many aces there were last year. Complicating things even further, we don’t know who will play for each nation in each day of the tournament, or which countries will advance to the knockout stages.

In other words, any ace prediction is going to be approximate.

Start with the basics. The ATP Cup will encompass 129 matches. That’s 43 ties, with two singles rubbers and one doubles rubber each. As in the new Davis Cup finals, many doubles rubbers are likely to be “dead,” so all 43 will probably not be played. In Madrid, 21 of the 25 doubles matches were played*, so let’s say that doubles will be skipped at the same rate in Australia, giving us 36 doubles matches.

* one of the four matches I’ve excluded was a 1-0 retirement, which for the purpose of ace counting–not to mention common sense–is effectively unplayed.

The average ace counts in best-of-three matches across the entire tour last year were 12 per singles match and 7 per doubles match. That gives us 1,284 for the 122 total contests we expect to see over the course of the event.

But we can do better. There are more aces on hard courts by a healthy margin. Over the 2019 season, the average best-of-three hard-court singles match returned 15 aces, while doubles matches featured half as many. That works out to a projected total of 1,542, 20% higher than where we started, and quite close to the ATP’s estimate.

While we don’t have much data on the surface in Perth, we have years worth of results from Brisbane and Sydney. Brisbane was one of the ace-friendliest surfaces on tour, while Sydney was at the other end of the spectrum. The figures have also varied from year to year, even controlling for the changing mix of players. Whether we look at one year or a longer time span, the average ace rates in Brisbane and Sydney combine to something in the neighborhood of the tour-wide rate.

Complicating factors

The record-setting temperatures in Australia are likely to nudge ace rates upwards. But the mix of players makes things considerably more difficult to forecast.

One challenge is the extreme range between the best players in the event (Rafael Nadal and Novak Djokovic) and the weakest, like Moldova’s 818th-ranked Alexander Cozbinov. Not only are underdogs like Cozbinov likely to see their typical ace rates plummet against higher-quality competition, they will probably struggle to keep matches competitive. The shorter the match, the fewer aces. Ironically, Cozbinov fought Steve Darcis for over three hours on the first day of play, but even at that length, only 2 of his 116 service points went for aces. He and Darcis combined for a below-average total of 10.

Another difficulty is one that would arise in predicting the total aces at any tournament. Overall ace counts depend heavily on who advances to the later rounds. The Spanish team of Nadal, Roberto Bautista Agut, and Pablo Carreno Busta is likely to do well despite relatively few first-serve fireworks. But if Canada reprises its Davis Cup Finals success, the top-line combination of Denis Shapovalov and Felix Auger Aliassime could give us six rounds of stratospheric serving stats. The American duo of John Isner and Taylor Fritz could do the same, though their odds of advancing took a dire turn after a day-one loss to Norway. At least Isner has already done his part, tallying 33 aces in a three-set loss to Casper Ruud.

As I write this, day one is not quite in the books. The first ten completed singles matches worked out to 16 aces each, slightly above the hard-court tour average. Thanks to Isner and Kyrgios, the outliers propped up that number, with 37 and 35 aces in the Isner-Ruud and Kyrgios-Struff matches, respectively. The three completed doubles matches have averaged just over 6 aces each, a bit below tour average.

This is all of long way of saying, surprise! The ATP’s estimate isn’t bad at all. A full simulation of each matchup and the event as a whole would give us more precision, but barring that, 1,500 aces and $150,000 looks like a pretty good bet. Philanthropists should line up behind the big hitting teams from Australia, Canada, and the USA, or at least cheer for an above-average number of free points off the serve of Rafael Nadal.

Rethinking Match Results as Probabilities

You don’t have to watch tennis for long before hearing a commentator explain that matches can be decided by the slimmest of margins. It’s common for a match winner to tally only 51% or 52% of the total points played. Dozens of times each year, players go even further, triumphing despite winning fewer than half of points. Novak Djokovic did just that in the 2019 Wimbledon final, claiming only 204 points to Roger Federer’s 218.

It’s right to look at results like Djokovic-Federer and conclude that many matches are decided by slim margins or that performance on certain points is crucial. Indeed, players occasionally win matches while winning as few as 47% of points.

Still, it’s possible to take the “slim margins” claim too far. 51% sounds like a narrow margin, as does 53%. In many endeavors, sporting and otherwise, 55% represents a near-tie, and even 60% or 65% suggests that there isn’t much to separate the two sides. Not so in tennis, especially in the serve-centered men’s game. However it sounds, 60% represents a one-sided contest, and 65% is a blowout verging on embarrassment. In 2019, only three ATP tour matches saw one player win more than 70% of total points.

Answer a different question

For several reasons, total points won is an imperfect measure of one player’s superiority, even in a single match. One flaw is that it is usually stuck in that range between 35% and 65%, incorrectly implying that all tennis matches are relatively close contests. Another drawback is that not all 55% rates (or 51%s, or 62%s) are created equal. The longer the match, the more information we gain about the players. For a specific format, like best-of-three, a longer match usually requires closely-matched players to go to tiebreaks or a third set. But if we want to compare matches across different formats (like best-of-three and best-of-five), the length of the match doesn’t necessarily tell us anything. Best-of-five matches are longer because of the rules, not because of any characteristics of the players.

The solution is to think in terms of probabilities. Given the length of a match, and the percentage of points won by each player, what is the probability that the winner was the better player?

To answer that question, we use the binomial distribution, and consider the likelihood that one player would win as many points as he did if the players were equally matched. If we flipped a fair coin 100 times, we would expect the number of heads to be around 50, but not that it will always be exactly 50. The binomial distribution tells us how often to expect any particular number of heads: 49, 50, or 51 are common, 53 is a bit less common, 55 even less so, 40 or 60 quite uncommon, as so on. For any number of heads, there’s some probability that it is entirely due to chance, and some probability that it occurs because the coin is biased.

Here’s how that relates to a tennis match. We start the match pretending that we know nothing about the players, assuming that they are equal. The number of points is analogous to the number of coin flips–the more points, the more likely the player who wins the most is really better. The number of points won by the victor corresponds to the number of heads. If the winner claims 60% of points, we can be pretty sure that he really is better, just as a tally of 60% heads in 100 or more flips would indicate that the coin is probably biased.

More than just 59%

The binomial distribution helps us convert those intuitions into probabilities. Let’s look at an example. The 2019 Roland Garros final was a fairly one-sided affair. Rafael Nadal took the title, winning 58.6% of total points played (116 of 198) over Dominic Thiem, despite dropping the second set. If Nadal and Thiem were equally matched, the probability that Nadal would win so many points is barely 1%. Thus, we can say that there is a 99% probability that Nadal was–on the day, in those conditions, and so on–the better player.

No surprises there, and there shouldn’t be. Things get more interesting when we alter the length of the match. The two other 2019 ATP finals in which one player won about 58.6% of points were both claimed by Djokovic. In Paris, he won 58.7% of points (61 of 104) against Denis Shapovalov, and in Tokyo, he accounted for 58.3% (56 of 96) in his defeat of John Millman. Because they were best-of-three instead of best-of-five, those victories took about half as long as Nadal’s, so our confidence that Djokovic was the better player–while still high!–shouldn’t be quite as close to 100%. The binomial distribution says that those likelihoods are 95% and 94%, respectively.

The winner of the average tour-level ATP match in 2019 won 55% of total points–the sort of number that sounds close, even as attentive fans know it really isn’t. When we convert every match result into a probability, the average likelihood that the winner was the better player is 80%. The latter number not only makes more intuitive sense–fewer results are clustered in the mid 50s, with numbers spread out from 15% to 100%–but it considers the length of the match, something that old-fashioned total-points-won ignores.

Why does this matter?

You might reasonably think that anyone who cared about quantifying match results already has these intuitions. You already know that 55% is a tidy win, 60% is an easy one, and that the length of the match means those numbers should be treated differently depending on context. Ranking points and prize money are awarded without consideration of this sort of trivia, so what’s the point of looking for an alternative?

I find this potentially valuable as a way to represent margin of victory. It seems logical that any player rating system–such as my Elo ratings–should incorporate margin of victory, because it’s tougher to execute a blowout than it is a narrow win. Put another way, someone who wins 59% of points against Thiem is probably better than someone who wins 51% of points against Thiem, and it would make sense for ratings to reflect that.

Some ratings already incorporate margin of victory, including the one introduced recently by Martin Ingram, which I discussed with him on a recent podcast. But many systems–again, including my Elo ratings–do not. Over the years, I’ve tested all sorts of potential ways to incorporate margin of victory, and have not found any way to consistently improve the predictiveness of the ratings. Maybe this is the one that will work.

Leverage and lottery matches

I’ve already hinted at one limitation to this approach, one that affects most other margin-of-victory metrics. Djokovic won only 48.3% of points in the 2019 Wimbledon final, a match he managed to win by coming up big in more important moments than Federer did. Recasting margin of victory in terms of probabilities gives us more 80% results than 55% results, but it also gives us more 25% results than 48% results. According to this approach, there is only a 24% chance that Djokovic was the better player that day. While that’s a defensible position–remember the 218 to 204 point gap–it’s also a bit uncomfortable.

Using the binomial distribution as I’ve described above, we completely ignore leverage, the notion that some points are more valuable than others. While most players aren’t consistently good or bad in high-leverage situations, many matches are decided entirely by performance in those key moments.

One solution would be to incorporate my concept of Leverage Ratio, which compares the importance of the points won by each player. I’ve further combined Leverage Ratio with Dominance Ratio, a metric closely related to total points won, into a single number I call DR+, or adjusted Dominance Ratio. It’s possible to win a match with a DR below 1.0, which means winning fewer return points than your opponent did, an occurrence that often occurs when total points won is below 50%. But when DR is adjusted for leverage, it’s extremely uncommon for a match winner to end up with a DR+ below 1.0. Djokovic’s DR in the Wimbledon final was 0.87, and his DR+ was 0.97, one of the very few instances in which a winner’s adjusted figure stayed below 1.0.

It would be impossible to fix the binomial distribution approach in the same way I’ve “fixed” DR. We can’t simply multiply 65%, or 80%, or whatever, by Leverage Ratio, and expect to get a sensible result. We might not even be interested in such an approach. Calculating Leverage Ratio requires access to a point-by-point log of the match–not to mention a hefty chunk of win-probability code–which makes it extremely time consuming to compute, even when the necessary data is available.

For now, leverage isn’t something we can fix. It is only something that we can be aware of, as we spot confusing margin-of-victory figures like Djokovic’s 24% from the Wimbledon final.

Rethinking, fast and slow

As with many of the metrics I devise, I don’t really expect wide adoption. If the best application of this approach is to create a component that improves Elo ratings, then that’s a useful step forward, even if it goes no further.

The broader goal is to create metrics that incorporate more of our intuitions. Just because we’ve grown accustomed to the quirks of the tennis scoring system, a universe in which 52% is close and 54% is not, doesn’t mean we can’t do better. Thinking in terms of probabilities takes more effort, but it almost always nets more insight.

Podcast Episode 80: Martin Ingram on Predicting Match Outcomes, Bayesian Style

Episode 80 of the Tennis Abstract Podcast features Martin Ingram (@xenophar), author of a recent academic paper, A point-based Bayesian hierarchical model to predict the outcome of tennis matches.

If you’re interested in learning more about what goes into a forecasting system, this one’s for you. We start with a discussion of the advantages as well as the limitations of the common “iid” assumption, that points are independent and identically distributed. Martin’s model, which relies on the iid assumption, incorporates each player’s serve and return skill, in addition to surface preferences and tournament-specific characteristics. In our conversation, he explains how it works, and why this sort of model is able to provide reasonable forecasts even with limited data.

That’s just the beginning. Martin suggests several possible additions to his model, and we close by considering the importance of domain knowledge in this sort of statistical work.

Thanks for listening!

(Note: this week’s episode is about 65 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Andreescu, Medvedev, and the Future According to Elo

With the US Open title added to her 2019 trophy haul, Bianca Andreescu is finally a member of the WTA top 10, debuting at fifth on the ranking table. Daniil Medvedev, the breakout star of the summer on the men’s side, only cracked the ATP top 10 after Wimbledon. He’s now up to fourth. The official ranking algorithms employed by the tours take some time to adjust to the presence of new stars.

Elo, on the other hand, reacts quickly. While the ATP and WTA computers assign points based on a year’s worth of results (rounds reached, not opponent quality), Elo gives the most weight to recent accomplishments, with even greater emphasis placed on surprising outcomes, like upsets of top players. If your goal in using a ranking system is to predict the future, Elo is better: Elo-based forecasts significantly outperform predictions based on ATP and WTA ranking points.

Andreescu’s first Premier-level title came at Indian Wells in March, when she beat two top-ten players, Elina Svitolina and Angelique Kerber, in the semi-final and final. The WTA computer reacted by moving her up from 60th to 24th on the official list. Elo already saw Andreescu as a more formidable force after her run to the final in Auckland, so after Indian Wells, the algorithm moved her up to seventh. Three more wins in Miami, and the Canadian teen cracked the Elo top five.

Tennis fans are accustomed to the slow adjustments of the ranking system, so seeing a “(22)” or a “(15)” next to Andreescu’s name at Roland Garros and the US Open wasn’t particularly jarring. And there’s something to be said for withholding judgment, since tennis has had its share of teenage flashes in the pan. But Elo is usually right. The betting market heavily favored Serena Williams in the US Open final, but Elo saw the Canadian as the superior player, giving her a slight edge. After the latest seven match wins in New York, the algorithm rates Andreescu as the best player on tour, very narrowly edging out Ashleigh Barty. Would you dare disagree?

The launching (Ar)pad

When Medvedev first reached the top ten on the Elo list last October, I ran some numbers to compare the two ranking systems. Most players who earn a spot in the Elo top ten eventually make their way into the ATP top ten as well, but Elo is almost always first. On average, the algorithm picks top-tenners more than a half-year sooner than the tour’s computer. The 23-year-old Russian is a good example: He reached eighth place on the Elo list last October, but didn’t match that mark in the ATP rankings for another 10 months, after reaching the Montreal final.

Andreescu closed the gap faster than Medvedev did, needing a more typical six months to progress from Elo top-tenner to a single-digit WTA ranking. It may not take much longer before her Elo and WTA rankings converge at the top of both lists.

We no longer need Elo to tell us that Andreescu and Medvedev are likely to keep winning matches at the highest level. But having acknowledged the accuracy with which Elo glimpses the future, it’s worth looking at which players are likely to follow in their footsteps.

After the US Open, Elo’s boldest claim regards Matteo Berrettini, ranked sixth. The ATP computer sites him at 13th, and he only made one brief stop this summer inside the top 20. The Flushing semi-finalist has been inside the Elo top 10 since mid-June, and the algorithm currently puts him ahead of such better-established young players as Alexander Zverev and Stefanos Tsitsipas.

The women’s Elo list doesn’t feature any similar surprises in the top 10, but that hardly means it agrees with the WTA computer. Karolina Muchova, currently at a career-high WTA ranking of 43rd, is 23rd on the Elo table. Two veteran threats, Victoria Azarenka and Venus Williams, are also marooned outside the official top 40, but Elo sees them as 18th and 28th best on tour, respectively. In terms of predictiveness, quality is more important than quantity, so a limited schedule isn’t necessarily seen as a drawback. Elo is also optimistic about Sofia Kenin, rating her 13th, compared to her official WTA standing of 20th.

Half a year from now, I’d bet Berrettini’s official ranking is closer to 6 than to 13, and that Muchova’s position is closer to 23 than 43. It’s impossible to tell the future, but if we’re interested in looking ahead, Elo gives us a six-month head start on the official rankings. We’ll have to wait and see whether the rest of the women’s tour can keep Andreescu away from the top spot for that long.

Monkeying Around With Rafael Nadal’s 19 Grand Slams

The gap is closing. With his marathon victory last night in the US Open final over Daniil Medvedev, Rafael Nadal is up to 19 career major titles, second only to Roger Federer, who holds 20. Lurking in third place is Novak Djokovic, with 16, who was favored at Flushing Meadows this year, but retired due to injury in the fourth round.

Just two weeks ago, Djokovic seemed to be the biggest threat to Federer’s place atop the leaderboard. Now, with Nadal only one back and Djokovic dealing with another round of physical problems, Rafa has the momentum. Federer, now 38 years old, appears increasingly unlikely to pad his own total.

In an attempt to foresee the future of the grand slam leaderboard, I built a straightforward algorithm last month to predict future major titles. In the spirit of baseball’s “Marcel” projection system, it aims to be so simple that a monkey could do it. It uses the bare minimum of inputs: final-four performance at the last two years’ worth of slams, and age. It trades some optimization in favor of simplicity and ease of understanding. The result is pretty darn good. You can review the algorithm itself and look at how it would have performed in the past in my earlier article here.

Solve for RN = 19 + x

Before the US Open, the algorithm seemed tailor-made to aggravate as many fanbases as possible. It predicted that, over the next five years, Djokovic would win four more majors, Nadal two more, and Federer none, leaving the big three in a tie.

One more slam in the books, and the numbers have changed. Here is the revised forecast, reflecting both Nadal’s 19th slam and his rosier outlook after adding another title to his list of recent results:

Player          Slams  Forecast  Total  
Rafael Nadal       19       3.5   22.5  
Roger Federer      20       0.3   20.3  
Novak Djokovic     16       3.5   19.5

Rafa is in line to improve his total by at least three slams. By the time he’s done, perhaps he will have left Djokovic and Federer in the dust, and we’ll be speculating about whether he’ll catch Serena Williams … or Margaret Court.

More forecasts

My basic algorithm allows us to generate future slam forecasts for any player with at least one major semi-final in the last two years. Keep in mind that I’m not forecasting career slam totals–I’m looking ahead to only the next five years. For the big three, I’m assuming we don’t need to worry about 2025 and beyond.

We have current projections for 18 players:

Player                 Forecast  
Novak Djokovic              3.5  
Rafael Nadal                3.5  
Daniil Medvedev             0.8  
Dominic Thiem               0.7  
Stefanos Tsitsipas          0.6  
Matteo Berrettini           0.5  
Hyeon Chung                 0.4  
Lucas Pouille               0.3  
Kyle Edmund                 0.3  
Roger Federer               0.3  
Grigor Dimitrov             0.1  
Marco Cecchinato            0.1  
Marin Cilic                 0.0  
Juan Martin del Potro       0.0  
Roberto Bautista Agut       0.0  
Kevin Anderson              0.0  
Kei Nishikori               0.0  
John Isner                  0.0

Most of these guys have only a single recent semi-final to their name, and the only thing to separate them is their age. It seems logical to be more optimistic about the future slam performance of Stefanos Tsitsipas (age 21) than that of Roberto Bautista Agut (age 31), even though the algorithm sees their results so far–one semi-final appearance in the last 12 months–as identical.

Five years means 20 slams, and you might notice that the above table doesn’t get close to accounting for all of them. The projections add up to 10.8 majors, leaving plenty of room for players who haven’t even qualified for the list–Alexander Zverev and Felix Auger-Aliassime come to mind. At the 2024 US Open, we’re sure to look back at our late-2019 prognostications and laugh.

Federer will keep his spot at the top of the game’s most important leaderboard for at least four more months. Djokovic will probably be the top pick in Melbourne, so Roger could well enjoy nine more months as the only 20-slam man. But you won’t need an algorithm–even a simple one–to identify the favorite at Roland Garros next year. Organized men’s tennis lasted over a century without a 20-time major champion. In less than a year, we could have two.

GOAT Races: Forecasting Future Slams With a Monkey

After Novak Djokovic won his 16th career major at Wimbledon this year, more attention than ever focused on the all-time grand slam race. Roger Federer has 20, Rafael Nadal has 18, and Djokovic is–by far–the best player in the world on the surface of the next two slams. This is anybody’s ballgame.

Forecasting tennis is hard, and that’s just if you’re trying to pick the results of tomorrow’s matches. Players improve and regress seemingly at random, making it difficult to predict what the ranking table will look like only a few months from now. Fans love to speculate about which of the big three will, in the end, win the most slams, but there are an awful lot of unknowns to contend with.

One can imagine some way to construct a crystal ball to get these numbers in a rigorous way. Consider each player’s age, his likely career length, his chances of injury, his recent performance at each of the four slams, his current ranking, the quality of the field on each surface, and probably more, and maybe you could come up with some plausible numbers. Or… what if we skip most of that, and build the simplest model possible?

Enter the monkey

Baseball statheads are familiar with the Marcel projection system, named after a fictional monkey because it “uses as little intelligence as possible.” Just three years of results and an age adjustment. It isn’t perfect, and there are plenty of “obvious” improvements that it leaves on the table. But as in tennis, baseball stats are noisy. For most purposes, a “basic” forecasting system is as good as a complicated one, and over the years, Marcel has outperformed a lot of models that are considerably more complex.

Let’s apply primate logic to slam predictions. First, I’m going to slightly re-cast the question to something a bit more straightforward. Instead of forecasting “career” slam results, we’re going to focus on major titles over the next five years. (That should cover the big three, anyway.) And in keeping with Marcel, we’ll use just a few inputs: slam semi-finals, finals, and titles for the last three years, plus age. Actually, we’re going to lop off a bit of the monkey’s brain right away, because slam results from three years ago aren’t that predictive. So our list of inputs is even shorter: two years of slam semi-finals, finals, and titles, plus age.

The resulting model is pretty good! For players who have reached a major semi-final in any of the last eight slams, it predicts 40% of the variation in next-five-years slam titles. Without building the hyper-complex, optimal model, we don’t know exactly how good that is, but for a forecast that extends so far into the future, capturing almost half of the player-to-player variation in slam results sounds good to me. Think of all the things we don’t know about the slams in 2022, let alone 2024: who is still playing, who gets hurt, who has improved enough to contend, which prospects have come out of nowhere, and so on. Point being, the best model is going to miss a lot, so we shouldn’t set our standards too high.

Follow the monkey

The two-years-plus-age algorithm is so simple that you can literally do it on the back of an envelope. For any player, count his semi-final appearances (won or lost), final appearances (won or lost), and titles at the last four slams, then do the same for the previous four. Then note his age at the start of the next major. Start with zero points, then follow along:

add 15 points for each semi-final appearance in the last four slams
add 30 points for each final appearance in the last four slams
add 90 points for each title in the last four slams
add 6 points for each semi-final appearance in the previous four slams
add 12 points for each final appearance in the previous four slams
add 36 points for each title in the previous four slams
if the player is older than 27, subtract 8 points for each year he is older than 27
if the player is younger than 27, add 8 points for each year he is younger than 27
divide the sum by 100

That’s it! Let’s try Djokovic. In the last four majors, he’s won three titles and made one more semi-final. In the four before that, he won one title. He’ll enter the US Open at 32 years of age. Here goes:

+60 (15 points for each of his four semi-finals in the last four slams)
+90 (30 points for each of his three finals in the last four slams)
+270 (90 points for each of his three titles in the last four slams)
+6 (6 points for his 2017 Wimbledon semi-final)
+12 (12 points for his 2017 Wimbledon final)
+36 (36 points for his 2017 Wimbledon title)
-40 (Novak is 32, so we subtract 8 points for each of the 5 years he is older than 27)

Add it all up, and you get 434. Divide by 100, and we’re predicting 4.34 more slams for Novak.

Next-level GOAT trolling

I promise, I went about this project solely as a disinterested analyst. I just wanted to know how accurate a bare-bones long-term slam forecast could be. My goal was not to make you tear your hair out. But hey, you were probably going to lose your hair anyway.

Here is the number of slams that the model predicts for the big three between the 2019 US Open and 2024 Wimbledon:

Djokovic: 4.34
Nadal: 2.22
Federer: 0.26

You probably don’t need me to do the math for the next step, but you know I can’t not do it. Projected career totals:

Djokovic: 20.34
Federer: 20.26
Nadal: 20.22

Or, since we live in a world where you can’t win fractional majors:

Djokovic: 20
Federer: 20
Nadal: 20

Ha.

Back to the model

Djokovic’s forecast of 4.34 is quite high, in keeping with a player who has won three of the last four majors. For each year since 1971, I calculated a slam prediction for every player who had made a major semi-final in the previous two years–a total of more than 800 forecasts. Only 14 of those forecasts were higher than 4.34, and several of those belonged to the big three. Here are the top ten:

Year  Player         Age   Predicted  Actual     
2008  Roger Federer   26        6.38       5     
2007  Roger Federer   25        5.86       7     
2016  Novak Djokovic  28        5.20       6  *  
2005  Roger Federer   23        4.91      11     
2011  Rafael Nadal    24        4.89       5     
2006  Roger Federer   24        4.86      10     
2017  Novak Djokovic  29        4.79       4  *  
2012  Novak Djokovic  24        4.68       8     
1989  Mats Wilander   24        4.65       0     
1988  Ivan Lendl      27        4.56       2

* actual slam counts that could still increase

All of these predictions are based on data available at the beginning of the named year. So the top row, 2008 Federer, is the forecast for Federer’s 2008-12 title count, based on his 2006-07 performance and his age entering the 2008 Australian. Had the model existed back then, it would have guessed he’d win a half-dozen slams in that time period. He came close, winning five.

There will be plenty of noise at the extreme ends of any model like this. At the beginning of 2005, the algorithm pegged Federer to win “only” five of the next twenty majors. Instead, he won 11. I can’t imagine any data-based system would have been so optimistic as to guess double digits. On the flip side, the 1989 edition of the monkey would’ve been nearly as hopeful for Mats Wilander, who was coming off a three-slam campaign. Sadly for the Swede, a gang of youngsters overtook him and he never made another major final.

Let’s also take a look at the next 10 rosiest forecasts, plus the current guesstimate for Djokovic:

Year  Player          Age  Predicted  Actual     
2010  Roger Federer    28       4.48       2     
1981  Bjorn Borg       24       4.47       1     
1996  Pete Sampras     24       4.47       6     
1975  Jimmy Connors    22       4.45       2     
Curr  Novak Djokovic   32       4.34       0  *  
1980  Bjorn Borg       23       4.28       3     
2013  Novak Djokovic   25       4.24       7     
2009  Roger Federer    27       4.20       4     
1995  Pete Sampras     23       4.16       7     
2009  Rafael Nadal     22       4.12       8     
1979  Bjorn Borg       22       4.09       5

Plenty more noise here, with outcomes between 0 and 8 slams. Still, the average result of the 10 other predictions on this list is 4.5 slams, right in line with our forecast for Novak.

Missing slams…

The model expects that the big three will win around seven of the next twenty slams. You might reasonably wonder: What about the other thirteen?

The monkey only considers players with a slam semi-final in the last eight majors, so the forecasts shouldn’t add up to 20. There’s a chance that the champions in 2023 and 2024 aren’t yet on our radar, and many young names of interest to pundits these days, like Alexander Zverev, Felix Auger Aliassime, and Daniil Medvedev, haven’t yet reached the final four of a major. Here are the players for whom we can make predictions:

Player                 Predicted Slams  
Novak Djokovic                    4.34  
Rafael Nadal                      2.22  
Dominic Thiem                     0.71  
Stefanos Tsitsipas                0.63  
Hyeon Chung                       0.38  
Lucas Pouille                     0.31  
Kyle Edmund                       0.30  
Roger Federer                     0.26  
Juan Martin del Potro             0.19  
Marco Cecchinato                  0.06  
----------------                  ----  
TOTAL                             9.40

(The five other players with semi-final appearances since the 2017 US Open are forecast to win zero slams.)

Yeah, I know, Lucas Pouille and Hyeon Chung aren’t really better bets to win a slam than Federer is. But they are (relatively) young, and the model recognizes that many players who reach slam semi-finals early in their careers are able to build on that success.

More to the point, we’re leaving a lot of majors on the table. If the overall forecast is correct, that list of players will win fewer than half of the next 20 slams, leaving at least ten championships to players who have yet to win a major quarter-final.

…and age

Remember, I retro-forecasted every five-year period back to 1971-75. Over the 44 five-year spans starting each season between 1971 and 2014, the model typically predicted that the players it knew about–the ones who had reached slam semi-finals in the last two years–would win 13 of the next 20 slams. In fact, those on-the-radar players combined to win an average of 12 majors in the ensuing five-year spans.

Only in the last few years has the total number of predicted slams fallen below 10. The culprit is age: Recall that every forecast has an age adjustment, and we subtract 8 points (0.08 slams) for each year a player is older than 27. That’s a 0.4-slam penalty for both Djokovic and Nadal, and it’s 0.8 slams erased from Federer’s future tally. Thus, the model predicts that the big three are fading, and there aren’t many youngsters (like Pouille and Chung) on the list to compensate.

How you interpret these big three forecasts in light of the “missing” slams depends on a couple of factors:

Has the aging curve for superstars has changed? Is 30 the new 25; 32 the new 27?
Will the next few generations of players soon be good enough to topple the big three?

There’s plenty of evidence that the aging curve has changed, that we should expect more from 30-somethings these days than we did in the 1980s and 1990s. That would close much of the gap. Let’s say we set the new peak age at 31, four years later than the men’s Open Era average of 27. That would add 0.32 slams to every player’s forecast, possibly adding one more slam to each of the big three’s forecasted total. Overall, it would add a bit more than an additional three slams to the total of the the previous table, putting that number close to the historical average of 13.

Shifting the age adjustment doesn’t disentangle the big three, though, because it affects them all equally. It just means a three-way tie at 21 is a bit more likely than a three-way tie at 20.

The second question is the more important–and less predictable–one. It’s hard enough to know how well a single player will be competing in three, four, or five years. (Or, sometimes, tomorrow.) But even if we could puzzle out that problem, we’d be left with the still more difficult task of predicting the level of competition. Entering the 2003 season, the monkey would have opined that the then-current crop of stars–men who made slam semis in 2001 and 2002–would account for a combined 13 majors between 2003 and 2007. That included 2.5 for Lleyton Hewitt, plus one apiece for Thomas Johansson, Albert Costa, Pete Sampras, Marat Safin, David Nalbandian, and Juan Carlos Ferrero. Those seven men won only two. The entire group of 20 players who merited forecasts entering the 2003 Australian Open won only three.

We’ll probably never establish exactly how strong that group was in comparison with other eras. What we know for sure is that none of those men were as good as Federer in 2003-05, and by the end of the five-year span, they’d been shunted aside by Nadal as well. (Only Nalbandian ranked in the 2007 year-end top ten.) The generation of Zverev/Tsitsipas/Auger-Aliassime/etc won’t be as good as peak Big Four, but the course of the next 20 slams will depend a lot more on those players that it will on the (relatively) more predictable career trajectories of Djokovic, Federer, and Nadal.

So we’re left with a stack of known unknowns and error bars wider than a shanked Federer backhand. But based on what we do know, the top of the all-time slam leaderboard is going to get even more crowded. At least, that’s what the monkey says.

New Feature: Forecasting the Next Major

I’ve added a pair of new pages to Tennis Abstract, both of which will be updated weekly:

I know many of you are avid followers of the ATP and WTA forecasts accessible each week from the Tennis Abstract front page. We’re still several weeks from the US Open, but it’s interesting to see how the men’s and women’s fields are shaping up for that tournament, as well.

Each week, I’ll generate an updated report by constructing a hypothetical 128-player field, consisting of the top 128 players in the official rankings. Of course, that isn’t exactly what the field will look like, but it would be a fool’s errand to predict qualifiers at this point. And for the purposes of simulating the top of the draw, where most of the interest in, the specific players making up the last 20 or 30 names in the bracket don’t have too much of an effect.

Then we run 100,000 simulations of the 128-player field, using the most current surface-weighted Elo ratings. It’s the same way that I run my live forecasts. The only difference is that some of the player ratings will change between now and then. The US Open forecast a month from now will probably be better than anything we come up with today, but especially for the top names in each field, we have a pretty good sense of their relative strength at this point.

The early men’s US Open forecast shows a field that is just about as lopsided as you’d expect. Novak Djokovic is the favorite, at about 35%, which is often the degree to which my forecasts favor the best man in a hard-court major field. Roger Federer is a close second, at 29%, with Rafael Nadal coming third, at 18%. Dominic Thiem and Kei Nishikori are the only other men above 2%, and only five more–including Juan Martin del Potro, who is injured and will not play–with better than a 1-in-100 chance.

The women’s forecast looks very different. Ashleigh Barty is a strong favorite, with a 25% chance of claiming the title, despite her early exit at Wimbledon. Simona Halep is next at 14%, and after Karolina Pliskova, Petra Kvitova, and Elina Svitolina, defending champ Naomi Osaka comes in 6th with a 1-in-20 shot. 12 women have a 2% or better chance of winning, and seven more are at 1% or above, including the probably-unseeded Victoria Azarenka.

The early forecasts also give us another way of keeping tabs on probable seedings, as players make their final attempts to break into the top 32 before the bracket is set. On the women’s side, Maria Sakkari looks to be the least deserving of protected draw placement, with only a 58% chance of advancing to the second round and a mere 32% shot of living up to her seed and reaching the final 32.

Still, those numbers are better than the ones facing Laslo Djere, a player who may hang on to a seed on the strength of some solid clay-court performances. He has only a one-in-three chance of winning his first match, and less than a 10% shot of reaching the third round. For both Sakkari and Djere, the seeds are among the few advantages they have. If they fall out of the top 32 before the US Open draw ceremony, their chances will fall even further.

I hope you enjoy these new reports. I’ll update them every Monday, and when the US Open is behind us, we can use them to get a head start on the road to Melbourne.