Podcast Episode 80: Martin Ingram on Predicting Match Outcomes, Bayesian Style

Episode 80 of the Tennis Abstract Podcast features Martin Ingram (@xenophar), author of a recent academic paper, A point-based Bayesian hierarchical model to predict the outcome of tennis matches.

If you’re interested in learning more about what goes into a forecasting system, this one’s for you. We start with a discussion of the advantages as well as the limitations of the common “iid” assumption, that points are independent and identically distributed. Martin’s model, which relies on the iid assumption, incorporates each player’s serve and return skill, in addition to surface preferences and tournament-specific characteristics. In our conversation, he explains how it works, and why this sort of model is able to provide reasonable forecasts even with limited data.

That’s just the beginning. Martin suggests several possible additions to his model, and we close by considering the importance of domain knowledge in this sort of statistical work.

Thanks for listening!

(Note: this week’s episode is about 65 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

An Introduction to Tennis Elo

Elo is a superior rating system to the ranking formulas used by the ATP and WTA. If you’ve spent much time reading this blog or listening to the podcast, you’ve probably heard me say that many times. But unless you’ve been exposed to Elo before, or done some research on your own, you might think of it as a sort of “magic” system. It’s worth digging in to understand better how it works.

The basic algorithm

The principle behind any Elo system is that each player’s rating is an estimate of their strength, and each match (or tournament) allows us to update that estimate. If a player wins, her rating goes up; if she loses, it goes down.

Where Elo excels is in determining the amount by which a rating should increase or decrease. There are two main variables that are taken into account: How many matches are already in the system (that is, how much confidence we have in the pre-match rating), and the quality of the opponent.

If you think about it for a moment, you’ll see that these two variables are a good approximation of how we already think about player strength. The more we already know about a player, the less we will change our opinion based on one match. Novak Djokovic’s round-robin loss to Dominic Thiem in London was a surprise, but only the most apocalyptic Djokovic fans saw the result as a disaster that should substantially change our estimate of his playing ability. Similarly, we adjust our opinion based on opponent quality. A loss to Thiem is disappointing, but a loss to, say, Marco Cecchinato is more concerning. The Elo system incorporates those natural intuitions.

Elo rating ranges

Traditionally, a player is given an Elo rating of 1500 when he enters the system–before any results come in. That number is completely arbitrary. All that matters is the difference between player ratings, so if we started each competitor with 0, 100, or 888, the end result of those differences would remain the same.

When I began calculating Elo ratings, I kept with tradition and started every player with 1500. Since then, I’ve expanded my view to Challengers (and the women’s ITF equivalent) and tour-level qualifying. If we started each new player at those levels with 1500 points, it re-scales the entire system, which would have been confusing. Instead, I replaced 1500 with a number in the low 1200s (it depends a bit on tournament level and gender) so that the ratings would remain approximately the same.

At the moment, the ATP and WTA top-ranked players are Rafael Nadal and Ashleigh Barty, at 2203 and 2123, respectively. The best players are often in this range, and the very best often approach 2500. According to the most recent version of my algorithm, Djokovic’s peak was 2470, and Serena Williams’s best was 2473.

The 2000-point mark is a good rule of thumb to separate the elites from the rest. At the moment, six men and seven women have ratings that high. 16 men and 18 women have Elo ratings of at least 1900, and a rating of 1800 is roughly equivalent to a place in the top 50.

Era comparisons and Elo inflation

Once we attach a single peak rating to every player, it’s only natural to start comparing across eras. While it’s always fun to do so, I’m not sure any rating system allows for useful cross-era comparisons in tennis. Elo doesn’t, either.

What you can do with Elo is compare how each player fared against her competition. In 1990, Helena Sukova achieved a rating of 2123–exactly the same as Barty’s today. That doesn’t mean that Sukova then was as good as Barty is now. But it does mean that their performance relative to their peers was similar. The second tier of players was considerably weaker thirty years ago, so in a sense it was easier to achieve such a rating. At the time, Sukova’s rating was only good for 11th place, far behind Steffi Graf’s 2600.

Thus, Elo doesn’t allow you rank players across eras unless you are confident that the level of competition was similar–or unless you have some other way of dealing with that issue, a minefield that many researchers have tried to cross, with little success.

A related issue is Elo inflation or deflation, which can also complicate cross-era comparisons. Every time a match is played, the winner and loser effectively “trade” some of their points, so the total number of Elo rating points in the system doesn’t change. However, every time a new player enters the system, the total number of points increases. And whenever a player retires, the total number of points decreases.

It would be nice if additions and subtractions canceled each other out, but for many competitions that use Elo, they don’t. Additions tend to outweigh subtractions, so Elo ratings increase over time. That doesn’t appear to be the case with my tennis ratings, at least in part because of the penalty I’ve introduced for injury absences, but it does serve as a reminder that the number of points in the system changes over time, for reasons unrelated to the strength of the top players. (I’ll have more to say about the absence penalty below.)

Elo predictions

Elo gives us a rating for every player, and we’re getting a sense of what we can and can’t do with them.

One of the main purposes of any rating system is to predict the outcome of matches–something that Elo does better than most others, including the ATP and WTA rankings. The only input necessary to make a prediction is the difference between two players’ ratings, which you can then plug into the following formula:

1 – (1 / (1 + (10^((difference) / 400))))

If we wanted to forecast a rematch of the last match of the Davis Cup Finals, we would take the Elo ratings of Nadal and Denis Shapovalov (2203 and 1947), find the difference (256), and plug it into the formula, for a result of 81.4%, Nadal’s chance of winning. If we used the negative difference (-256), we’d get 18.6%, Shapovalov’s odds of scoring the upset.

My version of tennis Elo is based on the most common match format, best-of-three matches. In a best-of-five match, the favorite has a better chance of winning. The math for converting best-of-three to best-of-five is a bit complicated, but for those interested, I’ve posted some code. The point is that an adjustment must be made. If the Nadal-Shapovalov rematch happens at the best-of-five Australian Open, Rafa’s 81.4% edge will increase to 86.7%.

Adjusting Elo for surface

For most sports, we could stop here. A match is a match, with only minor variations. In tennis, though, ratings and predictions should vary quite a bit based on surface.

My solution is a bit complicated. For each player, I maintain four separate Elo ratings: overall, hard court only, clay court only, and grass court only. I don’t differentiate between outdoor and indoor hard. For instance, Thiem’s ratings are 2066 overall, 1942 on hard, 2031 on clay, and 1602 on grass. (Surface ratings tend to be lower: Thiem’s clay rating is third-best on tour, miles ahead of everyone except for Nadal and Djokovic.)

These single-surface ratings tell us how we would rank players if we simply threw away results on every other surface. That’s not realistic, though. Single-surface ratings aren’t great at predicting match results. A better solution is to take a 50/50 blend of single-surface and overall ratings. If we wanted to predict Thiem’s chances in a clay-court match, we’d use a half-and-half mix of his 2066 overall rating and his 2031 clay-court rating. My weekly Elo reports show the single-surface ratings as “HardRaw” (and so on), and the blended ratings as “hElo,” “cElo,” and “gElo.”

There is no natural law that dictates a 50/50 blend. Every adjustment I’ve made to the basic Elo algorithm is determined solely by what works. (More on that below.) Initially, I suspected that a blend between single-surface and overall ratings would be appropriate, because a player’s success on one surface has some correlation with his success on others. I expected the blend to be different for each surface–perhaps using a higher percentage of the overall rating for grass, because there are fewer matches on the surface. In the end, my testing showed that 50/50 worked for each surface.

Non-adjustments

Ask some tennis fans which tournaments matches matter more–for rankings, for GOAT debates, whatever–and you can find yourself with a long, detailed list of what factors determine greatness. Maybe slams are more important than masters and premiers, though those are less important than tour finals and the Olympics, and of course finals are key, plus head-to-heads against certain players… you get the idea.

Elo provides for such adjustments. A coefficient usually referred to as the “k factor” allows us to give greater weight to certain matches. It’s common in Elo ratings for other sports, for example by using a higher k factor for postseason than regular season games. However, I’ve tested all sorts of different k factors for the likely types of “important” matches, and I’ve yet to find a tweak to the system that consistently improves its ability to predict match outcomes.

The absence penalty

There’s one exception. When players miss substantial amounts of time, I reduce their rating, and then increase the k factor for several matches after their return. I’ve explained more of the details in a previous post.

These steps are a logical extension of the Elo framework, especially when you consider our usual mental adjustments when a player misses time. If a player is injured for a few months, we never know quite what to expect when she returns. Maybe she’s as strong as ever; maybe she’s still a step slow. Perhaps she’ll return to normal quickly; she might never fully return to form. An extended absence raises a lot of questions. An injury player rarely returns in better form than when she left, while many players are worse upon return, giving us an average post-injury performance level that is worse than before the absence.

Therefore, when a player first returns, our estimate must be that she is worse. However, a few strong early results should be weighted more heavily–hence the higher k factor. The k factor reflects the fact that, immediately after an absence, we aren’t as confident as usual in our estimate.

The algorithm gets complicated, but the logic is simple. It’s basically just an attempt to work out a rigorous version of statements like, “I don’t know how well he’ll play when he comes back, but I’ll be watching closely.”

One side benefit of the absence penalty is that it counteracts Elo’s natural tendency toward ratings inflation. While more players enter the system than leave it, adding to the total number of available points, the penalty removes some points without re-allocating them to other players.

Validating Elo and adjustments

I’ve mentioned “testing” a few times, and I started this article with a claim that Elo is superior to the official ranking systems. What does that mean, and how do we know?

The simplest way to compare rating systems is a metric called “accuracy,” which counts correct predictions. There were 50 singles matches at the Davis Cup finals, and Elo picked the winner in 36 of them, for an accuracy rating of 72%. The ATP rankings picked the winner (in the sense that the higher-ranked player won the match) in 30 of them, for an accuracy rating of 60%. In this tiny experiment, Elo trounced the official rankings. Elo is also considerably better over the course of the entire season.

A better metric for this purpose is Brier score, which takes into account the confidence of each forecast. We saw earlier that Elo gives Nadal an 81.4% chance of beating Shapovalov. If Nadal ends up winning, 81.4% is a “better” forecast than, say, 65%, but it’s a “worse” forecast than 90%. Brier score takes the squared distance between the forecast (81.4%) and the result (0% or 100%, depending on the winner), and averages those numbers for all forecasted matches. It rewards aggressive forecasts that prove correct, but because it uses squared distance, it severely punishes predictions that are aggressive but wrong.

A more intuitive way to think about what Brier score is getting at is to imagine that Nadal and Shapovalov play 100 matches in a row. (Or, more accurately but less intuitively, imagine that 100 identical Nadals play simultaneous matches against 100 identical Shapovalovs.) A forecast of 81.4% means that we would expect Nadal to win 81 or 82 or those matches. If Nadal ends up winning 90, the forecast wasn’t Rafa-friendly enough. We’ll never get 100 simultaneous matches like this, but we do have thousands of individual matches, many of which share the same predictions, like a 60% chance of the favorite winning. Brier score aggregates all of those prediction-and-result pairs and spits out a number to tell us how we’re doing.

It’s tough to forecast the result of individual tennis matches. Any system, no matter how sophisticated, is going to be wrong an awful lot of the time. In many cases, the “correct” forecast is barely better than no forecast at all, if the evidence suggests that the competitors are equally matched. Thus, “accuracy” is of limited use–it’s more important to have the right amount of confidence than to simply pick winners.

All of this is to say: My Elo ratings have a much lower (better) Brier score than predictions derived from ATP and WTA rankings. Elo forecasts aren’t quite as good as betting odds, or else I’d be spending more time wagering and less time writing about rating systems.

Brier score is also the measure that tells us whether a certain adjustment–such as surface blends, injury absences, or tournament type–constitutes an improvement to the system. Assessing an injury penalty lowers the Brier score of the overall set of Elo forecasts, so we keep it. Decreasing the k factor for first-round matches has no effect, so we skip it.

Additional resources

My current Elo ratings: ATP | WTA

Extending Elo to doubles

… and mixed doubles

Code for tennis Elo (in R, not written by me)

A good introduction to Brier score

* * *

Subscribe to the blog to receive each new post by email:

 

How to (Partly) Fix the Davis Cup Finals

This is a guest post by Sébastien Rannaud.

There was plenty to criticize about the new-look Davis Cup Finals. Fans and pundits alike took aim at the atmosphere, the one-sided home support for Spain, the horrendous app and website, the lack of TV coverage, and the sleep-defying scheduling.

But perhaps the biggest controversy concerned something more arcane: Canada’s walkover in a dead doubles rubber against the United States. Why? The organizers gave the United States a double bagel win (6-0, 6-0) which padded their percentages in the Group F standings, thus increasing its chances of qualifying for the knockout stage as one of the two “best runners-up Nations” in round robin play.

To determine how the runners-up from each group are ranked against each other, the following order applies:

  1. Highest percentage of matches won
  2. Highest percentage of sets won
  3. Highest percentage of games won
  4. The Nations’ positions on the Davis Cup Rankings of the Monday of the week of the Finals

As you can see, that double bagel win for the US padded their stats in criteria #1 through #3.

Other tournaments, such as the ATP and WTA Finals use this criteria, but they don’t have walkovers, because they rely on substitute players in case of injury. The Davis Cup Finals is a different beast altogether, because of the “dead rubber” in round robin play. There are no incentives, sporting or financial, to play and win that match if you’ve already clinched your place in the quarter-finals, as Canada did before its doubles match against the US.

Odd constraints

This convoluted format is mainly due to two major factors. First, the Davis Cup Finals is comprised of 18 nations. Why use such a random number, when the knockout stage only involves eight nations? The only possible solution is to give wildcards to runner-up teams to complete the eight-team draw, hence the complicated tie-breaking procedure.

The second factor is that the tournament is played over a seven-day span. The organizers (Kosmos Group and ITF) would rather have a two-week timeslot for the event, but for now, seven days is the most they could get considering the not-so-ideal timeslot. If it is necessary to have three rounds in the knockout stage (quarter-finals, semi-finals, final), then you’re left with very limited round robin play, which explains the tiny three-team groups, playing only two ties each.

Such a small number of matches ensures that the tie breaking rules will come into play, making every match–including every doubles rubber–extremely important.  Therefore, when a team decides to forfeit its doubles match, rules need to be in place to ensure that the team benefitting from the walkover doesn’t have an unfair advantage over second-place teams from other groups.

Journalists, pundits and Twitter users have critiqued this major flaw in the format, but few have considered possible solutions. Let’s consider some of the adjustments that could be made and if they could work within the tournament’s constraints. 

The first solutions: Dead rubber tweaks

Let’s assume that the organizers would allow all dead rubbers to be skipped. In some cases, fans would buy tickets for only two matches, not three. The organizers would have to adjust the ticket prices somehow to reflect that likelihood, if they want to show fairness and respect to the ticket buyers.

Scenario A:

  • Same as current format (18 teams, 3-round knockout stage)
  • Dead rubber policy: walkover from clinching team. Winning team gets 1 point, but match does NOT count towards % of matches won, % of sets won, and % of games won

The team getting stomped on in the first two singles matches would not get the opportunity in the doubles match to make up for its bad percentages in the prior singles matches, while the winning team would be rewarded with keeping its near-perfect percentages. It is a system based on results, so it’d be difficult for a losing team to argue that it’s unfair to them, especially considering the fact that it gets to rest and go to bed earlier, on the eve of its do-or-die tie the next day against the other nation in the group.

Scenario B:

  • Same as current format (18 teams, 3-round knockout stage)
  • Dead rubber policy: walkover from clinching team. Winning team gets 1 point, but with a score of 6-4, 4-6, 6-4 counting towards % of matches won, % of sets won, and % of games won

Let’s say the two singles matches were lost in straight sets. The team benefitting from the walkover go from 0% of sets won to 29% of sets won. That seems reasonable and much less extreme that a 6-0, 6-0 score.

Scenario C:

  • Same as current format (18 teams, 3-round knockout stage)
  • Dead rubber policy: doubles match must be played. Bonus prize money ($100,000) will be given to the two players winning the match

We can assume that a clinching team would play its “second tier” players for the doubles rubber. These players would have a six-figure incentive to win the rubber–even at 4:00 AM–a serious motivation for doubles players who compete for smaller prize pools than singles players throughout the year. Because there would only be just a few dead rubbers each year, it wouldn’t be that much more costly for tournament organizers.

More solutions: 16 teams

Scenario D:

  • Round robin: 16 teams split into 4 groups; 3 ties played each
  • 8 teams qualify for knockout stage of 3 rounds (quarters, semis, final)
  • Dead rubber policy: winning team gets 1 point, but match does NOT count towards % of matches won, % of sets won, and % of games won

By playing three ties in the round robin stage, the dead rubber would likely only happen in the third tie, meaning teams would have already played between six and eight tennis matches (singles and doubles) before the dead rubber occurs. The weight of this forfeited match would be no more than one-seventh (14.2%) of the total matches played in the round robin stage. That’s less important than in the current round robin format of two ties, in which the forfeited match counts for one-sixth (16.7%). Moreover, by having groups of four nations, all four teams could play their ties at the same time, meaning that some teams would start the doubles rubber without knowing whether they had yet clinched their quarter-final spot.

Unfortunately, this scenario simply cannot work within the existing seven-day limit, because it would result in both finalists playing a total of six ties over seven days (or between 12 and 18 tennis matches). That is excessively grueling, especially for countries such as Canada and Russia, who essentially competed this year with two-man teams. That is simply not going to fly, especially for elites such as Nadal and Djokovic, who could have played up to five matches the previous week in the ATP Finals.

Scenario E:

  • Round robin: 16 teams split into 4 groups; 3 ties played each
  • 4 teams qualify for knockout stage of 2 rounds (semis, final)
  • Dead rubber policy: winning team gets 1 point, but match does NOT count towards % of matches won, % of sets won, and % of games won

By shortening the knockout stage, we get back to the much more palatable number of five ties in seven days. The upside is that the dead doubles rubber would be of even less importance that the prior scenario, since only the group winning teams would qualify for the knockout stage. The current tiebreaking procedure wouldn’t even matter since the group winning team would likely qualify on ties won and matches won alone.

Tradeoffs

However, solving one issue just raises others.

First, knockout ties are much more compelling for fans than round robin ties. In some cases, the last round robin tie has almost the same “do or die” quality as a quarter-finals tie, but on average, there is less drama. Which leads us to the second issue: teams ranked third or fourth in the group prior to the final round robin tie might already be mathematically eliminated from qualifying for the knockout stage. You could even end up with the third-place team and the fourth-place team playing each other in the last, meaningless “dead tie”–a new term for the tennis glossary that we can only hope never needs to be used. 

While a dead tie would be unlikely, the downside risk is enormous. It’s difficult to imagine how depressing this six-hour tie would feel in the stadium, especially in a neutral venue for both teams with few fans on-site. The ITF/Kosmos Group would be forced to assume that these teams would be professional enough to play the tie, at least in respect of the few hundred fans who show up. But even an 84-shot rally couldn’t salvage such a spectacle.

The only way to solve this would be to add incentives for teams stuck in these dead ties. In a 16-team tournament, you could give each runner-up team a direct entry for the following year’s Davis Cup Finals (in addition to the four group winning teams). Teams battling for third place in the group would be rewarded with the home court advantage in the March qualifying tie. Teams finishing last in the group would get the “away” tie in March or fall to a lower tier in the Davis Cup zone groups. With those incentives, the doubles rubber would usually retain some interest.

For the ITF and the Kosmos Group, cutting back from 18 to 16 teams would be much more complicated than tweaking the tiebreaker rules. With all the problems of this year’s Finals, the dead rubber policy probably isn’t on top of anyone’s to-do list. However, if they stay idle, more teams like Canada and Australia will exploit the loophole, and some day, a team will advance to the quarter-finals because of that double bagel win, leading to a public relations nightmare for the event organizers–not to mention a gut punch for the team that goes home early. 

Sport is only compelling so long as fans perceive an underlying level of fairness. The Davis Cup Finals narrowly skirted disaster this year, calling the format into question for attentive followers. Let’s hope that in the next 12 months, they figure out how to fix it.

Sébastien Rannaud is a pension actuary living in Montreal, Canada. You can find him on Twitter at @morggo.

Podcast Episode 79: Paul Timmons on the Broken Structure of Pro Tennis

Episode 79 of the Tennis Abstract Podcast features Paul Timmons (@PaulT_Tennis), author of the My Tennis Adventures blog, about many of the problems facing professional tennis–and some obvious fixes that no one seems interested in making.

We start with the failures of the ITF to provide a logical structure for up-and-coming players, including gender inequality that makes it much more difficult for women to make a living at the equivalent of the ATP Challenger level. We discuss how some national federations are centralizing when they should be localizing, and how match-fixing is inevitable when live data provides so much of the sport’s revenue. We also touch on several up-and-coming players, the likely next men’s major winner, and why the Davis Cup Finals–for all its flaws–is superior to the upcoming ATP Cup.

Thanks for listening!

(Note: this week’s episode is about 65 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

The Likelihood of Live Doubles Rubber in the New Davis Cup

In the new Davis Cup Finals format, each country-versus-country tie consists of three matches: two singles and one doubles. The singles rubbers are played first, so it’s possible that the doubles rubber will be “dead”–irrelevant to the result of the tie.

The Davis Cup Finals organizers aimed to make the doubles matter more, by using tiebreakers (based on sets and games won) to determine which sides advance from the round-robin phase to the knock-out rounds. It may have helped keep dead doubles rubbers interesting at first, but by the final days of the round-robin stage, teams that automatically qualified for the knock-out rounds had no remaining incentive to play doubles. Canada gave the United States a walkover, and Australia retired after one game. This was probably inevitable, but it isn’t ideal. Fans would presumably prefer to watch more tennis, and unfinished matches could wreak havoc with the tiebreaker system.

There are a lot of possible ways to restructure the event–so many that I’m not going to explore that topic today. Since dead doubles rubbers are inevitable, I’d instead like to look at how often we should expect them to occur and, given that they will occur, whether that truly sidelines doubles in comparison with singles.

Live doubles

This topic was prompted by a question ahead of this week’s podcast:

The most extreme way of handling dead doubles rubbers is simply not to play them. If we went that route, how many doubles matches would we see?

At the Davis Cup Finals last week, there were 25 ties: 18 in the round-robin stage, and 7 knock-out ties. 12 of the 25 featured a live doubles rubber: 7 of the 18 round-robin ties, and 5 of the 7 knock-outs. Using Luke’s proposed methodology, that’s roughly what we’d expect. The average tie (across all stages) had a 43% chance of reaching a deciding doubles rubber, suggesting that 11 doubles matches would matter.

Here is a list of the 25 ties, along with the probability that the two sides would split the singles rubbers. I’ve also shown whether the doubles rubber turned out to be necessary. Elo ratings didn’t do a very good job predicting which ties would require a doubles decider, even though they do give us a good estimate of how often the doubles will make the difference.

Tie                  Decider Odds  Decider Actual  
Semi: GBR vs ESP            56.2%             YES  
Quarter: SRB vs RUS         54.3%             YES  
Semi: RUS vs CAN            53.3%             YES  
RR: FRA vs SRB              52.5%              NO  
RR: ARG vs GER              51.6%              NO  
RR: USA vs CAN              51.4%              NO  
RR: ITA vs CAN              50.0%              NO  
Quarter: GBR vs GER         50.0%              NO  
RR: GBR vs KAZ              49.8%             YES  
RR: ESP vs RUS              49.4%             YES  
Quarter: AUS vs CAN         49.4%             YES  
RR: USA vs ITA              48.7%             YES  
RR: BEL vs AUS              46.1%              NO  
RR: KAZ vs NED              46.0%             YES  
RR: CRO vs RUS              45.7%              NO  
RR: GER vs CHI              44.2%             YES  
RR: ARG vs CHI              43.6%              NO  
RR: FRA vs JPN              43.4%             YES  
Final: CAN vs ESP           40.8%              NO  
RR: GBR vs NED              37.5%             YES  
RR: BEL vs COL              36.2%              NO  
Quarter: ARG vs ESP         34.6%             YES  
RR: SRB vs JPN              26.1%              NO  
RR: AUS vs COL              10.4%              NO  
RR: CRO vs ESP               7.3%              NO

Only a few ties were near-guarantees of a singles sweep. Even with a fairly deep 18-team draw, most countries were able to bring two solid singles players, while few sides featured more than one singles elite.

A decade of context

This wasn’t just a fluke. I went through all World Group ties (not including the Playoff round) from 2010-18, and identified the two best singles players who appeared on court for each side. Using their Elo ratings at the time of the contest for the new best-of-three-sets format, I estimated how often we would get a deciding doubles rubber.

Across those 135 ties, the average likelihood of a doubles decider was 41%, only a bit lower than the observed rate this year. Barring some radical shift in the geography of global tennis, that gives us a pretty good idea of how frequently we should expect to see a two-match singles sweep in the new Davis Cup format.

How much does doubles matter?

When doubles matches are live, they are particularly important. Each singles rubber has a great deal of influence on each side’s chances of winning the three-match tie, but once the doubles rubber is in play, it has all the influence.

Think of this in terms of leverage, the concept I usually use for in-match shifts from one point or game to the next. Imagine two identical sides, and consider their chances of winning at each step of the process. Each side has a 50% chance of winning each rubber, which means:

  • Each side has a 50% chance of winning the tie.
  • Whichever side wins the first rubber will have a 75% chance of winning the tie.
  • If the two sides split the singles rubbers, each side will once again have a 50% chance of winning the tie.

Now consider the leverage of each match from the perspective of the first side:

  • If they win the first singles rubber, their chances of winning the tie improve to 75%. Otherwise, they fall to 25%. That’s a leverage value of 75% – 25% = 50%.
  • Assume they win the first singles rubber. If they win the second, they win the tie–a probability of 100%. If they lose, it falls to 50%. Again, that’s a leverage value of 100% – 50% = 50%. (If they lose the first rubber, the math is the same, just with probabilities of 50% and 0% instead of 100% and 50%.)
  • If there is a deciding doubles rubber, the pre-match probability of winning the tie is 50%. Win the doubles, and the likelihood increases to 100%; lose it, and the probability is 0%. That’s a leverage value of 100% – 0% = 100%.

Maybe you think this is excessively formal and long-winded, and you might be right. The point is, given two equal sides, the doubles is twice as important. Plenty of other sports have similar features in which certain players appear infrequently, but at key moments. Consider baseball closers, who don’t pitch in every game, only appearing late in tight games. Or NFL kickers, who only take part in a handful of plays each game, but have the potential to score on many of them.

Theory and reality

In the sample framework I’ve just laid out, the doubles rubber will be live exactly 50% of the time, and it is twice as important as each singles rubber. That isn’t exactly how it works out in real life, since the doubles rubber is only decisive a little more than 40% of the time.

Still, when the doubles rubber matters, it is always make-or-break–or, in my terms above, it has a leverage value of 100%.

I’m happy to leave dead doubles rubbers unplayed. Doubles specialists might be unhappy with such a decision, and I fear the wrath of Davis Cup traditionalists. However, this way of thinking about what’s at stake might soften the blow. In a 16- to 18-team Davis Cup structure, the teams are typically balanced enough that the doubles rubber is necessary almost half the time. And when it is, the oft-unsung doubles specialists get to play a match that is–literally!–twice as important as each ratings-grabbing singles rubber.

The Speed of Every Surface, 2019 Edition

Fans are constantly talking about surface speed … I have written a lot about surface speed … yet somehow, I haven’t published complete surface speed numbers for three years. Time to remedy that.

If you’re interested in the long-form explanation of how these numbers work, what their limitations are, and so on, check out my post from three years ago. I’ll give a brief overview here, as well:

I rate the playing speed of every ATP surface using ace rate as a proxy for surface characteristics. Ace rate doesn’t tell the whole story, of course, but as you’ll see, it’s a pretty good first- or second-order approximation. For each tournament, I look at the ace rates in every match, and control for the servers and returners in those matches. (The ace rate for every John Isner match will be high, but that doesn’t necessarily mean the surface is fast.) I say “playing speed” because ace rate depends on a wide range of variables (heat, humidity, balls, etc), so it reflects how the court “plays”–not anything inherent about the physical makeup of the surface itself.

The main advantages of this approach are that it is simple to understand (more aces = higher rating!), and that we can calculate it with limited information–data that is available for ATP matches back to the early 1990s. Court Pace Index and other Hawkeye-based metrics surely have a lot more to add, but they require much more sophisticated tools–tools that federations and tours aren’t about to share with lowly fans like us.

A tour-average surface rates 1.0. The usual range for tour events is between 0.50 (slow clay) and 1.50 (fast hard or grass). The following table shows the 2017-19 speed ratings for all tour events on the 2019 calendar, including the Davis Cup Finals:

Tournament        Surface  2019 Ace%  2019  2018  2017  
Chengdu              Hard      14.8%  1.57  1.05  1.16  
Antalya             Grass      14.6%  1.47  1.25  1.74  
Tour Finals          Hard      11.7%  1.31  1.12  0.75  
Marseille            Hard      11.7%  1.29  1.21  1.34  
Newport             Grass      12.7%  1.27  0.87  0.76  
Australian Open      Hard      12.9%  1.27  1.16  1.14  
Brisbane             Hard      13.3%  1.26  1.35  0.99  
Atlanta              Hard      14.3%  1.25  1.01  0.86  
Shanghai             Hard      13.0%  1.24  1.17  1.53  
Sao Paulo            Clay       9.8%  1.24  0.89  0.92  
Halle               Grass      12.8%  1.23  1.16  1.18  
Stuttgart           Grass      14.5%  1.23  1.42  1.27  
Sofia                Hard      11.1%  1.21  1.14  1.33  
Antwerp              Hard      11.2%  1.21  1.25  1.06  
Davis Cup Finals     Hard      11.9%  1.20              
Metz                 Hard      13.5%  1.20  1.51  1.34  
Paris Bercy          Hard      11.9%  1.19  1.06  1.03  
Montpellier          Hard      13.4%  1.17  1.13  1.11  
Vienna               Hard      11.4%  1.16  1.16  0.98  
New York             Hard      17.0%  1.16  1.05        
                                                        
Tournament        Surface  2019 Ace%  2019  2018  2017  
Winston Salem        Hard      12.1%  1.15  1.01  1.07  
Basel                Hard      14.2%  1.14  1.03  0.77  
Beijing              Hard      11.6%  1.12  1.03  0.91  
Washington           Hard      15.5%  1.11  0.99  1.11  
Moscow               Hard      13.5%  1.11  1.21  1.45  
Delray Beach         Hard      13.9%  1.10  0.98  0.97  
Doha                 Hard      10.0%  1.10  0.88  1.02  
St. Petersburg       Hard       8.4%  1.09  1.13  0.80  
Stockholm            Hard      11.2%  1.08  1.03  1.05  
Tokyo                Hard      11.6%  1.08  1.34  1.18  
London              Grass      12.8%  1.07  1.25  1.20  
Auckland             Hard      10.7%  1.06  1.17  1.11  
Pune                 Hard      14.8%  1.05  0.99        
Cincinnati           Hard      11.6%  1.04  0.98  1.22  
Canada               Hard      10.8%  1.03  1.17  0.97  
Dubai                Hard       8.4%  1.02  1.04  0.91  
Eastbourne          Grass      13.2%  0.99  0.94  1.00  
Wimbledon           Grass      10.5%  0.99  1.14  1.03  
Sydney               Hard       9.3%  0.98  1.25  1.10  
Zhuhai               Hard       6.9%  0.97              
                                                        
Tournament        Surface  2019 Ace%  2019  2018  2017  
Marrakech            Clay       8.4%  0.97  0.62  0.77  
US Open              Hard      10.2%  0.97  0.98  0.96  
s'Hertogenbosch     Grass      10.2%  0.95  0.99  0.89  
Cordoba              Clay       6.9%  0.94              
Rotterdam            Hard       8.0%  0.90  1.13  1.09  
Lyon                 Clay       9.9%  0.90  0.89  0.85  
Gstaad               Clay       5.6%  0.88  1.16  0.92  
Acapulco             Hard      11.1%  0.86  1.03  0.92  
Miami Masters        Hard       9.5%  0.86  0.78  0.84  
Los Cabos            Hard       6.5%  0.85  0.80  1.28  
Geneva               Clay       6.6%  0.81  1.04  0.85  
Bastad               Clay       7.1%  0.80  0.72  0.88  
Kitzbuhel            Clay       6.3%  0.77  0.84  1.02  
Hamburg              Clay       7.7%  0.76  0.69  1.02  
Indian Wells         Hard       7.6%  0.76  0.84  1.03  
Houston              Clay       9.2%  0.75  0.81  0.94  
Madrid               Clay       7.0%  0.71  0.84  0.89  
Roland Garros        Clay       7.0%  0.71  0.72  0.76  
Rome                 Clay       7.0%  0.69  0.69  0.85  
Munich               Clay       7.0%  0.67  0.74  0.99  
                                                        
Tournament        Surface  2019 Ace%  2019  2018  2017  
Umag                 Clay       5.6%  0.65  0.78  0.61  
Rio de Janeiro       Clay       5.9%  0.63  0.71  0.68  
Budapest             Clay       7.0%  0.62  0.62  0.59  
Barcelona            Clay       5.6%  0.59  0.57  0.55  
Estoril              Clay       4.7%  0.54  0.58  0.53  
Buenos Aires         Clay       3.9%  0.52  0.65  0.88  
Monte Carlo          Clay       4.7%  0.50  0.56  0.50

The Tour Finals played as fast as it has in years–the 2014-16 ratings were 0.89 and 1.06–suggesting either that the organizers finally laid down a proper hard court, or that 15 matches is an insufficient sample. (It certainly isn’t ideal, and the same can be said for 28- and 32-draw tourneys.)

The Davis Cup Finals played more like a typical indoor hard court. At the other extreme, Indian Wells was particularly slow this year, even by its own clay-like standards. At least at a few events, surface speed convergence may have slowed down.

Podcast Episode 78: The Davis Cup Finals

In Episode 78 of the Tennis Abstract Podcast, I am joined by Peter Wetz, making his third appearance on the show. Peter and I take a deep dive into the first edition of the new Davis Cup Finals, talking about Rafael Nadal’s dominance in both singles and doubles, the surprise heroics of Vasek Pospisil, and why the #2 singles players may be the key to a side’s success.

We also take a close look at the format, considering various ways the tournament–and especially the qualification rules–could be tweaked, and what effect that will have on the doubles. Almost every aspect of the event has been controversial, but we can’t help but admit that it made for entertaining tennis that we’re looking forward to seeing again next year.

Thanks for listening!

(Note: this week’s episode is about 75 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Tramlines and Wide Groundstrokes

The NextGen Finals are played on an unusual court, in that the surface is marked only for singles matches, leaving out the “tramlines” that define the doubles alleys. Virtually all tennis events includes doubles, as well, so this is rarely an option. The ATP has skipped tramlines at season-ending events before, but at the end of the 2010s, the singles-only court is exclusive to the NextGen Finals.

One might reasonably wonder whether the unique paint job has any effect on play:

I discussed this on a recent podcast with Erik Jonsson, and we tentatively concluded that tennis pros (even young ones) with thousands of hours of playing experience shouldn’t be affected by a tweak to the appearance of the court. But why speculate when we can look at some data?

The Match Charting Project, my volunteer-driven effort to log shot-by-shot records of professional tennis matches, notes various details about errors–forced or unforced, and “type”–net, deep, wide, or wide-and-deep. MCP contributors didn’t immediately take to the NextGen Finals–before this week, the 2018 final was the only charted match out of the 6,600 matches in the dataset–but 2019 was different. We now have shot-by-shot stats for 8 of the 15 matches played in Milan last week. (Big thanks to Carrie, who took charge of Alex de Minaur’s entire run to the final.)

Quantifying wide errors

We’re interested in the frequency of wide errors, which isn’t quite as simple as it sounds. I chose to focus only groundstrokes, and I also excluded forced errors–shots on which the player might not have much control of the direction of the ball.

Here are three metrics we could use for the frequency of wide errors:

  • Wide errors per point
  • Wide errors per unforced error
  • Wide errors per “makeable” groundstroke–that is, groundstrokes that were either unforced errors or put in play

Wide errors per point is probably too crude, but it does have the advantage of simplicity. Wide errors per unforced error might have some value, telling us in what direction a player was most aggressive. The last, wide errors per makeable groundstroke, is probably the best representation of what we’re looking for, as it tells us how frequently a player tried to hit a shot and it went wide.

Here are de Minaur’s numbers for his five 2019 NextGen matches, along with his hard-court aggregates from 28 other charted matches in the last two years:

          Wide / Pt  Wide / UFE  Wide / GS  
NextGen        2.7%        1.5%      21.7%  
ATP Hard       3.0%        1.4%      21.4%

At least for Alex, the tramlines don’t seem to make much of a difference.

Let’s look at the slightly larger group of players. We have eight matches, which means 16 records of one match for a single player, including at least one for each of the eight guys who qualified for Milan. Here are the three wide-error rates for the NextGen Finals matches, along with the same players’ wide-error rates for other charted hard court matches in the last two years:

          Wide / Pt  Wide / UFE  Wide / GS  
NextGen        3.2%        1.8%      19.5%  
ATP Hard       3.2%        1.8%      23.1%

For our first two metrics, there is absolutely no effect. Tramlines or no tramlines, wide errors mark the end of 3.2% of points, and 1.8% of total unforced errors. (The 3.2% figure is per player, meaning that 6.4% of points were ended with a wide error.)

The third metric, though, is more interesting. On tour, these players make a wide error on 23.1% of their “makeable” groundstrokes. That number dropped by more than one-seventh, to 19.5%, on the tramline-free court in Milan. At the same time, the overall rate of unforced errors (not just wide errors) increased compared to the same players’ efforts on hard courts at other events.

Deep mind

I see two possible explanations for such a substantial drop. First, we don’t have much data, and maybe it’s just a fluke of a small sample. Some of the difference can be traced to Ugo Humbert, who didn’t make a single wide error in his one charted NextGen Finals match. (Humbert’s usual wide-error rates are close to average.) Without a lot more matches played on tramline-free surfaces–not to mention charts of those matches–we won’t be able to draw a firm conclusion.

Second, it could be a real effect stemming from some aspect of the conditions in Milan. The lack of tramlines really might, as Lisa puts it, “focus the mind.”

Compared to other innovations trialed at the NextGen Finals, the singles-only court gets very little press. But unlike, say, the towel rack or the shot clock, it might just have a small effect on play.

Podcast Episode 77: Erik Jonsson on Swedish Tennis and the NextGen Finals

Episode 77 of the Tennis Abstract Podcast is an interview with Erik Jonsson (@erktennis), of Tennisportalen and the Source podcast, about last week’s ATP NextGen Finals, which included up-and-coming Swedish star Mikael Ymer. We talk about the stunning rise of Jannik Sinner, the progress shown by Alex De Minaur, and we consider the advantages and disadvantages of a whole slew of the rule innovations that are employed at the NextGen event in Milan.

We also delve into Mikael Ymer’s potential, whether older brother Elias could still become a top-100 player, and if there is any reason why so many prominent umpires hail from Sweden. Finally, we chat about Erik’s Tennis Hipster Handbook, and we wonder whether it’s possible to follow tennis anymore without Twitter.

Thanks for listening!

(Note: this week’s episode is about 65 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Andreescu, Medvedev, and the Future According to Elo

With the US Open title added to her 2019 trophy haul, Bianca Andreescu is finally a member of the WTA top 10, debuting at fifth on the ranking table. Daniil Medvedev, the breakout star of the summer on the men’s side, only cracked the ATP top 10 after Wimbledon. He’s now up to fourth. The official ranking algorithms employed by the tours take some time to adjust to the presence of new stars.

Elo, on the other hand, reacts quickly. While the ATP and WTA computers assign points based on a year’s worth of results (rounds reached, not opponent quality), Elo gives the most weight to recent accomplishments, with even greater emphasis placed on surprising outcomes, like upsets of top players. If your goal in using a ranking system is to predict the future, Elo is better: Elo-based forecasts significantly outperform predictions based on ATP and WTA ranking points.

Andreescu’s first Premier-level title came at Indian Wells in March, when she beat two top-ten players, Elina Svitolina and Angelique Kerber, in the semi-final and final. The WTA computer reacted by moving her up from 60th to 24th on the official list. Elo already saw Andreescu as a more formidable force after her run to the final in Auckland, so after Indian Wells, the algorithm moved her up to seventh. Three more wins in Miami, and the Canadian teen cracked the Elo top five.

Tennis fans are accustomed to the slow adjustments of the ranking system, so seeing a “(22)” or a “(15)” next to Andreescu’s name at Roland Garros and the US Open wasn’t particularly jarring. And there’s something to be said for withholding judgment, since tennis has had its share of teenage flashes in the pan. But Elo is usually right. The betting market heavily favored Serena Williams in the US Open final, but Elo saw the Canadian as the superior player, giving her a slight edge. After the latest seven match wins in New York, the algorithm rates Andreescu as the best player on tour, very narrowly edging out Ashleigh Barty. Would you dare disagree?

The launching (Ar)pad

When Medvedev first reached the top ten on the Elo list last October, I ran some numbers to compare the two ranking systems. Most players who earn a spot in the Elo top ten eventually make their way into the ATP top ten as well, but Elo is almost always first. On average, the algorithm picks top-tenners more than a half-year sooner than the tour’s computer. The 23-year-old Russian is a good example: He reached eighth place on the Elo list last October, but didn’t match that mark in the ATP rankings for another 10 months, after reaching the Montreal final.

Andreescu closed the gap faster than Medvedev did, needing a more typical six months to progress from Elo top-tenner to a single-digit WTA ranking. It may not take much longer before her Elo and WTA rankings converge at the top of both lists.

We no longer need Elo to tell us that Andreescu and Medvedev are likely to keep winning matches at the highest level. But having acknowledged the accuracy with which Elo glimpses the future, it’s worth looking at which players are likely to follow in their footsteps.

After the US Open, Elo’s boldest claim regards Matteo Berrettini, ranked sixth. The ATP computer sites him at 13th, and he only made one brief stop this summer inside the top 20. The Flushing semi-finalist has been inside the Elo top 10 since mid-June, and the algorithm currently puts him ahead of such better-established young players as Alexander Zverev and Stefanos Tsitsipas.

The women’s Elo list doesn’t feature any similar surprises in the top 10, but that hardly means it agrees with the WTA computer. Karolina Muchova, currently at a career-high WTA ranking of 43rd, is 23rd on the Elo table. Two veteran threats, Victoria Azarenka and Venus Williams, are also marooned outside the official top 40, but Elo sees them as 18th and 28th best on tour, respectively. In terms of predictiveness, quality is more important than quantity, so a limited schedule isn’t necessarily seen as a drawback. Elo is also optimistic about Sofia Kenin, rating her 13th, compared to her official WTA standing of 20th.

Half a year from now, I’d bet Berrettini’s official ranking is closer to 6 than to 13, and that Muchova’s position is closer to 23 than 43. It’s impossible to tell the future, but if we’re interested in looking ahead, Elo gives us a six-month head start on the official rankings. We’ll have to wait and see whether the rest of the women’s tour can keep Andreescu away from the top spot for that long.