Rethinking Match Results as Probabilities

You don’t have to watch tennis for long before hearing a commentator explain that matches can be decided by the slimmest of margins. It’s common for a match winner to tally only 51% or 52% of the total points played. Dozens of times each year, players go even further, triumphing despite winning fewer than half of points. Novak Djokovic did just that in the 2019 Wimbledon final, claiming only 204 points to Roger Federer’s 218.

It’s right to look at results like Djokovic-Federer and conclude that many matches are decided by slim margins or that performance on certain points is crucial. Indeed, players occasionally win matches while winning as few as 47% of points.

Still, it’s possible to take the “slim margins” claim too far. 51% sounds like a narrow margin, as does 53%. In many endeavors, sporting and otherwise, 55% represents a near-tie, and even 60% or 65% suggests that there isn’t much to separate the two sides. Not so in tennis, especially in the serve-centered men’s game. However it sounds, 60% represents a one-sided contest, and 65% is a blowout verging on embarrassment. In 2019, only three ATP tour matches saw one player win more than 70% of total points.

Answer a different question

For several reasons, total points won is an imperfect measure of one player’s superiority, even in a single match. One flaw is that it is usually stuck in that range between 35% and 65%, incorrectly implying that all tennis matches are relatively close contests. Another drawback is that not all 55% rates (or 51%s, or 62%s) are created equal. The longer the match, the more information we gain about the players. For a specific format, like best-of-three, a longer match usually requires closely-matched players to go to tiebreaks or a third set. But if we want to compare matches across different formats (like best-of-three and best-of-five), the length of the match doesn’t necessarily tell us anything. Best-of-five matches are longer because of the rules, not because of any characteristics of the players.

The solution is to think in terms of probabilities. Given the length of a match, and the percentage of points won by each player, what is the probability that the winner was the better player?

To answer that question, we use the binomial distribution, and consider the likelihood that one player would win as many points as he did if the players were equally matched. If we flipped a fair coin 100 times, we would expect the number of heads to be around 50, but not that it will always be exactly 50. The binomial distribution tells us how often to expect any particular number of heads: 49, 50, or 51 are common, 53 is a bit less common, 55 even less so, 40 or 60 quite uncommon, as so on. For any number of heads, there’s some probability that it is entirely due to chance, and some probability that it occurs because the coin is biased.

Here’s how that relates to a tennis match. We start the match pretending that we know nothing about the players, assuming that they are equal. The number of points is analogous to the number of coin flips–the more points, the more likely the player who wins the most is really better. The number of points won by the victor corresponds to the number of heads. If the winner claims 60% of points, we can be pretty sure that he really is better, just as a tally of 60% heads in 100 or more flips would indicate that the coin is probably biased.

More than just 59%

The binomial distribution helps us convert those intuitions into probabilities. Let’s look at an example. The 2019 Roland Garros final was a fairly one-sided affair. Rafael Nadal took the title, winning 58.6% of total points played (116 of 198) over Dominic Thiem, despite dropping the second set. If Nadal and Thiem were equally matched, the probability that Nadal would win so many points is barely 1%. Thus, we can say that there is a 99% probability that Nadal was–on the day, in those conditions, and so on–the better player.

No surprises there, and there shouldn’t be. Things get more interesting when we alter the length of the match. The two other 2019 ATP finals in which one player won about 58.6% of points were both claimed by Djokovic. In Paris, he won 58.7% of points (61 of 104) against Denis Shapovalov, and in Tokyo, he accounted for 58.3% (56 of 96) in his defeat of John Millman. Because they were best-of-three instead of best-of-five, those victories took about half as long as Nadal’s, so our confidence that Djokovic was the better player–while still high!–shouldn’t be quite as close to 100%. The binomial distribution says that those likelihoods are 95% and 94%, respectively.

The winner of the average tour-level ATP match in 2019 won 55% of total points–the sort of number that sounds close, even as attentive fans know it really isn’t. When we convert every match result into a probability, the average likelihood that the winner was the better player is 80%. The latter number not only makes more intuitive sense–fewer results are clustered in the mid 50s, with numbers spread out from 15% to 100%–but it considers the length of the match, something that old-fashioned total-points-won ignores.

Why does this matter?

You might reasonably think that anyone who cared about quantifying match results already has these intuitions. You already know that 55% is a tidy win, 60% is an easy one, and that the length of the match means those numbers should be treated differently depending on context. Ranking points and prize money are awarded without consideration of this sort of trivia, so what’s the point of looking for an alternative?

I find this potentially valuable as a way to represent margin of victory. It seems logical that any player rating system–such as my Elo ratings–should incorporate margin of victory, because it’s tougher to execute a blowout than it is a narrow win. Put another way, someone who wins 59% of points against Thiem is probably better than someone who wins 51% of points against Thiem, and it would make sense for ratings to reflect that.

Some ratings already incorporate margin of victory, including the one introduced recently by Martin Ingram, which I discussed with him on a recent podcast. But many systems–again, including my Elo ratings–do not. Over the years, I’ve tested all sorts of potential ways to incorporate margin of victory, and have not found any way to consistently improve the predictiveness of the ratings. Maybe this is the one that will work.

Leverage and lottery matches

I’ve already hinted at one limitation to this approach, one that affects most other margin-of-victory metrics. Djokovic won only 48.3% of points in the 2019 Wimbledon final, a match he managed to win by coming up big in more important moments than Federer did. Recasting margin of victory in terms of probabilities gives us more 80% results than 55% results, but it also gives us more 25% results than 48% results. According to this approach, there is only a 24% chance that Djokovic was the better player that day. While that’s a defensible position–remember the 218 to 204 point gap–it’s also a bit uncomfortable.

Using the binomial distribution as I’ve described above, we completely ignore leverage, the notion that some points are more valuable than others. While most players aren’t consistently good or bad in high-leverage situations, many matches are decided entirely by performance in those key moments.

One solution would be to incorporate my concept of Leverage Ratio, which compares the importance of the points won by each player. I’ve further combined Leverage Ratio with Dominance Ratio, a metric closely related to total points won, into a single number I call DR+, or adjusted Dominance Ratio. It’s possible to win a match with a DR below 1.0, which means winning fewer return points than your opponent did, an occurrence that often occurs when total points won is below 50%. But when DR is adjusted for leverage, it’s extremely uncommon for a match winner to end up with a DR+ below 1.0. Djokovic’s DR in the Wimbledon final was 0.87, and his DR+ was 0.97, one of the very few instances in which a winner’s adjusted figure stayed below 1.0.

It would be impossible to fix the binomial distribution approach in the same way I’ve “fixed” DR. We can’t simply multiply 65%, or 80%, or whatever, by Leverage Ratio, and expect to get a sensible result. We might not even be interested in such an approach. Calculating Leverage Ratio requires access to a point-by-point log of the match–not to mention a hefty chunk of win-probability code–which makes it extremely time consuming to compute, even when the necessary data is available.

For now, leverage isn’t something we can fix. It is only something that we can be aware of, as we spot confusing margin-of-victory figures like Djokovic’s 24% from the Wimbledon final.

Rethinking, fast and slow

As with many of the metrics I devise, I don’t really expect wide adoption. If the best application of this approach is to create a component that improves Elo ratings, then that’s a useful step forward, even if it goes no further.

The broader goal is to create metrics that incorporate more of our intuitions. Just because we’ve grown accustomed to the quirks of the tennis scoring system, a universe in which 52% is close and 54% is not, doesn’t mean we can’t do better. Thinking in terms of probabilities takes more effort, but it almost always nets more insight.

Share this:

Related

Discover more from Heavy Topspin