## Measuring the Performance of Tennis Prediction Models

With the recent buzz about Elo rankings in tennis, both at FiveThirtyEight and here at Tennis Abstract, comes the ability to forecast the results of tennis matches. It’s not far fetched to ask yourself, which of these different models perform better and, even more interesting, how they fare compared to other ‘models’, such as the ATP ranking system or betting markets.

For this, admittedly limited, investigation, we collected the (implied) forecasts of five models, that is, FiveThirtyEight, Tennis Abstract, Riles, the official ATP rankings, and the Pinnacle betting market for the US Open 2016. The first three models are based on Elo. For inferring forecasts from the ATP ranking, we use a specific formula1 and for Pinnacle, which is one of the biggest tennis bookmakers, we calculate the implied probabilities based on the provided odds (minus the overround)2.

Next, we simply compare forecasts with reality for each model asking If player A was predicted to be the winner ($P(a) > 0.5$), did he really win the match? When we do that for each match and each model (ignoring retirements or walkovers) we come up with the following results.

```Model		% correct
Pinnacle	76.92%
538		75.21%
TA		74.36%
ATP		72.65%
Riles		70.09%
```

What we see here is how many percent of the predictions were actually right. The betting model (based on the odds of Pinnacle) comes out on top followed by the Elo models of FiveThirtyEight and Tennis Abstract. Interestingly, the Elo model of Riles is outperformed by the predictions inferred from the ATP ranking. Since there are several parameters that can be used to tweak an Elo model, Riles may still have some room left for improvement.

However, just looking at the percentage of correctly called matches does not tell the whole story. In fact, there are more granular metrics to investigate the performance of a prediction model: Calibration, for instance, captures the ability of a model to provide forecast probabilities that are close to the true probabilities. In other words, in an ideal model, we want 70% forecasts to be true exactly in 70% of the cases. Resolution measures how much the forecasts differ from the overall average. The rationale here is, that just using the expected average values for forecasting will lead to a reasonably well-calibrated set of predictions, however, it will not be as useful as a method that manages the same calibration while taking current circumstances into account. In other words, the more extreme (and still correct) forecasts are, the better.

In the following table we categorize the set of predictions into bins of different probabilities and show how many percent of the predictions were correct per bin. This also enables us to calculate Calibration and Resolution measures for each model.

```Model    50-59%  60-69%  70-79%  80-89%  90-100% Cal  Res   Brier
538      53%     61%     85%     80%     91%     .003 .082  .171
TA       56%     75%     78%     74%     90%     .003 .072  .182
Riles    56%     86%     81%     63%     67%     .017 .056  .211
ATP      50%     73%     77%     84%     100%    .003 .068  .185
Pinnacle 52%     91%     71%     77%     95%     .015 .093  .172
```

As we can see, the predictions are not always perfectly in line with what the corresponding bin would suggest. Some of these deviations, for instance the fact that for the Riles model only 67% of the 90-100% forecasts were correct, can be explained by small sample size (only three in that case). However, there are still two interesting cases (marked in bold) where sample size is better and which raised my interest. Both the Riles and Pinnacle models seem to be strongly underconfident (statistically significant) with their 60-69% predictions. In other words, these probabilities should have been higher, because, in reality, these forecasts were actually true 86% and 91% percent of the times.3 For the betting aficionados, the fact that Pinnacle underestimates the favorites here may be really interesting, because it could reveal some value as punters would say. For the Riles model, this would maybe be a starting point to tweak the model.

In the last three columns Calibration (the lower the better), Resolution (the higher the better), and the Brier score (the lower the better) are shown. The Brier score combines Calibration and Resolution (and the uncertainty of the outcomes) into a single score for measuring the accuracy of predictions. The models of FiveThirtyEight and Pinnacle (for the used subset of data) essentially perform equally good. Then there is a slight gap until the model of Tennis Abstract and the ATP ranking model come in third and fourth, respectively. The Riles model performs worst in terms of both Calibration and Resolution, hence, ranking fifth in this analysis.

To conclude, I would like to show a common visual representation that is used to graphically display a set of predictions. The reliability diagram compares the observed rate of forecasts with the forecast probability (similar to the above table).

The closer one of the colored lines is to the black line, the more reliable the forecasts are. If the forecast lines are above the black line, it means that forecasts are underconfident, in the opposite case, forecasts are overconfident. Given that we only investigated one tournament and therefore had to work with a low sample size (117 predictions), the big swings in the graph are somewhat expected. Still, we can see that the model based on ATP rankings does a really good job in preventing overestimations even though it is known to be outperformed by Elo in terms of prediction accuracy.

To sum up, this analysis shows how different predictive models for tennis can be compared among each other in a meaningful way. Moreover, I hope I could exhibit some of the areas where a model is good and where it’s bad. Obviously, this investigation could go into much more detail by, for example, comparing the models in how well they do for different kinds of players (e.g., based on ranking), different surfaces, etc. This is something I will spare for later. For now, I’ll try to get my sleeping patterns accustomed to the schedule of play for the Australian Open, and I hope, you can do the same.

This is a guest article by me, Peter Wetz. I am a computer scientist interested in racket sports and data analytics based in Vienna, Austria.

#### Footnotes

1. $P(a) = a^e / (a^e + b^e)$ where $a$ are player A’s ranking points, $b$ are player B’s ranking points, and $e$ is a constant. We use $e = 0.85$ for ATP men’s singles.

2. The betting market in itself is not really a model, that is, the goal of the bookmakers is simply to balance their book. This means that the odds, more or less, reflect the wisdom of the crowd, making it a very good predictor.

3. As an example, one instance, where Pinnacle was underconfident and all other models were more confident is the R32 encounter between Ivo Karlovic and Jared Donaldson. Pinnacle’s implied probability for Karlovic to win was 64%. The other models (except the also underconfident Riles model) gave 72% (ATP ranking), 75% (FiveThirtyEight), and 82% (Tennis Abstract). Turns out, Karlovic won in straight sets. One factor at play here might be that these were the US Open where more US citizens are likely to be confident about the US player Jared Donaldson and hence place a bet on him. As a consequence, to balance the book, Pinnacle will lower the odds on Donaldson, which results in higher odds (and a lower implied probability) for Karlovic.

## Forecasting Davis Cup Doubles

One of the most enjoyable aspects of Davis Cup is the spotlight it shines on doubles. At ATP events, doubles matches are typically relegated to poorly-attended side courts. In Davis Cup, doubles gets a day of its own, and crowds turn out in force. Even better, the importance of Davis Cup inspires many players who normally skip doubles to participate.

Because singles specialists are more likely to play doubles, and because most Davis Cup doubles teams are not regular pairings, forecasting these matches is particularly difficult. In the past, I haven’t even tried. But now that we have D-Lo–Elo ratings for doubles–it’s a more manageable task.

To my surprise, D-Lo is even more effective with Davis Cup than it is with regular-season tour-level matches. D-Lo correctly predicts the outcome of about 65% of tour-level doubles matches since 2003. For Davis Cup World Group and World Group Play-Offs in that time frame, D-Lo is right 70% of the time. To put it another way, this is more evidence that Davis Cup is about the chalk.

What’s particularly odd about that result is that D-Lo itself isn’t that confident in its Davis Cup forecasts. For ATP events, D-Lo forecasts are well-calibrated, meaning that if you look at 100 matches where the favorite is given a 60% chance of winning, the favorite will win about 60 times. For the Davis Cup forecasts, D-Lo thinks the favorite should win about 60% of the time, but the higher-rated team ends up winning 70 matches out of 100.

Davis Cup’s best-of-five format is responsible for part of that discrepancy. In a typical ATP doubles match, the no-ad scoring and third-set tiebreak introduce more luck into the mix, making upsets more likely. A matchup that would result in a 60% forecast in the no-ad, super-tiebreak format translates to a 64.5% forecast in the best-of-five format. That accounts for about half the difference: Davis Cup results are less likely to be influenced by luck.

The other half may be due to the importance of the event. For many players, regular-season doubles matches are a distant second priority to singles, so they may not play at a consistent level from one match to the next. In Davis Cup, however, it’s a rare competitor who doesn’t give the doubles rubber 100% of their effort. Thus, we appear to have quite a few matches in which D-Lo picks the winner, but since it uses primarily tour-level results, it doesn’t realize how heavily the winner should have been favored.

Incidentally, home-court advantage doesn’t seem to play a big role in doubles outcomes. The hosting side has won 52.6% of doubles matches, an edge which could have as much to do with hosts’ ability to choose the surface as it is does with screaming crowds and home cooking. This isn’t a factor that affects D-Lo forecasts, as the system’s predictions are as accurate when it picks the away side as when it picks the home side.

Forecasting Argentina-Croatia doubles

Here are the D-Lo ratings for the eight nominated players this weekend. The asterisks indicate those players who are currently slated to contest tomorrow’s doubles rubber:

```Player                 Side  D-Lo
Juan Martin del Potro  ARG   1759
Leonardo Mayer         ARG   1593  *
Federico Delbonis      ARG   1540
Guido Pella            ARG   1454  *

Ivan Dodig             CRO   1856  *
Marin Cilic            CRO   1677
Ivo Karlovic           CRO   1580
Franco Skugor          CRO   1569  *
```

As it stands now, Croatia has a sizable advantage. Based on the D-Lo ratings of the currently scheduled doubles teams, the home side has a 189-point edge, which converts to a 74.8% probability of winning. But remember, that’s the chance of winning a no-ad, super-tiebreak match, with all the luck that entails. In best-of-five, that translates to a whopping 83.7% chance of winning.

Making matters worse for Argentina, it’s likely that Croatia could improve their side. Argentina could increase their odds of winning the doubles rubber by playing Juan Martin del Potro, but given Delpo’s shaky physical health, it’s unlikely he’ll play all three days. Marin Cilic, on the other hand, could very well play as much as possible. A Cilic-Ivan Dodig pairing would have a 243-point advantage over Leonardo Mayer and Guido Pella, which translates to an 89% chance of winning a best-of-five match. Even Mayer’s Davis Cup heroics are unlikely to overcome a challenge of that magnitude.

Given the likelihood that Pella will sit on the bench for every meaningful singles match, it’s easy to wonder if there is a better option. Sure enough, in Horacio Zeballos, Argentina has a quality doubles player sitting at home. The two-time Grand Slam doubles semifinalist has a current D-Lo rating of 1758, almost identical to del Potro’s. Paired with Mayer, Zeballos would bring Argentina’s chances of upsetting a Dodig-Franco Skugor team to 43%. Zeballos-Mayer would also have a 32% chance of defeating Dodig-Cilic.

A full Argentina-Croatia forecast

With the doubles rubber sorted, let’s see who is likely to win the 2016 Davis Cup. Here are the Elo– and D-Lo-based forecasts for each currently-scheduled match, shown from the perspective of Croatia:

```Rubber                      Forecast (CRO)
Cilic v Delbonis                     90.8%
Karlovic v del Potro                 15.8%
Dodig/Skugor v Mayer/Pella           83.7%
Cilic v del Potro                    36.3%
Karlovic v Delbonis                  75.8%
```

Elo still believes Delpo is an elite-level player, which is why it makes him the favorite in the pivotal fourth rubber against Cilic. The system is less positive about Federico Delbonis, who it ranks 68th in the world, against his #41 spot on the ATP computer.

These match-by-match forecasts imply a 74.2% probability that Croatia will win the tie. That’s more optimistic than the betting market which, a few hours before play begins, gives Croatia about a 65% edge.

However, most of the tweaks we could make would move the needle further toward a Croatia victory. Delpo’s body may not allow him to play two singles matches at full strength, and the gap in singles skill between him and Mayer is huge. Croatia could improve their doubles chances if Cilic plays. And if there is a home-court or surface advantage, it would probably work against the South Americans.

Even more likely than a Croatian victory is a 1-1 split of the first two matches. If that happens, everything will hang in the balance tomorrow, when the world tunes in to watch a doubles match.

## Forecasting the 2016 ATP World Tour Finals

Andy Murray is the #1 seed this week in London, but as I wrote for The Economist, Novak Djokovic likely remains the best player in the world. According to my Elo ratings, he would have a 63% chance of winning a head-to-head match between the two. And with the added benefit of an easier round-robin draw, the math heavily favors Djokovic to win the tournament.

Here are the results of a Monte Carlo simulation of the draw:

```Player        SF      F      W
Djokovic   95.3%  73.9%  54.6%
Murray     86.3%  58.3%  29.7%
Nishikori  60.4%  24.9%   7.8%
Raonic     50.9%  16.3%   3.3%
Wawrinka   29.4%   7.8%   1.6%
Monfils    33.2%   8.7%   1.4%
Cilic      23.9%   5.8%   1.1%
Thiem      20.7%   4.1%   0.5%```

I don’t think I’ve ever seen a player favored so heavily to progress out of the group stage. Murray’s 86% chance of doing so is quite high in itself; Novak’s 95% is otherworldly. His head-to-heads against the other players in his group are backed up by major differences in Elo points–Dominic Thiem is a lowly 15th on the Elo list, given only a 7.4% chance of beating the Serb.

If Milos Raonic is unable to compete, Djokovic’s chances climb even higher. Here are the probabilities if David Goffin takes Raonic’s place in the bracket:

```Player        SF      F      W
Djokovic   96.8%  75.2%  55.4%
Murray     86.2%  60.7%  30.6%
Nishikori  60.7%  26.3%   8.1%
Monfils    47.7%  12.4%   1.8%
Wawrinka   29.3%   8.5%   1.7%
Cilic      23.8%   6.2%   1.1%
Thiem      29.5%   5.8%   0.7%
Goffin     26.0%   4.9%   0.5%```

The luck of the draw was on Novak’s side. I ran another simulation with Djokovic and Murray swapping groups. Here, Djokovic is still heavily favored to win the tournament, but Murray’s semifinal chances get a sizable boost:

```Player        SF      F      W
Djokovic   92.8%  75.1%  54.9%
Murray     90.9%  58.1%  29.8%
Nishikori  58.4%  26.9%   7.5%
Raonic     52.3%  14.3%   3.3%
Wawrinka   26.9%   8.4%   1.6%
Monfils    35.3%   7.5%   1.4%
Cilic      21.9%   6.2%   1.0%
Thiem      21.6%   3.4%   0.5%```

Elo rates Djokovic so highly that he is favored no matter what the draw. But the draw certainly helped.

Doubles!

I’ve finally put together a sufficient doubles dataset to generate Elo ratings and tournament forecasts for ATP doubles. While I’m not quite ready to go into detail, I can say that, by using the Elo algorithm and rating players individually, the resulting forecasts outperform the ATP rankings about as much as singles Elo ratings do.

Here is the forecast for the doubles event at the World Tour Finals:

```Team               SF      F      W
Herbert/Mahut   76.4%  49.5%  32.1%
Bryan/Bryan     68.7%  36.8%  19.9%
Kontinen/Peers  55.7%  29.1%  13.8%
Dodig/Melo      58.4%  28.1%  13.2%
Murray/Soares   48.3%  20.8%   8.6%
Lopez/Lopez     37.7%  16.4%   6.2%
Klaasen/Ram     30.2%  11.9%   4.0%
Huey/Mirnyi     24.6%   7.3%   2.2%```

This distribution is more like what round-robin forecasts usually look like, without a massive gap between the top of the field and the rest. Pierre-Hugues Herbert and Nicolas Mahut are the top rated team, followed closely by Bob Bryan and Mike Bryan. Max Mirnyi was, at his peak, one of the highest Elo-rated doubles players, but his pairing with Treat Huey is the weakest of the bunch.

The men’s doubles bracket has some legendary names, along with some players–like Herbert and Henri Kontinen–who may develop into all-time greats, but it has no competitors who loom over the rest of the field like Murray and Djokovic do in singles.

## Elo-Forecasting the WTA Tour Finals in Singapore

With the field of eight divided into two round-robin groups for the WTA Tour Finals in Singapore, we can play around with some forecasts for this event. I’ve updated my Elo ratings through last week’s tournaments, and the first thing that jumps out is how different they are from the official rankings.

Here’s the Singapore field:

```EloRank  Player                Elo  Group
2        Maria Sharapova      2296    RED
4        Simona Halep         2181    RED
6        Garbine Muguruza     2147  WHITE
8        Petra Kvitova        2136  WHITE
9        Angelique Kerber     2129  WHITE
15       Lucie Safarova       2051  WHITE
21       Flavia Pennetta      2004    RED```

Serena Williams (#1 in just about every imaginable ranking system) chose not to play, but if Elo ruled the day, Belinda Bencic, Venus Williams, and Victoria Azarenka would be playing this week in place of Agnieszka Radwanska, Lucie Safarova, and Flavia Pennetta.

Anyway, we’ll work with what we’ve got. Maria Sharapova is, according to Elo, a huge favorite here. The ratings translate into a forecast that looks like this:

```Player                  SF  Final  Title
Maria Sharapova      83.7%  61.1%  43.6%
Simona Halep         60.8%  35.4%  15.9%
Garbine Muguruza     59.4%  25.7%  11.3%
Petra Kvitova        55.2%  23.0%   9.8%
Angelique Kerber     53.1%  21.7%   8.8%
Lucie Safarova       32.3%   9.7%   3.1%
Flavia Pennetta      18.1%   6.0%   1.4%```

If Sharapova is really that good, the loser in today’s draw was Simona Halep. The top seed would typically benefit from having the second seed in the other group, but because Garbine Muguruza recently took over the third spot in the rankings, Pova entered the draw as a dangerous floater.

However, these ratings don’t reflect the fact that Sharapova hasn’t completed a match since Wimbledon. They don’t decline with inactivity, so Pova’s rating is the same as it was the day after she lost to Serena back in July. (My algorithm also excludes retirements, so her attempted return in Wuhan isn’t considered.)

With as little as we know about Sharapova’s health, it’s tough to know how to tweak her rating. For lack of any better ideas, I revised her Elo rating to 2132, right between Petra Kvitova and Angelique Kerber. At her best, Sharapova is better than that, but consider this a way of factoring in the substantial possibility that she’ll play much, much worse–or that she’ll get injured and her matches will be played by Carla Suarez Navarro instead. The revised forecast:

```Player                  SF  Final  Title
Simona Halep         69.9%  40.9%  24.0%
Garbine Muguruza     59.4%  31.5%  16.5%
Maria Sharapova      57.6%  29.5%  14.5%
Petra Kvitova        55.6%  28.4%  14.4%
Angelique Kerber     52.5%  26.3%  13.2%
Lucie Safarova       32.6%  12.9%   4.9%
Flavia Pennetta      24.7%   8.3%   2.7%```

If this is a reasonably accurate estimate of Sharapova’s current ability, the Red group suddenly looks like the right place to be. Because Elo doesn’t give any particular weight to Grand Slams, it suggests that the official rankings far overestimate the current level of Safarova and Pennetta. The weakness of those two makes Halep a very likely semifinalist and also means that, in this forecast, the winner of the tournament is more likely (54% to 46%) to come from the White group.

Without Serena, and with Sharapova’s health in question, there are simply no dominant players in the field this week. If nothing else, these forecasts illustrate that we’d be foolish to take any Singapore predictions too seriously.

## Forecasting the Effects of Performance Byes in Beijing

To the uninitiated, the WTA draw in Beijing this week looks a little strange. The 64-player draw includes four byes, which were given to the four semifinalists from last week’s event in Wuhan. So instead of empty places in the bracket next to the top four seeds, those free passes go to the 5th, 10th, and 15th seeds, along with one unseeded player, Venus Williams.

“Performance byes”–those given to players based on their results the previous week, rather than their seed–have occasionally featured in WTA draws over the last few years. If you’re interested in their recent history, Victoria Chiesa wrote an excellent overview.

I’m interested in measuring the benefit these byes confer on the recipients–and the negative effect they have on the players who would have received those byes had they been awarded in the usual way. I’ve written about the effects of byes before, but I haven’t contrasted different approaches to awarding them.

This week, the beneficiaries are Garbine Muguruza, Angelique Kerber, Roberta Vinci, and Venus Williams. The top four seeds–the women who were atypically required to play first-round matches, were Simona Halep, Petra Kvitova, Flavia Pennetta, and Agnieszka Radwanska.

To quantify the impact of the various possible formats of a 64-player draw, I used a variety of tools: Elo to rate players and predict match outcomes, Monte Carlo tournament simulations to consider many different permutations of each draw, and a modified version of my code to “reseed” brackets. While this is complicated stuff under the hood, the results aren’t that opaque.

Here are three different types of 64-player draws that Beijing might have employed:

1. Performance byes to last week’s semifinalists. This gives a substantial boost to the players receiving byes, and compared to any other format, has a negative effect on top players. Not only are the top four seeds required to play a first-round match, they are a bit more likely to play last week’s semifinalists, since the byes give those players a better chance of advancing.
2. Byes to the top four seeds. The top four seeds get an obvious boost, and everyone else suffers a bit, as they are that much more likely to face the top four.
3. No byes: 64 players in the draw instead of 60. The clear winners in this scenario are the players who wouldn’t otherwise make it into the main draw. Unseeded players (excluding Venus) also benefit slightly, as the lack of byes mean that top players are less likely to advance.

Let’s crunch the numbers. For each of the three scenarios, I ran simulations based on the field without knowing how the draw turned out. That is, Kvitova is always seeded second, but she doesn’t always play Sara Errani in the first round. This approach eliminates any biases in the actual draw. To simulate the 64-player field, I added the four top-ranked players who lost in the final round of qualifying.

To compare the effects of each draw type on every player, I calculated “expected points” based on their probability of reaching each round. For instance, if Halep entered the tournament with a 20% chance of winning the event with its 1,000 ranking points, she’d have 200 “expected points,” plus her expected points for the higher probabilities (and lower number of points) of reaching every round in between. It’s simply a way of combining a lot of probabilities into a single easier-to-understand number.

Here are the expected points in each draw scenario (plus the actual Beijing draw) for the top four players, the four players who received performance byes, plus a couple of others (Belinda Bencic and Caroline Wozniacki) who rated particularly highly:

```Player               Seed  PerfByes  TopByes  NoByes  Actual
Simona Halep            1       323      364     330     341
Petra Kvitova           2       276      323     290     291
Venus Williams                  247      216     218     279
Belinda Bencic         11       255      249     268     254
Garbine Muguruza        5       243      202     210     227
Angelique Kerber       10       260      224     235     227
Caroline Wozniacki      8       208      203     205     199
Flavia Pennetta         3       142      177     144     195
Agnieszka Radwanska     4       185      233     192     188
Roberta Vinci          15       120       91      94      90```

As expected, the top four seeds are expected to reap far more points when given first-round byes. It’s most noticeable for Pennetta and Radwanska, who would enjoy a 20% boost in expected points if given a first-round bye. Oddly, though, the draw worked out very favorably for Flavia–Elo gave her a 95% chance of beating her first-round opponent Xinyun Han, and her draw steered her relatively clear of other dangerous players in subsequent rounds.

Similarly, the performance byes are worth a 15 to 30% advantage in expected points to the players who receive them. Vinci is the biggest winner here, as we would generally expect from the player most likely to suffer an upset without the bye.

Like Pennetta, Venus was treated very well by the way the draw turned out. The bye already gave her an approximately 15% boost compared to her expectations without a bye, and the draw tacked another 13% onto that. Both the structure of the draw and some luck on draw day made her the event’s third most likely champion, while the other scenarios would have left her in fifth.

All byes–conventional or unconventional–work to the advantage of some players and against others. However they are granted, they tend to work in favor of those who are already successful, whether that success is over the course of a year or a single week.

Performance byes are easy enough to defend: They give successful players a bit more rest between two demanding events, and from the tour’s perspective, they make it a little more likely that last week’s best players won’t pull off of this week’s tourney. And if all byes tend to the make the rich a little richer, at least performance byes open the possibility of benefiting different players than usual.

## The Pervasive Role of Luck in Tennis

No matter what the scale, from a single point to a season-long ranking–even to a career–luck plays a huge role in tennis. Sometimes good luck and bad luck cancel each other out, as is the case when two players benefit from net cord winners in the same match. But sometimes luck spawns more of the same, giving fortunate players opportunities that, in turn, make them more fortunate still.

Usually, we refer to luck only in passing, as one possible explanation for an isolated phenomenon. It’s important that we examine them in conjunction with each other to get a better sense of just how much of a factor luck can be.

Single points

Usually, we’re comfortable saying that the results of individual points are based on skill. Occasionally, though, something happens to give the point to an undeserving player. The most obvious examples are points heavily influenced by a net cord or a bad bounce off an uneven surface, but there are others.

Officiating gets in the way, too. A bad call that the chair umpire doesn’t overturn can hand a point to the wrong player. Even if the chair umpire (or Hawkeye) does overrule a bad call, it can result in the point being replayed–even if one player was completely in control of the point.

We can go a bit further into the territory of “lucky shots,” including successful mishits, or even highlight-reel tweeners that a player could never replicate. While the line between truly lucky shots and successful low-percentage shots is an ambiguous one, we should remember that in the most extreme cases, skill isn’t the only thing determining the outcome of the point.

Lucky matches

More than 5% of matches on the ATP tour this year have been won by a player who failed to win more than half of points played. Another 25% were won by a player who failed to win more than 53% of points–a range that doesn’t guarantee victory.

Depending on what you think about clutch and momentum in tennis, you might not view some–or even any–of those outcomes as lucky. If a player converts all five of his break point opportunities and wins a match despite only winning 49% of total points, perhaps he deserved it more. The same goes for strong performance in a tiebreaks, another cluster of high-leverage points that can swing a match away from the player who won more points.

But when the margins are so small that executing at just one or two key moments can flip the result–especially when we know that points are themselves influenced by luck–we have to view at least some of these tight matches as having lucky outcomes. We don’t have to decide which is which, we simply need to acknowledge that some matches aren’t won by the better player, even if we use the very loose definition of “better player that day.”

Longer-term luck

Perhaps the most obvious manifestation of luck in tennis is in the draw each week. An unseeded player might start his tournament with an unwinnable match against a top seed or with a cakewalk against a low-ranked wild card. Even seeded players can be affected by fortune, depending on which unseeded players they draw, along with which fellow seeds they will face at which points in the match.

Another form of long-term luck–which is itself affected by draw luck–is what we might call “clustering.” A player who goes 20-20 on a season by winning all of his first-round matches and losing all of his second-round matches will not fare nearly as well in terms of rankings or prize money as someone who goes 20-20 by winning only 10 first-round matches, but reaching the third round every time he does.

Again, this may not be entirely luck–this sort of player would quickly be labeled “streaky,” but combined with draw luck, he might simply be facing players he can beat in clusters, instead of getting easy first-rounders and difficult second-rounders.

The Matthew effect

All of these forms of tennis-playing fortune are in some way related. The sociologist Robert Merton coined the term “Matthew effect“–alternatively known as the principle of cumulative advantage–to refer to situations where one entity with a very small advantage will, by the very nature of a system, end up with a much larger advantage.

The Matthew effect applies to a wide range of phenomena, and I think it’s instructive here. Consider the case of two players separated by only a few points in the rankings–a margin that could have come about by pure luck: for instance, when one player won a match by walkover. One of these players gets the 32nd seed at the Australian Open and the other is unseeded.

These two players–who are virtually indistinguishable, remember–face very different challenges. One is guaranteed two matches against unseeded opponents, while the other will almost definitely face a seed before the third round, perhaps even a high seed in the first. The unseeded player might get lucky, either in his draw or in his matches, cancelling out the effect of the seeding, but it’s more likely that the seeded player will walk away from the tournament with more points, solidifying the higher ranking–that he didn’t earn in the first place.

Making and breaking careers

The Matthew effect can have an impact on an even broader scale. Today’s tennis pros have been training and competing from a young age, and most of them have gotten quite a bit of help along the way, whether it’s the right coach, support from a national federation, or well-timed wild cards.

It’s tough to quantify things like the effect of a good or bad coach at age 15, but wild cards are a more easily understood example of the phenomenon. The unlucky unseeded player I discussed above at least got to enter the tournament. But when a Grand Slam-hosting federation decides which promising prospect gets a wild card, it’s all or nothing: One player gets a huge opportunity (cash and ranking points, even if they lose in the first round!) while the other one gets nothing.

This, in a nutshell, is why people like me spend so much time on our hobby horses ranting about wild cards. It isn’t the single tournament entry that’s the problem, it’s the cascading opportunities it can generate. Sure, sometimes it turns into nothing–Ryan Harrison’s career is starting to look that way–but even in those cases, we never hear about the players who didn’t get the wild cards, the ones who never had the chance to gain from the cumulative advantage of a small leg up.

Why all this luck matters

If you’re an avid tennis fan, most of this isn’t news to you. Sure, players face good and bad breaks, they get good and bad draws, and they’ve faced uneven challenges along the way.

By discussing all of these types of fortune in one place, I hope to emphasize just how much luck plays a part in our estimate of each player at any given time. It’s no accident that mid-range players bounce around the rankings so much. Some of them are truly streaky, and injuries play a part, but much of the variance can be explained by these varying forms of luck. The #30 player in the rankings is probably better than the #50 player, but it’s no guarantee. It doesn’t take much misfortune–especially when bad luck starts to breed more opportunities for bad luck–to tumble down the list.

Even if many of the forms of luck I’ve discussed are truly skill-based and, say, break point conversions are a matter of someone playing better that day, the evidence generally shows that major rises and falls in things like tiebreak winning percentage and break point conversion rates are temporary–they don’t persist from year to year. That may not be properly classed as luck, but if we’re projecting the rankings a year from now, it might as well be.

While match results, tournament outcomes, and the weekly rankings are written in stone, the way that players get there is not nearly so clear. We’d do well to accept that uncertainty.

## Unlikely Davis Cup Finalists and an Early Forecast for Ghent

Among nations that have reached Davis Cup finals, neither Great Britain or Belgium quite fits the mold.

The fortunes of the UK team depend almost entirely on Andy Murray. If you have to choose one player, you couldn’t do much better, but it’s hardly a strategy with lots of room for error. While the Belgian team is a bit more balanced, it doesn’t boast the sort of superstar singles player that most successful nations can send into battle.

Thanks to injury and apathy, the Brits and the Belgians haven’t defeated the level of competition usually required of Davis Cup finalists. Belgium hasn’t had to face any singles player better than Leonardo Mayer, and the only top-ten singles player to show up against Britain was Gilles Simon.

Measured by season-best singles rankings, these are two of the weakest Davis Cup finalists in the modern era [1]. The last time a finalist didn’t have two top-50 singles players was 1987, when the Indian team snuck past the Australians in the semifinals, only to be trounced by a powerhouse Swedish side in the final. This year, neither side has two top-50 players [2].

It’s even worse for the Belgians: David Goffin, their best singles player, has never topped 14th in the rankings. Only three times since 2000 has a nation reached the final without a top-ten player, and to find a side that won the Davis Cup without a top-tenner, we must go back to 1996, when the French team, headed by Arnaud Boetsch and Cedric Pioline, claimed the Cup.

Even when a nation reaches the final without a top-ten singles player, they typically have another singles player in the same range. Yet Belgium’s Steve Darcis has only now crept back into the top 60.

Despite a widespread belief that you can throw logic out the window in the riot that is Davis Cup, the better players still tend to win. Here are Elo-rating-based predictions for the four probable rubbers on clay:

• Murray d. Darcis (94.3%)
• Goffin d. GBR-2 (90.1%)
• Murray d. Goffin (86.7%)
• Darcis d. GBR-2 (78.1%)

Predicting the outcome of any doubles matches–let alone best-of-five-setters with players yet to be determined, probably including one very good but low-ranked player in Andy Murray–is beyond me. But based on the Murray brothers’ performance against Australia and the Belgians’ lack of a true doubles specialist, the edge has to go to Britain–let’s say 65%.

If we accept these individual probabilities, Great Britain has a 65.2% chance of winning the Davis Cup. That doesn’t take into account home court advantage, which will probably be a factor and favor the Belgians [3].

It’s a huge opportunity for the Brits, but it’s still quite a chance for Belgium, which hasn’t been this close to the Davis Cup for a century.  After all, the Cup is inscribed with country names, not judgments about that nation’s easy path to the final.