Sebastian Ofner and ATP Debuts

This is a guest post by Peter Wetz.

Sebastian Ofner, the still relatively young Austrian, received some media attention this June when he qualified for the Wimbledon main draw at his first attempt and even reached the round of 32 by beating Thomaz Bellucci and Jack Sock. Therefore, some people, including me, had an eye on the 21-year-old when he made his ATP tour debut* at Kitzbuhel a few weeks later, where he was awarded a wild card.

Stunningly, Ofner made it into the semifinals despite having drawn top seed Pablo Cuevas in the second round. Cuevas, who admittedly seems to be out of form lately (or possibly is just regressing to his mean), had a 79% chance of reaching the quarterfinal when the draw came out, according to First Ball In’s forecast.

Let’s look at the numbers to contextualize Ofner’s achievement. How deep do players go when making their debut at ATP level? How often would we expect to see what Ofner did in Kitzbuhel?

The following table shows the results of ATP debutantes with different types of entry into the main draw (WC = wild card, Q = qualifier, Direct = direct acceptance, All = WC + Q + Direct). The data considers tournaments starting in 1990.

Round	WC       Q        Direct    All
R16	14.51%	 26.73%   24.46%    21.77%			
QF	 2.39%	  6.39%    4.32%     4.64%
SF	 0.51%	  2.30%    2.16%     1.59%
F	 0.17%	  0.64%    0.72%     0.46%
W	 0.17%	  0.26%    0.72%     0.27%

Since 1990 there have been 1507 ATP debuts: 586 wild cards (39%), 782 qualifiers (52%) and 139 direct acceptances (9%). Given these numbers, we would expect a wild card debutante to get to the semifinal (or further) every 9 years. In other words, it is a once in a decade feat. In fact, in the 28 years of data, only Lleyton Hewitt (Adelaide 1998), Michael Ryderstedt (Stockholm 2004) and Ernests Gulbis (St. Petersburg 2006) accomplished what Ofner did. Only Hewitt went on to win the tournament.

More than half of the players of all entry types who reached the final won the tournament. Speaking in absolute terms, 4 of 7 finalists (of ATP debutantes) won the tournament. (Due to the small sample size, it is perfectly possible that this is just noise in the data.)

If we exclude rounds starting from the semifinals because of small sample sizes, qualifiers outperform direct acceptances. This may be the result of qualifiers having already played two or three matches and having already become accustomed to the conditions, making it easier for them than it is for debutantes who got accepted directly into the main draw. But to really prove this, more investigation is needed.

For now we know that what Sebastian Ofner has achieved rarely happens. We should also know that by no means is his feat a predictor of future greatness.

* I define Kitzbuhel as Ofner’s ATP tour debut because Grand Slam events are run by the ITF. However, Grand Slam statistics, such as match wins, are included in ATP statistics.

Peter Wetz is a computer scientist interested in racket sports and data analytics based in Vienna, Austria.

Putting the Antalya Draw Into Perspective

This is a guest post by Peter Wetz.

When the pre-Wimbledon grass court tournament in Antalya was announced by the ATP in May 2016, some people were scratching their heads: Which top players will be willing to play in Antalya, Turkey one week ahead of Wimbledon? Even more so, because one week earlier two events are played in London and Halle, the latter being considerably closer to London. If a player wanted to participate in Antalya, he would have to fly from Halle (or London) to Antalya and then back to London for Wimbledon, not an ideal itinerary.

Taking a glance at the entry list, the doubts are verified: After Dominic Thiem, the only top 10 player entered in the event, there were just three other men (Paolo Lorenzi, Viktor Troicki and Fernando Verdasco) ranked within the top 40. Only three (Thiem, Verdasco, and Lorenzi) of the 28 players who were directly accepted to the main draw of the event, will be seeded at Wimbledon.

But how weak is the field really compared to others? Of course there are countless ways to measure the strength of a draw, but for a quick and dirty approach we will simply look at two measures, that is, the last direct acceptance (LDA) and the mean rank of quarterfinalists.

The LDA is the rank of the last player who gained direct entrance into a tournament’s main draw excluding lucky losers, qualifiers and special exempts. Comparing the last direct acceptance of the Antalya draw (86, Radu Albot) to all other ATP Tour level events with a draw size of 32 or 28 players, it turns out that Antalya is at the 39th percentile. This means that 39% of the other tournaments have a better/lower (or equal) LDA and that 61% have a worse/higher LDA, respectively. The following image shows a percentile plot of LDAs of tournaments since 2012, highlighting this week’s event in Antalya:

The fact that the LDA compares well against the other tournaments tells us that despite the lack of top ranked seeds, the field seems to be more dense at the bottom. Not that bad after all?

Let us take a look at the mean rank of the eight players who made it into the quarterfinals. Choosing quarterfinalists limits the calculation to the players who were able to perform well at the event, winning at least one, and usually two, matches. This should reduce some of the noise in the data that would be otherwise included due to lucky first round wins.

The mean rank of the quarterfinalists at the Antalya Open 2017 is 109. Out of the 726 tournaments since 2000 with 32 or 28 player draws which were considered in this analysis, only 35 tournaments had a higher mean rank of players at the quarterfinal stage. With nine out of those 35 tournaments, the Hall of Fame Tennis Championships at Newport–which takes place each year after Wimbledon–stands out from the pack. As the following plot shows, the Antalya Open is at the 95th percentile in this category. This seems to be more aligned with what we would have expected.

To provide some context, the following table lists the top 10 tournaments with links to the draws having the worst mean rank of quarterfinalists.

#  Tournament           Mean QF Rank
1  Newport '10          240
2  Newport '01          197
3  Delray Beach '16     191
4  Moscow '13           166
5  Newport '11          166
6  Newport '07          165
7  s-Hertogenbosch '09  164
8  Newport '08          163
9  Gstaad '14           156
10 Amsterdam '01        152
...
36 Antalya '17          109

The seeds are to blame for this: Of the eight seeds, only Verdasco managed to win a match. The other seven went winless. We have to go back as far as 1983’s Tel Aviv tournament to find a draw where only one seed won a match. In Tel Aviv, however, the third seed Colin Dowdeswell won three matches all in all, whereas Fernando Verdasco crashed out in the second round. By the way, Tel Aviv 1983 marks the first title of the then 16 years and 2 months old Aaron Krickstein, still the youngest player to win a singles title on the ATP Tour. That only two out of eight seeds win their first match happens about once per year. The last time this happened at the 2016 Brasil Open, where only Pablo Cuevas and Federico Delbonis won matches as seeds.

Despite the presence of only one top 30 player in this year’s Antalya draw, the middle and bottom of the field looked surprisingly solid, as we saw when considering the last direct acceptance. However, if we take into account the development of the tournament and calculate the mean rank of quarterfinalists, it becomes clear that the field got progressively weaker. Still, there have been worse draws in the past and there will doubtless be worse draws in future. Maybe even in the not too distant future, if we take a glance at this year’s Newport entry list.

Peter Wetz is a computer scientist interested in racket sports and data analytics based in Vienna, Austria.

Podcast Episode 14: Wimbledon Preview

Episode 14 of the Tennis Abstract Podcast is Carl Bialik’s and my Wimbledon preview! We highlight the favorites, the overrated, and the underrated, along with a look at some of the most intriguing matchups. Along the way, we talk about the difficult of making grass court forecasts, and speculate about how players’ consistency changes with age.  Enjoy!

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

The Men Are Old, and The Best Men Are Even Older

It’s been one of the main talking points in men’s tennis for years now: The sport is getting older. Every year, a bigger slice of Grand Slam draws are taken up by thirty-somethings, and now, the entire big four has entered their fourth decades.

I don’t want to belabor the point. But my interest was piqued by an observation from commentator Chris Fowler this week:

When we talk about the sport getting older, this is what we really mean — the best guys are getting up in years.

When we calculate the average age of a draw, or the number of 30-somethings, we weight every player equally. Democratic as it is, it gives most of the weight to guys who are looking for flights home before middle Sunday. As substantial as the overall age shift has been over the last decade, the shift at the top of the game has been even more dramatic.

To quantify the shift, I calculated what I’ll call the “projected winner age” (PWA) of every Wimbledon men’s field from 1991 to 2017. This captures in one number the notion that Fowler is hinting at. We take a weighted average of all 128 men in the main draw, weighted by their chances of winning the tournament, as determined by grass-court Elos at the start of the event.

For example, last year’s Wimbledon men’s draw had an average age of 28.5 years, but a projected winner age of 30.0. We don’t yet know the exact average age of this year’s draw (it looks to be about the same, maybe a tiny bit younger), but we can already say that the PWA is 31.4.

An observer a decade ago would’ve thought such a number was insane. Here are the average ages and PWAs for the last 27 Wimbledons men’s events:

As recently as 2011, there wasn’t much difference between average age and PWA. Until 2015, the difference had never been greater than two years. Now, the difference is almost three years, and the point of comparison–average age–is nearly its own all-time high.

A lot of this, of course, is thanks to the big four. Even as the aging curve has shifted, allowing for late bloomers such as Stan Wawrinka, the biggest stars of the late ’00s–Roger Federer and Rafael Nadal–have declined even less than the revised aging curve would imply. In a sport hungry for new winners, we might have to settle for winners who are newly in their 30s.

Podcast Episode 13: Kvitova vs Barty, Serena vs McEnroe, and Federer vs The Field

In the Episode 13 of the Tennis Abstract Podcast, Carl Bialik and I start with Petra Kvitova’s first title back from injury, and her chances as a floater in the Wimbledon draw, along with other returning grass-court threats Victoria Azarenka and Sabine Lisicki.

We move on to what we hope is a sensible, fact-based discussion of John McEnroe’s comments about how Serena Williams would fare on the men’s tour. It runs from about 19:00 to 50:00 — if you want to skip it, we understand. Finally, we talk Federer’s title in Halle, and how much his Wimbledon chances are aided by Wimbledon’s seeding formula, which moves him into the top four. (For more on that, see my post from earlier today.)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Measuring the Impact of Wimbledon’s Seeding Formula

Unlike every other tournament on the tennis calendar, Wimbledon uses its own formula to determine seedings. The grass court Grand Slam grants seeds to the top 32 players in each tour’s rankings, and then re-orders them based on its own algorithm, which rewards players for their performance on grass over the last two seasons.

This year, the Wimbledon seeding formula has more impact on the men’s draw than usual. Seven-time champion Roger Federer is one of the best grass court players of all time, and though he dominated hard courts in the first half of 2017, he still sits outside the top four in the ATP rankings after missing the second half of 2016. Thanks to Wimbledon’s re-ordering of the seeds, Federer will switch places with ATP No. 3 Stan Wawrinka and take his place in the draw as the third seed.

Even with Wawrinka’s futility on grass and the shakiness of Andy Murray and Novak Djokovic, getting inside the top four has its benefits. If everyone lives up to their seed in the first four rounds (they won’t, but bear with me), the No. 5 seed will face a path to the title that requires beating three top-four players. Whichever top-four guy has No. 5 in his quarter would confront the same challenge, but the other three would have an easier time of it. Before players are placed in the draw, top-four seeds have a 75% chance of that easier path.

Let’s attach some numbers to these speculations. I’m interested in the draw implications of three different seeding methods: ATP rankings (as every other tournament uses), the Wimbledon method, and weighted grass-court Elo. As I described last week, weighted surface-specific Elo–averaging surface-specific Elo with overall Elo–is more predictive than ATP rankings, pure surface Elo, or overall Elo. What’s more, weighted grass-court Elo–let’s call it gElo–is about as predictive as its peers for hard and clay courts, even though we have less grass-court data to go on. In a tennis world populated only by analysts, seedings would be determined by something a lot more like gElo and a lot less like the ATP computer.

Since gElo ratings provide the best forecasts, we’ll use them to determine the effects of the different seeding formulas. Here is the current gElo top sixteen, through Halle and Queen’s Club:

1   Novak Djokovic         2296.5  
2   Andy Murray            2247.6  
3   Roger Federer          2246.8  
4   Rafael Nadal           2101.4  
5   Juan Martin Del Potro  2037.5  
6   Kei Nishikori          2035.9  
7   Milos Raonic           2029.4  
8   Jo Wilfried Tsonga     2020.2  
9   Alexander Zverev       2010.2  
10  Marin Cilic            1997.7  
11  Nick Kyrgios           1967.7  
12  Tomas Berdych          1967.0  
13  Gilles Muller          1958.2  
14  Richard Gasquet        1953.4  
15  Stanislas Wawrinka     1952.8  
16  Feliciano Lopez        1945.3

We might quibble with some these positions–the algorithm knows nothing about whatever is plaguing Djokovic, for one thing–but in general, gElo does a better job of reflecting surface-specific ability level than other systems.

The forecasts

Next, we build a hypothetical 128-player draw and run a whole bunch of simulations. I’ve used the top 128 in the ATP rankings, except for known withdrawals such as David Goffin and Pablo Carreno Busta, which doesn’t differ much from the list of guys who will ultimately make up the field. Then, for each seeding method, we randomly generate a hundred thousand draws, simulate those brackets, and tally up the winners.

Here are the ATP top ten, along with their chances of winning Wimbledon using the three different seeding methods:

Player              ATP     W%  Wimb     W%  gElo     W%  
Andy Murray           1  23.6%     1  24.3%     2  24.1%  
Rafael Nadal          2   6.1%     4   5.7%     4   5.5%  
Stanislas Wawrinka    3   0.8%     5   0.5%    15   0.4%  
Novak Djokovic        4  34.1%     2  35.4%     1  34.8%  
Roger Federer         5  21.1%     3  22.4%     3  22.4%  
Marin Cilic           6   1.3%     7   1.0%    10   1.0%  
Milos Raonic          7   2.0%     6   1.6%     7   1.7%  
Dominic Thiem         8   0.4%     8   0.3%    17   0.2%  
Kei Nishikori         9   1.9%     9   1.7%     6   1.9%  
Jo Wilfried Tsonga   10   1.6%    12   1.4%     8   1.5%

Again, gElo is probably too optimistic on Djokovic–at least the betting market thinks so–but the point here is the differences between systems. Federer gets a slight bump for entering the top four, and Wawrinka–who gElo really doesn’t like–loses a big chunk of his modest title hopes by falling out of the top four.

The seeding effect is a lot more dramatic if we look at semifinal odds instead of championship odds:

Player              ATP    SF%  Wimb    SF%  gElo    SF%  
Andy Murray           1  58.6%     1  64.1%     2  63.0%  
Rafael Nadal          2  34.4%     4  39.2%     4  38.1%  
Stanislas Wawrinka    3  13.2%     5   7.7%    15   6.1%  
Novak Djokovic        4  66.1%     2  71.1%     1  70.0%  
Roger Federer         5  49.6%     3  64.0%     3  63.2%  
Marin Cilic           6  13.6%     7  11.1%    10  10.3%  
Milos Raonic          7  17.3%     6  14.0%     7  15.2%  
Dominic Thiem         8   7.1%     8   5.4%    17   3.8%  
Kei Nishikori         9  15.5%     9  14.5%     6  15.7%  
Jo Wilfried Tsonga   10  14.0%    12  13.1%     8  14.0%

There’s a lot more movement here for the top players among the different seeding methods. Not only do Federer’s semifinal chances leap from 50% to 64% when he moves inside the top four, even Djokovic and Murray see a benefit because Federer is no longer a possible quarterfinal opponent. Once again, we see the biggest negative effect to Wawrinka: A top-four seed would’ve protected a player who just isn’t likely to get that far on grass.

Surprisingly, the traditional big four are almost the only players out of all 32 seeds to benefit from the Wimbledon algorithm. By removing the chance that Federer would be in, say, Murray’s quarter, the Wimbledon seedings make it a lot less likely that there will be a surprise semifinalist. Tomas Berdych’s semifinal chances improve modestly, from 8.0% to 8.4%, with his Wimbledon seed of No. 11 instead of his ATP ranking of No. 13, but the other 27 seeds have lower chances of reaching the semis than they would have if Wimbledon stopped meddling and used the official rankings.

That’s the unexpected side effect of getting rankings and seedings right: It reduces the chances of deep runs from unexpected sources. It’s similar to the impact of Grand Slams using 32 seeds instead of 16: By protecting the best (and next best, in the case of seeds 17 through 32) from each other, tournaments require that unseeded players work that much harder. Wimbledon’s algorithm took away some serious upset potential when it removed Wawrinka from the top four, but it made it more likely that we’ll see some blockbuster semifinals between the world’s best grass court players.

Unpredictable Bounces, Predictable Results

These days, the grass court season is the awkward stepchild of the tennis calendar. It takes place almost entirely within a single country’s borders, lasts barely a month, and often suffers from the absence of top players who prefer to rest after the French Open.

The small number of grass court events makes the surface problematic for analysts, as well. The surface behaves differently than hard or clay courts and rewards certain playing styles, so it’s reasonable to assume that many players will be particularly good or bad on grass. But with 90% of tour-level matches contested on other surfaces, many players don’t have much of a track record with which we can assess their grass-court prowess.

I was surprised, then, to find that grass court results are rather predictable. Elo-based forecasts of ATP grass court matches are almost as accurate as hard court predictions and considerably more effective than clay court forecasts. Even when we use “pure” surface forecasts–that is, predicting matches using ratings which draw only on results from that surface–grass court forecasts are a bit better than clay court predictions.

I started with a dataset of the roughly 50,000 ATP matches from 2000 through last week, excluding retirements and withdrawals. As a benchmark, I used official ATP rankings to make predictions for each of those matches. 66.6% of them were right, and the Brier score for ATP rankings over that span is .210. (Brier score measures the accuracy of a set of forecasts by averaging the squared error of each individual forecast, so a lower number is better. To put tennis-specific Brier scores in context, in 2016, ATP rankings had a .208 Brier score, and aggregate betting odds had a .189 Brier score.)

Let’s break that down by surface and compare the performance of ATP rankings, Elo, and surface-specific Elo. “F%” is the percentage of matches won by the favorite–as determined by that system, and “Br” is Brier score:

Surface  ATP F%  ATP Br  Elo F%  Elo Br  sElo F%  sElo Br  
Hard      67.3%   0.207   68.0%   0.205    68.5%    0.202  
Clay      66.1%   0.211   67.1%   0.211    67.0%    0.213  
Grass     66.0%   0.215   67.6%   0.207    68.5%    0.207

All three rating systems do best on hard courts, and for good reason: official rankings and overall Elo are more heavily weighted toward hard court results than they are clay or grass. Surface-specific Elo does best on hard courts for a similar reason: more data.

Already, though, we can see the unexpected divergence of clay and grass courts, especially with surface-specific Elo. It’s possible to explain overall Elo’s better performance on grass courts due to the presumed similarly between hard and grass–if a player excels on one, he’s probably good on the other, even if he’s horrible on clay.  But that doesn’t explain sElo doing better on grass than on clay. There are 3.3 times as many tour-level matches on clay than on grass, so even allowing for the fact that players choose schedules to suit their surface preferences, almost everyone is going to have more results on dirt than on turf. More data should give us better results, but not here.

We can improve our forecasts even more by blending surface-specific ratings with overall ratings. After testing a wide range of possible mixes, it turns out that equally weighting Elo and sElo provides close to the best results. (The differences between, say, 60/40 and 50/50 are extremely small on all surfaces, so even where 60/40 is a bit better, I prefer to keep it simple with a half-and-half mix.) Here are the results for weighted surface Elos for all three surfaces:

Surface  ATP F%  ATP Br  
Hard      68.6%   0.202  
Clay      68.0%   0.207  
Grass     69.8%   0.196

Now grass courts are the most predictable of the major surfaces! Even when we use a weighted average of Elo and sElo, grass court forecasts rely on less data than those of the other surfaces–the surface-specific half of the grass court forecasts uses less than one-third the match results of clay court predictions and less than one-fifth the results of hard court forecasts. In fact, we can do at least as well–and perhaps a tiny bit better–with even less data: A 50/50 weighting of grass-specific Elo and hard-specific Elo is just as accurate as the half-and-half mix of grass-specific and overall Elo.

Regardless of the exact formula, it remains striking that we can predict ATP grass court results so accurately from such limited data. Even if one-third of ATP events were played on grass, I still wouldn’t have been surprised if grass court results turned out to be the least predictable. The more a surface favors the server–and it’s hardest to break on grass–the tighter the scoreline will tend to be, introducing more randomness into the end result. Despite that structural tendency, we’re able to pick winners as successfully on grass as on the more common surfaces.

Here’s my theory: Even though there aren’t many grass court events, the conditions at those few tournaments are quite consistent. Altitude is roughly sea level, groundskeepers follow the lead of the staff at Wimbledon, and rain clouds are almost always in sight. Compare that homogeneity to the variety of hard courts or clay courts. The high-altitude hard courts in Bogota are nothing like the slow ones in Indian Wells. The “clay” in Houston is only nominally equal to the crushed brick of Roland Garros. While grass courts are almost identical to each other, clay courts are nearly as different from each other as they are from other surfaces.

It makes sense that ratings based on a uniform surface would be more accurate than ratings based on a wide range of surfaces, and it’s reassuring to find that the limited available data doesn’t cancel out the advantage. This research also suggests a further path to better forecasts: grouping hard and clay matches by a more precise measure of surface speed. If 10% of tour matches are sufficient to make accurate grass court predictions, the same may be true of the slowest one-third of clay courts. More data is almost always better, but sometimes, precisely targeted data is best of all.