Episode 14 of the Tennis Abstract Podcast is Carl Bialik’s and my Wimbledon preview! We highlight the favorites, the overrated, and the underrated, along with a look at some of the most intriguing matchups. Along the way, we talk about the difficult of making grass court forecasts, and speculate about how players’ consistency changes with age. Enjoy!
It’s been one of the main talking points in men’s tennis for years now: The sport is getting older. Every year, a bigger slice of Grand Slam draws are taken up by thirty-somethings, and now, the entire big four has entered their fourth decades.
I don’t want to belabor the point. But my interest was piqued by an observation from commentator Chris Fowler this week:
— Chris Fowler (@cbfowler) June 27, 2017
When we talk about the sport getting older, this is what we really mean — the best guys are getting up in years.
When we calculate the average age of a draw, or the number of 30-somethings, we weight every player equally. Democratic as it is, it gives most of the weight to guys who are looking for flights home before middle Sunday. As substantial as the overall age shift has been over the last decade, the shift at the top of the game has been even more dramatic.
To quantify the shift, I calculated what I’ll call the “projected winner age” (PWA) of every Wimbledon men’s field from 1991 to 2017. This captures in one number the notion that Fowler is hinting at. We take a weighted average of all 128 men in the main draw, weighted by their chances of winning the tournament, as determined by grass-court Elos at the start of the event.
For example, last year’s Wimbledon men’s draw had an average age of 28.5 years, but a projected winner age of 30.0. We don’t yet know the exact average age of this year’s draw (it looks to be about the same, maybe a tiny bit younger), but we can already say that the PWA is 31.4.
As recently as 2011, there wasn’t much difference between average age and PWA. Until 2015, the difference had never been greater than two years. Now, the difference is almost three years, and the point of comparison–average age–is nearly its own all-time high.
A lot of this, of course, is thanks to the big four. Even as the aging curve has shifted, allowing for late bloomers such as Stan Wawrinka, the biggest stars of the late ’00s–Roger Federer and Rafael Nadal–have declined even less than the revised aging curve would imply. In a sport hungry for new winners, we might have to settle for winners who are newly in their 30s.
In the Episode 13 of the Tennis Abstract Podcast, Carl Bialik and I start with Petra Kvitova’s first title back from injury, and her chances as a floater in the Wimbledon draw, along with other returning grass-court threats Victoria Azarenka and Sabine Lisicki.
We move on to what we hope is a sensible, fact-based discussion of John McEnroe’s comments about how Serena Williams would fare on the men’s tour. It runs from about 19:00 to 50:00 — if you want to skip it, we understand. Finally, we talk Federer’s title in Halle, and how much his Wimbledon chances are aided by Wimbledon’s seeding formula, which moves him into the top four. (For more on that, see my post from earlier today.)
Unlike every other tournament on the tennis calendar, Wimbledon uses its own formula to determine seedings. The grass court Grand Slam grants seeds to the top 32 players in each tour’s rankings, and then re-orders them based on its own algorithm, which rewards players for their performance on grass over the last two seasons.
This year, the Wimbledon seeding formula has more impact on the men’s draw than usual. Seven-time champion Roger Federer is one of the best grass court players of all time, and though he dominated hard courts in the first half of 2017, he still sits outside the top four in the ATP rankings after missing the second half of 2016. Thanks to Wimbledon’s re-ordering of the seeds, Federer will switch places with ATP No. 3 Stan Wawrinka and take his place in the draw as the third seed.
Even with Wawrinka’s futility on grass and the shakiness of Andy Murray and Novak Djokovic, getting inside the top four has its benefits. If everyone lives up to their seed in the first four rounds (they won’t, but bear with me), the No. 5 seed will face a path to the title that requires beating three top-four players. Whichever top-four guy has No. 5 in his quarter would confront the same challenge, but the other three would have an easier time of it. Before players are placed in the draw, top-four seeds have a 75% chance of that easier path.
Let’s attach some numbers to these speculations. I’m interested in the draw implications of three different seeding methods: ATP rankings (as every other tournament uses), the Wimbledon method, and weighted grass-court Elo. As I described last week, weighted surface-specific Elo–averaging surface-specific Elo with overall Elo–is more predictive than ATP rankings, pure surface Elo, or overall Elo. What’s more, weighted grass-court Elo–let’s call it gElo–is about as predictive as its peers for hard and clay courts, even though we have less grass-court data to go on. In a tennis world populated only by analysts, seedings would be determined by something a lot more like gElo and a lot less like the ATP computer.
Since gElo ratings provide the best forecasts, we’ll use them to determine the effects of the different seeding formulas. Here is the current gElo top sixteen, through Halle and Queen’s Club:
1 Novak Djokovic 2296.5 2 Andy Murray 2247.6 3 Roger Federer 2246.8 4 Rafael Nadal 2101.4 5 Juan Martin Del Potro 2037.5 6 Kei Nishikori 2035.9 7 Milos Raonic 2029.4 8 Jo Wilfried Tsonga 2020.2 9 Alexander Zverev 2010.2 10 Marin Cilic 1997.7 11 Nick Kyrgios 1967.7 12 Tomas Berdych 1967.0 13 Gilles Muller 1958.2 14 Richard Gasquet 1953.4 15 Stanislas Wawrinka 1952.8 16 Feliciano Lopez 1945.3
We might quibble with some these positions–the algorithm knows nothing about whatever is plaguing Djokovic, for one thing–but in general, gElo does a better job of reflecting surface-specific ability level than other systems.
Next, we build a hypothetical 128-player draw and run a whole bunch of simulations. I’ve used the top 128 in the ATP rankings, except for known withdrawals such as David Goffin and Pablo Carreno Busta, which doesn’t differ much from the list of guys who will ultimately make up the field. Then, for each seeding method, we randomly generate a hundred thousand draws, simulate those brackets, and tally up the winners.
Here are the ATP top ten, along with their chances of winning Wimbledon using the three different seeding methods:
Player ATP W% Wimb W% gElo W% Andy Murray 1 23.6% 1 24.3% 2 24.1% Rafael Nadal 2 6.1% 4 5.7% 4 5.5% Stanislas Wawrinka 3 0.8% 5 0.5% 15 0.4% Novak Djokovic 4 34.1% 2 35.4% 1 34.8% Roger Federer 5 21.1% 3 22.4% 3 22.4% Marin Cilic 6 1.3% 7 1.0% 10 1.0% Milos Raonic 7 2.0% 6 1.6% 7 1.7% Dominic Thiem 8 0.4% 8 0.3% 17 0.2% Kei Nishikori 9 1.9% 9 1.7% 6 1.9% Jo Wilfried Tsonga 10 1.6% 12 1.4% 8 1.5%
Again, gElo is probably too optimistic on Djokovic–at least the betting market thinks so–but the point here is the differences between systems. Federer gets a slight bump for entering the top four, and Wawrinka–who gElo really doesn’t like–loses a big chunk of his modest title hopes by falling out of the top four.
The seeding effect is a lot more dramatic if we look at semifinal odds instead of championship odds:
Player ATP SF% Wimb SF% gElo SF% Andy Murray 1 58.6% 1 64.1% 2 63.0% Rafael Nadal 2 34.4% 4 39.2% 4 38.1% Stanislas Wawrinka 3 13.2% 5 7.7% 15 6.1% Novak Djokovic 4 66.1% 2 71.1% 1 70.0% Roger Federer 5 49.6% 3 64.0% 3 63.2% Marin Cilic 6 13.6% 7 11.1% 10 10.3% Milos Raonic 7 17.3% 6 14.0% 7 15.2% Dominic Thiem 8 7.1% 8 5.4% 17 3.8% Kei Nishikori 9 15.5% 9 14.5% 6 15.7% Jo Wilfried Tsonga 10 14.0% 12 13.1% 8 14.0%
There’s a lot more movement here for the top players among the different seeding methods. Not only do Federer’s semifinal chances leap from 50% to 64% when he moves inside the top four, even Djokovic and Murray see a benefit because Federer is no longer a possible quarterfinal opponent. Once again, we see the biggest negative effect to Wawrinka: A top-four seed would’ve protected a player who just isn’t likely to get that far on grass.
Surprisingly, the traditional big four are almost the only players out of all 32 seeds to benefit from the Wimbledon algorithm. By removing the chance that Federer would be in, say, Murray’s quarter, the Wimbledon seedings make it a lot less likely that there will be a surprise semifinalist. Tomas Berdych’s semifinal chances improve modestly, from 8.0% to 8.4%, with his Wimbledon seed of No. 11 instead of his ATP ranking of No. 13, but the other 27 seeds have lower chances of reaching the semis than they would have if Wimbledon stopped meddling and used the official rankings.
That’s the unexpected side effect of getting rankings and seedings right: It reduces the chances of deep runs from unexpected sources. It’s similar to the impact of Grand Slams using 32 seeds instead of 16: By protecting the best (and next best, in the case of seeds 17 through 32) from each other, tournaments require that unseeded players work that much harder. Wimbledon’s algorithm took away some serious upset potential when it removed Wawrinka from the top four, but it made it more likely that we’ll see some blockbuster semifinals between the world’s best grass court players.
These days, the grass court season is the awkward stepchild of the tennis calendar. It takes place almost entirely within a single country’s borders, lasts barely a month, and often suffers from the absence of top players who prefer to rest after the French Open.
The small number of grass court events makes the surface problematic for analysts, as well. The surface behaves differently than hard or clay courts and rewards certain playing styles, so it’s reasonable to assume that many players will be particularly good or bad on grass. But with 90% of tour-level matches contested on other surfaces, many players don’t have much of a track record with which we can assess their grass-court prowess.
I was surprised, then, to find that grass court results are rather predictable. Elo-based forecasts of ATP grass court matches are almost as accurate as hard court predictions and considerably more effective than clay court forecasts. Even when we use “pure” surface forecasts–that is, predicting matches using ratings which draw only on results from that surface–grass court forecasts are a bit better than clay court predictions.
I started with a dataset of the roughly 50,000 ATP matches from 2000 through last week, excluding retirements and withdrawals. As a benchmark, I used official ATP rankings to make predictions for each of those matches. 66.6% of them were right, and the Brier score for ATP rankings over that span is .210. (Brier score measures the accuracy of a set of forecasts by averaging the squared error of each individual forecast, so a lower number is better. To put tennis-specific Brier scores in context, in 2016, ATP rankings had a .208 Brier score, and aggregate betting odds had a .189 Brier score.)
Let’s break that down by surface and compare the performance of ATP rankings, Elo, and surface-specific Elo. “F%” is the percentage of matches won by the favorite–as determined by that system, and “Br” is Brier score:
Surface ATP F% ATP Br Elo F% Elo Br sElo F% sElo Br Hard 67.3% 0.207 68.0% 0.205 68.5% 0.202 Clay 66.1% 0.211 67.1% 0.211 67.0% 0.213 Grass 66.0% 0.215 67.6% 0.207 68.5% 0.207
All three rating systems do best on hard courts, and for good reason: official rankings and overall Elo are more heavily weighted toward hard court results than they are clay or grass. Surface-specific Elo does best on hard courts for a similar reason: more data.
Already, though, we can see the unexpected divergence of clay and grass courts, especially with surface-specific Elo. It’s possible to explain overall Elo’s better performance on grass courts due to the presumed similarly between hard and grass–if a player excels on one, he’s probably good on the other, even if he’s horrible on clay. But that doesn’t explain sElo doing better on grass than on clay. There are 3.3 times as many tour-level matches on clay than on grass, so even allowing for the fact that players choose schedules to suit their surface preferences, almost everyone is going to have more results on dirt than on turf. More data should give us better results, but not here.
We can improve our forecasts even more by blending surface-specific ratings with overall ratings. After testing a wide range of possible mixes, it turns out that equally weighting Elo and sElo provides close to the best results. (The differences between, say, 60/40 and 50/50 are extremely small on all surfaces, so even where 60/40 is a bit better, I prefer to keep it simple with a half-and-half mix.) Here are the results for weighted surface Elos for all three surfaces:
Surface ATP F% ATP Br Hard 68.6% 0.202 Clay 68.0% 0.207 Grass 69.8% 0.196
Now grass courts are the most predictable of the major surfaces! Even when we use a weighted average of Elo and sElo, grass court forecasts rely on less data than those of the other surfaces–the surface-specific half of the grass court forecasts uses less than one-third the match results of clay court predictions and less than one-fifth the results of hard court forecasts. In fact, we can do at least as well–and perhaps a tiny bit better–with even less data: A 50/50 weighting of grass-specific Elo and hard-specific Elo is just as accurate as the half-and-half mix of grass-specific and overall Elo.
Regardless of the exact formula, it remains striking that we can predict ATP grass court results so accurately from such limited data. Even if one-third of ATP events were played on grass, I still wouldn’t have been surprised if grass court results turned out to be the least predictable. The more a surface favors the server–and it’s hardest to break on grass–the tighter the scoreline will tend to be, introducing more randomness into the end result. Despite that structural tendency, we’re able to pick winners as successfully on grass as on the more common surfaces.
Here’s my theory: Even though there aren’t many grass court events, the conditions at those few tournaments are quite consistent. Altitude is roughly sea level, groundskeepers follow the lead of the staff at Wimbledon, and rain clouds are almost always in sight. Compare that homogeneity to the variety of hard courts or clay courts. The high-altitude hard courts in Bogota are nothing like the slow ones in Indian Wells. The “clay” in Houston is only nominally equal to the crushed brick of Roland Garros. While grass courts are almost identical to each other, clay courts are nearly as different from each other as they are from other surfaces.
It makes sense that ratings based on a uniform surface would be more accurate than ratings based on a wide range of surfaces, and it’s reassuring to find that the limited available data doesn’t cancel out the advantage. This research also suggests a further path to better forecasts: grouping hard and clay matches by a more precise measure of surface speed. If 10% of tour matches are sufficient to make accurate grass court predictions, the same may be true of the slowest one-third of clay courts. More data is almost always better, but sometimes, precisely targeted data is best of all.
In the Episode 12 of the Tennis Abstract Podcast, Carl Bialik and I get excited about grass season, both for the glories of the surface itself and for the great players it has brought back, including Roger Federer and Victoria Azarenka. We talk Fed’s surprise loss, the arcane (but worthwhile) Wimbledon seeding formula, the returning WTA stars who will threaten on grass, David Goffin’s injury, and the debut Challenger title for 16-year-old Felix Auger-Aliassime.
We had a lot of fun recording this one. Hope you enjoy it as well!
If you picked up only two stats about surprise Roland Garros champion Jelena Ostapenko, you probably heard that, first, her average forehand is faster than Andy Murray’s, and second, she hit 299 winners in her seven French Open matches. I’m not yet sure how much emphasis we should put on shot speed, and I instinctively distrust raw totals, but even with those caveats, it’s hard not to be impressed.
Compared to the likes of Simona Halep, Timea Bacsinszky, and Caroline Wozniacki, the last three women she upset en route to her maiden title, Ostapenko was practically playing a different game. Her style is more reminiscent of fellow Slam winners Petra Kvitova and Maria Sharapova, who don’t construct points so much as they destruct them. What I’d like to know, then, is how Ostapenko stacks up against the most aggressive players on the WTA tour.
Thankfully we already have a metric for this: Aggression Score, which I’ll abbreviate as AGG. This stat requires that we know three things about every point: How many shots were hit, who won it, and how. With that data, we figure out what percentage of a player’s shots resulted in winners, unforced errors, or her opponent’s forced errors. (Technically, the denominator is “shot opportunities,” which includes shots a player didn’t manage to hit after her opponent hit a winner. That doesn’t affect the results too much.) For today’s purposes, I’m calculating AGG without a player’s serves–both aces and forced return errors–so we’re capturing only rally aggression.
The typical range of this version AGG is between 0.1–very passive–and 0.3–extremely aggressive. Based on the nearly 1,600 women’s matches in the Match Charting Project dataset, Kvitova and Julia Goerges represent the aggressive end, with average AGGs around .275. We only have four Samantha Crawford matches in the database, but early signs suggest she could outpace even those women, as her average is at .312. At the other end of the spectrum, Madison Brengle is at 0.11, with Wozniacki and Sara Errani at 0.12. In the Match Charting data, there are single-day performances that rise as high as 0.44 (Serena Williams over Errani at the 2013 French Open) and fall as low as 0.06. In the final against Ostapenko, Halep’s aggression score was 0.08, half of her average of 0.16.
Context established, let’s see where Ostapenko fits in, starting with the Roland Garros final. Against Halep, her AGG was a whopping .327. That’s third highest of any player in a major final, behind Kvitova at Wimbledon in 2014 (.344) and Serena at the 2007 Australian Open (.328). (We have data for every Grand Slam final back to 1999, and most of them before that.) Using data from IBM Pointstream, which encompasses almost all matches at Roland Garros this year, Ostapenko’s aggression in the final was 7th-highest of any match in the tournament–out of 188 player-matches with the necessary data–behind two showings from Bethanie Mattek Sands, one each from Goerges, Madison Keys, and Mirjana Lucic … and Ostapenko’s first-round win against Louisa Chirico. It was also the third-highest recorded against Halep out of more than 200 Simona matches in the Match Charting dataset.
You get the picture: The French Open final was a serious display of aggression, at least from one side of the court. That level of ball-bashing was nothing new for the Latvian, either. We have charting data for her last three matches at Roland Garros, along with two matches from Charleston and one from Prague this clay season. Of those six performances, Ostapenko’s lowest AGG was .275, against Wozniacki in the Paris quarters. Her average across the six was .303.
If those recent matches indicate what we’ll see from her in the future, she will likely score as the most aggressive rallying player on the WTA tour. Because she played less aggressively in her earlier matches on tour, her career average still trails those of Kvitova and Goerges, but not by much–and probably not for long. It’s scary to consider what might happen as she gets stronger; we’ll have to wait and see how her tactics evolve, as well.
The Match Charting Project contains at least 15 matches on 62 different players–here is the rally-only aggression score for all of them:
PLAYER MATCHES RALLY AGG Julia Goerges 15 0.277 Petra Kvitova 57 0.277 Jelena Ostapenko 17 0.271 Madison Keys 35 0.261 Camila Giorgi 17 0.257 Sabine Lisicki 19 0.246 Caroline Garcia 15 0.242 Coco Vandeweghe 17 0.238 Serena Williams 108 0.237 Laura Siegemund 19 0.235 Anastasia Pavlyuchenkova 17 0.230 Danka Kovinic 15 0.223 Kristina Mladenovic 28 0.222 Na Li 15 0.218 Maria Sharapova 73 0.217 PLAYER MATCHES RALLY AGG Eugenie Bouchard 52 0.214 Ana Ivanovic 46 0.211 Garbine Muguruza 57 0.210 Lucie Safarova 29 0.209 Karolina Pliskova 42 0.207 Elena Vesnina 20 0.207 Venus Williams 46 0.205 Johanna Konta 31 0.205 Monica Puig 15 0.203 Dominika Cibulkova 38 0.198 Martina Navratilova 25 0.197 Steffi Graf 39 0.196 Anastasija Sevastova 17 0.194 Samantha Stosur 19 0.193 Sloane Stephens 15 0.190 PLAYER MATCHES RALLY AGG Ekaterina Makarova 23 0.189 Lauren Davis 16 0.186 Heather Watson 16 0.185 Daria Gavrilova 20 0.183 Justine Henin 28 0.183 Kiki Bertens 15 0.181 Monica Seles 18 0.179 Svetlana Kuznetsova 28 0.174 Timea Bacsinszky 28 0.174 Victoria Azarenka 55 0.170 Andrea Petkovic 24 0.166 Roberta Vinci 23 0.164 Barbora Strycova 16 0.163 Belinda Bencic 31 0.163 Jelena Jankovic 24 0.162 PLAYER MATCHES RALLY AGG Alison Riske 15 0.161 Angelique Kerber 83 0.161 Flavia Pennetta 23 0.160 Simona Halep 218 0.160 Carla Suarez Navarro 31 0.159 Martina Hingis 15 0.157 Chris Evert 20 0.152 Darya Kasatkina 18 0.148 Elina Svitolina 46 0.141 Yulia Putintseva 15 0.137 Alize Cornet 18 0.136 Agnieszka Radwanska 90 0.130 Annika Beck 16 0.126 Monica Niculescu 25 0.124 Caroline Wozniacki 62 0.122 Sara Errani 23 0.121
(A few of the match counts differ slightly from what you’ll find on the MCP home page. I’ve thrown out a few matches with too much missing data or in formats that didn’t play nice with the script I wrote to calculate aggression score.)