Is Jelena Ostapenko More Than the Next Iva Majoli?

Winning a Grand Slam as a teenager–or, in the case of this year’s French Open champion Jelena Ostapenko, a just-barely 20-year-old–is an impressive feat. But it isn’t always a guarantee of future greatness. Plenty of all-time greats launched their careers with Slam titles at age 20 or later, but three of the players who won their debut major at ages closest to Ostapenko’s serve as cautionary tales in the opposite direction: Iva Majoli, Mary Pierce, and Gabriela Sabatini. Each of these women was within three months of her 20th birthday when she won her first title, and of the three, only Pierce ever won another.

However, simply comparing her age to that of previous champions understates the Latvian’s achievement. Women’s tennis has gotten older over the last two decades: The average age of a women’s singles entrant in Paris this year was 25.6, a few days short of the record established at Roland Garros and Wimbledon last year. That’s two years older than the average player 15 years ago, and four years older than the typical entrant three decades ago. Headed into the French Open this year, there were only five teenagers ranked in the top 100; at the end of 2004, the year of Maria Sharapova’s and Svetlana Kuznetsova’s first major victories, there were nearly three times as many.

Thus, it doesn’t seem quite right to group Ostapenko with previous 19- and 20-year-old first-time winners. Instead, we might consider the Latvian’s “relative age”—the difference between her and the average player in the draw—of 5.68 years younger than the field. When I introduced the concept of relative age last week, it was in the context of Slam semifinalists, and in every era, there have been some very young players reaching the final four who burned out just as quickly. The same isn’t true of women who went on to win major titles.

In the last thirty years, only two players have won a major with a greater relative age than Ostapenko: Sharapova, who was 6.66 years younger than the 2004 US Open field, and Martina Hingis, who won three-quarters of the Grand Slam in 1997 at age 16, between 6.3 and 6.6 years younger than each tournament’s average entrant. The rest of the top five emphasizes Ostapenko’s elite company, including Monica Seles (5.29, at the 1990 French Open) and Serena Williams (5.26, at the 1999 US Open).

Each of those four women went on to reach the No. 1 ranking and win at least five majors–an outrageously optimistic forecast for Ostapenko, who, even after winning Roland Garros, is ranked outside the top ten. By relative age, Majoli, Pierce, and Sabatini are poor comparisons for Saturday’s champion–Majoli and Pierce were only three years younger than the fields they overcame, and Sabatini was only two years younger than the average entrant. By comparison, Garbine Muguruza, who won last year’s French Open at age 22, was two and a half years younger than the field.

Which is it, then? Unfortunately I don’t have the answer to that, and we probably won’t have a better idea for several more years. For most of the Open Era, until about ten years ago, the average age on the women’s tour fluctuated between 21 and 23. Thus, for the overall population of first-time major champions, actual age and relative age are very highly correlated. It’s only with the last decade’s worth of debut winners that the numbers meaningfully diverge. For Ostapenko and Muguruza–and perhaps Victoria Azarenka and Petra Kvitova–we have yet to see what their entire career trajectory will look like. To build a bigger sample to test the hypothesis, we’ll need a few more young first-time Slam winners, something we may finally see with Sharapova and Williams out of the way.

For more post-French Open analysis, here’s my Economist piece on Ostapenko and projecting major winners in the long term. Also at the Game Theory blog, I wrote about Rafael Nadal and his abssurd dominance on clay in a fast-court-friendly era.

Finally, check out Carl Bialik’s and my extra-long podcast, recorded Monday, with tons of thoughts and the winners and the fields in general.

First Meetings in Grand Slam Finals

The 2017 Roland Garros final is crammed with firsts for 20-year-old Latvian Jelena Ostapenko. Playing in only her eighth major, she had never before reached the round of 16, let alone the final two. Her opponent, Simona Halep, has been here before–she lost the 2014 French Open final to Maria Sharapova–but the two women have one first in common: Halep and Ostapenko have never played each other.

Slam finals are usually reserved for an elite group, and that select few tends to play each other quite a bit. Since 1980, women’s major finalists have had an average of 12 previous meetings. The veteran Australian Open finalists this year, Serena Williams and Venus Williams, had faced off 27 times before their clash in Melbourne.

That makes the Halep-Ostapenko debut meeting an unusual one, but the situation is not unheard of. The 2012 Roland Garros final was the first match between Sharapova and Sara Errani (they’ve since played five more). Overall, there have been five first meetings in women’s major finals in the last 35 years:

```Slam     Winner           Finalist
2012 RG  Maria Sharapova  Sara Errani
2009 US  Kim Clijsters    Caroline Wozniacki
2007 W   Venus Williams   Marion Bartoli
1988 RG  Steffi Graf      Natalia Zvereva```

(There were probably a few more before that, but my database is missing a lot of matches from the mid-1970s, so I don’t know for sure.)

In all of these cases, the established star defeated the upstart, which bodes well for Halep. On the other hand, the Romanian doesn’t quite measure up to the previous four winners, all of whom had won a Grand Slam title before their final on this list.

First meetings in Grand Slam finals are a bit more common in the men’s game, though it’s been nearly a decade since the last one. We’ll probably wait quite a bit longer, too. Rafael Nadal and Stanislas Wawrinka will play for the 19th time on Sunday, and of the 45 possible pairings in the current top ten, only Kei Nishikori and Alexander Zverev have yet to face off. The next highest-ranked pair without a head-to-head is Andy Murray and Jack Sock which, come to think of it, would make for an interesting Wimbledon final next month.

The last debut clash on such a big stage was the 2008 Australian Open, between Novak Djokovic and Jo Wilfried Tsonga. It was the eighth in the last 35 years:

```Slam     Winner            Finalist
2008 AO  Novak Djokovic    Jo Wilfried Tsonga
2003 US  Andy Roddick      Juan Carlos Ferrero
1997 RG  Gustavo Kuerten   Sergi Bruguera
1997 AO  Pete Sampras      Carlos Moya
1996 W   Richard Krajicek  Malivai Washington
1986 RG  Ivan Lendl        Mikael Pernfors
1985 W   Boris Becker      Kevin Curren
1984 AO  Mats Wilander     Kevin Curren```

Before 1982, most first-meeting finals took place at the Australian Open, which at that time usually featured a weaker draw than the other Slams. For instance, the 1979 final was played by Guillermo Vilas and John Sadri. While Vilas is among the all-time greats, Sadri never advanced beyond the fourth round of any other major–where he might have encountered Vilas more often.

One thing seems certain: It won’t be the last meeting for Halep and Ostapenko. All of the pairs I’ve listed played at least once after their Slam final, and with the exception of Wilander-Curren, each one played at least twice more. Halep is only 25, so if she remains near the top of the game and Ostapenko continues climbing the ranks, the pair could aim to match Graf and Zvereva, who met 20 more times after the 1988 French Open final. The loser of today’s match will want to avoid Zvereva’s fate, though: In those 20 matches, the Belarussian won only once.

Bouncing Back From a Marathon Third Set

In this year’s edition of the French Open, we’ve already seen two women’s matches charge past the 6-6 mark in the third set. On Sunday, Madison Brengle outlasted Julia Goerges 13-11 in the decider, and yesterday, Kristina Mladenovic overcame Jennifer Brady 9-7 in the final set. Marathon three-setters aren’t as gut-busting as the five-set equivalent on the men’s tour, yet they still require players to go beyond the usual limit of a tour match.

Do marathon three-setters affect the fortunes of those players that move on to the next round? Back in 2012, I published a study showing that men who win marathon five-setters (that is, matches that go to 8-6 or longer) win fewer than 30% of their following matches, a rate far worse than what we would expect, given the quality of their next opponents. It seems likely that long three-setters wouldn’t have the same effect, especially since many top women are willing to play five-setters themselves.

The numbers bear out the intuition. From 2001 to the 2017 Australian Open, there have been 185 marathon three-setters in Grand Slam main draws, and the winners of those matches have gone on to win 42.2% of their next contests. That’s more than the equivalent number for men, and it’s even better than it sounds.

Players who need to go deep into a third set to vanquish an early-round opponent are, on average, weaker than those who win in straight sets, so many of the marathon women would already be considered underdogs in their next matches. Using sElo–surface-specific Elo, which I recently introduced–we see that these 185 marathon women would have been expected to win only 44.0% of their following matches. There may be a real effect here, but it is a minor one, especially compared to the fortunes of players who struggle through marathon five-setters.

I ran the same algorithm for women’s Slam matches that ended at 7-6, 7-5, and 6-4 or 6-3 in the final set. Since only the US Open uses the third-set tiebreak format, the available sample for that score is limited, which may explain a slightly wacky result. For the other scores, we see numbers that are roughly similar to the marathon findings. Winners tend to be underdogs against their next opponents, but there is little, if any, hangover effect:

```3rd Set Score  Sample  Next W%  Next ExpW%
Marathons         185    42.2%       44.0%
7-6                56    48.2%       42.2%
7-5               232    43.1%       42.7%
6-4 / 6-3         421    41.6%       43.2%```

In short: A long match often tells us something about the winner’s chances against her next foe, but it’s something that we already knew. The tight three-setter itself–marathon or otherwise–has little effect on her chances later on. That’s good news for Mladenovic, who will be back on court tomorrow against Sara Errani, an opponent likely to give her another grueling workout.

Measuring the Performance of Tennis Prediction Models

With the recent buzz about Elo rankings in tennis, both at FiveThirtyEight and here at Tennis Abstract, comes the ability to forecast the results of tennis matches. It’s not far fetched to ask yourself, which of these different models perform better and, even more interesting, how they fare compared to other ‘models’, such as the ATP ranking system or betting markets.

For this, admittedly limited, investigation, we collected the (implied) forecasts of five models, that is, FiveThirtyEight, Tennis Abstract, Riles, the official ATP rankings, and the Pinnacle betting market for the US Open 2016. The first three models are based on Elo. For inferring forecasts from the ATP ranking, we use a specific formula1 and for Pinnacle, which is one of the biggest tennis bookmakers, we calculate the implied probabilities based on the provided odds (minus the overround)2.

Next, we simply compare forecasts with reality for each model asking If player A was predicted to be the winner ($P(a) > 0.5$), did he really win the match? When we do that for each match and each model (ignoring retirements or walkovers) we come up with the following results.

```Model		% correct
Pinnacle	76.92%
538		75.21%
TA		74.36%
ATP		72.65%
Riles		70.09%
```

What we see here is how many percent of the predictions were actually right. The betting model (based on the odds of Pinnacle) comes out on top followed by the Elo models of FiveThirtyEight and Tennis Abstract. Interestingly, the Elo model of Riles is outperformed by the predictions inferred from the ATP ranking. Since there are several parameters that can be used to tweak an Elo model, Riles may still have some room left for improvement.

However, just looking at the percentage of correctly called matches does not tell the whole story. In fact, there are more granular metrics to investigate the performance of a prediction model: Calibration, for instance, captures the ability of a model to provide forecast probabilities that are close to the true probabilities. In other words, in an ideal model, we want 70% forecasts to be true exactly in 70% of the cases. Resolution measures how much the forecasts differ from the overall average. The rationale here is, that just using the expected average values for forecasting will lead to a reasonably well-calibrated set of predictions, however, it will not be as useful as a method that manages the same calibration while taking current circumstances into account. In other words, the more extreme (and still correct) forecasts are, the better.

In the following table we categorize the set of predictions into bins of different probabilities and show how many percent of the predictions were correct per bin. This also enables us to calculate Calibration and Resolution measures for each model.

```Model    50-59%  60-69%  70-79%  80-89%  90-100% Cal  Res   Brier
538      53%     61%     85%     80%     91%     .003 .082  .171
TA       56%     75%     78%     74%     90%     .003 .072  .182
Riles    56%     86%     81%     63%     67%     .017 .056  .211
ATP      50%     73%     77%     84%     100%    .003 .068  .185
Pinnacle 52%     91%     71%     77%     95%     .015 .093  .172
```

As we can see, the predictions are not always perfectly in line with what the corresponding bin would suggest. Some of these deviations, for instance the fact that for the Riles model only 67% of the 90-100% forecasts were correct, can be explained by small sample size (only three in that case). However, there are still two interesting cases (marked in bold) where sample size is better and which raised my interest. Both the Riles and Pinnacle models seem to be strongly underconfident (statistically significant) with their 60-69% predictions. In other words, these probabilities should have been higher, because, in reality, these forecasts were actually true 86% and 91% percent of the times.3 For the betting aficionados, the fact that Pinnacle underestimates the favorites here may be really interesting, because it could reveal some value as punters would say. For the Riles model, this would maybe be a starting point to tweak the model.

In the last three columns Calibration (the lower the better), Resolution (the higher the better), and the Brier score (the lower the better) are shown. The Brier score combines Calibration and Resolution (and the uncertainty of the outcomes) into a single score for measuring the accuracy of predictions. The models of FiveThirtyEight and Pinnacle (for the used subset of data) essentially perform equally good. Then there is a slight gap until the model of Tennis Abstract and the ATP ranking model come in third and fourth, respectively. The Riles model performs worst in terms of both Calibration and Resolution, hence, ranking fifth in this analysis.

To conclude, I would like to show a common visual representation that is used to graphically display a set of predictions. The reliability diagram compares the observed rate of forecasts with the forecast probability (similar to the above table).

The closer one of the colored lines is to the black line, the more reliable the forecasts are. If the forecast lines are above the black line, it means that forecasts are underconfident, in the opposite case, forecasts are overconfident. Given that we only investigated one tournament and therefore had to work with a low sample size (117 predictions), the big swings in the graph are somewhat expected. Still, we can see that the model based on ATP rankings does a really good job in preventing overestimations even though it is known to be outperformed by Elo in terms of prediction accuracy.

To sum up, this analysis shows how different predictive models for tennis can be compared among each other in a meaningful way. Moreover, I hope I could exhibit some of the areas where a model is good and where it’s bad. Obviously, this investigation could go into much more detail by, for example, comparing the models in how well they do for different kinds of players (e.g., based on ranking), different surfaces, etc. This is something I will spare for later. For now, I’ll try to get my sleeping patterns accustomed to the schedule of play for the Australian Open, and I hope, you can do the same.

Peter Wetz is a computer scientist interested in racket sports and data analytics based in Vienna, Austria.

Footnotes

1. $P(a) = a^e / (a^e + b^e)$ where $a$ are player A’s ranking points, $b$ are player B’s ranking points, and $e$ is a constant. We use $e = 0.85$ for ATP men’s singles.

2. The betting market in itself is not really a model, that is, the goal of the bookmakers is simply to balance their book. This means that the odds, more or less, reflect the wisdom of the crowd, making it a very good predictor.

3. As an example, one instance, where Pinnacle was underconfident and all other models were more confident is the R32 encounter between Ivo Karlovic and Jared Donaldson. Pinnacle’s implied probability for Karlovic to win was 64%. The other models (except the also underconfident Riles model) gave 72% (ATP ranking), 75% (FiveThirtyEight), and 82% (Tennis Abstract). Turns out, Karlovic won in straight sets. One factor at play here might be that these were the US Open where more US citizens are likely to be confident about the US player Jared Donaldson and hence place a bet on him. As a consequence, to balance the book, Pinnacle will lower the odds on Donaldson, which results in higher odds (and a lower implied probability) for Karlovic.

Can Nick Kyrgios Win a Grand Slam?

Today’s breaking news? Former Wimbledon finalist Mark Philippoussis thinks that Nick Kyrgios can win the Australian Open. Hey, it’s almost the offseason. We take our news wherever we can get it.

Still, it’s an interesting question. Is it possible for such a volatile, one-dimensional player to string together seven wins on one of the biggest stages in the sport? Philippoussis–not the most versatile of players himself–reached two Slam finals. A big serve can take you far.

Last year, I published a post investigating the “minimum viable return game,” the level of return success that a player would need to maintain in order to reach the highest echelon of men’s tennis. It’s rare to finish a season in the top ten without winning at least 38% of return points, though a few players, including Milos Raonic, have managed it. When I wrote that article, Kyrgios’s average for the previous 52 weeks was a measly 31.7%, almost in the territory of John Isner and Ivo Karlovic.

Kyrgios has improved since then. In 2016, he won 35.4% of return points, almost equal to Raonic’s 35.9%–and most would agree that Milos had an excellent year. Philippoussis’s career mark was only 34.9%, though Kyrgios would be lucky to play as many tournaments on grass and carpet as Philippoussis did. Still, a sub-36% rate of return points won isn’t usually good enough in today’s game: Raonic was only the third player since 1991 (along with Pete Sampras and Goran Ivanisevic) to finish a season in the top five with such a low rate.

Then again, Philippoussis didn’t say anything about finishing in the top five. The “minimum viable Slam-winning return game” might be different. Looking at all Grand Slam champions back to 1991, here are the lowest single-tournament rates of return points won:

```Year  Slam             Player               RPW%
2001  Wimbledon        Goran Ivanisevic    31.1%
1996  US Open          Pete Sampras        32.8%
2009  Wimbledon        Roger Federer       33.7%
2002  US Open          Pete Sampras        35.6%
2000  Wimbledon        Pete Sampras        36.6%
2014  Australian Open  Stan Wawrinka       37.0%
1998  Wimbledon        Pete Sampras        37.2%
1991  Wimbledon        Michael Stich       37.4%
2000  US Open          Marat Safin         37.5%
```

Wimbledon is well-represented here, as we might expect. Not so for Kyrgios’s home Slam: Stan Wawrinka‘s 2014 Australian Open title is the only time it appears in the top 20, even though it has played very fast in recent years. Every other Melbourne titlist won at least 39.5% of return points. As with year-end top-ten finishes, 38% is a reasonable rule of thumb for the minimum viable level, though on rare occasions, it is possible to come in below that.

The bar is set: Can Kyrgios clear it? 18 months ago, when Kyrgios’s 52-week return-points-won average was below 32%, the obvious answer would have been negative. His current mark above 35% makes the question a more interesting one. To win a Slam, he’ll probably need to return better, but only for seven matches.

The Australian has enjoyed one seven-match streak–in fact, a nine-match run–that would be more than good enough. Combining his title in Marseille and his semifinal showing in Dubai this Februrary, Kyrgios played almost nine matches (he retired with a back injury in the last one) while winning a whopping 41.5% of return points. At 42 of the last 104 Slams, the champion has won return points at a lower rate.

However, February was an aberration. To approximate Kyrgios’s success over the length of a Slam, I looked at his return points won over every possible streak of ten matches. (Most of his matches have been best-of-three, so ten matches is about the same number of points as a Slam title run.) Aside from the streaks involving Marseille and Dubai this year, he has never topped 37% for that length of time.

There’s always hope for improvement, especially for a mercurial 21-year-old in a sport dominated by older men. But the evidence is against him here, as well. Research by falstaff78 suggests that players do not substantially improve their return statistics as they mature. That may seem counterintuitive, since some players clearly do develop their skills. However, as players get better, they go deeper in tournaments and alter their schedules, changing the mix of opponents they face. Two years ago, Kyrgios faced seven top-20 players. This year he played 18. Raonic, who represents an optimistic career trajectory for Kyrgios, faced 26 this season.

Against the top 20–the sorts of Grand Slam opponents a player has to beat to get from the fourth round to the trophy ceremony–Kyrgios has won less than 30% of his career return points. Even Raonic, who has yet to win a Slam himself, has done better, and won 32.6% of return points against top-20 opponents this year.

There’s little doubt that Kyrgios has the serve to win Grand Slams. And once the Big Four retire, I suppose someone will have to win the majors. But even in weak eras, you need to break serve, and at Slams, you typically need to do so many times, and against very high-quality opponents. The evidence we have so far strongly implies that Kyrgios, like Philippoussis before him, will struggle to triumph at a Slam.

Shot-by-Shot Stats for 261 Grand Slam Finals (and More?)

One of my favorite subsets of the Match Charting Project is the ongoing effort–in huge part thanks to Edo–to chart all Grand Slam finals, men’s and women’s, back to 1980. We’re getting really close, with a total of 261 Slam finals charted, including:

• every men’s Wimbledon and US Open final all the way back to 1980;
• every men’s Slam final since 1989 Wimbledon;
• every women’s Slam final back to 2001, with a single exception.

The Match Charting Project gathers and standardizes data that, for many of these matches, simply didn’t exist before. These recaps give us shot-by-shot breakdowns of historically important matches, allowing us to quantify how the game has changed–at least at the very highest level–over the last 35 years. A couple of months ago, I did one small project using this data to approximate surface speed changes–that’s just the tip of the iceberg in terms of what you can do with this data. (The dataset is also publicly available, so have fun!)

We’ve got about 30 Slam finals left to chart, and you might be able to help. As always, we are actively looking for new contributors to the project to chart matches (here’s how to get started, and why you should, and you don’t have to chart Slam finals!), but right now, I have a different request.

We’ve scoured the internet, from YouTube to Youku to torrent trackers, to find video for all of these matches. While I don’t expect any of you to have the 1980 Teacher-Warwick Australian Open final sitting around on your hard drive, I’ve got higher hopes for some of the more recent matches we’re missing.

If you have full (or nearly full) video for any of these matches, or you know of a (preferably free) source where we can find them, please–please, please!–drop me a line. Once we have the video, Edo or I will do the rest, and the project will become even more valuable.

There are several more finals from the 1980s that we’re still looking for. Here’s the complete list.

The Grass is Slowing: Another Look at Surface Speed Convergence

A few years ago, I posted one of my most-read and most-debated articles, called The Mirage of Surface Speed Convergence.  Using the ATP’s data on ace rates and breaks of serve going back to 1991, it argued that surface speeds aren’t really converging, at least to the extent we can measure them with those two tools.

One of the most frequent complaints was that I was looking at the wrong data–surface speed should really be quantified by rally length, spin rate, or any number of other things. As is so often the case with tennis analytics, we have only so much choice in the matter. At the time, I was using all the data that existed.

Thanks to the Match Charting Project–with a particular tip of the cap to Edo Salvati–a lot more data is available now. We have shot-by-shot stats for 223 Grand Slam finals, including over three-fourths of Slam finals back to 1980. While we’ll never be able to measure anything like ITF Court Pace Rating for surfaces thirty years in the past, this shot-by-shot data allows us to get closer to the truth of the matter.

Sure enough, when we take a look at a simple (but until recently, unavailable) metric such as rally length, we find that the sport’s major surfaces are playing a lot more similarly than they used to. The first graph shows a five-year rolling average* for the rally length in the men’s finals of each Grand Slam from 1985 to 2015:

* since some matches are missing, the five-year rolling averages each represent the mean of anywhere from two to five Slam finals.

Over the last decade and a half, the hard-court and grass-court slams have crept steadily upward, with average rally lengths now similar to those at Roland Garros, traditionally the slowest of the four Grand Slam surfaces. The movement is most dramatic in the Wimbledon grass, which for many years saw an average rally length of a mere two shots.

For all the advantages of rally length and shot-by-shot data, there’s one massive limitation to this analysis: It doesn’t control for player. (My older analysis, with more limited data per match, but for many more matches, was able to control for player.) Pete Sampras contributed to 15 of our data points, but none on clay. Andres Gomez makes an appearance, but only at Roland Garros. Until we have shot-by-shot data on multiple surfaces for more of these players, there’s not much we can do to control for this severe case of selection bias.

So we’re left with something of a chicken-and-egg problem.  Back in the early 90’s, when Roland Garros finals averaged almost six shots per point and Wimbledon finals averaged barely two shots per point, how much of the difference was due to the surface itself, and how much to the fact that certain players reached the final? The surface itself certainly doesn’t account for everything–in 1988, Mats Wilander and Ivan Lendl averaged over seven shots per point at the US Open, and in 2002, David Nalbandian and Lleyton Hewitt topped 5.5 shots per point at Wimbledon.

Still, outliers and selection bias aside, the rally length convergence we see in the graph above reflects a real phenomenon, even if it is amplified by the bias. After all, players who prefer short points win more matches on grass because grass lends itself to short points, and in an earlier era, “short points” meant something more extreme than it does today.

The same graph for women’s Grand Slam finals shows some convergence, though not as much:

Part of the reason that the convergence is more muted is that there’s less selection bias. The all-surface dominance of a few players–Chris Evert, Martina Navratilova, and Steffi Graf–means that, if only by historical accident, there is less bias than in men’s finals.

We still need a lot more data before we can make confident statements about surface speeds in 20th-century tennis. (You can help us get there by charting some matches!) But as we gather more information, we’re able to better illustrate how the surfaces have become less unique over the years.