How Much Does Height Matter in Men’s Tennis?

Italian translation at settesei.it

Clearly, height matters. On average, tall players can serve faster and more effectively than can shorter players. And usually, short players who succeed on tour do so by returning and moving better than their taller colleagues. The conventional wisdom is that height is an advantage, but only up to a point. An inch or two above six feet (a range between 185 and 190 cm) is good, but much more than that is too much. No player above 6’4″ (193 cm, Marat Safin) has ever reached No. 1 in the ATP rankings.

While 5’7″ (170 cm) Diego Schwartzman‘s surprise run to the US Open quarterfinals has brought this issue to the forefront, pundits and fans talk about it all the time. This is a topic crying out for some basic data analysis, yet as is too often the case in tennis, some really simple work is missing from the conversation. Let’s try to fix that.

When I say “basic,” I really mean it. We all know that tall men hit more aces than short men. But how many? How strong is the relationship between height and, say, first serve points won? In this post, I’ll show the relationship between height and each of nine different stats, from overall records to serve- and return-specific numbers.

For my dataset, I took age-25 seasons from 1998 to 2017 in which the player completed at least 30 tour-level matches. (I used only one season per player so that the best players with the longest careers wouldn’t be weighted too heavily.) That gives us 156 player-seasons, from Hicham Arazi and Greg Rusedski in 1998 up to Schwartzman and Jack Sock in 2017. There aren’t very many players at the extremes, so I lumped together everyone 5’8″ (173 cm) and below and did the same with everyone 6’5″ (196 cm) and above. I also grouped players standing 5’10” with those at 5’9″, because there were only four 5’10” guys in the dataset.

That gives us nine “height levels”: one per inch from 5’8″ to 6’5″ with the exception of 5’10”. (The ATP website displays heights in meters, but its database must record and/or store them in inches, because every height translates to something close to an integer height in inches. For example, no player is listed at 174 cm, or 5’8.5″.) Some individual heights are certainly exaggerated, as male athletes and their organizations tend to do, but we have to make do with the information available, and we may assume that the exaggerations are fairly consistent.

Let’s start with the most basic building block of tennis, the match win. There is a reasonably strong relationship here, although the group of players at 6’1″ is nearly as good as the tallest subset. In each of these graphs, height is given on the horizontal axis in centimeters, from 173 (the 5’8″ and below group) up to 196 (the 6’5″ and higher group).

There is a similar, albeit slightly weaker, relationship when we look at the level of single points. Since a small difference in points results in a larger difference in matches won (at the extreme, winning 55% of points translates to nearly a 100% chance of winning the match) this isn’t a surprise. At the match level, r^2 = 0.38, and at the point level, below, r^2 = 0.27:

(If you’re wondering how all of the averages are above 50%, it’s because the sample is limited to player-seasons with at least 30 matches. A fair number of those matches are against players who aren’t tour regulars, and the regulars–the guys in this sample–win a hefty proportion of those matches.)

Serve stats

Now we get to confirm our main assumptions. Taller players are better servers, and the gap is enormous, ranging from 60% of service points won for the shortest players up to nearly 70% for the tallest:

As strong as that relationship is (r^2 = 0.81), the relationship between height and ace rate is stronger still, at r^2 = 0.83:

Aces don’t tell the whole story–the stat with the strongest correlation to height is first serve points won (r^2 = 0.92) as you can see here:

But this is where things start to get interesting. Nearly every inch makes a player more effective on the first serve, but opponents are able to negotiate tall players’ second serves much more successfully. There remains a modest relationship with height (r^2 = 0.18), but it is the weakest of all the stats presented here:

It’s nice to be tall, as anyone who has seen John Isner casually spin a second-serve ace out of the reach of an unlucky opponent. But except in the tallest category, height doesn’t confer much of a second-serve advantage. Players standing 6’4″ (193 cm) win about as many second-serve points as do players at 5’9″ (175 cm). That doesn’t mean that the second serves of the shorter players are just as good–they probably aren’t–but that shorter players tend to possess other skills that they can leverage in second-serve points, which usually last longer. For the purposes of today’s overview, it doesn’t really matter why short players are able to negate the advantage of height on second serve points, just that they are clearly able to do so.

Return stats

We wouldn’t be having this conversation–and David Ferrer wouldn’t be headed to a likely place in the Hall of Fame–if the inverse relationship between height and return effectiveness weren’t nearly as strong as the positive one between height and serving prowess. “Nearly” is the key word here. The relationship between height and overall return points won is almost as strong (r^2 = 0.74) as that of height and overall service points won, but not quite:

Schwartzman is doing more than his part to hold up the left side of that trendline: He is both the shortest player in the top 50 and the best returner. On first serve points, however, there’s only so much the returner can do, so while shorter players still have an advantage, it is less substantial. The relationship here is a bit weaker, at r^2 = 0.63:

It follows, then, that the relationship between height and second-serve return points won must be stronger, at r^2 = 0.77:

The overall and first-serve return point graphs make clear just how much worse the tallest players are than the rest of the pack. The graphs exaggerate it a bit, because I’ve grouped players from 6’5″ all the way up to 6’11”, and the Isners of the sport are considerably less effective than players such as Marin Cilic. Still, we find plenty of confirmation for the conventional wisdom that a height of 6’2″ or 6’3″ (188 cm to 190 cm) allows for players to remain effective on both sides of the ball, while a small increase from there can be a disadvantage.

A note on selection bias

It’s easy to lapse into shorthand and say something like, “shorter players are better returners.” More precisely, what we mean is, “of the players who have become tour regulars, shorter players are better returners.” They have to be, because it is nearly impossible for them to be top-tier servers. If they’ve cracked the top 50, they must have developed a world-class return game. The shorter the player, the more likely this is true.

The same logic is considerably weaker if we descend a couple rungs lower on the ladder of tennis skill. In collegiate tennis, it’s still an advantage to be tall–as Isner can attest–but a player such as 5’10” Benjamin Becker can serve as well as nearly all the competition he will face at that level.

One more note on selection bias

My choice to use each player’s age-25 season might understate the ability of either short or tall players. It is possible that certain playing styles result in earlier or later peaks, meaning that while tall players could be better at age 25, shorter players may be superior at age 28. There are anecdotes that support the argument in both directions, so I don’t think it’s a major issue, but it is one worthy of additional study.

Further reading

A guest post on this blog earlier this year posed the question, Are Taller Players the Future of Tennis?

I didn’t mention serve speed in the above, but here’s a quick study of the fastest serves and their correlation with height.

Podcast Episode 14: Wimbledon Preview

Episode 14 of the Tennis Abstract Podcast is Carl Bialik’s and my Wimbledon preview! We highlight the favorites, the overrated, and the underrated, along with a look at some of the most intriguing matchups. Along the way, we talk about the difficult of making grass court forecasts, and speculate about how players’ consistency changes with age.  Enjoy!

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

The Men Are Old, and The Best Men Are Even Older

Italian translation at settesei.it

It’s been one of the main talking points in men’s tennis for years now: The sport is getting older. Every year, a bigger slice of Grand Slam draws are taken up by thirty-somethings, and now, the entire big four has entered their fourth decades.

I don’t want to belabor the point. But my interest was piqued by an observation from commentator Chris Fowler this week:

When we talk about the sport getting older, this is what we really mean — the best guys are getting up in years.

When we calculate the average age of a draw, or the number of 30-somethings, we weight every player equally. Democratic as it is, it gives most of the weight to guys who are looking for flights home before middle Sunday. As substantial as the overall age shift has been over the last decade, the shift at the top of the game has been even more dramatic.

To quantify the shift, I calculated what I’ll call the “projected winner age” (PWA) of every Wimbledon men’s field from 1991 to 2017. This captures in one number the notion that Fowler is hinting at. We take a weighted average of all 128 men in the main draw, weighted by their chances of winning the tournament, as determined by grass-court Elos at the start of the event.

For example, last year’s Wimbledon men’s draw had an average age of 28.5 years, but a projected winner age of 30.0. We don’t yet know the exact average age of this year’s draw (it looks to be about the same, maybe a tiny bit younger), but we can already say that the PWA is 31.4.

An observer a decade ago would’ve thought such a number was insane. Here are the average ages and PWAs for the last 27 Wimbledons men’s events:

As recently as 2011, there wasn’t much difference between average age and PWA. Until 2015, the difference had never been greater than two years. Now, the difference is almost three years, and the point of comparison–average age–is nearly its own all-time high.

A lot of this, of course, is thanks to the big four. Even as the aging curve has shifted, allowing for late bloomers such as Stan Wawrinka, the biggest stars of the late ’00s–Roger Federer and Rafael Nadal–have declined even less than the revised aging curve would imply. In a sport hungry for new winners, we might have to settle for winners who are newly in their 30s.

Podcast Episode 13: Kvitova vs Barty, Serena vs McEnroe, and Federer vs The Field

In the Episode 13 of the Tennis Abstract Podcast, Carl Bialik and I start with Petra Kvitova’s first title back from injury, and her chances as a floater in the Wimbledon draw, along with other returning grass-court threats Victoria Azarenka and Sabine Lisicki.

We move on to what we hope is a sensible, fact-based discussion of John McEnroe’s comments about how Serena Williams would fare on the men’s tour. It runs from about 19:00 to 50:00 — if you want to skip it, we understand. Finally, we talk Federer’s title in Halle, and how much his Wimbledon chances are aided by Wimbledon’s seeding formula, which moves him into the top four. (For more on that, see my post from earlier today.)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Measuring the Impact of Wimbledon’s Seeding Formula

Italian translation at settesei.it

Unlike every other tournament on the tennis calendar, Wimbledon uses its own formula to determine seedings. The grass court Grand Slam grants seeds to the top 32 players in each tour’s rankings, and then re-orders them based on its own algorithm, which rewards players for their performance on grass over the last two seasons.

This year, the Wimbledon seeding formula has more impact on the men’s draw than usual. Seven-time champion Roger Federer is one of the best grass court players of all time, and though he dominated hard courts in the first half of 2017, he still sits outside the top four in the ATP rankings after missing the second half of 2016. Thanks to Wimbledon’s re-ordering of the seeds, Federer will switch places with ATP No. 3 Stan Wawrinka and take his place in the draw as the third seed.

Even with Wawrinka’s futility on grass and the shakiness of Andy Murray and Novak Djokovic, getting inside the top four has its benefits. If everyone lives up to their seed in the first four rounds (they won’t, but bear with me), the No. 5 seed will face a path to the title that requires beating three top-four players. Whichever top-four guy has No. 5 in his quarter would confront the same challenge, but the other three would have an easier time of it. Before players are placed in the draw, top-four seeds have a 75% chance of that easier path.

Let’s attach some numbers to these speculations. I’m interested in the draw implications of three different seeding methods: ATP rankings (as every other tournament uses), the Wimbledon method, and weighted grass-court Elo. As I described last week, weighted surface-specific Elo–averaging surface-specific Elo with overall Elo–is more predictive than ATP rankings, pure surface Elo, or overall Elo. What’s more, weighted grass-court Elo–let’s call it gElo–is about as predictive as its peers for hard and clay courts, even though we have less grass-court data to go on. In a tennis world populated only by analysts, seedings would be determined by something a lot more like gElo and a lot less like the ATP computer.

Since gElo ratings provide the best forecasts, we’ll use them to determine the effects of the different seeding formulas. Here is the current gElo top sixteen, through Halle and Queen’s Club:

1   Novak Djokovic         2296.5  
2   Andy Murray            2247.6  
3   Roger Federer          2246.8  
4   Rafael Nadal           2101.4  
5   Juan Martin Del Potro  2037.5  
6   Kei Nishikori          2035.9  
7   Milos Raonic           2029.4  
8   Jo Wilfried Tsonga     2020.2  
9   Alexander Zverev       2010.2  
10  Marin Cilic            1997.7  
11  Nick Kyrgios           1967.7  
12  Tomas Berdych          1967.0  
13  Gilles Muller          1958.2  
14  Richard Gasquet        1953.4  
15  Stanislas Wawrinka     1952.8  
16  Feliciano Lopez        1945.3

We might quibble with some these positions–the algorithm knows nothing about whatever is plaguing Djokovic, for one thing–but in general, gElo does a better job of reflecting surface-specific ability level than other systems.

The forecasts

Next, we build a hypothetical 128-player draw and run a whole bunch of simulations. I’ve used the top 128 in the ATP rankings, except for known withdrawals such as David Goffin and Pablo Carreno Busta, which doesn’t differ much from the list of guys who will ultimately make up the field. Then, for each seeding method, we randomly generate a hundred thousand draws, simulate those brackets, and tally up the winners.

Here are the ATP top ten, along with their chances of winning Wimbledon using the three different seeding methods:

Player              ATP     W%  Wimb     W%  gElo     W%  
Andy Murray           1  23.6%     1  24.3%     2  24.1%  
Rafael Nadal          2   6.1%     4   5.7%     4   5.5%  
Stanislas Wawrinka    3   0.8%     5   0.5%    15   0.4%  
Novak Djokovic        4  34.1%     2  35.4%     1  34.8%  
Roger Federer         5  21.1%     3  22.4%     3  22.4%  
Marin Cilic           6   1.3%     7   1.0%    10   1.0%  
Milos Raonic          7   2.0%     6   1.6%     7   1.7%  
Dominic Thiem         8   0.4%     8   0.3%    17   0.2%  
Kei Nishikori         9   1.9%     9   1.7%     6   1.9%  
Jo Wilfried Tsonga   10   1.6%    12   1.4%     8   1.5%

Again, gElo is probably too optimistic on Djokovic–at least the betting market thinks so–but the point here is the differences between systems. Federer gets a slight bump for entering the top four, and Wawrinka–who gElo really doesn’t like–loses a big chunk of his modest title hopes by falling out of the top four.

The seeding effect is a lot more dramatic if we look at semifinal odds instead of championship odds:

Player              ATP    SF%  Wimb    SF%  gElo    SF%  
Andy Murray           1  58.6%     1  64.1%     2  63.0%  
Rafael Nadal          2  34.4%     4  39.2%     4  38.1%  
Stanislas Wawrinka    3  13.2%     5   7.7%    15   6.1%  
Novak Djokovic        4  66.1%     2  71.1%     1  70.0%  
Roger Federer         5  49.6%     3  64.0%     3  63.2%  
Marin Cilic           6  13.6%     7  11.1%    10  10.3%  
Milos Raonic          7  17.3%     6  14.0%     7  15.2%  
Dominic Thiem         8   7.1%     8   5.4%    17   3.8%  
Kei Nishikori         9  15.5%     9  14.5%     6  15.7%  
Jo Wilfried Tsonga   10  14.0%    12  13.1%     8  14.0%

There’s a lot more movement here for the top players among the different seeding methods. Not only do Federer’s semifinal chances leap from 50% to 64% when he moves inside the top four, even Djokovic and Murray see a benefit because Federer is no longer a possible quarterfinal opponent. Once again, we see the biggest negative effect to Wawrinka: A top-four seed would’ve protected a player who just isn’t likely to get that far on grass.

Surprisingly, the traditional big four are almost the only players out of all 32 seeds to benefit from the Wimbledon algorithm. By removing the chance that Federer would be in, say, Murray’s quarter, the Wimbledon seedings make it a lot less likely that there will be a surprise semifinalist. Tomas Berdych’s semifinal chances improve modestly, from 8.0% to 8.4%, with his Wimbledon seed of No. 11 instead of his ATP ranking of No. 13, but the other 27 seeds have lower chances of reaching the semis than they would have if Wimbledon stopped meddling and used the official rankings.

That’s the unexpected side effect of getting rankings and seedings right: It reduces the chances of deep runs from unexpected sources. It’s similar to the impact of Grand Slams using 32 seeds instead of 16: By protecting the best (and next best, in the case of seeds 17 through 32) from each other, tournaments require that unseeded players work that much harder. Wimbledon’s algorithm took away some serious upset potential when it removed Wawrinka from the top four, but it made it more likely that we’ll see some blockbuster semifinals between the world’s best grass court players.

Unpredictable Bounces, Predictable Results

Italian translation at settesei.it

These days, the grass court season is the awkward stepchild of the tennis calendar. It takes place almost entirely within a single country’s borders, lasts barely a month, and often suffers from the absence of top players who prefer to rest after the French Open.

The small number of grass court events makes the surface problematic for analysts, as well. The surface behaves differently than hard or clay courts and rewards certain playing styles, so it’s reasonable to assume that many players will be particularly good or bad on grass. But with 90% of tour-level matches contested on other surfaces, many players don’t have much of a track record with which we can assess their grass-court prowess.

I was surprised, then, to find that grass court results are rather predictable. Elo-based forecasts of ATP grass court matches are almost as accurate as hard court predictions and considerably more effective than clay court forecasts. Even when we use “pure” surface forecasts–that is, predicting matches using ratings which draw only on results from that surface–grass court forecasts are a bit better than clay court predictions.

I started with a dataset of the roughly 50,000 ATP matches from 2000 through last week, excluding retirements and withdrawals. As a benchmark, I used official ATP rankings to make predictions for each of those matches. 66.6% of them were right, and the Brier score for ATP rankings over that span is .210. (Brier score measures the accuracy of a set of forecasts by averaging the squared error of each individual forecast, so a lower number is better. To put tennis-specific Brier scores in context, in 2016, ATP rankings had a .208 Brier score, and aggregate betting odds had a .189 Brier score.)

Let’s break that down by surface and compare the performance of ATP rankings, Elo, and surface-specific Elo. “F%” is the percentage of matches won by the favorite–as determined by that system, and “Br” is Brier score:

Surface  ATP F%  ATP Br  Elo F%  Elo Br  sElo F%  sElo Br  
Hard      67.3%   0.207   68.0%   0.205    68.5%    0.202  
Clay      66.1%   0.211   67.1%   0.211    67.0%    0.213  
Grass     66.0%   0.215   67.6%   0.207    68.5%    0.207

All three rating systems do best on hard courts, and for good reason: official rankings and overall Elo are more heavily weighted toward hard court results than they are clay or grass. Surface-specific Elo does best on hard courts for a similar reason: more data.

Already, though, we can see the unexpected divergence of clay and grass courts, especially with surface-specific Elo. It’s possible to explain overall Elo’s better performance on grass courts due to the presumed similarly between hard and grass–if a player excels on one, he’s probably good on the other, even if he’s horrible on clay.  But that doesn’t explain sElo doing better on grass than on clay. There are 3.3 times as many tour-level matches on clay than on grass, so even allowing for the fact that players choose schedules to suit their surface preferences, almost everyone is going to have more results on dirt than on turf. More data should give us better results, but not here.

We can improve our forecasts even more by blending surface-specific ratings with overall ratings. After testing a wide range of possible mixes, it turns out that equally weighting Elo and sElo provides close to the best results. (The differences between, say, 60/40 and 50/50 are extremely small on all surfaces, so even where 60/40 is a bit better, I prefer to keep it simple with a half-and-half mix.) Here are the results for weighted surface Elos for all three surfaces:

Surface  ATP F%  ATP Br  
Hard      68.6%   0.202  
Clay      68.0%   0.207  
Grass     69.8%   0.196

Now grass courts are the most predictable of the major surfaces! Even when we use a weighted average of Elo and sElo, grass court forecasts rely on less data than those of the other surfaces–the surface-specific half of the grass court forecasts uses less than one-third the match results of clay court predictions and less than one-fifth the results of hard court forecasts. In fact, we can do at least as well–and perhaps a tiny bit better–with even less data: A 50/50 weighting of grass-specific Elo and hard-specific Elo is just as accurate as the half-and-half mix of grass-specific and overall Elo.

Regardless of the exact formula, it remains striking that we can predict ATP grass court results so accurately from such limited data. Even if one-third of ATP events were played on grass, I still wouldn’t have been surprised if grass court results turned out to be the least predictable. The more a surface favors the server–and it’s hardest to break on grass–the tighter the scoreline will tend to be, introducing more randomness into the end result. Despite that structural tendency, we’re able to pick winners as successfully on grass as on the more common surfaces.

Here’s my theory: Even though there aren’t many grass court events, the conditions at those few tournaments are quite consistent. Altitude is roughly sea level, groundskeepers follow the lead of the staff at Wimbledon, and rain clouds are almost always in sight. Compare that homogeneity to the variety of hard courts or clay courts. The high-altitude hard courts in Bogota are nothing like the slow ones in Indian Wells. The “clay” in Houston is only nominally equal to the crushed brick of Roland Garros. While grass courts are almost identical to each other, clay courts are nearly as different from each other as they are from other surfaces.

It makes sense that ratings based on a uniform surface would be more accurate than ratings based on a wide range of surfaces, and it’s reassuring to find that the limited available data doesn’t cancel out the advantage. This research also suggests a further path to better forecasts: grouping hard and clay matches by a more precise measure of surface speed. If 10% of tour matches are sufficient to make accurate grass court predictions, the same may be true of the slowest one-third of clay courts. More data is almost always better, but sometimes, precisely targeted data is best of all.

Podcast Episode 12: Fed’s Match Points, Vika’s Return, and Wimbledon’s Seeding Formula

In the Episode 12 of the Tennis Abstract Podcast, Carl Bialik and I get excited about grass season, both for the glories of the surface itself and for the great players it has brought back, including Roger Federer and Victoria Azarenka. We talk Fed’s surprise loss, the arcane (but worthwhile) Wimbledon seeding formula, the returning WTA stars who will threaten on grass, David Goffin’s injury, and the debut Challenger title for 16-year-old Felix Auger-Aliassime.

We had a lot of fun recording this one. Hope you enjoy it as well!

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Just How Aggressive is Jelena Ostapenko?

Italian translation at settesei.it

If you picked up only two stats about surprise Roland Garros champion Jelena Ostapenko, you probably heard that, first, her average forehand is faster than Andy Murray’s, and second, she hit 299 winners in her seven French Open matches. I’m not yet sure how much emphasis we should put on shot speed, and I instinctively distrust raw totals, but even with those caveats, it’s hard not to be impressed.

Compared to the likes of Simona Halep, Timea Bacsinszky, and Caroline Wozniacki, the last three women she upset en route to her maiden title, Ostapenko was practically playing a different game. Her style is more reminiscent of fellow Slam winners Petra Kvitova and Maria Sharapova, who don’t construct points so much as they destruct them. What I’d like to know, then, is how Ostapenko stacks up against the most aggressive players on the WTA tour.

Thankfully we already have a metric for this: Aggression Score, which I’ll abbreviate as AGG. This stat requires that we know three things about every point: How many shots were hit, who won it, and how. With that data, we figure out what percentage of a player’s shots resulted in winners, unforced errors, or her opponent’s forced errors. (Technically, the denominator is “shot opportunities,” which includes shots a player didn’t manage to hit after her opponent hit a winner. That doesn’t affect the results too much.) For today’s purposes, I’m calculating AGG without a player’s serves–both aces and forced return errors–so we’re capturing only rally aggression.

The typical range of this version AGG is between 0.1–very passive–and 0.3–extremely aggressive. Based on the nearly 1,600 women’s matches in the Match Charting Project dataset, Kvitova and Julia Goerges represent the aggressive end, with average AGGs around .275. We only have four Samantha Crawford matches in the database, but early signs suggest she could outpace even those women, as her average is at .312. At the other end of the spectrum, Madison Brengle is at 0.11, with Wozniacki and Sara Errani at 0.12. In the Match Charting data, there are single-day performances that rise as high as 0.44 (Serena Williams over Errani at the 2013 French Open) and fall as low as 0.06. In the final against Ostapenko, Halep’s aggression score was 0.08, half of her average of 0.16.

Context established, let’s see where Ostapenko fits in, starting with the Roland Garros final. Against Halep, her AGG was a whopping .327. That’s third highest of any player in a major final, behind Kvitova at Wimbledon in 2014 (.344) and Serena at the 2007 Australian Open (.328). (We have data for every Grand Slam final back to 1999, and most of them before that.) Using data from IBM Pointstream, which encompasses almost all matches at Roland Garros this year, Ostapenko’s aggression in the final was 7th-highest of any match in the tournament–out of 188 player-matches with the necessary data–behind two showings from Bethanie Mattek Sands, one each from Goerges, Madison Keys, and Mirjana Lucic … and Ostapenko’s first-round win against Louisa Chirico. It was also the third-highest recorded against Halep out of more than 200 Simona matches in the Match Charting dataset.

You get the picture: The French Open final was a serious display of aggression, at least from one side of the court. That level of ball-bashing was nothing new for the Latvian, either. We have charting data for her last three matches at Roland Garros, along with two matches from Charleston and one from Prague this clay season. Of those six performances, Ostapenko’s lowest AGG was .275, against Wozniacki in the Paris quarters. Her average across the six was .303.

If those recent matches indicate what we’ll see from her in the future, she will likely score as the most aggressive rallying player on the WTA tour. Because she played less aggressively in her earlier matches on tour, her career average still trails those of Kvitova and Goerges, but not by much–and probably not for long. It’s scary to consider what might happen as she gets stronger; we’ll have to wait and see how her tactics evolve, as well.

The Match Charting Project contains at least 15 matches on 62 different players–here is the rally-only aggression score for all of them:

PLAYER                    MATCHES  RALLY AGG  
Julia Goerges                  15      0.277  
Petra Kvitova                  57      0.277  
Jelena Ostapenko               17      0.271  
Madison Keys                   35      0.261  
Camila Giorgi                  17      0.257  
Sabine Lisicki                 19      0.246  
Caroline Garcia                15      0.242  
Coco Vandeweghe                17      0.238  
Serena Williams               108      0.237  
Laura Siegemund                19      0.235  
Anastasia Pavlyuchenkova       17      0.230  
Danka Kovinic                  15      0.223  
Kristina Mladenovic            28      0.222  
Na Li                          15      0.218  
Maria Sharapova                73      0.217  
                                              
PLAYER                    MATCHES  RALLY AGG  
Eugenie Bouchard               52      0.214  
Ana Ivanovic                   46      0.211  
Garbine Muguruza               57      0.210  
Lucie Safarova                 29      0.209  
Karolina Pliskova              42      0.207  
Elena Vesnina                  20      0.207  
Venus Williams                 46      0.205  
Johanna Konta                  31      0.205  
Monica Puig                    15      0.203  
Dominika Cibulkova             38      0.198  
Martina Navratilova            25      0.197  
Steffi Graf                    39      0.196  
Anastasija Sevastova           17      0.194  
Samantha Stosur                19      0.193  
Sloane Stephens                15      0.190  
                                              
PLAYER                    MATCHES  RALLY AGG  
Ekaterina Makarova             23      0.189  
Lauren Davis                   16      0.186  
Heather Watson                 16      0.185  
Daria Gavrilova                20      0.183  
Justine Henin                  28      0.183  
Kiki Bertens                   15      0.181  
Monica Seles                   18      0.179  
Svetlana Kuznetsova            28      0.174  
Timea Bacsinszky               28      0.174  
Victoria Azarenka              55      0.170  
Andrea Petkovic                24      0.166  
Roberta Vinci                  23      0.164  
Barbora Strycova               16      0.163  
Belinda Bencic                 31      0.163  
Jelena Jankovic                24      0.162  
                                              
PLAYER                    MATCHES  RALLY AGG  
Alison Riske                   15      0.161  
Angelique Kerber               83      0.161  
Flavia Pennetta                23      0.160  
Simona Halep                  218      0.160  
Carla Suarez Navarro           31      0.159  
Martina Hingis                 15      0.157  
Chris Evert                    20      0.152  
Darya Kasatkina                18      0.148  
Elina Svitolina                46      0.141  
Yulia Putintseva               15      0.137  
Alize Cornet                   18      0.136  
Agnieszka Radwanska            90      0.130  
Annika Beck                    16      0.126  
Monica Niculescu               25      0.124  
Caroline Wozniacki             62      0.122  
Sara Errani                    23      0.121

(A few of the match counts differ slightly from what you’ll find on the MCP home page. I’ve thrown out a few matches with too much missing data or in formats that didn’t play nice with the script I wrote to calculate aggression score.)

Is Jelena Ostapenko More Than the Next Iva Majoli?

Italian translation at settesei.it

Winning a Grand Slam as a teenager–or, in the case of this year’s French Open champion Jelena Ostapenko, a just-barely 20-year-old–is an impressive feat. But it isn’t always a guarantee of future greatness. Plenty of all-time greats launched their careers with Slam titles at age 20 or later, but three of the players who won their debut major at ages closest to Ostapenko’s serve as cautionary tales in the opposite direction: Iva Majoli, Mary Pierce, and Gabriela Sabatini. Each of these women was within three months of her 20th birthday when she won her first title, and of the three, only Pierce ever won another.

However, simply comparing her age to that of previous champions understates the Latvian’s achievement. Women’s tennis has gotten older over the last two decades: The average age of a women’s singles entrant in Paris this year was 25.6, a few days short of the record established at Roland Garros and Wimbledon last year. That’s two years older than the average player 15 years ago, and four years older than the typical entrant three decades ago. Headed into the French Open this year, there were only five teenagers ranked in the top 100; at the end of 2004, the year of Maria Sharapova’s and Svetlana Kuznetsova’s first major victories, there were nearly three times as many.

Thus, it doesn’t seem quite right to group Ostapenko with previous 19- and 20-year-old first-time winners. Instead, we might consider the Latvian’s “relative age”—the difference between her and the average player in the draw—of 5.68 years younger than the field. When I introduced the concept of relative age last week, it was in the context of Slam semifinalists, and in every era, there have been some very young players reaching the final four who burned out just as quickly. The same isn’t true of women who went on to win major titles.

In the last thirty years, only two players have won a major with a greater relative age than Ostapenko: Sharapova, who was 6.66 years younger than the 2004 US Open field, and Martina Hingis, who won three-quarters of the Grand Slam in 1997 at age 16, between 6.3 and 6.6 years younger than each tournament’s average entrant. The rest of the top five emphasizes Ostapenko’s elite company, including Monica Seles (5.29, at the 1990 French Open) and Serena Williams (5.26, at the 1999 US Open).

Each of those four women went on to reach the No. 1 ranking and win at least five majors–an outrageously optimistic forecast for Ostapenko, who, even after winning Roland Garros, is ranked outside the top ten. By relative age, Majoli, Pierce, and Sabatini are poor comparisons for Saturday’s champion–Majoli and Pierce were only three years younger than the fields they overcame, and Sabatini was only two years younger than the average entrant. By comparison, Garbine Muguruza, who won last year’s French Open at age 22, was two and a half years younger than the field.

Which is it, then? Unfortunately I don’t have the answer to that, and we probably won’t have a better idea for several more years. For most of the Open Era, until about ten years ago, the average age on the women’s tour fluctuated between 21 and 23. Thus, for the overall population of first-time major champions, actual age and relative age are very highly correlated. It’s only with the last decade’s worth of debut winners that the numbers meaningfully diverge. For Ostapenko and Muguruza–and perhaps Victoria Azarenka and Petra Kvitova–we have yet to see what their entire career trajectory will look like. To build a bigger sample to test the hypothesis, we’ll need a few more young first-time Slam winners, something we may finally see with Sharapova and Williams out of the way.

For more post-French Open analysis, here’s my Economist piece on Ostapenko and projecting major winners in the long term. Also at the Game Theory blog, I wrote about Rafael Nadal and his abssurd dominance on clay in a fast-court-friendly era.

Finally, check out Carl Bialik’s and my extra-long podcast, recorded Monday, with tons of thoughts and the winners and the fields in general.

Podcast Episode 11: The French Open in Review

In the Episode 11 of the Tennis Abstract Podcast, Carl Bialik and I celebrate the 2017 Roland Garros with a lot to say about the perils of forecasting. Could we have seen Rafa’s resurgence coming? What are the career prospects for Latvian sensation Jelena Ostapenko and women’s runner-up Simona Halep? And what on earth are we supposed to conclude about Andy Murray right now?

It’s a super-sized episode, clocking in just under 80 minutes … and we still couldn’t fit everything in. Enjoy!

Click to listen, subscribe on iTunes, find us on Stitcher, or use our feed to get updates on your favorite podcast software.