Uncategorized – Page 2 – Heavy Topspin

Tanking: A Model

The logic behind tanking a part of a tennis match–deliberately playing with less than maximum effort–is simple. If you have fallen behind early in the first set, you could choose to take it easy for the rest of the set. You probably would’ve lost the set anyway, and having semi-rested for several games, you’ll have more mental and physical energy to draw upon for the rest of the match.

By the end of this post, we’ll have some idea how useful that extra energy must be to make tanking worthwhile. It will take a few steps to get there.

The scenario

Consider some sample numbers to make this more concrete. Take two evenly matched men, each of whom win 70% of their service points. Maybe they are powerful–though not one-dimensional–servers on a reasonably fast surface. Winning seven out of ten service points means that nine out of ten games are holds of serve, so in our hypothetical match, breaks are at a premium.

Imagine that the match opens with one of those rare breaks. Given the 90% hold rate for both players, the man who got his nose in front has improved to an 83% chance of winning the set. In the simplest formulation, the player who has fallen behind faces two options for the balance of the set:

Continue playing at his usual level despite the low chance of winning, or
Take it easy, as the set is probably lost.

The tank

In the continue-as-usual scenario, our early front-runner has an 83% chance of winning the set. If both players continue playing at the same level for the duration of a best-of-five-sets match, that translates to a 62% chance of winning the match, leaving our player who decided not to tank with a 38% chance. (I’m using best-of-five because in a longer match, it’s more likely that a player can recover from losing the first set. That makes tanking a more plausible strategy.)

To evaluate the take-it-easy scenario, we need to pile on more assumptions. How much worse does a tanking player play? You will probably disagree with my estimates of the point-level costs and benefits of tanking, which is fine. I don’t have strong opinions about them, and they don’t matter much to the conclusions below. Consider these numbers just one illustration of the model. As soon as the trailing player decides to take it easy, let’s say his numbers fall to the following:

20% return points won (instead of 30%)
65% serve points won (instead of 70%)

That’s not a very good player–picture an unmotivated Nick Kyrgios. Down a break after the first game and playing a newly lackadaisical brand of tennis, he has a mere 1.3% chance of coming back to win the set. We’re simplifying quite a bit here, in large part because a player could always decide midway through the set to pause the tank, perhaps raising his game if he reaches 15-30 or better on his opponent’s serve. But again, this is just a model, and one I’m trying to keep from getting too complex.

The trade-off

The tanking player has, according to these assumptions, chosen to decrease his chance of winning the first set from 17% to a tick above 1%. If he received no benefit from conserving energy and both players returned to their 90% hold rate at the beginning of the next set, the tanking player’s chances of winning the match have fallen from 38% to 32%.

Clearly that’s not the whole story. A player who chooses to conserve energy at the expense of their immediate fortunes must assume that there are benefits coming later.

To further simplify, let’s assume that the tanking player loses the first set. Here are his chances of winning the match based on a few possible post-tank levels he could sustain:

70% serve points won (SPW), 30% return points won (RPW): 31.3% (no benefit from tanking)
71% SPW, 32% RPW: 46.3%
72% SPW, 34% RPW: 61.9%
73% SPW, 36% RPW: 75.8%
74% SPW, 38% RPW: 93.3%

Remember that our tanking player has only a 38% chance of winning the match after sustaining the opening-game break, so the second scenario, in which his level improves to 71% SPW and 32% RPW, represents an improvement. That would be hardly noticeable over the course of three or four more sets. If the remainder of the match spanned 200 more points, it would mean winning 103 of them, instead of 100. If conserving energy early on confers even bigger benefits, it starts to look like a no-brainer.

Complications

Of course, it’s never this simple. The leading player might realize that his opponent was tanking and conserve some energy himself. The tanking player could have a hard time resuming his usual level (or better) at the right moment. Some points are more important than others, so the difference between 100 and 103 might not matter. Most matches are best-of-three, and giving up on the opening set in a shorter match is much more dangerous.

Those qualifications shouldn’t stop us from considering what tanking has to offer. While players don’t tank sets as often as they used to, there’s surely some energy-conservation benefit, and extra energy must have some value for the remainder of the match, right? I have no idea whether that value is equivalent to one point per hundred or something much higher or lower, but surely it’s possible that in some situations, it’s worth it.

The illustration I’ve used shows that the value of the extra energy doesn’t have to be that substantial to make tanking a plausible tactic. The small margins that determine the outcome of tennis matches mean that we’ll rarely recognize when a player is taking advantage of a tank, but those margins also mean that a small edge could be enough to make it worthwhile.

—

All calculations of game, set, and match probabilities are based on my publicly-available code.

The Most Predictable Woman in Tennis

Italian translation at settesei.it

Caroline Wozniacki is set in her habits. In the eight service games of her first round match in Charleston against Laura Siegemund last week, she followed a strict pattern: wide serve on the first point, T serve on the second, T on the third, and wide on the fourth. Aside from two missed first serves that weren’t classified as “wide” or “T”, that’s 30 points. Wozniacki served in her preferred direction on all 30. From the fifth point in each game, her choices were closer to random.

This is nothing new for the Danish former No. 1. Against Monica Niculescu in the Miami third round, she had 11 service games. In the first four points of each, she followed the exact pattern: wide/T/T/wide. 44 service points, and zero deviations from the first-serve script. The Match Charting Project (MCP) has logged over 2,600 WTA matches, and no other player has ever gone an entire match without varying their first-four-point serve direction. Wozniacki has done so 17 times.

Measuring serve predictability

Just how extreme is Caro’s reliability, and how much does she differ from the competition? Let’s take a look.

I classified each first serve as either “wide” or “T.” MCP coding provides for three categories (wide, body, and T), and where a serve is coded as “body,” I used the returner’s first shot as an indication of the serve direction. That’s not perfect, because some returners will run around a weak serve, but it gets us pretty close. I excluded unreturned body serves and body serve faults. Here is Caro’s percentage of wide serves for each point of over 1,000 charted service games:

Point  Wide%  
1st    82.8%  
2nd    17.4%  
3rd    16.7%  
4th    78.5%  
5th    52.3%  
6th    46.8%  
deuce  48.0%  
ad     50.6%

Wozniacki only varies her first serve direction on the first four points about once every five deliveries. If we convert the first four rates (82.8%, 17.4%, 16.7%, and 78.5%) to the frequency with which she hit her favored serve (82.8%, 82.6%, 83.3%, 78.5%), we get an average–call it FSP, for First Serve Predictability–of 81.8%. Only two other women with at least ten charted matches, Kateryna Kozlova and Justine Henin, exceed 70%, and Henin’s repetition has more to do with her preference for the T serve in all situations.

Amazingly, Caro’s overall numbers obscure just how often she uses the pattern these days. The MCP has 52 Wozniacki matches dating from the beginning of 2017, and that more recent subset gives us a FSP of 94.0%. I suspect that the more extreme number is a better representation of Woz’s tendencies, because the more recent data includes a broader selection of matches, including contests against weaker opponents. The MCP is not a random sample, and older matches tend to be more notable ones involving higher-quality opponents.

Wozniacki’s not-really-peers

Let’s take a look at some of the other women who are more predictable than average. The median WTAer with at least 10 charted matches in the MCP dataset has an FSP of about 58%, meaning that they might prefer one direction to the other, or that they often aim for a right-hander’s backhand, but that they vary the first serve delivery quite a bit.

Here are the 20 who change direction the least. For each player, the following table shows the frequency with which they hit a wide serve on each of the first four points, their FSP on the first four points–FSP(1-4)–and their FSP on points from the fifth onward, FSP(5+).

Player         1st  2nd  3rd  4th  FSP(1-4)  FSP(5+)  
Wozniacki      83%  17%  17%  79%       82%      52%  
Kozlova        60%  35%  10%  73%       72%      64%  
Henin          38%  11%  57%  25%       71%      66%  
Vikhlyantseva  92%  46%  38%  63%       68%      54%  
Petkovic       74%  72%  36%  38%       68%      58%  
Vondrousova    15%  63%  30%  54%       68%      68%  
Brengle        82%  67%  53%  68%       67%      56%  
Clijsters      86%  32%  61%  52%       67%      56%  
Stephens       76%  21%  53%  46%       65%      62%  
Voegele        71%  35%  59%  34%       65%      60%  
                                                      
Player         1st  2nd  3rd  4th  FSP(1-4)  FSP(5+)  
Dementieva     76%  54%  71%  60%       65%      60%  
Dodin          58%  14%  43%  43%       65%      64%  
Li Na          28%  33%  52%  33%       65%      56%  
Kerber         43%  78%  56%  67%       65%      64%  
Doi            21%  60%  64%  56%       65%      63%  
Vandeweghe     35%  35%  62%  66%       65%      55%  
A Beck         59%  24%  45%  33%       64%      61%  
Sanchez V      43%  77%  42%  65%       64%      64%  
Buzarnescu     19%  39%  58%  46%       64%      59%  
Sevastova      73%  58%  37%  60%       64%      55%

Only two servers, Kozlova and Natalia Vikhlyantseva, follow the general principle of Wozniacki’s wide/T/T/wide pattern. Many of these players, like Henin, prefer wide or T serves at all times, and others, including Andrea Petkovic and Coco Vandeweghe, often opt for one type of serve on the first two points and another on the next two. It’s tough to see much in the patterns among these players, especially since most of them are closer to the median level of predictability than they are to Wozniacki’s extreme consistency.

I included the final column, FSP(5+), to illustrate another aspect of Caro’s uniqueness. While she closely follows her script for the first four points, she reverts to almost 50/50 wide and T serves after that–even in the more extreme 2017-present subset of matches. Many of the other players on this list do not. Angelique Kerber, for instance, is a near Woz-level lock to go wide in the ad court late in games. She hits wide first serves more than 80% of the time at 40-30 or 30-40, and 73% of the time at AD-40 or 40-AD. Henin also stuck with her preferences on higher-leverage points.

Equilibrium

For whatever reason, Wozniacki is comfortable with this pattern, and is confident that it works. Or, at least, that it doesn’t work against her. It’s not a secret–the sequence came to my attention after Siegemund’s coach pointed it out during an on-court coaching visit in Charleston.

Tennis is full of decisions like this: when to follow a pattern, and how often to vary things to keep an opponent from getting too comfortable. On this week’s podcast, Carl and I speculated about how often a player would need to deploy an underarm serve in order to force a returner out of position. If Wozniacki’s tendencies are any indication, the answer is: not very often. The mere fact that Caro could serve the other direction was apparently enough to prevent Niculescu or Siegemund from pouncing on her first serves, even if Woz stuck to the script from the first game to the last.

I realize I’ve left a lot of questions unanswered. Does Caro win more first serve points when she varies her delivery more? Does she follow any similar patterns with her second serve? Does she use the results of the first four points to help decide the direction of the following points? Are there particular types of players who force her to mix things up–as Madison Keys did in the Charleston final, with her aggressive return tactics?

Keep an eye on this space–maybe I’ll be able to offer some answers. In the meantime, I hope you derive some extra enjoyment the next time you watch a Wozniacki match, knowing in advance where her next serve will go. Or, perhaps, you’ll witness one of the rare occasions when the most predictable woman in tennis goes off-script.

Thanks to Kees for charting the Siegemund match, passing along the on-court coaching conversation, and providing the impetus for this post.

WTA Aging Patterns and Bianca Andreescu’s Future

Italian translation at settesei.it

Bianca Andreescu is really good, right now. Still a few months away from her 19th birthday, she has collected her first Premier Mandatory title, beaten a few top-ten players (including Angelique Kerber twice), and climbed to 7th in the Elo ratings. She is the only teenager in the WTA top 30 and one of only five in the top 100.

The burning question about Andreescu isn’t how good she is, it’s how good she could become. It’s easy to look at the best 18-year-old in the game and imagine her becoming the best 19-year-old, best 20-year-old, and so on, until she’s at her peak age and she’s the best player in the world, period. As the sport in general has gotten older, teenage champions have become rarer, so she seems all the more destined for success. But it isn’t that simple: Prospects get injured, opponents learn how to beat them, they peak early and fizzle out. Tennis history is littered with teen starlets who failed to reach their potential.

Building an aging curve

Let’s start with the basics. What is the trajectory of the typical WTA career? Answering that question requires a whole slew of assumptions, so keep in mind that this is approximate. I found every player born between 1960 and 1989* who played at least five full** seasons, a total of about 500 players. For each one, I calculated her year-end Elo for every full season she played, as well as the difference between that year’s Elo and her peak year-end Elo.

* I wish we knew more about players born in the 1990s, since their experience is most relevant to today’s teens, but many of them have yet to reach their peaks, whenever that will be.

** I’ve defined a full season very broadly, as 20 or more completed matches at the ITF $50K level or higher.

For every player, then, we have an idea of how they aged. To get our bearings, let’s look at a couple of players with unique aging trajectories: Martina Navratilova and Venus Williams:

(Martina’s peak was about 50 Elo points higher than Venus’s, but I set them equal to each other for the purpose of this graph.)

Venus peaked at age 21 and had her last all-time-great-level season at 23, while Martina’s peak came at age 30. There’s more than one way to amass a Hall of Fame career, and it’s important to keep in mind that “average” aging patterns hide a lot of more extreme possibilities.

The usual route

When we take Venus’s and Martina’s trajectories and average them with the other 500-or-so players in our dataset, here’s what we get:

The most common peak age is 24, with 23 a very close second. In the above graph, I set peak Elo at 1,820, the average peak Elo of the players I looked at, but the absolute number isn’t important. The typical player who completes a full season at age 18 is about 70 Elo points away from her peak. There’s isn’t much downward movement in the 20s; at age 30, those players who are still active are only 43 Elo points below their peak.

There’s a poison pill in that last sentence that is difficult to avoid when analyzing aging patterns–we only know what happens to those players who are still active. That’s even more troublesome for young players. Venus, for instance, improved 211 Elo points between her year-end finish as an 18-year-old and her best year-end rating. Kerber, on the other hand, wasn’t even good enough to show up in the ratings until she was 19. If we were able to estimate Kerber’s level at that age, it would probably be very low. Thus, forecasting an 18-year-old using this dataset may understate the degree to which a player can improve.

Changing times

Using the numbers above, we can make a baseline estimate. Those players who had year-end Elo ratings as 18-year-olds typically improved about 70 more points before hitting their peak. Through her Indian Wells title, Andreescu is rated at 2,017, giving us an estimated peak of 2,087. That’s good enough for 2nd place on the current list and just inside the top 50 of all time (as measured by the player’s best year-end Elo). Still, that seems a bit modest–it doesn’t represent much of an additional improvement for a player who has come so far in just a few months.

The forecast is slightly more optimistic if we narrow our view to players born in the 1980s. It seems like a reasonable thing to do, because Andreescu is facing an era with older competition, more like the last decade than, say, the one faced by players born in the 1960s. Our dataset shrinks to about 200 players, and those players do show a bigger gap between their 18-year-old Elo rating and their career peak. The difference is about 83 points, giving Bianca a revised estimated peak of 2,100–exactly even with Simona Halep, who currently tops the list, and around the 40th best of all time.

The biggest difference in the overall aging curve and the curve for players born in the 1980s isn’t the timing of the peak, it’s the duration. I looked at several age cohorts, and the typical WTA peak is always at 23 or 24 years old. But there’s more to it than that. Take a look at the trajectory of players born in the 1960s compared to those born in the 1980s:

For the more recent generation of players, there is little difference between age 23 and 28 or 29. Even into the early 30s, those players who stick around are competing almost as well as they did at their peak.

Bespoke for Bianca

Aging patterns in women’s tennis have changed, so it’s important to look at a relevant era when there’s enough data to do so. But what if that’s not the best way to narrow our view? As I’ve noted, the average peak Elo of the 500 players in our dataset is 1820. Bianca is already 200 points higher than that. What if the best players are qualitatively different as well as quantitatively superior?

Here are 20 players whose year-end Elo at age 18 were similar to Andreescu’s current rating: the ten closest who were higher and the ten closest who were lower:

Player                     Birth Year  18yo Elo  Peak Elo  
Jelena Dokic                     1983      2110      2110  
Conchita Martinez                1972      2085      2191  
Arantxa Sanchez Vicario          1971      2084      2314  
Hana Mandlikova                  1962      2071      2160  
Iva Majoli                       1977      2067      2067  
Belinda Bencic                   1997      2066      2066  
Caroline Wozniacki               1990      2059      2194  
Lindsay Davenport                1976      2053      2353  
Nicole Vaidisova                 1989      2043      2121  
Manuela Maleeva Fragniere        1967      2035      2059  
---                                                        
Mary Pierce                      1975      2008      2161  
Ana Ivanovic                     1987      1994      2133  
Victoria Azarenka                1989      1986      2270  
Anke Huber                       1974      1980      2072  
Magdalena Maleeva                1975      1961      2024  
Agnieszka Radwanska              1989      1957      2116  
Mary Joe Fernandez               1971      1955      2110  
Anna Kournikova                  1981      1954      2020  
Kathy Rinaldi Stunkel            1967      1947      1947  
Justine Henin                    1982      1946      2411

Both halves of the list include some of the greatest of all time: Arantxa Sanchez Vicario, Lindsay Davenport, Victoria Azarenka, and Justine Henin. Yet several of these players failed to build on their early-career peaks, such as Jelena Dokic and (so far, at least) Belinda Bencic.

The average 18-year-old year-end Elo of these 20 players is 2,018, virtually the same as Andreescu’s post-Indian Wells level. The average peak year-end Elo of these 20 players is 2,145, a 120 point improvement and a more optimistic forecast than anything we’ve seen so far. That rating would put her a tick above Ana Ivanovic at her best, a bit below Hana Mandlikova at hers, and just inside the 30 greatest of all time.

This is heady stuff for a teenager, but after watching her ascent this year, it’s tough to bet against her. And as long as Kerber is in the draw, apparently, we can expect Andreescu to keep winning.

Belinda Bencic Won a Historically Difficult Title, Just Not Last Week

Italian translation at settesei.it

Belinda Bencic is back among the WTA elites. Last week in Dubai, she won her first Premier-level title since 2015, knocking out four top-ten players in the process. They were hardly dominant victories, with all four going to deciding sets and two of the four culminating in final-set tiebreaks, but there is no question that the 21-year-old Swiss is once again a threat at the tour’s biggest events.

Her string of top-ten victories leaves us to wonder how her title stacks up against similar feats in the past. Most relevant is the path Bencic took to her last Premier title, the 2015 Canadian Open. Four years ago in Toronto, she defeated four members of the top six, including then-top-ranked Serena Williams in the semi-final and Simona Halep in the championship match. Even the two lower-ranked opponents she faced that week were dangerous players then ranked in the top 25, Eugenie Bouchard and Sabine Lisicki. Those two presented more serious challenges than Bencic’s first two matches last week against Lucie Hradecka and Stefanie Voegele.

Spoiler alert: Toronto was the tougher path. It wasn’t the most difficult of all time, but it’s in the conversation. Bencic’s Dubai title surely wasn’t easy, but it wasn’t quite as unusual as last weekend’s press made it out to be.

Quantifying path difficulty

This is something we’ve done before. I’ve written several articles comparing the quality of opposition faced in slams, particularly as it applies to the ATP’s big three. It’s more complicated to compare all WTA events, in part because there are so many different levels of tournament, and the categorizations have changed over the years. But we can wave some of that aside for today’s purposes.

Here’s the simple algorithm to measure the difficulty of a player’s path to a title:

Pick a standard Elo rating for the type of tournament won. (In this case, we’re using 1900 for hard-court wins. We’d use lower numbers for clay and grass, but it gets complicated, and it’s more practical for today’s purposes to focus solely on hard-court events.)
Find the surface-weighted Elos of each opponent she played in the tournament
For each opponent, calculate the odds using the standard Elo rating and the opponent’s Elo rating.
Calculate the difficulty for each match as one minus the odds in the previous step.
Sum the single-match difficulties.

In the grand slam exercises I’ve done in the past, I’ve taken a final step of normalizing the results so that an average major title is exactly 1.0. Here, the idea of ‘average’ is more nebulous, so we’ll leave our results un-normalized.

The average difficulty of a hard-court title (excluding majors and year-end championships) is about 1.8. Bencic’s 2015 Toronto run was 3.64, and her path last week was 3.01.

It’s hotter in Miami (and Indian Wells)

One of the variables that influences path difficulty is number of matches. Bencic played six last week (as she did at the 2015 Canadian Open), but the top eight seeds played only five. At Indian Wells and Miami, the top 32 seeds play up to six matches, but those might be expected to present more challenges than Bencic’s six in Dubai, since the round-of-64 opponent has already won a match.

Certainly it has turned out that way. Here are the top ten most difficult hard-court WTA title paths since 2000:

Year  Event          Winner             Matches  Difficulty  
2010  Miami          Kim Clijsters            6        3.80  
2011  Miami          Victoria Azarenka        6        3.78  
2007  Miami          Serena Williams          6        3.65  
2015  Canadian Open  Belinda Bencic           6        3.64  
2012  Indian Wells   Victoria Azarenka        6        3.59  
2018  Cincinnati     Kiki Bertens             6        3.54  
2000  Miami          Martina Hingis           6        3.46  
2002  Miami          Serena Williams          6        3.45  
2008  Miami          Serena Williams          6        3.37  
2013  Miami          Serena Williams          6        3.35

Seven of the ten are from Miami, an event with a grand-slam-like field. Indian Wells is similar, but featured a weaker draw for most of the 21st century because Serena and Venus Williams chose not to play there. Bencic’s Toronto run is one of only two in the top ten outside of the March sunshine swing. The other is Kiki Bertens’s path to last year’s Cincinnati title, in which she also defeated Halep, Petra Kvitova, and Elina Svitolina, albeit not quite in the same order than Bencic did last week.

Also hot in Dubai

I calculated title difficulty for about 600 hard-court champions going back to 2000. Bencic’s Dubai path doesn’t register among the very most challenging, but it still stands above most of the pack. Here are the next 25 toughest routes, including every path rated a 3.0 or above:

Year  Event         Winner              Matches  Difficulty  
2016  Wuhan         Petra Kvitova             6        3.32  
2000  Indian Wells  Lindsay Davenport         6        3.32  
2014  Beijing       Maria Sharapova           6        3.30  
2008  Olympics      Elena Dementieva          6        3.27  
2009  Indian Wells  Vera Zvonareva            6        3.27  
2007  Indian Wells  Daniela Hantuchova        6        3.23  
2002  Filderstadt   Kim Clijsters             5        3.23  
2013  Beijing       Serena Williams           6        3.21  
2018  Doha          Petra Kvitova             6        3.18  
2002  Los Angeles   Chanda Rubin              5        3.18  
2000  Los Angeles   Serena Williams           5        3.16  
2009  Miami         Victoria Azarenka         6        3.15  
2003  Miami         Serena Williams           6        3.13  
2002  Indian Wells  Daniela Hantuchova        6        3.10  
2018  Wuhan         Aryna Sabalenka           6        3.08  
2008  Indian Wells  Ana Ivanovic              6        3.08  
2012  Tokyo         Nadia Petrova             6        3.08  
2010  Sydney        Elena Dementieva          5        3.06  
2010  Indian Wells  Jelena Jankovic           6        3.03  
2000  Sydney        Venus Williams            6        3.02  
2000  Sydney        Amelie Mauresmo           4        3.02  
2019  Dubai         Belinda Bencic            6        3.01  
2009  Tokyo         Maria Sharapova           6        3.00  
2002  San Diego     Venus Williams            5        3.00  
2001  Sydney        Martina Hingis            4        2.99

There’s Belinda again, at 32nd overall. Historically, the February tournaments in the Gulf haven’t been the toughest on the calendar, at least compared with Indian Wells, Miami, and Sydney. Yet Kvitova took an even more difficult path to the title last year in Doha. (Dubai and Doha trade tournament levels each year. As a Premier 5, Doha was worth more points in 2018; Dubai took over the status and was worth more points in 2019.) She also plowed through four top-ten opponents, and she needed to beat 33rd-ranked Agnieszka Radwanska just to earn a place in the round of 16.

Strong but weaker

Again, Bencic’s Dubai title was an impressive feat. But as we’ve seen, it pales in comparison with her previous Premier title. I suppose she might have won anyway if faced with more difficult competition, but that pair of third-set tiebreaks suggests she was pushed to the limit as it was.

While the current WTA field is extremely deep, packed with very good players, the lack of one historically great superstar (or more!) shows up in the Elo ratings. Of the 35 champions shown in the two tables above, 12 had to beat a player with a surface-weighted rating of 2240 or higher, and 12 more needed to get past an opponent rated 2100 or above. Bencic’s toughest task last week was Halep, at 2054. While it isn’t easy to knock off several consecutive foes in the 2000 range, it’s not the same as including one victory over a superstar like Serena, Venus, Maria Sharapova, or Victoria Azarenka at her peak.

At the 2015 Canadian Open, Bencic counted Serena among the vanquished. Maybe in another four years, when the Swiss is due for her next odds-defying Premier title, she’ll face down a couple of new young superstars and earn a place at the top of this list.

Around the Net, Issue 2

Around the Net is my attempt to provide a clearinghouse for tennis analytics on the web. Each week, you’ll find a summary of recent articles, podcasts, papers, and data sources, as well as trivia and the occasional bit of interesting non-tennis content. If you would like to suggest something for a future issue, drop me a line.

Articles and Papers

Aliaksandra Sasnovich’s Bagel Recipe (hiddengameoftennis.com)
Assessing the Fit of a Serve Prediction Model (on-the-t.com)
Gaussian Process Priors for Dynamic Paired Comparison Modelling (arxiv.org)
Dominic Thiem, Tennys Sandgren, and Playing Your Way In (tennisabstract.com/blog)

Multimedia

Data

Match Charting Project: The dataset has grown by 60 matches in the last week, from 5,083 to 5,143. Highlights include the 100th charted Petra Kvitova match, making her the 7th woman to become so well represented. We’ve also continued filling out the historical record of grand slam semi-finals, including a 1981 clash between Jimmy Connors and Bjorn Borg.

Trivia

Last week’s New York semi-final between John Isner and Reilly Opelka set plenty of records, the number of which is probably limited only by our imaginations. First, their 59 tiebreak points tied a best-of-three record. Unsurprisingly, Isner (and Jeremy Chardy) held the previous record as well.
The Isner-Opelka tilt also set the record for most aces (81) in a best-of-three match–breaking another of Isner’s marks–and was also the first best-of-three match in which both players hit at least 37 aces.
Marco Cecchinato has somehow won three tour-level titles (and reached a Roland Garros semi-final!) with only 33 tour-level match wins. By contrast, Julien Benneteau won 273 tour-level matches but nary a title.
Since 2008, Fabio Fognini has played at least part of the South American golden swing every year but one. But 2019 was the first time he suffered three straight first-round exits, despite entering each event as a top-two seed.

Beyond the net

New book by Christie Aschwanden, Good To Go: What the Athlete in All of Us Can Learn from the Strange Science of Recovery
New book by Kirk Goldsberry, SprawlBall: A Visual Tour of the New Era of the NBA
The Sloan Sports Analytics Conference is next weekend, and research papers have been announced. No tennis, but a broad mix of other sports, including curling.

Thanks to Peter, Jeff, and Carl for help with this week’s issue.

Forecasting the Davis Cup Finals

It took more than a year to decide on a new format, but barely a week to make the draw. With 12 countries qualifying for the inaugural Davis Cup Finals in home-and-away ties earlier in month, the field of 18 is set. Using the ITF’s own system to rank countries, the 18 teams were divided into three “pots,” then assigned to the six round-robin groups that will kick off the tournament this November in Madrid.

The new format sounds complicated, but as round-robin events go, it’s easy enough to understand. Each of the six round-robin groups will send a winning team to the quarter-finals. Two second-place sides will also advance to the final eight, as determined by matches won, then sets won, and so on as necessary, until John Isner and Ivo Karlovic stand back to back to determine which one is really taller. From that point, it’s an eight-team knock-out tournament.

Here are the groups, as determined by yesterday’s draw, with seeded countries indicated:

Group A: France (1), Serbia, Japan
Group B: Croatia (2), Spain, Russia
Group C: Argentina (3), Germany, Chile
Group D: Belgium (4), Australia, Colombia
Group E: Great Britain (5), Kazakhstan, Netherlands
Group F: United States (6), Italy, Canada

The ITF ranking system considers the last four years of Davis Cup results, so Spain’s brief exit from the World Group makes the seedings a bit wonky. As it turns out, not only is it a top team (Croatia) who will have to deal with early ties against the Spaniards, the entire Group B trio constitutes a group of death. Russia would be an up-and-coming squad in any format, and it is clearly the most dangerous of the six lowest-ranked sides.

Madrid to Monte Carlo

Last week, I introduced a more accurate, predictive rating system for Davis Cup, involving surface-specific Elo ratings for the players likely to compete. Those rankings put Spain at the top, Croatia second, Russia fifth, and fourth-seeded Belgium 14th in the 18-team field.

Now that we have a draw, we can use those ratings to run Monte Carlo simulations of the entire Davis Cup ~~carnival~~ Finals. As in my post last week, I’m estimating that singles players have a 75% chance of playing at any given opportunity and doubles players have an 85% chance. Those are just guesses–there’s no data involved in this step. Surely some teams are more fragile than others, perhaps because their stars are particularly susceptible to injury or just uninterested in the next event. I’ve excluded Andy Murray, but for the moment, I’m keeping Novak Djokovic and Alexander Zverev in the mix.

(We’re using Elo ratings for each individual player, which means the simulation is telling us what would be likely to happen if it were played today. Things will change between now and November, even if every eligible player shows up. A proper forecast that takes the time lag into account would probably give a slight boost for younger teams [whose players will have nine months to mature] and a penalty for older ones [who are more likely to be hit by injury]. And overall, it would shift all of the championship probabilities a bit toward the mean.)

Here are the results of 100,000 simulations of the draw, with percentages given for each country’s chance of winning their group, then reaching each of the knock-out rounds:

Country  Group     QF     SF      F      W  
ESP      46.1%  59.1%  41.9%  30.3%  19.3%  
FRA      54.2%  66.6%  40.6%  25.1%  14.6%  
AUS      74.5%  84.4%  46.0%  23.8%  12.1%  
USA      53.0%  65.5%  36.8%  19.7%  10.4%  
CRO      31.0%  43.0%  27.2%  17.8%   9.8%  
GER      52.5%  67.9%  39.7%  17.6%   7.7%  
RUS      22.9%  33.1%  19.5%  12.0%   6.1%  
SRB      33.0%  47.9%  24.1%  12.6%   6.0%  
GBR      66.8%  78.7%  35.9%  12.5%   4.4%  
ARG      39.7%  56.6%  28.6%  10.4%   3.8%  
ITA      24.3%  35.9%  14.6%   5.5%   2.1%  
CAN      22.7%  33.4%  13.1%   4.9%   1.8%  
JPN      12.8%  19.5%   7.2%   2.8%   0.9%  
BEL      20.3%  32.0%   8.5%   2.1%   0.6%  
NED      21.7%  35.5%   8.6%   1.7%   0.3%  
CHI       7.8%  12.9%   3.4%   0.6%   0.1%  
KAZ      11.5%  19.0%   3.2%   0.5%   0.1%  
COL       5.1%   8.9%   1.2%   0.1%   0.0%

Spain is our clear favorite, despite their path through the group of death. Five teams have a better chance of winning their group and reaching the quarters than the Spaniards do, but their chances in the single-elimination rounds make the difference. At the other extreme, Australia seems to be the biggest beneficiary of draw luck. My rankings put them sixth, and they landed in a group with Belgium (the lowest-rated seed) and Colombia (the weakest team in the field). Their good fortune makes them the most likely country to reach the final four, even if Spain and France have a better chance of advancing to the championship tie.

Less randomness, more Spain

What if we run the simulation one step earlier in the process? That is to say, ignore yesterday’s draw and see what each country’s chances were before their round-robin assignments were determined. For this simulation, we’ll keep the ITF’s seeds, so Spain is still a floater. Here’s how it looked ahead of the ceremony:

Country  Group     QF     SF      F      W  
ESP      63.0%  75.9%  52.9%  35.0%  22.6%  
FRA      56.8%  70.8%  43.9%  25.7%  14.5%  
CRO      55.5%  69.4%  42.2%  25.1%  13.5%  
USA      51.3%  65.6%  38.5%  19.8%  10.0%  
AUS      48.3%  62.9%  34.8%  17.7%   8.5%  
RUS      40.6%  53.5%  30.2%  15.8%   7.9%  
SRB      42.9%  55.8%  28.3%  13.5%   5.9%  
GER      42.0%  55.7%  27.3%  12.5%   5.4%  
ARG      35.9%  49.1%  20.9%   7.9%   2.8%  
ITA      33.6%  47.1%  19.2%   7.2%   2.5%  
GBR      34.9%  48.3%  20.3%   7.5%   2.5%  
CAN      24.5%  35.5%  14.3%   5.3%   1.9%  
JPN      19.8%  29.4%  10.6%   3.6%   1.1%  
BEL      20.9%  30.4%   7.5%   1.8%   0.4%  
NED       9.5%  15.5%   3.5%   0.7%   0.1%  
CHI       7.9%  13.3%   2.6%   0.4%   0.1%  
KAZ       8.4%  14.1%   2.1%   0.3%   0.0%  
COL       4.3%   7.5%   1.1%   0.2%   0.0%

With the “group of death” out of the picture, Croatia jumps from fifth to third, swapping places with Australia. The defending champs lost the most from the draw, while Spain suffered a bit as well.

Elo in charge

Another variation is to ignore the ITF rankings and generate the entire draw based on my Elo-based ratings. In this case, the top six seeds would be Spain, Croatia, France, USA, Russia, and Australia, in that order. Argentina and Great Britain would fall to the middle group, and Belgium would drop to the bottom third. Here’s how that simulation looks:

Country  Group     QF     SF      F      W  
ESP      71.6%  82.8%  57.3%  38.0%  24.1%  
FRA      64.6%  77.6%  45.8%  26.7%  14.4%  
CRO      63.1%  76.3%  45.8%  25.6%  13.6%  
USA      59.7%  73.3%  41.1%  20.2%  10.2%  
RUS      58.6%  71.2%  37.0%  19.7%   9.5%  
AUS      57.7%  71.4%  37.7%  17.7%   8.8%  
SRB      37.1%  53.0%  26.1%  12.1%   5.3%  
GER      35.3%  52.3%  24.5%  10.9%   4.6%  
ARG      28.0%  44.2%  17.5%   6.4%   2.2%  
ITA      27.4%  43.6%  16.9%   6.2%   2.1%  
GBR      27.0%  43.1%  16.5%   6.0%   2.0%  
CAN      26.7%  41.8%  16.0%   5.8%   2.0%  
JPN      15.9%  23.6%   8.1%   2.6%   0.8%  
BEL       9.4%  15.1%   3.9%   0.9%   0.2%  
NED       6.5%  10.8%   2.3%   0.5%   0.1%  
CHI       5.3%   9.0%   1.8%   0.3%   0.1%  
KAZ       3.2%   5.8%   0.9%   0.1%   0.0%  
COL       3.1%   5.2%   0.8%   0.1%   0.0%

The big winners in the Elo scenario are the Russians, who gain a seed and avoid a round-robin encounter with either Spain or Croatia. Australia gets a seed as well, but the benefit of protection from the powerhouses isn’t as valuable as the luck than shone on the Aussies in the actual draw.

Imagine a world with no rankings

Finally, let’s see what happens if we ignore the rankings altogether. It would be unusual for the tournament to take such an approach, but if there’s ever a time to have a tennis event with no seedings, this is it. The existing rankings are far too dependent on years-old results, leaving young teams at a disadvantage. And my system, while more accurate, doesn’t quite feel appropriate either. It is based on individual player ratings, and this is a team event.

Whatever the likelihood of a ranking-free draw in the Davis Cup future, here’s what a simulation looks like with completely random assignment of nations into round-robin groups:

Country  Group     QF     SF      F      W  
ESP      62.8%  75.4%  52.4%  34.8%  22.5%  
FRA      54.8%  68.6%  42.6%  25.0%  13.9%  
CRO      53.4%  67.2%  41.0%  23.6%  13.0%  
USA      48.8%  62.9%  35.9%  19.1%   9.7%  
RUS      47.9%  61.0%  34.8%  18.5%   9.3%  
AUS      47.1%  61.1%  34.1%  17.6%   8.5%  
SRB      41.5%  54.3%  28.0%  13.5%   6.1%  
GER      40.3%  53.6%  26.7%  12.3%   5.3%  
ARG      31.9%  44.9%  18.8%   7.2%   2.6%  
ITA      31.5%  44.2%  18.6%   7.1%   2.5%  
GBR      30.7%  43.4%  17.6%   6.5%   2.3%  
CAN      30.4%  42.7%  17.4%   6.4%   2.2%  
JPN      25.9%  36.4%  13.5%   4.6%   1.4%  
BEL      17.2%  25.9%   7.2%   1.8%   0.4%  
NED      12.5%  20.0%   4.6%   0.9%   0.2%  
CHI      10.4%  16.9%   3.5%   0.6%   0.1%  
KAZ       7.0%  11.8%   1.9%   0.3%   0.0%  
COL       5.9%   9.7%   1.5%   0.2%   0.0%

Round-robin formats do a decent job of surfacing the best teams, so the fully random approach doesn’t give us wildly different results than the seeded simulations. The main effect of the no-seed version is to give the weakest sides a slightly better chance at advancing past the group stage, since there is a better chance for them to avoid strong round-robin competition.

Madrid or Maldives redux

Some top players are likely to skip the event. Zverev has said he’ll be in the Maldives, and Djokovic has hinted he may miss the tournament as well. The new three-rubber format means that teams will suffer a bit less from the absence of a singles star, assuming he also isn’t one of the best doubles options as well. Still, both Germany and Serbia would much rather head to the party with a top-three singles player on their side.

Here are the results of the intial simulation–based on the actual draw–but without Djokovic or Zverev:

Country  Group     QF     SF      F      W  
ESP      46.5%  59.5%  44.0%  33.2%  21.3%  
FRA      68.2%  79.3%  49.6%  30.6%  17.8%  
AUS      74.3%  84.5%  46.1%  24.2%  12.6%  
USA      53.4%  66.2%  37.5%  20.4%  10.8%  
CRO      30.3%  42.5%  28.4%  19.6%  10.8%  
RUS      23.2%  33.6%  21.1%  13.8%   7.0%  
GBR      67.0%  79.0%  40.9%  14.6%   5.2%  
ARG      52.1%  66.9%  35.5%  12.9%   4.9%  
GER      36.4%  52.3%  23.3%   7.2%   2.2%  
ITA      24.2%  35.9%  14.5%   5.7%   2.2%  
CAN      22.4%  33.2%  13.4%   5.2%   2.0%  
JPN      19.4%  31.7%  11.5%   4.8%   1.6%  
BEL      20.5%  32.4%   8.6%   2.3%   0.6%  
SRB      12.4%  21.1%   6.0%   1.9%   0.5%  
NED      21.6%  35.5%   9.8%   2.0%   0.4%  
CHI      11.4%  18.5%   4.9%   0.9%   0.2%  
KAZ      11.3%  19.1%   3.8%   0.5%   0.1%  
COL       5.2%   9.0%   1.2%   0.2%   0.0%

Germany’s chances of winning the inaugural Pique Cup would fall from 7.7% to 2.2%, and Serbia’s odds drop from 6.0% to 0.5%. Argentina and France, the seeded teams sharing groups with Germany and Serbia, respectively, would be the biggest gainers from such high-profile absences.

Anybody’s game

I’ve been skeptical of the new Davis Cup, and while I remain unconvinced that it’s an improvement, I find myself getting excited for the weeklong tennis hootenanny in Madrid. These simulations were even more encouraging. As always, the ranking and seeding isn’t the way I’d do it, but in this format, the differences are minimal. The event format will give us a chance to see plenty of tennis from every qualifying nation, and the high level of competition from most of these countries ensures that most teams have a shot at going all the way.

Is Doubles As Entertaining As We Think?

For as long as I’ve been following tennis, there’s been a tension between the amount of doubles available to watch and the amount of doubles that fans say they want to watch. In-person spectators flock to doubles matches at grand slams and aficionados pass around GIFs of the most outrageous, acrobatic doubles points. Yet broadcasters almost always stick with singles, leaving would-be viewers chasing down online streams, often illegal ones.

There are some good reasons for that, foremost among them the marquee drawing power of the best singles players. Broadcasters are convinced that their audiences would rather watch a Fed/Rafa/Serena/Pova blowout than a potentially more entertaining one-on-one contest between unknowns, let alone a doubles match. And they’re probably right–at least, they’ve got ratings numbers to back them up. So we’re left with a small population of hipster doubles fans, confident that two-on-two is the good stuff, even if most of us rarely watch it.

It’s probably impossible to quantify entertainment value, but that doesn’t mean we shouldn’t try. What can the numbers tell us about the watchability of doubles?

Hip to be rectangular

There’s plenty of room for a diversity of preferences–one fan’s Monfils may be another fan’s Isner. But there are some general principles that seem to define entertaining tennis for most spectators. Winners are better than errors, for one. Long rallies are better than short ones, at least within reason. And you can never go wrong with more net play.

If net play were the only criterion, doubles would beat singles easily. But what about other factors? I started wondering about this while researching a recent post on gender differences in mixed doubles, when I came across a match in which every rally was four shots or fewer. For every brilliant reflex half-volley, doubles features a hefty dose of big serving and tactically high-risk returning. Especially in men’s doubles, that translates into a lot of team conferences and not very much shotmaking.

Let’s see some numbers. For each of the five main events at the 2019 Australian Open–men’s and women’s singles, men’s and women’s doubles, and mixed doubles–here is the average rally length, the percentage of points ended in three shots or less, and the percentage of points that required at least ten shots:

Event            Avg Rally  <3 Shots  10+ Shots  
Men's Singles          3.2     72.6%       5.1%  
Women's Singles        3.4     67.9%       5.4%  
Men's Doubles          2.5     81.6%       1.1%  
Women's Doubles        2.9     76.7%       2.4%  
Mixed Doubles          2.8     74.0%       1.8%

There's a family resemblance in these numbers, but it's clear that doubles points are shorter. Men's doubles is the most extreme, at 2.5 shots per point. By comparison, only 8% of the men's singles matches in the Match Charting Project database have an average rally length lower than that. More than four out of every five men's doubles points ends by the third shot, and with barely one in one hundred points lasting to ten shots, you'd be lucky to sit through an entire match and see more than one such exchange.

Quantity and quality

Shorter points are the nature of the format. Even recreational players can find it hard to keep the ball in play when half of each team is patrolling the net, looking for an easy putaway. Short-rally tennis can still be entertaining, as long as the quality of play offsets the unfavorable watching-to-waiting ratio.

I've mentioned my perception that men's doubles features a lot of unreturned serves. The numbers suggest that I spoke too soon. For the five events, here are the percentage of points in which the return doesn't come back in play:

Event            Unret%  
Men's Singles     31.7%  
Women's Singles   24.3%  
Men's Doubles     32.1%  
Women's Doubles   21.6%  
Mixed Doubles     29.3%

For men, singles and doubles are about the same. Perhaps the singles servers are a bit stronger, but the doubles returners are taking more chances, trying to avoid feeding weak returns to aggressive netmen. With women, you're more likely to see a return in play in a doubles match than in singles. Unless you're a connoisseur of powerful serves, you'll probably find higher rates of returns in play to be more enjoyable to watch.

The same applies to winners, compared to unforced errors. (Forced errors are a bit tricky--sometimes they are as exciting and indicative of quality as a winner; other times they're just an out-of-position unforced error.) Let's see what fraction of points end in various ways, for each of the five events:

Event            Unforced%  Forced%  Winner%  
Men's Singles        25.6%    16.2%    21.3%  
Women's Singles      28.9%    16.0%    23.4%  
Men's Doubles        12.8%    17.2%    29.9%  
Women's Doubles      20.9%    18.0%    32.1%  
Mixed Doubles        14.5%    17.0%    29.5%

Here, doubles is the clear winner. For both men and women, more doubles points than singles points end in winners, and fewer points end in unforced errors. Some of that reflects the much higher rate of net play, since it's easier to execute an unreturnable shot from just a few feet behind the net. There are a few more forced errors in doubles, perhaps representing failed attempts to handle volleys that almost went for winners, but no matter how we interpret them, the difference in forced errors is not enough to offset the differences in winners and unforced errors.

The hipsters weren't wrong

The numbers aren't as conclusive as I expected them to be. Yes, doubles points are shorter, but not so much so that the format is reduced to only serving and returning. (Though some men's matches are close.) As usual, our data has limitations, but the information available for each point suggests that there's plenty of high-quality, entertaining tennis to be seen on doubles courts, even if it's usually limited to four or five shots at a time.

Top Seed Upsets in ATP 250s

Italian translation at settesei.it

In a typical week, no one would notice if Fabio Fognini, Karen Khachanov, and Lucas Pouille combined to go 0-3. This week is different, as those three men held the top seeds at the ATP events in Cordoba, Sofia, and Montpellier. After their first-round byes, each of them lost in the second round, to Aljaz Bedene, Matteo Berrettini, and Marcos Baghdatis, respectively. At least two of the top seeds pushed their opponents to three sets, while Fognini lasted only 71 minutes.

This is not the first time a trio of number one seeds have suffered first-match upsets in the same week. Amazingly, it’s not even the first such occurrence in this very week on the calendar. Two years ago, when the South American event was played in Quito, the results were the same: top seeds Marin Cilic, Ivo Karlovic, and Dominic Thiem all failed to win a match. Thiem’s vanquisher, Nikoloz Basilashvili, even extended the streak the following week, heading to Memphis and handing Karlovic his second straight second-round ouster.

Predictable upsets?

Focusing on these losses, it’s natural to wonder whether top seeds are particularly fragile in this sort of tournament. There’s certainly a logic to it. The number one seed at an ATP 250 is usually ranked in the top 20, and is the sort of player who might have considered taking the week off. He knows that more ranking points are available at slams and Masters, so winning a smaller event isn’t his highest priority. His opponent, on the other hand, is competing every chance he gets, and the points on offer at a smaller event could make a big difference in his standing. Further, he has already played–and won–his first-round match, so he might be performing better than usual, or the conditions might suit him particularly well.

Let’s put it to the test. Since 2010, not counting this week’s carnage, I found 267 non-Masters events at which a top seed got a first-round bye and completed his second-round match. (Additionally, there have been three retirements and one withdrawal; only one of those resulted in a loss for the top seed.) The number one seeds had a median rank of 10, and the underdogs had a median rank of 89. Based on my surface-weighted Elo ratings at the time of each match, the favorites should have won 81.5% of the time. That’s better than this week’s trio of top-seeded losers, who were 64% (Fognini), 80% (Khachanov), and 69% (Pouille) favorites.

As it happened, the unseeded challengers were more successful than expected. The favorites won only 76.8% of those matches–a rate low enough that there is only a 3% probability it is due to chance alone. It’s not an overwhelming effect–certainly not enough that we should have predicted this week’s results–but it seems that a few of the top seeds are showing up unmotivated and a handful of the underdogs are playing better than expected.

Riding the wave

What about the underdog winners? Once they’ve defeated the top seed, how many capitalize on the opportunity? Berrettini came back to beat Fernando Verdasco in his quarter-final match today, while Baghdatis and Bedene play later. My forecasts believe that, of the three, Bedene has the best chance of claiming a title, though still less than a one-in-five shot at doing so.

In our subset of 267 matches, the underdog won 66 of them. More than half the time, though, that was the end of the run. 38 of the 66 (58%) fell in the quarter-finals. Another 17 lost in the semis. Whatever works so well for these underdogs in the second round disappears afterward. In the 105 matches contested by these 66 men in the quarter-finals and beyond, Elo thinks they should have won 44.9% of them. Instead, they managed only 42.3%.

There’s still a bit of hope. Five men knocked out the top seed in the second round and went on to win the entire tournament. One of those was a challenger we’ve already mentioned: Estrella, who knocked out Karlovic and went on to hoist the trophy in Quito two years ago. Maybe there’s some magic in week six. This week’s trio of underdogs would surely love to think so.

Bianca Andreescu’s Very, Very Good Week

Italian translation at settesei.it

WTA fans have grown accustomed to watching teenagers blast their elders off the court, but nobody expected this. 18-year-old Bianca Andreescu, ranked just outside the top 150, qualified for the season-opening Auckland event with three victories, overpowered Timea Babos in the first round, and then proceeded to knock out two former WTA No. 1s, Caroline Wozniacki and Venus Williams. She advances to the semi-final in just her fifth tour-level main draw and will jump at least a few dozen places in the rankings.

What makes Andreescu’s feat so notable is the pedigree of her opponents. Sure, Wozniacki was dealing with physical issues and Williams isn’t quite the unstoppable force she used to be, but fringe players like the Canadian teenager don’t knock out multiple former No. 1s very often.

Going back to 1984, I found just over 2,000 matches in which a top-ranked or former top-ranked player lost. Over 300 players have recorded a win against such an opponent, and elite players have accumulated a lot of these upsets. Serena Williams has beaten No. 1s or former No. 1s over 100 times, and Venus has done so 65 times, including her first-round win over Victoria Azarenka this week.

Andreescu’s achievement in Auckland was the 171st time (again, since 1984) that a player beat two or more such opponents at the same tournament, so we’ve seen it happen about five times per season. It has become more frequent in recent years, at least in part because there are so many former top-ranked players on tour, giving would-be giant-killers more opportunities. Most of the players who beat multiple No. 1s are themselves elite players: Serena accounts for 26 of the 171 tournaments, and Venus for another 9. Andreescu was the 71st different woman to pull off the feat.

At just over 18.5 years of age, the Canadian is one of the youngest players to beat multiple former No. 1s at the same event. She’s a bit older than Belinda Bencic was when she knocked out Serena, Wozniacki, and Ana Ivanovic in Toronto in 2015, but before that we need to go back to the 2006 French Open to find a woman who recorded similar upsets at an earlier age. Here is the full list of such feats accomplished at or before Andreescu’s age:

Event                 Player              Age  
1997 French Open      Martina Hingis     16.7  
1998 Key Biscayne     Anna Kournikova    16.8  
1998 Berlin           Anna Kournikova    16.9  
2006 French Open      Nicole Vaidisova   17.1  
2004 Wimbledon        Maria Sharapova    17.2  
1999 Indian Wells     Serena Williams    17.4  
1999 Key Biscayne     Serena Williams    17.5  
1987 Key Biscayne     Steffi Graf        17.7  
1988 Boca Raton       Gabriela Sabatini  17.8  
1999 Manhattan Beach  Serena Williams    17.9  
2005 Miami            Maria Sharapova    17.9  
1999 US Open          Serena Williams    17.9  
2015 Toronto          Belinda Bencic     18.4  
1996 Tokyo            Iva Majoli         18.5  
2019 Auckland         Bianca Andreescu   18.5

She wouldn’t be the first player on this list to flame out before taking a place among the all-time greats, but in general, that’s good company for an 18-year-old qualifier.

Andreescu stands out even more when we consider that she is ranked far outside the top 100. (At least for another few days.) Of the 171 occasions when a player knocked out two current or former No. 1s, none had done so with such a low ranking. The only other player to accomplish such a thing while outside the top 100 was Louisa Chirico, who beat Azarenka and Ivanovic at the 2016 Madrid event. The Canadian’s career-best week is only the 13th time that a player beat two such opponents while ranked outside the top 40, and a few of those instances came when a typically-great player’s ranking was recovering from time away:

Event                 Player              Age  Rank  
2019 Auckland         Bianca Andreescu   18.5   152  
2016 Madrid           Louisa Chirico     20.0   130  
2003 French Open      Nadia Petrova      21.0    76  
2017 Madrid           Eugenie Bouchard   23.2    60  
2007 Istanbul         Aravane Rezai      20.2    59  
2010 Australian Open  Maria Kirilenko    23.0    58  
2009 Beijing          Shuai Peng         23.7    53  
2014 Montreal         CoCo Vandeweghe    22.7    51  
2007 Beijing          Shuai Peng         21.7    49  
2005 Paris            Dinara Safina      18.8    48  
2015 Doha             Victoria Azarenka  25.6    48  
2018 Indian Wells     Naomi Osaka        20.4    44  
2014 Dubai            Venus Williams     33.7    44

Two shocking upsets are no guarantee of future success, but the demonstrated ability to defeat such elite veterans is probably more indicative of future success than winning a handful of ITF $25K titles (as she has) or lifting trophies for multiple junior grand slam doubles championships (as she did). On a tour already full of promising young stars, it took Andreescu only 48 hours to establish herself as one of the WTA teenagers most worth watching.

Measuring the Impact of Break Points

Yesterday I dove deep into tiebreak luck. I explained that while better players tend to win more tiebreaks, there’s no special tiebreak skill that causes certain players to perform better at the end of sets than they do at other stages of the match. Therefore, if a player has a long stretch of excellent or dismal tiebreak results, we should discard the tempting hypothesis that he or she possesses some special tiebreak talent and assume that he or she will post more average results in the future.

The same is true of break points. In any given season, you can find players who win or lose a disproportionate number of break points, and it’s tempting to point to mental strength by way of explanation. Yet more often that not, the unusual results disappear, along with any convincing case that we’ve identified a notably steely or flimsy tennis brain.

To quantify those over- and underachievements, I’ve attempted to measure the number of break points converted compared to the “expected” number, where the expectation is defined by how often the player wins return points. (It’s a bit more complicated than looking up a player’s single season return-points-won (RPW) rate. Instead, we consider their RPW for each match, and weight the matches according to how many break point opportunities occurred in the match.) For example, Gael Monfils converted 146 of his 317 break point chances last year, good for a 46.1% win rate. That far outstrips his weighted RPW of 38.7%. He claimed 23 more break points than expected, or an excess of 19%. Parallel to my approach with tiebreaks, I’ve named those stats, so the counting stat is Break Points Over Expectation (BPOE) and the rate stat is Break Points Overperformance Rate (BPOR).

(On average, returners win slightly fewer break points than non-break points. I’ve adjusted the “expected” level downward by 1.4% to account for this.)

Monfils was an outlier, the only player in 2018 to exceed +20 BPOE, and the only player with 40-plus matches to post a BPOR of more than 15%. Yet there was little in his past performance that would have told us what was coming. From 2009 to 2017, he had three negative seasons, two years indistinguishable from neutral, and four above average. Over the entire span, he won break points less than one percent more often than expected. The Frenchman’s pressure-point success in 2018 could be thanks to some newfound mental strength, but if history is any guide, he won’t continue to display whatever mix of luck and nerves led him to post his circuit-leading figures.

Here are the best and worst break point performances, by BPOE, posted by ATPers with at least 20 tour-level matches last year:

Player                 Chances  Won   BPOE  BPOR  
Gael Monfils               317  146   23.4  1.19  
Mackenzie Mcdonald         252  116   19.0  1.20  
Michael Mmoh               129   63   16.9  1.37  
Malek Jaziri               298  134   16.2  1.14  
Pierre Hugues Herbert      297  126   16.1  1.15  
Adrian Mannarino           318  136   14.1  1.12  
Ricardas Berankis          235  103   13.8  1.15  
Sam Querrey                290  118   13.8  1.13  
Martin Klizan              313  139   13.5  1.11  
Jan Lennard Struff         272  118   13.4  1.13  
                                                  
Marton Fucsovics           414  162  -11.5  0.93  
Filip Krajinovic           238   86  -11.8  0.88  
Evgeny Donskoy             239   79  -11.9  0.87  
Stan Wawrinka              217   66  -11.9  0.85  
Aljaz Bedene               303  108  -12.9  0.89  
John Isner                 308   85  -13.0  0.87  
Mischa Zverev              347  123  -14.1  0.90  
Marin Cilic                568  209  -18.1  0.92  
Joao Sousa                 484  176  -21.6  0.89  
Novak Djokovic             617  246  -21.7  0.92

It’s striking to see Novak Djokovic at the bottom of the list, nearly as bad or unlucky as Monfils was good or fortunate. Yet Novak’s story is surprisingly similar to Gael’s. From 2009 to 2017, his overall BPOR was 0.997–almost precisely neutral–and he posted nearly as many positive seasons as negative ones.

Yep, it’s random

To give more player-specific examples would only belabor the point: A player’s performance on break points (independent of his overall return-point skill) has no relationship from one year to the next. I found 700 pairs of consecutive player-seasons between 2009 and 2018 (for example, Djokovic’s 2017 and 2018) and found that the correlation between the two seasons was effectively zero. (r^2 = 0.002)

Here’s one more illustration of the point. This table shows the ten players who recorded the highest 2017 BPOR figures of those men who played at least 20 ATP matches in both 2017 and 2018. The right-most column shows what they did the following year:

Player             2017 BPOR  2018 BPOR  
Damir Dzumhur           1.16       1.05  
Alexander Zverev        1.15       1.02  
Nicolas Kicker          1.15       1.04  
Peter Gojowczyk         1.14       0.92  
Dusan Lajovic           1.13       1.04  
Mikhail Kukushkin       1.13       0.94  
Mischa Zverev           1.13       0.90  
John Isner              1.12       0.87  
Andrey Rublev           1.12       0.96  
Thiago Monteiro         1.12       1.17  
AVERAGE                 1.14       0.99

Only Thiago Monteiro continued to be successful enough to maintain a place amid the tour leaders; John Isner’s follow-up campaign was so different that he registered as one of the tour’s worst in 2018. Taken together, five of 2017’s top ten ended 2018 below average, and the ten men combined for a BPOR just a bit worse than neutral. This is all just another way of saying we’re looking at something indistinguishable from chance.

Putting a price tag on good fortune

We’ve established that break point performance in the present has nothing to tell us about break point performance in the future. But as I pointed out in yesterday’s post about tiebreaks, that very lack of predictiveness has value.

Monfils’s BPOE of +23 helped his overall cause, helping him rack up more victories in 2018 than he otherwise would have. His break point results probably boosted his ranking and prize money tally. Reverting to neutral break point performance won’t knock him off tour, but assuming he continues to serve and return at the same level he did last year, a more pedestrian BPOE could hurt his cause. But how much?

Yesterday I suggested that two additional tiebreaks are equal to one additional win. Break points are a bit more complicated–clearly a single break point is not as valuable as an entire tiebreak, both because it is a single point and because it rarely offers the player a chance to finish off an entire set or match. On the other hand, break points are more numerous, and figures Monfils’s +23 and Djokovic’s -21 are more extreme than the most unexpected tiebreak performances.

Measuring high-leverage points

The key to measuring the impact of break points is the general concept of win probability, and the more specific notion of leverage. (Leverage is often referred to as volatility or importance; these are all the same basic idea.) Win probability is simply a measure of each player’s chances of winning the match at any given stage. Leverage is an index of how much a single point can affect that probability. Say two equal players embark on a new match. Before the first ball is struck, each have a 50% chance of emerging victorious. If winning the first point increases the server’s chance of winning to 51% while losing it decreases his probability to 49%, we would say that the leverage of the first point is 2%–the difference between the win probabilities that would result from winning or losing the point.

The more crucial the point, the higher the leverage. The typical point is well below 5%, but a truly high-pressure moment, like 5-6 in a third-set tiebreak, can be as high as 50%.

Win probability stats depend a great deal on the inputs you choose, so there’s no single mathematically correct leverage measurement at any given moment. If you think two players are equal, your estimate of the win probability at the start of the match is very different than if you think one of the competitors is a heavy favorite. Those judgements affect the leverage of every point as well. Still, for aggregates of large numbers of matches–say, an entire season–we can get a general idea of the value of break points.

Necessary assumptions

If we make the simple but clearly wrong assumption that all players are equal, the leverage of the average point on the ATP tour last year was 4.6%, and the leverage of the average break point on tour last year was 10.5%. Those numbers are useful as a starting point, but they are clearly too high; when we accept that most matches are not contested between players of equal skill, we realize that any given break point isn’t quite that important–if Djokovic fails to convert one against Monteiro, he’ll remain almost certain to win the match.

One alternative approach is to assume that each player’s skill level is represented exactly by their performance in a given match. So if Djokovic plays Monteiro and wins 80% of service points, while Monteiro wins only 60%, we could calculate the win probability and leverage of every point using those numbers. Using that method, we get a leverage of 2.9% on the average point and 6.5% on the average break point.

The second assumption is also not exactly right, but it probably gets closer to the truth than the first. Keeping in mind that it’s an approximation, let’s use a break point leverage of 7.5%. That figure means that, on average, changing the result of a single break point affects the win probability of a single match by 7.5%. Another of way of thinking about it–the one most relevant to the task at hand–is that winning a break point instead of losing it is equivalent to winning 7.5% (or about one-thirteenth) of a match.

Break points are (fractions of) wins

Returning to the concept of BPOE, we can now say that 13 additional break points is equivalent to one additional win. Monfils’s 2018 tally of +23 was good for almost two extra victories over the course of the season, and Djokovic’s count of -21 would, on average, cost him 1.5 matches. Given the multitude of other factors influencing each man’s performance, it’s unreasonable to expect either player’s won-loss record in 2019 to bounce back so predictably and precisely. (Especially since it’s impossible to win 1.5 matches.) But in the unlikely event that all else is equal, we should expect those advantages and disadvantages to disappear in the new season.

The range of minus-21 to plus-23 break points is a decent representation of how extreme break point luck can be. Since 2009, only four players have posted single-season numbers above +23, including the most extreme BPOE of +34, accumulated by Damir Dzumhur in 2017. (Dzumhur was hit hard by the ensuing reverse in fortune: His 2017 tour-level record was 37-24, but in 2018, when his BPOE fell to a still-lucky +8, his record dropped to 25-31.) At the opposite extreme, Dominic Thiem suffered from a tally of 28 break points below expectations in 2015. A year later, he bounced back to minus-5, and his ranking improved from 19th to 9th. Despite the roller-coaster descents and climbs of Dzumhur and Thiem, the range of the break-point-luck effect appears to be about five wins, from about minus-2 wins at the low end to plus-3 for the players most favored by fortune.

For most players in most seasons, however, break point luck is little more than a rouding error. And while it’s easy to get sucked into the measurements I’ve laid out, that’s the most important point of all: Just like there’s no special tiebreak factor, there’s no reason to think that certain players are somehow better at break points than others. The better a player’s return game, the more break points he’ll convert. Anything beyond that will eventually regress to the mean. And for players with extremely strong or weak break point performances, that regression is likely to have effects that extend to the overall won-loss record, ranking, and beyond.