Nick Kyrgios is More Predictable Than We Think

Italian translation at settesei.it

There is a persistent belief among tennis fans and commentators that some players are particularly inconsistent. For today’s purposes, I’m talking about match-to-match results, the players who have a knack for upsetting higher-ranked opponents but are also particularly susceptible to losses against weaker players. We have a range of words for this, like unpredictable, dangerous, tricky, and the preferred term for Nick Kyrgios: mercurial.

So far in 2019, Kyrgios has provided a perfect example of the inconsistent type. After early losses to Jeremy Chardy and Radu Albot, he bounced back to win last week’s ATP 500 in Acapulco, knocking out Rafael Nadal, Stan Wawrinka, John Isner, and Alexander Zverev. There’s no question that the Australian possesses more talent than his ranking would suggest. This is a guy who has yet to crack the top ten, but holds a .500 record in completed matches against the Big 3, a feat managed by no other active player (minimum 5 matches, excepting Nadal and Novak Djokovic themselves).

He sounds inconsistent. His results look unpredictable. But compared to the uncertainty that comes with every tennis match between highly-ranked professionals, how does he stack up? As my headline suggests, it’s not as clear-cut as it seems.

Measuring predictability

Consider the opposite type, a player who reliably beats lower-ranked opponents and usually loses against his betters. Roberto Bautista Agut has this type of reputation. As we’ll see, the numbers bear it out, notwithstanding his Doha upset of Djokovic a couple of months ago. If someone really is so predictable, that should show up in a comparison of his pre-match forecasts to his results. For a Bautista Agut type, the forecasts would be particularly accurate, while for a Kyrgios type, the forecasts would be much less reliable.

We already have a metric for this. Brier Score measures the accuracy of forecasts, considering not just how often predictions proved correct, but how close they came. For instance, after Kyrgios beat Zverev in Saturday’s Acapulco final, those prognosticators who gave the Aussie a 90% chance of winning were “more” correct than those who gave him a 60% shot. On the other hand, too much confidence runs the risk of a worse Brier Score–if you’re always giving tennis favorites a 90% chance of winning, you’ll often be wrong. Brier Score is the average of the squared difference between the pre-match forecast (e.g. 90%) and the result (1 or 0, depending if the pick was correct).

Brier Scores for ATP forecasting hover around the 0.2 mark. A lower Brier Score is better, representing less difference between prediction and results, so if you can come in much lower than 0.2, you should be making money betting on matches. If you’re much higher than 0.2, you might as well be flipping a coin. If we use random, 50/50 pre-match predictions, the resulting Brier Score is 0.25.

Brier-gios

If a player is truly unpredictable, the Brier Score for his matches should approach the 0.25 mark, and it should definitely exceed the tour-typical 0.2. To measure the reliability of pre-match forecasts for Kyrgios and other players, I used my surface-weighted Elo ratings for every completed tour-level main draw match since 2000 and generated percentage forecasts for each one. By this method, Zverev had a 67.4% probability of winning the Acapulco final.

So far in 2019, Kyrgios does look truly unpredictable. The Brier Score of his ten match results is 0.318, meaning that we’d have done better by simply flipping a coin to forecast the result of each of his matches. Even if we retroactively increase his chances of winning each match to account for the fact that he’s playing better than his Elo rating predicted, the Brier Score is 0.277, still worse than coin flips.

On the other hand, it’s just ten matches. Several other players have 2019 Brier Scores well over the 0.25 threshold, including Frances Tiafoe, Joao Sousa, Juan Ignacio Londero, and Felix Auger Aliassime. In a handful of tournaments, you’ll always get a few oddball results, either because of marked improvements (as is likely with Auger Aliassime) or extreme good or bad luck. Unless we’re willing to say that Sousa and Londero are remarkably unpredictable players, we shouldn’t draw the same conclusion based on Kyrgios’s last ten matches.

What you predict is what you get

The Brier Score for Elo-based forecasts of Kyrgios’s career matches at tour level is 0.219. That’s higher–and thus less predictable–than average, but not by that much. Of the 280 players with at least 100 tour-level matches this century, Kyrgios ranks 84th, more reliable than 30% of his peers. In 2017, his results were quite unpredictable, with a Brier Score of 0.244, but in 2015 and 2016 they generated a more pedestrian 0.210, and last year they looked downright predictable, at 0.177.

The Australian may be quite unpredictable in tactics, point-to-point performance, or on-court behavior, but his results just aren’t that unusual. The following table shows the 15 most unpredictable active players, as measured by Brier Score, along with Kyrgios, followed by the 15 most predictable active players:

Player                 Matches  Brier  
Lucas Pouille              189  0.247  
Andrey Rublev              106  0.245  
Benoit Paire               377  0.239  
Ivo Karlovic               650  0.239  
Stefanos Tsitsipas         100  0.232  
Karen Khachanov            154  0.231  
Peter Gojowczyk            102  0.231  
Federico Delbonis          225  0.227  
Marius Copil               108  0.227  
Damir Dzumhur              173  0.227  
Ernests Gulbis             420  0.226  
Pablo Cuevas               338  0.226  
Mischa Zverev              297  0.226  
Joao Sousa                 323  0.226  
Borna Coric                210  0.226  
...                                       
Nick Kyrgios               191  0.219  
...                                       
Matthew Ebden              171  0.188  
David Goffin               344  0.188  
Marin Cilic                684  0.186  
Richard Gasquet            770  0.183  
Tomas Berdych              911  0.182  
Milos Raonic               448  0.178  
David Ferrer              1048  0.177  
Jo Wilfried Tsonga         600  0.175  
Roberto Bautista Agut      384  0.172  
Kei Nishikori              517  0.167  
Juan Martin Del Potro      560  0.160  
Andy Murray                802  0.146  
Roger Federer             1350  0.121  
Novak Djokovic             951  0.117  
Rafael Nadal              1060  0.114 

Lucas Pouille’s results have been almost impossible to forecast. The Brier Score generated by his 2018 results was nearly 0.3, suggesting it would have been smarter to calculate a forecast and then bet against it! Ivo Karlovic also shows up among the less reliable players, though it’s not clear whether that’s due to his unusual game style. Isner, the only decent parallel we have, is as reliable as the tour in general, with a career Brier Score of 0.201. Reilly Opelka, the other towering ace machine in the ATP top 100, has defied the odds so far in 2019, but he hasn’t yet amassed enough data to draw any conclusions.

At the other end of the spectrum, the most reliable players are many of the best. That adds up: A dominant player not only wins most of the matches he should, but his performance also allows us to make more aggressive forecasts. Nadal often enters matches with a 90% or better probability of winning, and confident predictions like that–as long the player converts them into wins–are what generate the lowest Brier Scores.

Consistent consistency results

We all tend to read too much into unusual results. Kyrgios has given us plenty of those, and we’ve repaid the favor by making him out to be even more of a wild card than he is. A couple of weeks ago, I took on a similar question and found that ATPers don’t really “play their way in” to tournaments, earning better or worse results in different rounds. This isn’t quite the same issue, but it all comes back to similar truths: Existing forecasts are pretty good, there’s always going to be a lot of randomness in the results, and the stories we invent to account for the randomness don’t really explain much at all.

Kyrgios is an immensely interesting player–I joked in yesterday’s podcast that readers should prepare themselves for a ten-part series–and digging into his point-by-point stats could reveal characteristics that are unique among tour players. That is still true. But at the match level, the likelihood that his contests will end in upsets isn’t unique at all–even if he is the proud new owner of a sombrero that says otherwise.

Podcast Episode 51: 100 For Federer, and 50.4% for Kyrgios

Episode 51 of the Tennis Abstract Podcast, with Carl Bialik of the Thirty Love podcast, starts with a discussion of Roger Federer’s 100th title, and the chances that he’ll reach Jimmy Connors’s record of 109. We also consider the odds that Rafael Nadal and Novak Djokovic will join him in the triple-digit title club.

We continue with a lot of Nick Kyrgios talk, attempting to make sense of his ability to win matches despite losing the majority of points, as well as his abilities to unsettle opponents and cause commentators to say questionable things. We also do a bit of an Indian Wells preview to close out the hour.

Finally: My apologies for the poor editing of last week’s episode. You won’t run into any similar issues with this one.

Thanks for listening!

(Note: this week’s episode is about 62 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Around the Net, Issue 3

Around the Net is my attempt to provide a clearinghouse for tennis analytics on the web. Each week, you’ll find a summary of recent articles, podcasts, papers, and data sources, as well as trivia and the occasional bit of interesting non-tennis content. If you would like to suggest something for a future issue, drop me a line.

Articles

Multimedia

Data

  • Match Charting Project: The dataset has grown by more than 50 matches in the last week, from 5,143 to 5,194. Highlights include the 100th charted Elina Svitolina match, all of last week’s tour-level finals, and several classic Pete Sampras Wimbledon matches, which round out our complete set of his semi-finals and finals at the All-England Club.

Trivia

  • Nick Kyrgios beat Rafael Nadal in Acapulco, but he didn’t exactly play better than the Spaniard. Nadal’s dominance ratio (DR) in the match was 1.36, higher than in any loss of his career. (Usually a DR larger than 1.0 corresponds with a win.)
  • The Acapulco upset means that Kyrgios improved his record in completed matches against the Big Three to 6-6. Only three other players (plus Djokovic and Nadal) have won at least half of their matches against the famous trio, minimum five matches. Kyrgios joins Alex Corretja, Yevgeny Kafelnikov, and Dominik Hrbaty.
  • Nadal wasn’t the only unlucky loser this week. Henri Kontinen and John Peers lost their first round doubles match in Dubai to Raja/Nedunchezhiyan despite winning 59% of total points–20 more than their opponents.
  • Gael Monfils discovered that a well-timed exclamation can give his forehead a bit of extra juice.
  • Last week in Bergamo, 17-year-old Jannik Sinner won his first Challenger title, becoming the youngest ever champion from Italy, the first born in 2001, and youngest since Alexander Zverev won his first challenger at Braunschweig in 2014.
  • Felix Auger-Aliassime is a bit older, but by reaching the final in Rio de Janeiro, he became the first 2000-born player to crack the ATP top 100.

Beyond the Net

Thanks to Peter for help with this week’s issue.

The Best Draw That Money Can Buy

Italian translation at settesei.it

Last week featured two events on the WTA calendar. First, both chronologically and by every conceivable ranking except for “most Hungarian,” was the Dubai Open, a Premier 5 event offering over $500,000 and 900 ranking points for the winner. The other was the Hungarian Open in Budapest, a WTA International tournament with $43,000 and 280 ranking points going to the champion. No top player would seriously consider going to Budapest, even before considering potential appearance fees and WTA incentives.

Fifteen of the top twenty ranked women went to Dubai, and the top seed in Budapest, defending champ Alison Van Uytvanck, was ranked 50th. Every Budapest entrant ranked in the top 72 got a top-eight seed, including a couple of players who would have needed to play qualifying just to earn a place in the Dubai main draw.

The rewards offered by the Dubai event and supported by the structure of the WTA tour make this an easy scheduling decision for many players. But at some point, if the rest of the field is zigging toward the Gulf, might it be better to zag toward Central Europe? Van Uytvanck would have been an underdog to reach even the third round of the richer event, yet she defended her title in Budapest. Marketa Vondrousova, who would have been stuck in Dubai qualifying, reached the Hungarian Open final. Opting for the smaller stage almost definitely proved the wise choice for those two women. Did other, better-ranked players leave money or ranking points on the table?

Motivations

Scheduling decisions depend on a lot of factors. Some women might prefer to play the event with the highest-quality field, both to test themselves against the best and to give themselves an opportunity for the circuit’s richest prizes. Others might head for the marquee events because of their doubles prowess: Timea Babos was part of the top-seeded doubles team in Dubai, but was the lowest-ranked direct entry in singles. Still others might choose to play closer to home or at tournaments they’ve enjoyed in the past.

For all that, ranking points should come first, with prize money also among the top considerations. Ranking points determine one’s ability to enter future events and to remain on tour. Prize money is necessary to cover the vast expenses necessary to bankroll a traveling support staff.

Dubai-versus-Budapest offers a fairly “pure” experiment, because both are played on similar surfaces and neither event is in the middle of a mini-circuit of events in a single region. Yes, Dubai immediately follows Doha, but that trip requires a flight, and most players headed back to Europe or North America after the tournament. Opting for one event over the other doesn’t substantially complicate anyone’s travel plans, like it would for an ATPer to mix and match destinations from the South American golden swing and the simultaneous European indoor circuit.

Revealed preferences

Let’s see which of the two main factors played a bigger role in scheduling decisions last week. To determine each player’s options, I tried to reconstruct as much as possible what information each woman had at her disposal six weeks earlier, on January 7th, when entry applications and stated preferences for Dubai and Budapest were due. I used the January 7th rankings to project how a player would be seeded at either event, and Elo ratings as of that date to forecast how far she would advance in each draw.

The major difficulty of this kind of simulation is the composition of the draws themselves. From our vantage point after the events, we know who opted for each draw as well as which players were unable to compete. In early January, none but the best-connected players would have known which of her peers would head in which direction, and no one at all could have known that Caroline Wozniacki would be a late withdrawal from Dubai, or that a viral illness would knock Kirsten Flipkens out of the Hungarian Open. Still, the resulting 2019 draws were very similar to what players could have predicted based on the player fields in 2018. So to simulate each player’s options, we’ll use the fields as they turned out to be.

Let’s start with Carla Suarez Navarro, the highest-ranked woman (at the January 7th entry deadline) who wasn’t seeded in Dubai. She ended up reaching the quarter-finals at the Premier event, in part because Kristina Mladenovic did her the favor of ousting Naomi Osaka from that section of the draw. For her efforts, Suarez Navarro grabbed 190 ranking points and almost $60,000. She would have needed to win the Budapest title to garner more points. And with a champion’s purse of “only” $43,000 in Hungary, she would have needed to rob a bank to improve on her Dubai prize money check.

However, that isn’t what Suarez Navarro should have anticipated taking home from Dubai. Sure, she should be optimstic about her own potential, but smart scheduling demands some degree of realism. I ran simulations of both the Dubai tournament (before the draw was made, so she doesn’t always end up in Osaka’s quarter) and the Budapest event with the Spaniard as the top seed and the rest of the field (minus last-in Arantxa Rus) unchanged. These forecasts suggest that Suarez Navarro only had a 12% chance of reaching the Dubai quarters, and that her expected ranking points in the Gulf were much lower:

Event     Points  Prize Money  
Dubai         76     $28.121   
Budapest     111     $15.384

(prize money in thousands of USD)

In all of these simulations, I’ve calculated points and prize money as weighted averages. Suarez Navarro had a 37% chance of a first-round loss, so that’s a 37% chance of one ranking point and first-round-loser prize money. And so on, for all of the possible outcomes at each event. For the Spaniard, her expected ranking points were nearly 50% higher as the top seed in Budapest. But because the Dubai prize pot is so much larger, her expected check was almost twice as big at the tournament she chose.

Consistent incentives

The total purse in Dubai was more than eleven times bigger than the prize money on offer in Hungary, while the points differed by only a factor of three. Thus, it’s no surprise that Suarez Navarro’s incentives are representative of those faced by many more women. I ran the same simulations for 26 more players: All of the competitors who gained direct entry into Dubai but were unseeded, plus Bernarda Pera, who would have been seeded in Budapest but instead played qualifying in the Gulf.

The following table shows each player’s expected points and prize money for Dubai (D-Pts and D-Prize), along with the corresponding figures for Budapest (B-Pts and B-Prize):

Player                    D-Pts   D-Prize   B-Pts   B-Prize   
Dominika Cibulkova           96   $36.794     130   $18.291   
Lesia Tsurenko               84   $31.528     119   $16.695   
Carla Suarez Navarro         76   $28.121     111   $15.384   
Aliaksandra Sasnovich        75   $27.920     111   $15.364   
Dayana Yastremska            72   $26.716     107   $14.803   
Anastasia Pavlyuchenkova     72   $26.590     106   $14.721   
Barbora Strycova             67   $24.809     102   $14.096   
Donna Vekic                  66   $24.143     100   $13.717   
Katerina Siniakova           63   $23.157      95   $13.062   
Ekaterina Makarova           58   $21.543      90   $12.265   
                                                              
Player                    D-Pts   D-Prize   B-Pts   B-Prize   
Petra Martic                 57   $21.019      88   $11.960   
Su Wei Hsieh                 54   $19.863      84   $11.396   
Belinda Bencic               53   $19.813      84   $11.372   
Ajla Tomljanovic             53   $19.530      82   $11.181   
Shuai Zhang                  49   $18.350      77   $10.416   
Sofia Kenin                  46   $17.109      72    $9.659   
Ons Jabeur                   45   $17.077      71    $9.624   
Viktoria Kuzmova             45   $17.009      70    $9.432   
Alize Cornet                 44   $16.823      69    $9.280   
Saisai Zheng                 40   $15.436      62    $8.307   
                                                              
Player                    D-Pts   D-Prize   B-Pts   B-Prize   
Vera Lapko                   37   $14.618      57    $7.695   
Mihaela Buzarnescu           36   $14.465      56    $7.548   
Alison Riske                 35   $14.309      55    $7.445   
Kristina Mladenovic          34   $13.910      51    $6.969   
Timea Babos                  32   $13.354      48    $6.572   
Yulia Putintseva             32   $13.407      48    $6.484   
Bernarda Pera*               25   $11.830      36    $5.061

Every single player could have expected more points in Budapest and more money in Dubai. The ratios are all similar to Suarez Navarro’s. The one possible expection is Pera (hence the asterisk). My simulation assumed she came through qualifying to make the main draw, and calculated only her expected points and prize money from main draw matches. Yet simply qualifying for the main draw is worth 30 ranking points, plus whatever points a player earns by winning main draw matches. Pera was no lock to qualify, but she was favored, and usually a couple of lucky loser spots make the main draw even more achieveable. It’s possible that if we ran all those scenarios, Pera is the one player for whom Dubai offered better hopes of prize money and points.

Loss aversion and game theory

It’s no accident that Van Uytvanck was one of the few players to choose the high-points, low-prize money route. She was defending 280 points from last year’s Hungarian Open, meaning that opting for a bigger check in Dubai would have a negative impact on her ranking. The thought of losing a couple hundred ranking points has a greater influence on behavior than the chance of gaining the same amount for a player who has few to defend.

For the majority of women who will face the same decision in 2020 without many points to defend, what should they do? Assuming, as I do, that they and their coaches will all carefully study this article, what happens if more top-70 players decide to chase ranking points and flock to the smaller event?

If the Budapest field gets stronger, each entrant’s expected points and prize money will decrease; if Dubai’s field weakens, each player there can anticipate a better chance of more points and even more money. As the entry system is currently structured, in which each player must state their preferences without knowledge of their peers’ choices, we can’t count on reaching an equilibrium. Even if every single player aimed solely to maximize ranking points, there wouldn’t be enough information available to reliably make the right choice. It’s conceivable, though unlikely, that a Budapest could attract a stronger field and end up offering lower expected prize money checks and ranking points.

But don’t fret, dear readers and schedule optimizers. There are external factors and there always will be. And in this case, virtually all of those factors pull players to the bigger money event. (Even Hungarian heroine Babos skipped her home tournament.) At least a half-dozen of the players listed above are doubles elites, making it likely they’ll choose the Premier event. Others–probably many others–will go where the money is, because they like money.

Even those who don’t play doubles and don’t like money will chase the biggest available pot of ranking points, not entirely unlike the way people play the lottery. The WTA offers a very limited set of opportunities to earn 900 points in a single week. You can get close to 900 points with three International championships, but there’s a finite number of weeks on the annual schedule–not to mention a limited number of matches in each player’s body! Lots of people stock up on lottery tickets despite unfavorable odds, and players will continue to enter higher-profile events even if their expected points are higher on smaller stages. The chance of a prestigious title, however slim, doesn’t show up in a purely actuarial calculation.

The success of Belinda Bencic–expected Dubai points, 53; expected Budapest points, 84; actual Dubai points, 900–will keep players chasing the big prizes. That’s good news for level-headed would-be optimizers. Those players willing to forego the skyscrapers, the shopping malls, and the prize money next year aren’t about to lose this opportunity. Budapest will almost certainly remain a better option for players who want to improve their ranking.

Belinda Bencic Won a Historically Difficult Title, Just Not Last Week

Italian translation at settesei.it

Belinda Bencic is back among the WTA elites. Last week in Dubai, she won her first Premier-level title since 2015, knocking out four top-ten players in the process. They were hardly dominant victories, with all four going to deciding sets and two of the four culminating in final-set tiebreaks, but there is no question that the 21-year-old Swiss is once again a threat at the tour’s biggest events.

Her string of top-ten victories leaves us to wonder how her title stacks up against similar feats in the past. Most relevant is the path Bencic took to her last Premier title, the 2015 Canadian Open. Four years ago in Toronto, she defeated four members of the top six, including then-top-ranked Serena Williams in the semi-final and Simona Halep in the championship match. Even the two lower-ranked opponents she faced that week were dangerous players then ranked in the top 25, Eugenie Bouchard and Sabine Lisicki. Those two presented more serious challenges than Bencic’s first two matches last week against Lucie Hradecka and Stefanie Voegele.

Spoiler alert: Toronto was the tougher path. It wasn’t the most difficult of all time, but it’s in the conversation. Bencic’s Dubai title surely wasn’t easy, but it wasn’t quite as unusual as last weekend’s press made it out to be.

Quantifying path difficulty

This is something we’ve done before. I’ve written several articles comparing the quality of opposition faced in slams, particularly as it applies to the ATP’s big three. It’s more complicated to compare all WTA events, in part because there are so many different levels of tournament, and the categorizations have changed over the years. But we can wave some of that aside for today’s purposes.

Here’s the simple algorithm to measure the difficulty of a player’s path to a title:

  • Pick a standard Elo rating for the type of tournament won. (In this case, we’re using 1900 for hard-court wins. We’d use lower numbers for clay and grass, but it gets complicated, and it’s more practical for today’s purposes to focus solely on hard-court events.)
  • Find the surface-weighted Elos of each opponent she played in the tournament
  • For each opponent, calculate the odds using the standard Elo rating and the opponent’s Elo rating.
  • Calculate the difficulty for each match as one minus the odds in the previous step.
  • Sum the single-match difficulties.

In the grand slam exercises I’ve done in the past, I’ve taken a final step of normalizing the results so that an average major title is exactly 1.0. Here, the idea of ‘average’ is more nebulous, so we’ll leave our results un-normalized.

The average difficulty of a hard-court title (excluding majors and year-end championships) is about 1.8. Bencic’s 2015 Toronto run was 3.64, and her path last week was 3.01.

It’s hotter in Miami (and Indian Wells)

One of the variables that influences path difficulty is number of matches. Bencic played six last week (as she did at the 2015 Canadian Open), but the top eight seeds played only five. At Indian Wells and Miami, the top 32 seeds play up to six matches, but those might be expected to present more challenges than Bencic’s six in Dubai, since the round-of-64 opponent has already won a match.

Certainly it has turned out that way. Here are the top ten most difficult hard-court WTA title paths since 2000:

Year  Event          Winner             Matches  Difficulty  
2010  Miami          Kim Clijsters            6        3.80  
2011  Miami          Victoria Azarenka        6        3.78  
2007  Miami          Serena Williams          6        3.65  
2015  Canadian Open  Belinda Bencic           6        3.64  
2012  Indian Wells   Victoria Azarenka        6        3.59  
2018  Cincinnati     Kiki Bertens             6        3.54  
2000  Miami          Martina Hingis           6        3.46  
2002  Miami          Serena Williams          6        3.45  
2008  Miami          Serena Williams          6        3.37  
2013  Miami          Serena Williams          6        3.35

Seven of the ten are from Miami, an event with a grand-slam-like field. Indian Wells is similar, but featured a weaker draw for most of the 21st century because Serena and Venus Williams chose not to play there. Bencic’s Toronto run is one of only two in the top ten outside of the March sunshine swing. The other is Kiki Bertens’s path to last year’s Cincinnati title, in which she also defeated Halep, Petra Kvitova, and Elina Svitolina, albeit not quite in the same order than Bencic did last week.

Also hot in Dubai

I calculated title difficulty for about 600 hard-court champions going back to 2000. Bencic’s Dubai path doesn’t register among the very most challenging, but it still stands above most of the pack. Here are the next 25 toughest routes, including every path rated a 3.0 or above:

Year  Event         Winner              Matches  Difficulty  
2016  Wuhan         Petra Kvitova             6        3.32  
2000  Indian Wells  Lindsay Davenport         6        3.32  
2014  Beijing       Maria Sharapova           6        3.30  
2008  Olympics      Elena Dementieva          6        3.27  
2009  Indian Wells  Vera Zvonareva            6        3.27  
2007  Indian Wells  Daniela Hantuchova        6        3.23  
2002  Filderstadt   Kim Clijsters             5        3.23  
2013  Beijing       Serena Williams           6        3.21  
2018  Doha          Petra Kvitova             6        3.18  
2002  Los Angeles   Chanda Rubin              5        3.18  
2000  Los Angeles   Serena Williams           5        3.16  
2009  Miami         Victoria Azarenka         6        3.15  
2003  Miami         Serena Williams           6        3.13  
2002  Indian Wells  Daniela Hantuchova        6        3.10  
2018  Wuhan         Aryna Sabalenka           6        3.08  
2008  Indian Wells  Ana Ivanovic              6        3.08  
2012  Tokyo         Nadia Petrova             6        3.08  
2010  Sydney        Elena Dementieva          5        3.06  
2010  Indian Wells  Jelena Jankovic           6        3.03  
2000  Sydney        Venus Williams            6        3.02  
2000  Sydney        Amelie Mauresmo           4        3.02  
2019  Dubai         Belinda Bencic            6        3.01  
2009  Tokyo         Maria Sharapova           6        3.00  
2002  San Diego     Venus Williams            5        3.00  
2001  Sydney        Martina Hingis            4        2.99

There’s Belinda again, at 32nd overall. Historically, the February tournaments in the Gulf haven’t been the toughest on the calendar, at least compared with Indian Wells, Miami, and Sydney. Yet Kvitova took an even more difficult path to the title last year in Doha. (Dubai and Doha trade tournament levels each year. As a Premier 5, Doha was worth more points in 2018; Dubai took over the status and was worth more points in 2019.) She also plowed through four top-ten opponents, and she needed to beat 33rd-ranked Agnieszka Radwanska just to earn a place in the round of 16.

Strong but weaker

Again, Bencic’s Dubai title was an impressive feat. But as we’ve seen, it pales in comparison with her previous Premier title. I suppose she might have won anyway if faced with more difficult competition, but that pair of third-set tiebreaks suggests she was pushed to the limit as it was.

While the current WTA field is extremely deep, packed with very good players, the lack of one historically great superstar (or more!) shows up in the Elo ratings. Of the 35 champions shown in the two tables above, 12 had to beat a player with a surface-weighted rating of 2240 or higher, and 12 more needed to get past an opponent rated 2100 or above. Bencic’s toughest task last week was Halep, at 2054. While it isn’t easy to knock off several consecutive foes in the 2000 range, it’s not the same as including one victory over a superstar like Serena, Venus, Maria Sharapova, or Victoria Azarenka at her peak.

At the 2015 Canadian Open, Bencic counted Serena among the vanquished. Maybe in another four years, when the Swiss is due for her next odds-defying Premier title, she’ll face down a couple of new young superstars and earn a place at the top of this list.

Podcast Episode 50: Easy Draws, Tough Draws, and the Difficulty of Forecasting

Episode 50 of the Tennis Abstract Podcast, with Carl Bialik of the Thirty Love podcast, grapples with familiar questions of forecasting, and the difficulty of incorporating various level of achievements into our predictions. Belinda Bencic won a major title after plowing through some of the WTA’s strongest competition, while four ATP finalists–Laslo Djere, Felix Auger-Aliassime, Radu Albot, and Daniel Evans–enjoyed breezier paths to trophy ceremonies and sharp rises on the ranking table.

We also talk a bit about the return of Federer and Nadal in Dubai and Acapulco, respectively, possible explanations for the (apparent) weakness in clay-court ATP 500s, our latest evaluation of Casper Ruud, and the lack of tennis analytics at this week’s Sloan conference.

Thanks for listening!

(Note: this week’s episode is about 64 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Around the Net, Issue 2

Around the Net is my attempt to provide a clearinghouse for tennis analytics on the web. Each week, you’ll find a summary of recent articles, podcasts, papers, and data sources, as well as trivia and the occasional bit of interesting non-tennis content. If you would like to suggest something for a future issue, drop me a line.

Articles and Papers

Multimedia

Data

  • Match Charting Project: The dataset has grown by 60 matches in the last week, from 5,083 to 5,143. Highlights include the 100th charted Petra Kvitova match, making her the 7th woman to become so well represented. We’ve also continued filling out the historical record of grand slam semi-finals, including a 1981 clash between Jimmy Connors and Bjorn Borg.

Trivia

  • Last week’s New York semi-final between John Isner and Reilly Opelka set plenty of records, the number of which is probably limited only by our imaginations. First, their 59 tiebreak points tied a best-of-three record. Unsurprisingly, Isner (and Jeremy Chardy) held the previous record as well.
  • The Isner-Opelka tilt also set the record for most aces (81) in a best-of-three match–breaking another of Isner’s marks–and was also the first best-of-three match in which both players hit at least 37 aces.
  • Marco Cecchinato has somehow won three tour-level titles (and reached a Roland Garros semi-final!) with only 33 tour-level match wins. By contrast, Julien Benneteau won 273 tour-level matches but nary a title.
  • Since 2008, Fabio Fognini has played at least part of the South American golden swing every year but one. But 2019 was the first time he suffered three straight first-round exits, despite entering each event as a top-two seed.

Beyond the net

Thanks to Peter, Jeff, and Carl for help with this week’s issue.

Dominic Thiem, Tennys Sandgren, and Playing Your Way In

Dominic Thiem is one of the best clay-court players on earth, with eight titles and a Roland Garros final to his credit. But his impressive track record wasn’t worth much last night, when he lost his opening-round match in Rio de Janeiro. The straight-set defeat to 90th-ranked Laslo Djere calls to mind other first-match failures, such as Thiem’s loss to Martin Klizan last summer in Hamburg, or his truly gobsmacking upset at the hands of 222nd-ranked Ramkumar Ramanathan on grass in Antalya two years ago.

It’s also not the first time this season that a top seed has proven unable to live up to their billing. Two weeks ago, the No. 1 seeds in three different ATP events all lost their first matches. I dug a bit deeper and discovered that top seeds underperform by a modest amount at these smaller tournaments. Rio is technically a higher-profile event, but the result is the same: An elite player at a non-mandatory event, heading home early.

You’ll hear all sorts of theories for this sort of thing. In ATP 250s, when top seeds get a bye, it’s possible that the elites are in danger because their opponents have played their way into form. At any optional events, it’s possible that the top seeds are not particularly motivated, making the trip for a quick appearance fee and nothing more. Finally, there’s the old saw that some competitors need to get used to their surroundings. In other words, they need to “play their way in” to the tournament. It’s this last theory that I’d like investigate.

Present and prepared

If a player needs time to get comfortable, we would expect him to underperform in the first round, and possibly continue playing below average to a lesser extent in the second round. The flip side of that is that the player would need to overperform in later rounds–if he didn’t, the earlier underperformance wouldn’t be below average, it would just be bad. These under- and over-performances are effects we can quantify.

Let’s start with Thiem. I went through his career results at the ATP level and broke his matches into several categories (some overlapping), like first match, second match, first match at a non-mandatory event, second-or-later match, finals, and so on. For each of those categories, I tallied up his results and compared them to expecatations (Expected Wins, or “ExpWins” in the table), based on what Elo forecasted at the time. Here are Thiem’s results:

Category     Matches  ExpWins  Wins  
1st              141     94.3    94  
1st (small)       84     52.9    54  
1st/2nd          238    151.3   151  
2nd               97     59.9    60  
2nd+             203    117.7   118  
3rd               58     34.9    35  
3rd+             106     60.7    61  
4th               32     18.5    19  
Finals            17     10.2    10

The Austrian has been almost comically predictable. In 84 non-mandatory tournaments through last week, Elo expected that he would win his first match 53 times. He won 54. In all tournaments, he has won his first match 94 times, exactly in line with the Elo estimation. In the nine categories shown here, his performances was never more than a 1.1 matches better or worse than expected. If he’s playing his way into tournaments, he’s doing it in a way that doesn’t show up in the results.

What about Tennys?

Thiem has suffered some rough early-round upsets, but over the course of his career, he’s usually ended up on the winning side. Maybe we’d do better to focus on a true feast-or-famine player, someone who more often loses his first-round encounters, but is dangerous when he advances further.

A great recent example of such a player is Tennys Sandgren. The American raced to the quarter-finals of last year’s Australian Open, reached a final in Houston, and won a title in Auckland to start the 2019 season. Other than that, he rarely turns up on the tennis fan’s radar. He acknowledged his inconsistency on a recent Thirty Love podcast, explaining from a player’s perspective why he thinks his results are so erratic. Like Thiem, he lost easily in an opening match last night, winning only four games against Reilly Opelka in Delray Beach.

Sandgren’s round-by-round results are less predictable than Thiem’s, but for an apparently extreme example of the go-big-or-go-home-early phenomenon, there’s not much support for it in the numbers. Because Sandgren has played fewer tour events than Thiem, I included his Challenger results before separating his matches into the same categories:

Category     Matches  ExpWins  Wins  
1st              124     64.7    62  
1st (small)      113     60.2    60  
1st/2nd          186     96.4    98  
2nd               62     31.7    36  
2nd+             120     60.3    63  
3rd               35     17.3    15  
4th               15      7.3     9  
Finals             8      4.2     3

The American has underperformed a bit in his first matches and beaten expectations in his second rounders, but the effect disappears after two matches are in the books. In any case, none of the over- or under-performances are even close to statistically significant. His extra first-match losses have about a one-in-three probability of happening by chance, and his bonus second-match wins would occur about one time in six. There could be something interesting going on here, but the effects are small, and it’s very likely that we’re seeing nothing more than randomness.

Positive results, anyone?

So far, we’ve investigated two players who seemed likely to over- or under-perform in certain groups of matches. Yet we found nothing. The “playing your way in” theory will surely survive this blog post, but let’s make sure there aren’t players who embody it, even if Thiem and Sandgren don’t.

I went through the same steps for the other 98 men in this week’s top 100, grouping their matches into categories, tallying up Elo-based expected wins and actual wins, and calculating the probability that their results–above or below expectations–are due to chance. The result is 1,043 player-categories, from Novak Djokovic’s finals to Pedro Sousa’s first matches. (The number of player-categories isn’t a round number because not every player has matches in every category, like 6th matches or finals.)

Of those 1,000 player-categories, only 29 meet the usual standard of statistical significance, in that there is less than a 5% chance they can be explained by randomness. A familiar example is Gael Monfils’s record in finals. Even with last week’s title in Rotterdam, his eight wins are outweighed by 21 losses. But such cases are extremely rare. Since fewer than 3% of the player-categories meet the 5% threshold, it’s wrong to say that these categories represent real trends (like, perhaps, a psychological basis for Monfils’s inability to win tournaments). When we test over one thousand groups of matches, dozens of them should look like outliers.

In other words, there’s no statistical support for the claim that certain players are more or less effective in certain rounds. It’s always possible that a very small number of guys have certain characteristics along these lines, but among the 29 player-categories with particularly unlikely results, only Monfils’s finals record fits any kind of narrative I’ve heard before. Richard Gasquet has won 120 times–11 more than expected–in first matches at non-mandatory events. That overperformance is just as unlikely as Monfils’s letdown in finals, so maybe we should be talking about how assiduously he prepares for the start of each tournament, no matter the stakes?

It’s always possible that the top men do, in fact, play their way into tournaments. But based on this evidence, it’s only the case if everyone rounds their way into form at approximately the same rate. Maybe first rounders are lower in quality than semi-finals. But if we’re interested in predicting outcomes–even Thiem’s first-round results against journeymen–we’d do better to ignore the theories. Opening matches just aren’t that unique, even for the players who think they are.

Podcast Episode 49: The New York Open, Surprise Finalists, and Clay Court Tactics

Episode 49 of the Tennis Abstract Podcast, with Carl Bialik of the Thirty Love podcast, focuses on Carl’s time at the second edition of the ATP New York Open. We discuss the breakout performances of Reilly Opelka and Brayden Schnur, whether professional men’s doubles is entertaining, and if a black tennis court is necessarily as fast as it looks.

We also talk clay courts: another title for late-blooming Marco Cecchinato, the difficulty of measuring the impact of surface speed, and the potential for clay court tactics like wide ad-court kickers and aggressive drop shots.

Thanks for listening!

(Note: this week’s episode is about 58 minutes long; in some browsers the audio player may display a different length. Sorry about that!)

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Around the Net, Issue 1

Around the Net is my attempt to provide a clearinghouse for tennis analytics on the web. Each week, you’ll find a summary of recent articles, podcasts, papers, and data sources, as well as trivia and the occasional bit of interesting non-tennis content. If you would like to suggest something for a future issue, drop me a line.

Articles

Podcasts

Data

Trivia

Miscellaneous

Thanks to Peter, Jeff, and Carl for help with this week’s issue.