Podcast Episode 9: Roland Garros Preview and the Value of Surface-Specific Forecasts

In the Episode 9 of the Tennis Abstract Podcast, Carl Bialik and I preview the upcoming fortnight at Roland Garros. We discuss possible threats to Rafael Nadal, likely beneficiaries of some wide-open sections of the draw, and a number of lesser-known names worth watching in Paris.

We apologize for the sound quality this week–due to personal commitments, we had to improvise a bit, and it was either a podcast with subpar sound quality or no podcast at all. I think it’s still very listenable, but if you’re sensitive to that sort of thing, you may disagree. In any case, thanks for listening!

Click to listen, subscribe on iTunes, find us on Stitcher, or use our feed to get updates on your favorite podcast software.

The Steadily Less Predictable WTA

One of the talking points throughout the 2017 WTA season has been the unpredictability of the field. With the absence of Serena Williams, Victoria Azarenka, and until recently, Petra Kvitova and Maria Sharapova, there is a dearth of consistently dominant players. Many of the top remaining players have been unsteady as well, due to some combination of injury (Simona Halep), extreme surface preferences (Johanna Konta), and good old-fashioned regression to the mean (Angelique Kerber).

Before Kiki Bertens’s win yesterday in Nurnberg, no top seed had won a WTA event. Last week, Stephanie Kovalchik went into more detail, quantifying how seeds have failed to meet expectations and suggesting that the official WTA ranking system–the algorithm that determines which players get those seeds–has failed.

There are plenty of problems with the WTA ranking system, especially if you expect it to have predictive value–that is, if you want it to properly reflect the performance level of players right now. Kovalchik is correct that the rankings have done a particularly poor job this year identifying the best players. However, there’s something else going on: According to much more accurate algorithms, the WTA is more chaotic than it has been for decades.

Picking winners

Let’s start with a really basic measurement: picking winners. Through Rome, there had been more than 1100 completed WTA matches. The higher-ranked player 62.4% of those. Since 1990, the ranking system has picked the winner of 67.9% of matches, and topped 70% during several years in the 1990s. It never fell below 66% until 2014, and this year’s 62.4% is the worst in the 28-year time frame under consideration.

Elo does a little better. It rates players by the quality of their opponents, meaning that draw luck is taken out of the equation, and does a better job of estimating the ability level of players like Serena and Sharapova, who for various reasons have missed long stretches of time. Since 1990, Elo has picked the winner of 68.6% of matches, falling to an all-time low of 63.1% so far in 2017.

For a big improvement, we need surface-specific Elo (sElo). An effective surface-based system isn’t as complicated as I expected it to be. By generating separate rankings for each surface (using only matches on that surface), sElo has correctly predicted the winner of 76.2% of matches since 1990, almost cracking 80% back in 1992. Even sElo is baffled by 2017, falling to it’s lowest point of 71.0% in 2017.

(sElo for all three major surfaces is now shown on the Tennis Abstract Elo ratings report.)

This graph shows how effectively the three algorithms picked winners. It’s clear that sElo is far better, and the graph also shows that some external factor is driving the predictability of results, affecting the accuracy of all three systems to a similar degree:

Brier scores

We see a similar effect if we use a more sophisticated method to rate the WTA ranking system against Elo and sElo. The Brier score of a collection of predictions measures not only how accurate they are, but also how well calibrated they are–that is, a player forecast to win a matchup 90% of the time really does win nine out of ten, not six out of ten, and vice versa. Brier scores average the square of the difference between each prediction and its corresponding result. Because it uses the square, very bad predictions (for instance, that a player has a 95% chance of winning a match she ended up losing) far outweigh more pedestrian ones (like a player with a 95% chance going on to win).

In 2017 so far, the official WTA ranking system has a Brier score of .237, compared to Elo of .226 and sElo of .187. Lower is better, since we want a system that minimizes the difference between predictions and actual outcomes. All three numbers are the highest of any season since 1990. The corresponding averages over that time span are .207 (WTA), .202 (Elo), and .164 (sElo).

As with the simpler method of counting correct predictions, we see that Elo is a bit better than the official ranking, and both of the surface-agnostic methods are crushed by sElo, even though the surface-specific method uses considerably less data. (For instance, the clay-specific Elo ignores hard and grass court results entirely.) And just like the results of picking winners, we see that the differences in Brier scores of the three methods are fairly consistent, meaning that some other factor is causing the year-to-year differences:

The takeaway

The WTA ranking system has plenty of issues, but its unusually bad performance this year isn’t due to any quirk in the algorithm. Elo and sElo are structured completely differently–the only thing they have in common with the official system is that they use WTA match results–and they show the same trends in both of the above metrics.

One factor affecting the last two years of forecasting accuracy is the absence of players like Serena, Sharapova, and Azarenka. If those three played full schedules and won at their usual clip, there would be quite a few more correct predictions for all three systems, and perhaps there would be fewer big upsets from the players who have tried to replace them at the top of the game.

But that isn’t the whole story. A bunch of no-brainer predictions don’t affect Brier score very much, and the presence of heavily-favored players also make it more likely that massively surprising results occur, such as Serena’s loss to Madison Brengle, or Sharapova’s ouster at the hands of Eugenie Bouchard. Many unexpected results are completely independent of the top ten, like Marketa Vondrousova’s recent title in Biel.

While some of the year-to-year differences in the graphs above are simply noise, the last several years looks much more like a meaningful trend. It could be that we are seeing a large-scale changing of a guard, with young players (and their low rankings) regularly upsetting established stars, while the biggest names in the sport are spending more time on the sidelines. Upsets may also be somewhat contagious: When one 19-year-old aspirant sees a peer beating top-tenners, she may be more confident that she can do the same.

Whatever influences have given us the WTA’s current state of unpredictability, we can see that it’s not just a mirage created by a flawed ranking system. Upsets are more common now than at any other point in recent memory, whichever algorithm you use to pick your favorites.

Podcast Episode 8: Zverev’s Title, Emerging WTA Favorites, and a New Match Format

In the Episode 8 of the Tennis Abstract Podcast, Carl Bialik and I survey the men’s and women’s fields in Rome and consider what last week’s top-tier events have to tell us about Roland Garros. We touch on Alexander Zverev’s maiden Masters title, the mixed signals of Dominic Thiem’s and Novak Djokovic’s tournaments, the rise of Elina Svitolina, and the continued relevance of Venus Williams.

We also have even more to say about wild cards (not Sharapova’s, I promise!) and dive into the potential of the best-of-five, first-to-four-games format set to debut at the NextGen ATP event in November.

Thanks for listening!

Click to listen, subscribe on iTunes, find us on Stitcher, or use our feed to get updates on your favorite podcast software.

Podcast Episode 7: Champion Simona, King Rafa, and Memories of Pico

In the Episode 7 of the Tennis Abstract Podcast, Carl Bialik and I cover a lot of ground, from Simona Halep’s Madrid title and Kristina Mladenovic’s recent outspokenness, to Rafael Nadal’s unbeaten streak and Dominic Thiem’s rising status as a clay-court contender, along with the inevitability that someone born in the 1990s will eventually win a big ATP title.

Thanks for listening!

Click to listen, subscribe on iTunes, find us on Stitcher, or use our feed to get updates on your favorite podcast software.

Dominic Thiem played Davis Cup in Barcelona. Sort of…

This is a guest post by Peter Wetz.

Last week Dominic Thiem fought his way into the finals of the Barcelona Open by winning against Kyle Edmund, Daniel Evans, Yuichi Sugita, and Andy Murray. Three of these four players play for the same flag and Thiem won against each of them. Thiem is not exactly a champion of the current Davis Cup format–he has opted out of playing for Austria several times and has a rather poor record of 2-3 when he does compete–but in Barcelona he has, at least, shown that he can beat several players from the same country over a short amount of time. And that’s what Davis Cup is about, right?

In this post my goal is to put this statistical hiccup into some context. It is not the first time the Austrian defeated three players of the same nationality at one event: In 2016 at Buenos Aires Thiem already beat three players from Spain. However, given that Spanish players appear much more frequently in draws than Britons do, I will take a closer look.

Since 1990, there have only been three tournaments where a single player faced three players from Great Britain. And only one of these players who faced three Britons won each encounter. The following table shows the three tournaments and each of the matches where a player from Great Britain was faced by the same player. Wally Masur is the only player since 1990 who defeated three players from Great Britain in a single tournament. Thiem remains the only player who achieved this in a tournament outside of the island.

Tournament     Round Winner        Loser           Score
'93 Manchester R32   Wally Masur   Ross Matheson   6-4 6-4
'93 Manchester R16   Wally Masur   Chris Wilkinson 6-3 6-7(4) 6-3
'93 Manchester QF    Wally Masur   Jeremy Bates    6-4 6-3

'97 Nottingham R32   Karol Kucera  Martin Lee      6-1 6-1
'97 Nottingham SF    Karol Kucera  Tim Henman      6-4 2-6 6-4
'97 Nottingham F     Greg Rusedski Karol Kucera    6-4 7-5

'01 Nottingham R32   Martin Lee    Lee Childs      6-4 5-7 6-0
'01 Nottingham R16   Martin Lee    Arvind Parmar   6-4 6-3
'01 Nottingham QF    Greg Rusedski Martin Lee      6-3 6-2

Obviously, there are not many chances to face three Britons in a single tournament. And when one of those opponents is likely to be Andy Murray, a player’s chances of beating all three are even slimmer.

Let’s broaden the perspective a bit and take a look at how often a player defeated three (or more) players from the same country without looking only at Great Britain. The following table displays the results of this analysis. The first column contains the country, the second column (3W) shows how often a player defeated three players of this country, the third column (3WL) shows how often a player defeated two players of this country and then lost to a player of the same country, and so on.

Country  3W  3WL  4W  4WL  5W  5WL
USA      119 179  19  30   1   4
ESP      98  157  17  18   3   2
FRA      28  45   5   2    1   0
ARG      22  26   5   3    0   0
GER      15  18   1   1    0   0
AUS      13  9    0   0    0   0
SWE      9   16   1   0    0   0
CZE      4   5    0   0    0   0
NED      4   4    0   0    0   0
RUS      4   3    0   0    0   0
ITA      2   3    1   0    0   0
BRA      1   3    1   0    0   0
GBR      1   2    0   0    0   0
CHI      1   1    0   0    0   0
SUI      1   1    0   0    0   0

As we could have imagined, USA, ESP, and FRA come out on top here, simply, because for years they have had the highest density of players in the rankings. These are also the only countries of which a player was faced five times at a single tournament. Facing a player of the same country six or more times never happened according to the data at hand. The following table shows the most recent occasions of the entries printed in bold in the above table (5W).

Tournament    Round Winner        Loser             Score
'91 Charlotte R32   Jaime Yzaga   Chris Garner      7-6 6-3
'91 Charlotte R16   Jaime Yzaga   Jimmy Brown       6-4 6-4
'91 Charlotte QF    Jaime Yzaga   Michael Chang     7-6 6-1
'91 Charlotte SF    Jaime Yzaga   M. Washington     7-5 6-2
'91 Charlotte F     Jaime Yzaga   Jimmy Arias       6-3 7-5
                                                 
'07 Lyon      R32   Sebastien Gr. Rodolphe Cadart   6-3 6-2
'07 Lyon      R16   Sebastien Gr. Fabrice Santoro   4-6 6-1 6-2
'07 Lyon      QF    Sebastien Gr. Julien Benneteau  6-7 6-2 7-6
'07 Lyon      SF    Sebastien Gr. Jo Tsonga         6-1 6-2
'07 Lyon      F     Sebastien Gr. Marc Gicquel      7-6 6-4
                                                  
'08 Valencia  R32   David Ferrer  Ivan Navarro      6-3 6-4
'08 Valencia  R16   David Ferrer  Pablo Andujar     6-3 6-4
'08 Valencia  QF    David Ferrer  Fernando Verdasco 6-3 1-6 7-5
'08 Valencia  SF    David Ferrer  Tommy Robredo     2-6 6-2 6-3
'08 Valencia  F     David Ferrer  Nicolas Almagro   4-6 6-2 7-6

Finally, we take a look at the big four. Did they ever eliminate three or more players from the same country in a single tournament? Yes, they did. In 2014 Roger Federer beat three Czech players in Dubai. In 2005, 2008, and 2013 he beat three German players in Halle. In 2009 Andy Murray beat three Spanish players in Valencia. In 2007 Novak Djokovic beat three Spanish players in Estoril. In 2013 Rafael Nadal beat three Argentinian players both in Acapulco and Sao Paolo. In 2015 he even beat four Argentinian players in Buenos Aires. And there are many other examples where Rafa beat three of his countrymen at the same tournament.

We can see that this happens fairly often, specifically for countries where the tournament is organized, because more players of this country appear in the draw due to wild cards and qualifications. If we exclude these cases, Federer’s streak in Dubai stands out, as does Thiem’s streak in Barcelona.

Peter Wetz is a computer scientist interested in racket sports and data analytics based in Vienna, Austria.

Podcast Episode 6: Djokovic Therapy, More WTA Chaos, and Fivers in Roehampton

In the Episode 6 of the Tennis Abstract Podcast, Carl Bialik and I discuss Novak Djokovic’s new coaching situation, consider the prospects of last week’s title winners (Marin Cilic and Alexander Zverev among them), and continue to watch Maria Sharapova in her return to tour. We also lament Wimbledon’s decision to charge admission for qualies and rejoice in an American or two who can play on clay.

Thanks for listening!

Click to listen, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

Albert Ramos’s Record-Setting Doubles Futility

Last week, we learned that Albert Ramos is not very good at doubles. In Barcelona, he lost his first-round doubles match, running his losing streak to 21 straight and his career tour-level record to an astonishing 14-79.

Ramos hasn’t won a doubles match since Marrakech last year, so he has fallen off the doubles ranking list entirely. Elo isn’t so kind: Of the 268 players with at least one tour-level doubles match since 2014, Ramos ranks dead last, with an Elo rating of 1260, 130 points behind the second worst, Paul-Henri Mathieu, and 240 points below the default rating of 1500 given to a player when he first arrives on tour. If two players with Ramos’s rating were to play an elite team like Kontinen/Peers, Elo would give the Ramos team little more than a 2% chance of winning.

It turns out that the Barcelona loss was a notable one, setting the mark for the longest tour-level doubles losing streak since 2000. Here is the list:

PLAYER               LOSSES     YEARS  
Albert Ramos             21   2016-17*  
Florent Serra            20   2008-10  
Lars Burgsmuller         18   2001-03  
Ryan Sweeting            17   2010-12  
Mikhail Kukushkin        17   2014-16  
Gael Monfils             16   2012-15  
Jack Waite               16   2001-02  
Mikhail Youzhny          16   2002-03  
Luke Jensen              15   2000-02  
Ratiwatana brothers      15   2008-09  
Taylor Dent              15   2001-04

* active streak

My database isn’t as complete before 2000, so I can’t confidently say whether there were longer streaks earlier in ATP history.

Among active players, Ramos’s run of futility stands far above the pack. There are 14 players with active streaks of 8 or more tour-level losses, though as you’ll see, I’m defining “active” quite broadly:

PLAYER                STREAK  START  
Albert Ramos              21   2016  
Lukas Lacko               13   2012  
James Ward                11   2010  
Marinko Matosevic         11   2014  
Jimmy Wang                11   2006  
Zhe Li                    11   2010  
Omar Awadhy               10   2002  
Jose Rubin Statham        10   2006  
Mikhail Youzhny           10   2015  
Paul Henri Mathieu         9   2016  
Juan Monaco                9   2015  
Lucas Pouille              8   2016  
Andre Begemann             8   2016  
Daniel Gimeno Traver       8   2015

Many of the players on this list are attempting comebacks from injury or trying to rebuild their rankings to enter more ATP events, so few of them are likely to threaten Ramos’s mark. If he continues on tour, Mathieu may have the best chance: He has racked up five different losing streaks of 8 or more matches, including a 12-loss stretch between 2002 and 2005.

One of the things that makes Ramos’s streak so remarkable is that he has continued to enter doubles draws so frequently, playing both singles and doubles in 20 of his 31 events. Some of his peers have had poor doubles seasons, but few of them have kept trying so assiduously. Here are the 15 players with the worst doubles winning percentages in the last 52 weeks, minimum 10 matches:

PLAYER                   MATCHES  WINS  WIN PERC  
Albert Ramos                  20     0      0.0%  
Jiri Vesely                   10     1     10.0%  
Alexander Bury                13     2     15.4%  
Taylor Fritz                  11     2     18.2%  
Gilles Simon                  11     2     18.2%  
Benoit Paire                  16     3     18.8%  
Inigo Cervantes Huegun        10     2     20.0%  
Lucas Pouille                 15     3     20.0%  
Hans Podlipnik Castillo       13     3     23.1%  
Paolo Lorenzi                 33     8     24.2%  
Marcos Baghdatis              12     3     25.0%  
Adrian Mannarino              15     4     26.7%  
Andreas Seppi                 15     4     26.7%  
Joao Sousa                    30     8     26.7%  
Neal Skupski                  17     5     29.4%

Paolo Lorenzi might be a bit better than his position on this list makes him look: Over the last year, he has partnered Ramos four times, more than any other player.

Then again, Lorenzi has struggled with plenty of doubles partners. Here are the least successful doubles players since 2000, minimum 50 matches:

PLAYER              MATCHES  WINS  WIN PERC  
Albert Ramos             93    14     15.1%  
Robby Ginepri            97    21     21.6%  
Gilles Simon            151    33     21.9%  
Gael Monfils             92    21     22.8%  
Adrian Mannarino         58    14     24.1%  
Benoit Paire             93    23     24.7%  
Paul Henri Mathieu      105    26     24.8%  
Jack Waite               68    17     25.0%  
Florent Serra            72    18     25.0%  
Santiago Giraldo         99    27     27.3%  
Aleksandar Kitinov       88    24     27.3%  
Marinko Matosevic        61    17     27.9%  
Bernard Tomic            63    18     28.6%  
Younes El Aynaoui        56    16     28.6%  
Paolo Lorenzi           104    30     28.8%

Ramos, once again, is in a league of his own. Beyond him and Robby Ginepri, the list is dominated by a surprising number of Frenchmen, including Florent Serra, who outranks several of his countrymen, but appeared earlier with the 20-match losing streak that Ramos finally overtook.

Ironically, since Ramos’s losing streak has coincided with career-best success on the singles circuit, he will find it easier than ever to enter doubles draws. With the press that comes with the streak, however, potential partners may finally think twice before signing up with the worst tour-level doubles player of their generation.