The Five-Set Advantage

Italian translation at settesei.it

Last night, the heavily-favored Janko Tipsarevic won his first round match against Guillaume Rufin despite dropping the first two sets.  Had Rufin taken the first two sets against Janko in Cincinnati, Monte Carlo, or just about anywhere else on the ATP tour, he would’ve scored his first top-ten scalp.

Other seeds have similar stories.  Milos Raonic, Marin Cilic, Gilles Simon, and Alexandr Dolgopolov all would be headed home had their matches been judged on the first three sets.  Only two seeds had the opposite experience: Juan Monaco and Tommy Haas were each up two sets to love before losing their next three.

Simply (if tongue-twistingly) put, the five-set format favors favorites.

In all grand slam first rounds since 1991, seeds have come back from 0-2 or 1-2 down against unseeded players 125 times, while seeds have squandered 2-0 or 2-1 advantages only 71 times.  Just looking at those 32 matches per slam, that’s almost one upset averted per tournament.  The US Open draw would look awfully different right now if Tipsarevic, Raonic, Cilic, Simon, and Dolgopolov were among the first-round losers, even if Haas and Monaco replaced them in the second round.

Set theory

These numbers shouldn’t surprise us, since longer formats should do a better job of revealing the better player.  There are reasons why the baseball World Series is best-of-7 instead of a single game and the final sets of singles matches aren’t super-tiebreaks.  The difference between best-of-3 and best-of-5 isn’t quite so simple–fitness and mental strength play a part–but from a purely mathematical perspective, there should be fewer upsets in best-of-5s than best-of-3s.

Take Raonic for example.  My numbers (which don’t differentiate between 3-set and 5-set matches–shame on me) gave him approximately a 70% chance of beating Santiago Giraldo.  If 70% is his probability of winning a three-set match and sets are independent (more on that in a minute), that number implies a 63.7% chance of winning any given set.  A 63.7% chance of winning a set translates into a 74.4% shot at winning a best-of-five.

A four- or five-point increase doesn’t radically change the complexion of the tournament, but it does make a different.  My original numbers suggested that we could expect 20 or 21 first-round upsets.  If we adjust my odds in the manner I described for Raonic, the likely number of upsets falls to 18.

The most important implication here is the effect it has on the chances that top players reach the final rounds.  Earlier this week a commenter took me to task for my unintuively low probabilities that Federer and Djokovic would reach the semifinals.  Obviously, if you give an overwhelming favorite a boost in every round, as the five-set format does, the cumulative effect is substantial.  For the top seeds, it can halve their probability of losing against a much lower-ranked opponent.

For Federer, adjusting the odds to reflect the theoretical advantage of the best-of-five format raises his chances of reaching the semis from 52.5% to over 65%.  Djokovic’s numbers are almost identical.

Dependent outcomes

Everything I’ve said so far seems intuitively sound, with one caveat.  Earlier I mentioned the assumptions that sets are independent.  That is, a player has the same chance of winning a particular set no matter what the outcome of the previous sets–there is no “hangover effect” based on what has come before.

Tennis players, even professionals, aren’t robots, so the assumption probably isn’t completely valid.  Sometimes frustration with one’s own performance, the environment, or line calls can carry over into the next set and give one’s opponent an advantage.  Perhaps more importantly, the result of one set sometimes reveal that pre-match expectations were wrong in the first place.  Had David Nalbandian played this week instead of withdrawn, no number of sets would reveal that he was a better player–his health would prevent him from playing at his usual level.

Another related caveat is that beyond a certain match length, the outcome is no longer dependent on the same skills.  When Michael Russell played Yuichi Sugita in the Wimbledon qualifying round, the two men looked equal for four sets.  In the fifth, Russell’s fitness gave him an advantage that didn’t exist in the first couple of hours.  In this case, an estimate of Russell’s probability of winning a set against Sugita may be independent of previous outcomes, but it is not the same for every set.

These allowances aside, there is little doubt that favorites are more likely to win best-of-five matches than best-of-threes.  Whether you want to watch the entire thing … that’s another story.

2012 US Open Men’s Projections

Here are my pre-tournament odds for the 2012 US Open.  For some background reading, follow the links for more on my player rating systemcurrent rankings, and more on how I simulate tournaments.

I’ve made one tweak to the algorithm (for men only) since last posting odds.  As many of you have noticed, I seem to underestimate the chances that the very best players will progress through the draw.  Some analysis of past results showed that this is correct, so for now, there’s a bit of a band-aid in the system, boosting the odds of the current top ten in a way that reflects how they’ve outperformed my projections in the past.

Still, Federer and Djokovic both have well under 30% chances of winning the Open, and fall just short of 50% between them.  My rankings give Djokovic a very slight edge despite Federer’s big season, and the tournament draw, which places Murray in Federer’s half, firmly tilts the scales in the Serb’s favor.

    Player                    R64    R32    R16        W  
1   Roger Federer           90.6%  84.0%  74.0%    23.2%  
    Donald Young             9.4%   5.4%   2.5%     0.0%  
    Maxime Authom           32.9%   2.3%   0.7%     0.0%  
    Bjorn Phau              67.1%   8.3%   3.7%     0.0%  
    Albert Ramos            50.1%  15.1%   1.7%     0.0%  
    Robby Ginepri           49.9%  14.8%   1.7%     0.0%  
    Rui Machado             15.1%   5.5%   0.4%     0.0%  
25  Fernando Verdasco       84.9%  64.6%  15.4%     0.3%  

    Player                    R64    R32    R16        W  
23  Mardy Fish              77.1%  50.6%  33.9%     1.3%  
    Go Soeda                22.9%   8.8%   3.3%     0.0%  
    Nikolay Davydenko       88.6%  39.4%  21.4%     0.2%  
    Guido Pella             11.4%   1.2%   0.1%     0.0%  
    Ivo Karlovic            67.5%  34.2%  14.7%     0.1%  
    Jimmy Wang              32.5%  10.9%   3.0%     0.0%  
    Michael Russell         35.7%  16.2%   5.4%     0.0%  
16  Gilles Simon            64.3%  38.6%  18.1%     0.3%  

    Player                    R64    R32    R16        W  
11  Nicolas Almagro         52.9%  33.6%  20.2%     0.3%  
    Radek Stepanek          47.1%  28.5%  16.5%     0.2%  
    Nicolas Mahut           48.7%  18.2%   8.6%     0.0%  
    Philipp Petzschner      51.3%  19.6%   9.5%     0.0%  
    Blaz Kavcic             45.9%  15.3%   4.8%     0.0%  
    Flavio Cipolla          54.1%  19.8%   6.9%     0.0%  
    Jack Sock               19.8%   7.7%   1.9%     0.0%  
22  Florian Mayer           80.2%  57.2%  31.6%     0.5%  

    Player                    R64    R32    R16        W  
27  Sam Querrey             64.9%  51.7%  27.6%     0.7%  
    Yen-Hsun Lu             35.1%  23.9%   9.3%     0.1%  
    Ruben Ramirez Hidalgo   31.4%   4.8%   0.8%     0.0%  
    Somdev Devvarman        68.6%  19.6%   5.5%     0.0%  
    Denis Istomin           62.4%  23.8%  11.8%     0.1%  
    Jurgen Zopp             37.6%  10.2%   3.8%     0.0%  
    David Goffin            28.7%  14.8%   6.9%     0.0%  
6   Tomas Berdych           71.3%  51.3%  34.3%     1.7%  

    Player                    R64    R32    R16        W  
3   Andy Murray             87.6%  76.3%  63.9%    13.7%  
    Alex Bogomolov Jr.      12.4%   6.3%   2.7%     0.0%  
    Hiroki Moriya           22.9%   1.8%   0.4%     0.0%  
    Ivan Dodig              77.1%  15.7%   7.8%     0.1%  
    Thomaz Bellucci         65.9%  29.0%   6.6%     0.1%  
    Pablo Andujar           34.1%   9.9%   1.4%     0.0%  
    Robin Haase             31.9%  15.6%   3.0%     0.0%  
30  Feliciano Lopez         68.1%  45.5%  14.1%     0.3%  

    Player                    R64    R32    R16        W  
24  Marcel Granollers       63.8%  37.7%  19.2%     0.2%  
    Denis Kudla             36.2%  16.4%   6.3%     0.0%  
    Lukas Lacko             46.7%  20.6%   8.4%     0.0%  
    James Blake             53.3%  25.2%  10.8%     0.1%  
    Paul-Henri Mathieu      45.6%  14.3%   5.9%     0.0%  
    Igor Andreev            54.4%  19.2%   8.7%     0.0%  
    Santiago Giraldo        30.9%  16.5%   7.7%     0.0%  
15  Milos Raonic            69.1%  50.0%  33.0%     1.0%  

    Player                    R64    R32    R16        W  
12  Marin Cilic             70.6%  56.4%  31.1%     0.9%  
    Marinko Matosevic       29.4%  18.6%   6.5%     0.0%  
    Daniel Brands           70.6%  20.5%   6.0%     0.0%  
    Adrian Ungur            29.4%   4.5%   0.7%     0.0%  
    Tim Smyczek             53.1%  15.1%   5.8%     0.0%  
    Bobby Reynolds          46.9%  12.1%   4.3%     0.0%  
    Guido Andreozzi          5.7%   0.9%   0.1%     0.0%  
17  Kei Nishikori           94.3%  71.9%  45.6%     1.7%  

    Player                    R64    R32    R16        W  
32  Jeremy Chardy           84.1%  55.5%  23.6%     0.3%  
    Filippo Volandri        15.9%   4.3%   0.7%     0.0%  
    Tatsuma Ito             44.6%  16.6%   4.5%     0.0%  
    Matthew Ebden           55.4%  23.6%   7.3%     0.0%  
    Martin Klizan           42.3%   8.7%   3.2%     0.0%  
    Alejandro Falla         57.7%  14.7%   6.4%     0.0%  
    Karol Beck              16.7%   8.2%   3.2%     0.0%  
5   Jo-Wilfried Tsonga      83.3%  68.5%  51.2%     3.9%  

    Player                    R64    R32    R16        W  
8   Janko Tipsarevic        81.6%  69.4%  49.7%     1.9%  
    Guillaume Rufin         18.4%  10.4%   3.8%     0.0%  
    Brian Baker             40.9%   7.1%   1.8%     0.0%  
    Jan Hajek               59.1%  13.1%   4.5%     0.0%  
    Grega Zemlja            55.9%  22.5%   8.1%     0.0%  
    Ricardo Mello           44.1%  15.5%   4.7%     0.0%  
    Cedrik-Marcel Stebe     39.2%  21.6%   8.2%     0.0%  
29  Viktor Troicki          60.8%  40.4%  19.2%     0.2%  

    Player                    R64    R32    R16        W  
19  Philipp Kohlschreiber   54.1%  32.9%  16.2%     0.3%  
    Michael Llodra          45.9%  26.1%  11.9%     0.2%  
    Grigor Dimitrov         54.9%  23.7%   9.8%     0.1%  
    Benoit Paire            45.1%  17.4%   6.4%     0.0%  
    Mikhail Kukushkin       46.2%  14.5%   6.0%     0.0%  
    Jarkko Nieminen         53.8%  18.3%   8.2%     0.1%  
    Xavier Malisse          33.7%  19.2%   9.6%     0.1%  
9   John Isner              66.3%  48.0%  31.9%     1.6%  

    Player                    R64    R32    R16        W  
13  Richard Gasquet         82.1%  51.9%  27.6%     0.9%  
    Albert Montanes         17.9%   5.3%   1.3%     0.0%  
    Jurgen Melzer           82.7%  39.6%  18.1%     0.3%  
    Bradley Klahn           17.3%   3.1%   0.5%     0.0%  
    Steve Johnson           35.5%   5.3%   1.1%     0.0%  
    Rajeev Ram              64.5%  15.4%   4.7%     0.0%  
    Ernests Gulbis          27.6%  18.4%   7.6%     0.0%  
21  Tommy Haas              72.4%  60.9%  39.1%     2.5%  

    Player                    R64    R32    R16        W  
28  Mikhail Youzhny         68.2%  49.4%  22.9%     0.6%  
    Gilles Muller           31.8%  17.4%   5.2%     0.0%  
    Tobias Kamke            48.9%  15.9%   4.2%     0.0%  
    Lleyton Hewitt          51.1%  17.2%   4.6%     0.0%  
    Igor Sijsling           69.4%  17.1%   7.3%     0.0%  
    Daniel Gimeno-Traver    30.6%   4.0%   1.0%     0.0%  
    Kevin Anderson          27.6%  18.3%   9.8%     0.1%  
4   David Ferrer            72.4%  60.6%  44.9%     3.9%  

    Player                    R64    R32    R16        W  
7   Juan Martin Del Potro   70.1%  55.3%  45.2%     4.6%  
    David Nalbandian        29.9%  18.4%  12.2%     0.3%  
    Benjamin Becker         48.9%  12.7%   7.0%     0.0%  
    Ryan Harrison           51.1%  13.6%   7.7%     0.1%  
    Lukasz Kubot            71.1%  38.8%  11.8%     0.1%  
    Leonardo Mayer          28.9%  10.0%   1.5%     0.0%  
    Tommy Robredo           31.0%  11.8%   2.1%     0.0%  
26  Andreas Seppi           69.0%  39.5%  12.5%     0.1%  

    Player                    R64    R32    R16        W  
20  Andy Roddick            89.4%  57.3%  36.9%     1.1%  
    Rhyne Williams          10.6%   2.0%   0.4%     0.0%  
    Carlos Berlocq          23.0%   5.2%   1.5%     0.0%  
    Bernard Tomic           77.0%  35.5%  19.7%     0.3%  
    Edouard Roger-Vasselin  44.4%  14.4%   4.3%     0.0%  
    Fabio Fognini           55.6%  21.1%   7.3%     0.0%  
    Guillermo Garcia-Lopez  38.8%  22.5%   8.9%     0.0%  
10  Juan Monaco             61.2%  41.9%  21.0%     0.4%  

    Player                    R64    R32    R16        W  
14  Alexandr Dolgopolov     61.8%  36.8%  19.6%     0.3%  
    Jesse Levine            38.2%  18.1%   7.7%     0.0%  
    Marcos Baghdatis        67.8%  34.5%  17.2%     0.2%  
    Matthias Bachinger      32.2%  10.6%   3.5%     0.0%  
    Steve Darcis            59.5%  23.6%  10.8%     0.1%  
    Malek Jaziri            40.5%  12.6%   4.6%     0.0%  
    Sergiy Stakhovsky       28.8%  14.1%   5.8%     0.0%  
18  Stanislas Wawrinka      71.2%  49.8%  30.9%     0.8%  

    Player                    R64    R32    R16        W  
31  Julien Benneteau        64.7%  43.7%   9.6%     0.3%  
    Olivier Rochus          35.3%  18.7%   2.8%     0.0%  
    Dennis Novikov          34.1%   9.6%   1.0%     0.0%  
    Jerzy Janowicz          65.9%  28.1%   4.4%     0.0%  
    Rogerio Dutra Silva     39.5%   2.5%   0.6%     0.0%  
    Teymuraz Gabashvili     60.5%   5.4%   1.9%     0.0%  
    Paolo Lorenzi            6.4%   3.6%   1.2%     0.0%  
2   Novak Djokovic          93.6%  88.6%  78.5%    26.5%

2012 US Open Women’s Projections

Here are my pre-tournament odds for the 2012 US Open.  For some background reading, follow the links for more on my player rating systemcurrent rankings, and more on how I simulate tournaments.

    Player                         R64    R32    R16        W  
1   Victoria Azarenka            92.6%  83.5%  70.0%    12.5%  
    Alexandra Panova              7.4%   3.2%   1.0%     0.0%  
    Barbora Zahlavova Strycova   46.8%   6.0%   2.1%     0.0%  
    Kirsten Flipkens             53.2%   7.3%   2.7%     0.0%  
    Su-Wei Hsieh                 56.4%  24.1%   5.4%     0.0%  
    Magdalena Rybarikova         43.6%  16.0%   2.9%     0.0%  
    Virginie Razzano             41.4%  22.8%   5.2%     0.0%  
28  Jie Zheng                    58.6%  37.1%  10.6%     0.2%  

    Player                         R64    R32    R16        W  
18  Julia Goerges                80.7%  66.0%  37.5%     0.8%  
    Kristyna Pliskova            19.3%  10.1%   2.6%     0.0%  
    Mandy Minella                50.2%  12.0%   3.0%     0.0%  
    Olivia Rogowska              49.8%  11.9%   2.9%     0.0%  
    Stephanie Foretz Gacon       43.0%   7.4%   2.0%     0.0%  
    Anna Tatishvili              57.0%  12.4%   4.0%     0.0%  
    Sorana Cirstea               40.3%  30.8%  16.9%     0.2%  
16  Sabine Lisicki               59.7%  49.4%  31.2%     0.8%  

    Player                         R64    R32    R16        W  
9   Na Li                        85.7%  75.7%  41.9%     4.6%  
    Heather Watson               14.3%   8.0%   1.6%     0.0%  
    Lesia Tsurenko               45.0%   6.6%   1.0%     0.0%  
    Casey Dellacqua              55.0%   9.7%   1.8%     0.0%  
    Samantha Crawford            14.0%   0.5%   0.0%     0.0%  
    Laura Robson                 86.0%  14.2%   3.6%     0.0%  
    Victoria Duval                0.9%   0.1%   0.0%     0.0%  
23  Kim Clijsters                99.1%  85.3%  50.1%     5.5%  

    Player                         R64    R32    R16        W  
31  Varvara Lepchenko            66.9%  44.1%  15.7%     0.0%  
    Mathilde Johansson           33.1%  16.1%   3.7%     0.0%  
    Anastasia Rodionova          55.9%  23.4%   5.9%     0.0%  
    Julia Cohen                  44.1%  16.4%   3.5%     0.0%  
    Edina Gallovits-Hall         44.2%   7.1%   2.7%     0.0%  
    Stefanie Voegele             55.8%  10.8%   4.7%     0.0%  
    Petra Martic                 25.5%  17.6%  10.7%     0.0%  
7   Samantha Stosur              74.5%  64.5%  53.0%     2.1%  

    Player                         R64    R32    R16        W  
3   Maria Sharapova              86.5%  77.7%  67.0%     9.3%  
    Melinda Czink                13.5%   7.9%   4.1%     0.0%  
    Lourdes Dominguez Lino       48.9%   6.9%   3.0%     0.0%  
    Sesil Karatantcheva          51.1%   7.4%   3.3%     0.0%  
    Timea Bacsinszky             70.8%  19.4%   2.5%     0.0%  
    Mallory Burdette             29.2%   3.9%   0.3%     0.0%  
    Lucie Hradecka               38.0%  27.1%   5.7%     0.0%  
27  Anabel Medina Garrigues      62.0%  49.6%  14.1%     0.1%  

    Player                         R64    R32    R16        W  
19  Nadia Petrova                67.0%  36.1%  19.5%     0.2%  
    Jarmila Gajdosova            33.0%  12.0%   4.4%     0.0%  
    Simona Halep                 49.9%  25.8%  12.8%     0.1%  
    Iveta Benesova               50.1%  26.1%  13.0%     0.1%  
    Alexandra Cadantu            21.3%   4.5%   1.0%     0.0%  
    Aleksandra Wozniak           78.7%  37.7%  18.7%     0.2%  
    Melanie Oudin                30.9%  13.9%   5.3%     0.0%  
15  Lucie Safarova               69.1%  43.9%  25.4%     0.4%  

    Player                         R64    R32    R16        W  
11  Marion Bartoli               78.4%  46.4%  28.9%     1.2%  
    Jamie Hampton                21.6%   6.4%   2.1%     0.0%  
    Romina Oprandi               24.5%   7.1%   2.3%     0.0%  
    Andrea Petkovic              75.5%  40.2%  23.9%     0.7%  
    Kristina Mladenovic          37.5%   7.2%   1.4%     0.0%  
    Marina Erakovic              62.5%  17.6%   4.9%     0.0%  
    Daniela Hantuchova           48.8%  36.5%  17.6%     0.4%  
17  Anastasia Pavlyuchenkova     51.2%  38.7%  18.9%     0.5%  

    Player                         R64    R32    R16        W  
25  Yanina Wickmayer             82.8%  64.6%  26.3%     0.6%  
    Julia Glushko                17.2%   7.3%   1.2%     0.0%  
    Pauline Parmentier           45.4%  11.9%   2.1%     0.0%  
    Michaella Krajicek           54.6%  16.2%   3.3%     0.0%  
    Nicole Gibbs                 23.5%   1.7%   0.3%     0.0%  
    Alize Cornet                 76.5%  15.0%   5.8%     0.0%  
    Polona Hercog                15.7%   9.2%   3.9%     0.0%  
5   Petra Kvitova                84.3%  74.0%  57.1%     6.9%  

    Player                         R64    R32    R16        W  
8   Caroline Wozniacki           85.1%  72.5%  52.5%     4.1%  
    Irina-Camelia Begu           14.9%   7.6%   2.4%     0.0%  
    Silvia Soler-Espinosa        57.0%  12.3%   4.3%     0.0%  
    Alla Kudryavtseva            43.0%   7.6%   2.2%     0.0%  
    Tsvetana Pironkova           68.7%  48.3%  22.8%     0.5%  
    Camila Giorgi                31.3%  16.4%   5.2%     0.0%  
    Ayumi Morita                 37.6%  10.7%   2.6%     0.0%  
26  Monica Niculescu             62.4%  24.5%   8.0%     0.0%  

    Player                         R64    R32    R16        W  
22  Francesca Schiavone          55.4%  41.9%  18.9%     0.2%  
    Sloane Stephens              44.6%  31.9%  12.9%     0.1%  
    Akgul Amanmuradova           52.9%  14.2%   3.3%     0.0%  
    Tatjana Malek                47.1%  12.0%   2.5%     0.0%  
    Kimiko Date-Krumm            29.2%   5.8%   1.8%     0.0%  
    Sofia Arvidsson              70.8%  25.3%  13.4%     0.1%  
    Elina Svitolina              13.8%   4.5%   1.4%     0.0%  
12  Ana Ivanovic                 86.2%  64.4%  45.8%     1.6%  

    Player                         R64    R32    R16        W  
14  Maria Kirilenko              67.6%  50.9%  31.6%     0.6%  
    Chanelle Scheepers           32.4%  19.5%   8.6%     0.0%  
    Agnes Szavay                 16.2%   1.4%   0.2%     0.0%  
    Greta Arn                    83.8%  28.2%  11.2%     0.0%  
    Galina Voskoboeva            59.1%  30.2%  15.0%     0.1%  
    Arantxa Rus                  40.9%  17.3%   7.1%     0.0%  
    Andrea Hlavackova            30.0%  11.4%   4.0%     0.0%  
24  Klara Zakopalova             70.0%  41.1%  22.3%     0.2%  

    Player                         R64    R32    R16        W  
32  Shuai Peng                   57.6%  25.3%   5.2%     0.1%  
    Elena Vesnina                42.4%  15.8%   2.6%     0.0%  
    Ekaterina Makarova           80.0%  52.4%  14.9%     0.8%  
    Eleni Daniilidou             20.0%   6.5%   0.8%     0.0%  
    Mirjana Lucic                35.6%   3.0%   0.8%     0.0%  
    Maria Jose Martinez Sanchez  64.4%   8.7%   3.4%     0.0%  
    Coco Vandeweghe               8.2%   4.1%   1.3%     0.0%  
4   Serena Williams              91.8%  84.2%  70.9%    26.1%  

    Player                         R64    R32    R16        W  
6   Angelique Kerber             88.6%  65.7%  48.5%     6.0%  
    Anne Keothavong              11.4%   3.3%   1.0%     0.0%  
    Bethanie Mattek-Sands        30.1%   6.3%   2.4%     0.0%  
    Venus Williams               69.9%  24.7%  13.9%     0.4%  
    Johanna Konta                42.7%  10.2%   1.6%     0.0%  
    Timea Babos                  57.3%  16.6%   3.3%     0.0%  
    Olga Govortsova              18.2%   8.4%   1.4%     0.0%  
29  Tamira Paszek                81.8%  64.8%  27.9%     1.1%  

    Player                         R64    R32    R16        W  
21  Christina McHale             75.7%  61.4%  41.8%     1.0%  
    Kiki Bertens                 24.3%  14.2%   6.0%     0.0%  
    Olga Puchkova                39.7%   7.9%   2.4%     0.0%  
    Irina Falconi                60.3%  16.5%   6.5%     0.0%  
    Vera Dushevina               68.3%  27.2%  10.4%     0.0%  
    Nastassja Burnett            31.7%   7.5%   1.7%     0.0%  
    Garbine Muguruza             36.1%  20.5%   7.9%     0.0%  
10  Sara Errani                  63.9%  44.9%  23.3%     0.2%  

    Player                         R64    R32    R16        W  
13  Dominika Cibulkova           73.9%  54.9%  35.7%     1.3%  
    Johanna Larsson              26.1%  13.2%   5.2%     0.0%  
    Bojana Jovanovski            44.2%  13.1%   4.8%     0.0%  
    Mona Barthel                 55.8%  18.8%   8.0%     0.0%  
    Vania King                   54.1%  25.1%  11.3%     0.1%  
    Yaroslava Shvedova           45.9%  19.7%   8.2%     0.0%  
    Urszula Radwanska            45.1%  23.7%  10.9%     0.1%  
20  Roberta Vinci                54.9%  31.5%  15.9%     0.2%  

    Player                         R64    R32    R16        W  
30  Jelena Jankovic              60.9%  40.0%  14.4%     0.2%  
    Kateryna Bondarenko          39.1%  21.7%   6.0%     0.0%  
    Lara Arruabarrena-Vecino     25.7%   5.6%   0.7%     0.0%  
    Shahar Peer                  74.3%  32.6%   9.1%     0.0%  
    Ksenia Pervak                47.1%  10.4%   4.7%     0.0%  
    Carla Suarez Navarro         52.9%  12.6%   6.1%     0.0%  
    Nina Bratchikova             11.3%   4.2%   1.4%     0.0%  
2   Agnieszka Radwanska          88.7%  72.7%  57.6%     6.7%

The Slam No One Misses

Italian translation at settesei.it

By now you’ve heard: Rafael Nadal will miss the US Open.  It’s hardly a surprise, as Rafa hasn’t played a match since Wimbledon, and his knee has kept him off the tour for long periods in the past.

What is remarkable is the rarity of a top player missing the Open.  Despite its position near the end of the ATP schedule, after eight months of grueling tennis in which every player picks up his share of nagging injuries, New York gets a better turnout from top-10 players than any of the other three slams.

In fact, Nadal is only the third top-three player since 1991 to skip Flushing.  In 1999, #1-ranked Pete Sampras couldn’t play, and in 2004, it was #3-ranked Guillermo Coria who stayed home.  In the tournament’s last 21 editions, a top-ten player has missed the event only ten times.

It’s interesting to speculate as to why top players manage to show up in Flushing at a rate unmatched elsewhere.  Surely the event doesn’t have more cachet than Wimbledon.  Certainly the multiple shifts of surface throughout the spring and summer test every player’s mental and physical stamina.  Perhaps the longish break between Wimbledon and the Open allows players to take time off if they need it.  Most men play Canada and Cincinnati, but as we’ve seen this year, plenty of guys are willing to miss either one, meaning that only a serious injury keeps one out of the New York draw.

Defying conventional wisdom even further, the slam with the second-best turnout among top players is the French, not Wimbledon.  Since 1991, only 13 top-tenners have missed Roland Garros, and three of those were Boris Becker.

Wimbledon may be synonymous with the sport of tennis, but it is a distant third, with 25 top-tenners missing from the last 22 draws.  Here the no-shows are more logical: Alex Corretja three times, Marcelo Rios twice, Sergi Bruguera four times.  In the late 1990s, some guys simply didn’t consider the All-England Club a must.

Australia is a bit further back in fourth, with 29 top-tenners who didn’t play.  Melbourne does seem to have the least cachet of the four big events, but the tide may be turning.  Since 2006, only one top-ten player, Nikolay Davydenko in 2009, failed to make an appearance.

It may seem that absences from Grand Slams are random, driven by accidents such as major injuries that can happen at any time.  Any single absence surely does look that way.  There are larger forces at work, however–the value associated with certain tournaments, the demands of the schedule leading to physical breakdowns at some times and not others–that are not random.  In one more way, Rafael Nadal is proving himself a unique player, missing the most unmissable slam on the ATP calendar.

The Unbreakable and Record-Setting Cincinnati Finalists

When Roger Federer and Novak Djokovic met in the Cincinnati final on Sunday, they represented a unique event in tennis history: Neither one had been broken.  Four matches each, no breaks of serve.

That’s not just a Masters-level record, it’s a first for the ATP tour, at least since 1991, the time span for which point-level stats are available.    That’s over 1500 tournaments, including nearly 200 Masters events.

It’s very rare to even come close.  Of the 195 Masters tournaments for which data is available, only four pairs of finalists entered the title match with three or fewer breaks.  Djokovic leads the pack: When he met Rafael Nadal in the 2011 Miami final, Nadal had been broken once, Djokovic not at all.  When Djokovic and Federer met in the 2007 Montreal final, each player had only been broken once.  The Miami achievement is particularly notable because each player had won five pre-final matches, compared to only four each in Cincinnati and Montreal.

Federer set some records on his own, as well.  By holding his serve against Djokovic, he made it through an entire Masters tournament without suffering a break.  That’s the first time it has ever happened at this level.  Eight other times the winner has only been broken once–twice that winner was Federer, including Cincinnati two years ago.  Ten additional times, the winner was only broken twice–and Roger is responsible for three of those.

At lower level tournaments, it’s somewhat more common–the winner of a non-Masters event has made it through without losing serve a total of 17 times.  Surprise, surprise: Two of those are Federer, at Doha in 2005 and Halle in 2008.  Four other men have done it twice: Andy Roddick, Joachim Johanssen, Richard Krajicek, and Ivan Ljubicic.  Milos Raonic did it earlier this year in Chennai.

Federer set at least one more record last week, and it might be the most impressive of all.  He only faced three break points all week–the lowest known total at a Masters tournament.  The previous record was four, set by Andre Agassi at the 2002 Madrid Masters.  Fed’s total in Cinci was only the 10th ever in single digits–and Roger is now responsible for four of those top ten results.

At lower-level events, Fed’s mark has been bettered a couple of times.  At the 2007 Memphis tournament, Tommy Haas claimed the trophy without facing a single break point.   At San Jose this year, Raonic faced only two break points, though Tobias Kamke converted one of them.  Two other players–Andy Murray at 2009 Queen’s Club and Roddick at Lyon in 2005–got through an event facing only three break points.

No breaks, and record-settingly few break points. If hard courts are truly becoming slower, it seems that someone forgot to tell Roger.

The Implications of the 10-Point Tiebreak

Italian translation at settesei.it

I’m not sure how we got here, but we now live in a world where a lot of people consider a 10-point tiebreak equivalent to a set.  Apparently it’s more fan-friendly and better for television.  And of course it’s faster.

Whatever its practical uses, it’s obvious that the first-to-10 breaker isn’t the same as a set.  I’ll leave the moral debate to others; let’s take a statistical approach.

In general, the more points (or games, or sets) required to win a match, the more likely it is that the better player wins.  Some commentators have taken to calling the 10-point breakers “shootouts,” and for good reason.  Reduce the number of points required to win, and you increase the role played by luck.

Of course, sometimes a shootout is the best idea.  You’ve got to end a match somehow, and when players end up equal after two sets, four sets, or four sets and twelve games, it’s all the more likely that luck will have to intervene.  But the structure of the match determines just how much luck is permitted to play a part.

To compare a 10-point tiebreak with the set it replaces, we need to know how much more luck it introduces into the game.  For that, we need an example to work with.

Take two players: Player A wins 70% of points on serve, and Player B wins 67% of points on serve.  Playing best of three tiebreak sets, Player A has a 63.9% chance of winning the match.

If A and B split sets, A’s probability of winning falls to 59.3%.  In other words, the shorter time frame makes it more likely that B gets lucky, or is able to put together an unusually good run of play long enough to win the match.

If the match is decided by a 10-point tiebreak, however, A’s probability of winning falls all the way to 56.0%, erasing more than one-third of the favorite’s edge in the third set.  In fact, the 10-point breaker is barely more favorable to A than a typical 7-pointer, in which A would have a 55.1% chance.

(If you like playing around with this stuff, see my python code to calculate tiebreak odds.)

Somehow I don’t think anyone would advocate replacing the deciding set with a 7-point tiebreak.  Yet a 10-point tiebreak is much closer to its 7-point cousin than it is to a full set.

Adding a few more points doesn’t resolve the discrepancy, either.  To maintain Player A’s 59.3% chance of winning, the third set would have to be replaced by a 26-point tiebreak.  But that, I’m sure, wouldn’t attract many new advertisers.

Does Cincinnati Matter in Flushing?

After months of clay and grass tournaments, the best players on tour are finally competing on hard courts.  For many, Cincinnati is the extent of their North American hard court preparation leading up to the US Open.  No matter who wins this week, we’ll be tempted to anoint him the favorite in New York.  Should we?

Traditionally, Cincinnati features one of the strongest draws of the ATP season.  As the only tournament scheduled two weeks before the US Open, there are no alternatives for players preparing for the slam, and it still allows a week off.  This year’s draw, missing three top 10 players due to injury, is an aberration.

It’s no surprise, then, that the list of winners in Cincinnati is particularly impressive.  19 of the last 20 champions have career peak rankings of 1 or 2.  (The black sheep in the group is Thomas Enqvist, who “only” reached #4.)  Not only do the best in the world show up to play, they show up to win.

More than some warmups, Cincinnati seems to tell us who is in form.  Let’s see if tells us who is going to win the Open.

Since 1991, there have been four seasons when the same man lifted the trophy in Cincinnati and New York: Pat Rafter in 1998, Andy Roddick in 2003, and Roger Federer in 2005 and 2007.  Five more times, the Cincinnati winner reached the US Open final.  Not counting 1999, when Pete Sampras didn’t compete in Flushing, the Cincinnati champion has failed to reach the US Open round of 16 only twice in the last 21 years.

So, the Cincinnati winner has won the US Open about 20% of the time, and reached the final another 25%.  Sounds good, though not as good as we’d expect from the top seed.  On the other hand, Cincinnati winners aren’t always the top seed in New York, so we can’t expect them to perform at that level.

In fact, the Cincinnati winner has been the top seed in Flushing only six times.  On average, the Cinci champion has been seeded 4th in New York.  Compared to the performance we’d expect from a #4 seed, a 20% shot at winning the tournament, along with a nearly 1-in-2 chance of reaching the final, is extremely good.

Since 1991, #4 seeds at the US Open haven’t performed nearly so well during the final weekend as have Cincinnati champions.  Both groups have a roughly 6-in-10 chance of reaching the semis (#4 seeds: 57.1%, Cinci winners: 60%), but the #4 seeds have won only half of their semifinals, for a 28.5% chance of reaching the final, compared to the 45% of Cincinnati titlists.

The biggest difference is where it matters most: the final itself.  Cincinnati winners go on to win almost half of their US Open finals, winning 4 titles in 20 attempts, as we’ve seen.  But #4 seeds have won only 2 titles.  It’s not a huge sample, but if we expand our view to consider all four slams since 1991, the performance of #4 seeds stays about the same.

Much to my surprise, it seems that Cincinnati results do have something to say about the final rounds in Flushing.  This week’s winner isn’t exactly a lock to triumph in New York, but his performance in Ohio will tell us to expect that much more from him at the US Open.

How Good is Brian Baker?

In his remarkable comeback this year, Brian Baker has already recorded two top-20 scalps, along with seven other victories against players in the top 100.   In the same span of six months, he’s also lost to a player barely inside the top 400, and suffered another six defeats against guys outside the top 100.

This is inconsistency of historic magnitude.  The list of players he’s beaten may actually be more impressive than the list of those who have beaten him!  Adding to the confusion, we don’t have any other recent results from him.  We can’t just wave our hands and point to his 2011 performance level as an accurate indicator of his current level.

One measurement of player ability, the ATP ranking system, places him at #78, a number that seems just as ridiculous when he’s beating Philipp Kohlschreiber at a Masters event as when he’s losing to Maxime Authom at a challenger.  But overall, the ATP estimate doesn’t seem too far-fetched.  It’s certainly better than what jrank (my rating system) spits out.  That algorithm doesn’t know what to do with such a limited track record, so it places him far outside the top 100.

We can do better.  As we’ll see, Baker’s results suggest he belongs on the cusp of the top 50.

Uniquely limited results

Imagine a completely unknown player is given a wild card into a major event.  We don’t know where he came from or who he might have beaten in the past.  He’s a completely blank slate.  If we wanted to estimate his ability level, we would have to wait until we got some results.

If that player won an opening-round match against the 17th-best player in the world, our best guess would be that he is better than #17, but we wouldn’t know how much.  If he lost that opening round match, we would assume he is worse than #17.  We might use statistics from that match to estimate how much better or worse than #17.

As our unknown kept playing more matches, we would update our estimate, using additional data as it came in.

(You might protest that in the early going, we should regress our estimate to the mean, since if some random guy came out of nowhere, he probably isn’t one of the 16 best tennis players in the world–there was a reason he was nowhere.  And, in such a real-world scenario, you would be right.  But such a case, what is the mean?  If a baseball player is called up from Triple-A, an intelligent observer, such as a scout or team executive, considers him at least marginally MLB-level, so we would regress our estimate to the level of marginal MLB players.  But if a player receives a wild card into a tennis tournament, what do we know?)

Few tennis players in history have come closer to this unknown than Brian Baker.  Sure, everyone has to start somewhere, but usually “somewhere” is a long string of futures tournaments, followed by an even longer string of challengers.  By the time a player bags his first top-20 scalp, we have lots and lots of data to work with.

When other players were racking up several dozen matches every year, Brian Baker was rehabbing injuries and coaching college tennis.  We can only judge him based on a small number of recent results.  And those results are particularly contradictory.

Working backward

Intuitively, it’s tough to accept that a single player has beaten a bunch of good players and lost to several weaker ones.  No matter how good that guy is, such a set of outcomes is unlikely.

But how unlikely?  That question is the key to estimating Baker’s current level.

Rather than assuming Baker is playing at a certain level (like that of #78) and scratching our heads at his inconsistency, we can work backwards–take his results and determine the likelihood that he is playing at various levels.

For instance, we could assume that Baker is #5 in the world.  If so, some of his results would be very predictable (like the two wins against Blake Strode) and others would be particularly jarring.  We could go further and calculate the probability that the #5 player in the world would amass Baker’s specific match record.  Those odds, of course, are vanishingly small.

If you repeat the process for every possible ranking, you get a probability that #5, or #12, or #77 would win the matches Baker has won and lose the matches he has lost.  One of those probabilities will be higher than the others, and that’s our best guess of how highly we should regard the American.

(If you’re interested in methodology, click “Continue Reading” below.)

Using this method, we discover that Baker has played at the level of someone with about 820 ATP ranking points, putting him around #54, in a tight pack with Grigor Dimitrov, Gilles Muller, Alejandro Falla, and Lukas Lacko.  With every match he plays, we can continue to fine-tune our estimate.

There are many factors we need to ignore to do an analysis like this, largely because of the limited data that led us to the topic in the first place.  Many of Baker’s worst results have come on hard courts; perhaps he will prove over a longer period to be stronger on clay and grass.  If his ability level has changed over the last six months, as seems very likely, this approach fails to take it into consideration.

But because of the unique nature of Baker’s comeback, which makes it difficult to assume anything about his ability level–this approach allows us to a make a reasonably good guess.  And with such a strange mix of great wins and rough losses, a good guess is all we can hope for.

Continue reading How Good is Brian Baker?

Tommy Haas: Old and Winning

For all the talk of 30-somethings at the top of the modern men’s game, tennis players decline quickly.  30 may be the new 20, but 35 is still the same old 35, and 35-year-old tennis players are usually found on the champions tour, the doubles court, or national television.

Yet Tommy Haas, aged 34 years and 5 months, is enjoying a resurgence, having reached three finals in the last two months–on three different surfaces.  He’s one of the hottest players on tour of any age.

34-year-olds don’t do things like that.  In the last ten years, players 34 and older have accounted for fewer than 1% of wins on the ATP tour.  From 2008 to 2011, all 34-year-olds–combined–won a total of 17 tour-level matches.  In the five months since his birthday, Haas has won 22.

To find a point of comparison, we need to go back five years, to the 2007 campaign of Fabrice Santoro, and slightly earlier, to Andre Agassi‘s 2004 season.  Agassi at 34 was better than Haas at 34, winning 37 tour-level matches and reaching two grand slam quarterfinals.  Agassi was the best “old” player since Jimmy Connors and the only man in the discussion since the 1970s.

Yet already, Haas is among the best 34-and-overs in ATP history.  His 22 wins since his 34th birthday are good for 28th on the all-time list, ahead of Fred Stolle and just behind Roy Emerson.  But that understates Haas’s accomplishment.  With the exceptions of Santoro, Agassi, and Connors (whose 178 wins-past-34 are good for 2nd on the all time list, behind Ken Rosewall), everyone on the list retired more than 20 years ago.

Comparisons to Haas’s contemporaries do a better job of illustrating how unusual he is.  The only two older men to have won a match on tour this year are Arnaud Clement and Ruben Ramirez Hidalgo, neither of whom are a factor anywhere but the challenger tour.  The other 34-year-old to win some matches this season is hyper-fit warrior Michael Russell, who took advantage of the weak draws in Atlanta and Los Angeles.

As long as he stays healthy, Haas is far from finished.  According to Jrank, he’s the 11th-best hard court player in the game right now. He may not have another grand slam final ahead of him, as Agassi did at the same age, but he has more wins in his future than most players a decade his junior.

The Hangover Effect of a Marathon Fifth Set

Italian translation at settesei.it

Marathon sets are again the talk of tennis.  We won’t soon forget Roger Federer’s 19-17 third-set win over Juan Martin Del Potro … or Roger’s weak performance in the match that followed.

The unusual Olympic format–best-of-three, no final-set tiebreak–brought several issues to the fore.  Should best of three be enough for slams?  It certainly gave us plenty of dramatics last week.  And is it finally time to end the no-tiebreak madness?  For all of the occasional drama, do we really need to see even more service holds in John Isner matches?

Peter Bodo makes the case for a marathon-free world:

[M]y main reason for embracing the final-set tiebreaker is not the obvious one that would be cited by most time-sensitive television producers. The real problem with deuce sets is that when a match goes as long as Federer v. Delpo or even Jo-Wilfried Tsonga v. Milos Raonic (that one went 25-23, for Tsonga) the reward for the winner’s heroic feat is almost always a quick subsequent loss.

As Bodo goes on to illustrate, this seems anecdotally true.  But who cares about anecdotes?  This is a testable hypothesis.

As we’ll see, there is a noticeable hangover effect when a player has fought through a marathon fifth set.  But the alternative–a fifth-set tiebreak–produces nearly the same hangover.

There have been 146 marathon fifth sets–matches in which the final set reached 6-6–in Grand Slam tennis since the beginning of 2001.  The record of those 146 winners in their next round is dreadful: 43-103, or 29.5%.  It’s even worse than that, actually.  Four times, two marathon men went on to play each other, so four of those wins were inevitable.

However, that isn’t the end of the story.  To prove that fifth-set marathons significantly weaken their winners, we need to establish two things: (1) They had a decent shot at beating their next opponents anyway, and (2) if a fifth-set tiebreak were played, their chances would have been better.

Post-marathon underdogs

The first issue is a bit sneaky.  If a player has to go deep into the fifth set to win in the early rounds, he’s hardly a dominating presence in the draw.  Consider the extreme case of Yen Hsun Lu, who in 2010, beat Andy Roddick in a 9-7 fifth set, advancing to play Novak Djokovic in the Wimbledon quarters.  Sure, Lu was tired, but what were the odds of an upset even if Roddick lost in three?  Top players rarely need five hours to push through an early-round opponent.

To quantify this, we can turn to jrank-driven predictions.  Using these measures of each player’s ability level at the time of the match, we can estimate the actual chances of our 146 marathon men.

The marathon men would have been underdogs in their next match no matter what.  On average, each one had a 43.4% chance of winning, meaning that of the 146 matches, they should have won 63 of them.  Even adjusting for their underdog status, they seem to have suffered from their marathons–they won 43 of those matches, barely two-third the number that they “should” have won.

Almost-but-not-quite marathons

We’ve established that once a player enters the uncharted territory beyond 6-6, his chances of winning the next match are substantially weakened.  But surely the fatigue didn’t set in right at the moment the chair umpire called “6-6.”  Even if the fifth set is a bagel, simply playing five sets of professional-level tennis is exhausting, and might impact one’s performance a day or two later.

The most relevant set of matches for comparison are US Open five-setters that went to a final-set tiebreak.  Since 2001, we have 40 of those.  In their next matches, the winners of the almost-marathons went a dismal 11-29 (27.5%)–worse than the marathon men!

Compared to their expectations, though, they did a bit better.  Those forty men, on average, had a 38% chance of winning their next matches, meaning we would expect them to win about 15 of the 40.  Relative to the predictions we would have made at the time, this small sample of fifth-set-tiebreak winners outperformed the marathon men, but just barely.

For a bigger sample, we can turn to the slightly shorter–but still epic–matches that end 7-5 in the fifth.  Of the 95 such matches since 2001, the 7-5 winners went on win 49, or 51.5% of their next matches!  This despite the fact they were collective underdogs, expected to win only 48%, or 46 of those matches.

What now?

Since the 7-5 group performed so differently in their next matches, it’s tempting to speculate why they did so.  My best guess: If a player manages a break before the set goes 6-6, he’s relatively fresh, physically and mentally.  The sort of player who can break at 5-5 or 6-5 is one who can come back a day or two later and plow through another three or four hard-fought sets.

By contrast, matches that get to 6-6–whether they end in a tiebreak or not–are usually battles of attrition.  Think Isner-Mahut: The longer it lasted, the less likely either player could challenge the other’s serve.  That brand of tennis had set in before 6-6 in the fifth: If one of the players pulled out a 7-4 tiebreak, it wouldn’t say much about his fitness or mental stamina, simply that someone is bound to get lucky for a point or two.

Based on the limited data we have, there just isn’t much difference between the after-effects of fifth-set marathons and fifth-set tiebreaks.  In both cases, the marathon men weren’t going to be favored anyway, and their fatigue hurts them even more.  Changing format to fifth-set tiebreaks would have little effect on future outcomes–it would just make those matches a bit more dependent on a lucky bounce.