The Case for Novak Djokovic … and Roger Federer … and Rafael Nadal

By winning the US Open last weekend and increasing his career total to ten Grand Slams, Novak Djokovic has pushed himself even further into conversations about the greatest of all time. At the very least, his 2015 season is shaping up to be one of the best in tennis history.

A recent FiveThirtyEight article introduced Elo ratings into the debate, showing that Djokovic’s career peak–achieved earlier this year at the French Open–is the highest of anyone’s, just above 2007 Roger Federer and 1980 Bjorn Borg. In implementing my own Elo ratings, I’ve discovered just how close those peaks are.

Here are my results for the top 15 peaks of all time [1]:

Player                 Year   Elo  
Novak Djokovic         2015  2525  
Roger Federer          2007  2524  
Bjorn Borg             1980  2519  
John McEnroe           1985  2496  
Rafael Nadal           2013  2489  
Ivan Lendl             1986  2458  
Andy Murray            2009  2388  
Jimmy Connors          1979  2384  
Boris Becker           1990  2383  
Pete Sampras           1994  2376  
Andre Agassi           1995  2355  
Mats Wilander          1984  2355  
Juan Martin del Potro  2009  2352  
Stefan Edberg          1988  2346  
Guillermo Vilas        1978  2325

A one-point gap is effectively nothing: It means that peak Djokovic would have a 50.1% chance of beating peak Federer. The 35-point gap separating Novak from peak Rafael Nadal is considerably more meaningful, implying that the better player has a 55% chance of winning.

Surface-specific Elo

If we limit our scope to hard-court matches, Djokovic is still a very strong contender, but Fed’s 2007 peak is clearly the best of all time:

Player          Year  Hard Ct Elo  
Roger Federer   2007         2453  
Novak Djokovic  2014         2418  
Ivan Lendl      1989         2370  
Pete Sampras    1997         2356  
Rafael Nadal    2014         2342  
John McEnroe    1986         2332  
Andy Murray     2009         2330  
Andre Agassi    1995         2326  
Stefan Edberg   1987         2285  
Lleyton Hewitt  2002         2262

Ivan Lendl and Pete Sampras make much better showings on this list than on the overall ranking. Still, they are far behind Fed and Novak–the roughly 100-point difference between peak Fed and peak Pete is equivalent to a 64% probability that the higher-rated player would win.

On clay, I’ll give you three guesses who tops the list–and your first two guesses don’t count. It isn’t even close:

Player           Year  Clay Ct Elo  
Rafael Nadal     2009         2550  
Bjorn Borg       1982         2475  
Novak Djokovic   2015         2421  
Ivan Lendl       1988         2408  
Mats Wilander    1984         2386  
Roger Federer    2009         2343  
Jose Luis Clerc  1981         2318  
Guillermo Vilas  1982         2316  
Thomas Muster    1996         2313  
Jimmy Connors    1980         2307

Borg was great, but Nadal is in another league entirely. Though Djokovic has pushed Nadal out of many greatest-of-all-time debates–at least for the time being–there’s little doubt that Rafa is the greatest clay court player of all time, and likely the most dominant player in tennis history on any single surface.

Djokovic is well back of both Nadal and Borg, but in his favor, he’s the only player ranked in the top three for both major surfaces.

The survivor

As the second graph in the 538 article shows, Federer stands out as the greatest player of all time at his age. Most players have retired long before their 34th birthday, and even those who stick around aren’t usually contesting Grand Slam finals. In fact, Federer’s Elo rating of 2393 after his US Open semifinal win against Stanislas Wawrinka last week would rank as the sixth-highest peak of all time, behind Lendl and just ahead of Andy Murray.

Here are the top ten Elo peaks for players over 34:

Player         Age   34+ Elo  
Roger Federer  34.1     2393  
Jimmy Connors  34.1     2234  
Andre Agassi   35.3     2207  
Rod Laver      36.6     2207  
Ken Rosewall   37.4     2195  
Tommy Haas     35.3     2111  
Arthur Ashe    35.7     2107  
Ivan Lendl     34.1     2054  
Andres Gimeno  35.0     2035  
Mark Cox       34.0     2014

The 160-point gap between Federer and Jimmy Connors implies that 34-year-old Fed would win about 70% of the time against 34-year-old Connors. No one has ever sustained this level of play–or anything close to it–for this long.

At the risk of belaboring the point, similar arguments can be made for 33-year-old Fed, all the way to 30-year-old Fed. At almost any stage in the last four years, Federer has been better than any player in history at that age [2].  Djokovic has matched many of Roger’s career accomplishments so far, especially on clay, but it would be truly remarkable if he maintained a similar level of play through the end of the decade.

Current Elo ratings

While it’s not really germane to today’s subject, I’ve got the numbers, so let’s take a look at the current ATP Elo ratings. Since Elo is new to most tennis fans, I’ve included columns to indicate each player’s chances of beating Djokovic and of beating the current #10, Milos Raonic, based on their rating. As a general rule, a 100-point gap translates to a 64% chance of winning for the favorite, a 200-point gap implies 76%, and a 500-point gap is equivalent to 95%.

Rank  Player                  Elo  Vs #1  Vs #10  
1     Novak Djokovic         2511      -     91%  
2     Roger Federer          2386    33%     84%  
3     Andy Murray            2332    26%     79%  
4     Kei Nishikori          2256    19%     71%  
5     Rafael Nadal           2256    19%     71%  
6     Stan Wawrinka          2186    13%     62%  
7     David Ferrer           2159    12%     58%  
8     Tomas Berdych          2148    11%     56%  
9     Richard Gasquet        2128    10%     54%  
10    Milos Raonic           2103     9%       -  
                                                  
Rank  Player                  Elo  Vs #1  Vs #10  
11    Gael Monfils           2084     8%     47%  
12    Jo-Wilfried Tsonga     2083     8%     47%  
13    Marin Cilic            2081     8%     47%  
14    Kevin Anderson         2074     7%     46%  
15    John Isner             2035     6%     40%  
16    David Goffin           2027     6%     39%  
17    Grigor Dimitrov        2021     6%     38%  
18    Gilles Simon           2005     5%     36%  
19    Jack Sock              1994     5%     35%  
20    Roberto Bautista Agut  1986     5%     34%  
                                                  
Rank  Player                  Elo  Vs #1  Vs #10  
21    Philipp Kohlschreiber  1982     5%     33%  
22    Tommy Robredo          1963     4%     31%  
23    Feliciano Lopez        1955     4%     30%  
24    Nick Kyrgios           1951     4%     29%  
25    Ivo Karlovic           1949     4%     29%  
26    Jeremy Chardy          1940     4%     28%  
27    Alexandr Dolgopolov    1940     4%     28%  
28    Bernard Tomic          1936     4%     28%  
29    Fernando Verdasco      1932     3%     27%  
30    Fabio Fognini          1925     3%     26%

Notes:

  1. These numbers don’t precisely agree with 538’s, or with either of two other recent sets of ratings. Some of the discrepancy seems to be due to including or excluding retirements and withdrawals–both 538 and I are excluding them, but when I included retirements (though not withdrawals), Federer and Djokovic swapped places at the top of the list.
  2. 538’s graph shows Lendl ahead at age 30 and Connors with a slight edge briefly around age 32.

22 thoughts on “The Case for Novak Djokovic … and Roger Federer … and Rafael Nadal”

  1. All this comparing players across eras is a lot of fun and all, but I don’t think you can stress enough how meaningless these comparisons really are.

    There really is no way to adjust the ratings to take account of the changing overall level of skill in the game, and there’s all sorts of other issues too. The ratings only really mean something in comparison to other players playing at the same time.

    There’s always some completely indefensible fiddle in order to keep the top guys’ elo ratings at roughly the same level across eras, and then comparisons are obviously ridiculous.

      1. I will have a look at that, but the problem with elo comparisons across eras is simply that, not only can you add a constant offset to everyone’s ratings and not change anything, you can also add any function of time to the ratings and not change anything.

        So right off the bat elo just cannot tell if average standards are changing over time. Since standards almost certainly are changing you ought to add something, but I couldn’t tell you what it should be.

        What you’re really comparing is some level of dominance across eras, but even that concept has problems. If you have changing numbers of players/matches in your model then the spread of elo ratings also changes – you have to decide who the ‘average’ player is, or some sort of reference player, again I couldn’t tell you how to do that in an unbiased way.

      2. There’s a number of arbitrary seeming choices that have been made in there: a k-factor which gets smaller the more matches a player has played, the k-factors are also player specific so the rating changes are no longer zero sum, players start with a 1500 ranking which is somewhere near the average (at least initially)…

        Also it is only using the atp+gs results (no qualifiers or challengers) which I believe skews the results in odd ways for those players on the edge of being atp tour regulars – their average opponent in included matches is better than them because you don’t often see the matches where they win.

        I did a quick analysis of how year-end ratings are changing over time and the average is rising. There’s a few effects that are partially cancelling out that cause that – players retiring and taking their points with them causes the average to go down, but those fringe players losing a couple of matches and never being seen again cause the average to go up (especially given that they start at 1500 and immediately drop a ton of points due to their big k-factors).

        Unless you find a satisfactory way to adjust for these things you can’t compare across eras.

  2. I know you weren’t crazy about my idea of scaling average ELO to zero, but my other point (using 500 in the exponential denominator) would allow you to eliminate the last two columns in the chart altogether and anyone could calculate any player’s approximate percentage against any other player just by dividing the ELO difference by 10 and adding to 50.

    I’ll stop now. 🙂

    1. I hear you. Since my goal is to turn this into something with predictive value, there’s a lot of work to be done — surface adjustments, handling players who have missed a lot of time, and newbies, as you mentioned on Twitter. I don’t know what the end result will look like, but I expect it’ll involve some rescaling, especially to handle players at lower levels.

  3. very nice analysis and thanks for sharing the code.
    Is jrank based on Elo like ratings ? ( http://tennisabstract.com/jrank/atp.html) Or are you planning to replace it with something like Elo ?
    Also for jrank , I thought that if two players have x and y points, winning probability of the former is x/(x+y) , so basically the ratio matters. [ intuition coming from logistic regression ] .
    For Elo, it seems the difference matters. So do they (closely) match each other after taking a logarithm (and change of origin and scale ?)

    1. jrank is somewhat like elo, in that it gives players points based on what they beat (and lost to), instead of when they did so. It’s a lot more complicated though, and some of that complexity is probably unnecessary. I’ve yet to test Elo vs Jrank for predictiveness (to see how similar they are, or to see which is more predictive), and before I do that, there are many things to do that might improve Elo, particularly with surface adjustments.

      Yep, you’re right about ratio vs difference.

  4. Hey do you have a surface specific ELO for

    1. Benjamin Becker on hard courts
    2. Daniel Gimeno Traver on clay
    3. Radek Stepanek on grass?

    Kinda wondering how close to 1500 they will be, because they have (over the last couple of years) stats that closely resemble the average stats on those surfaces.

    1. Becker hard: 1677
      DGT clay: 1715

      Haven’t run grass #s, don’t know how meaningful they would be given how few matches there are.

      Using only tour-level data, 1500 is essentially replacement level, since there are so many players entering the league for the first time (or first few). Becker is 9-2 against players outside the top 100 since the beginning of last year, and some of those guys are even above 1500.

      Looking at who’s at 1500, it’s looks like the 150-200 range in the rankings, with some tweaks for surface. (Pospisil is 1350 on clay.) In the overall ratings, you have guys like Moriya, Millot, Gombos, Daniel, and Lindell around 1500.

    1. I don’t doubt the theoretical improvement of TrueSkill or WHR, but in practice, is an improvement of roughly .005 worth the additional computation time and complexity?

      Also, I probably wouldn’t say ELO is “not very good” if the basis for that statement is that True Skill and WHR are only a half a percentage point better.

      1. My point was not to say that “ELO is not very good” but that “ELO is not very good for historical comparisons”.
        By that I mean that if you try to answer the question “Who is the best player in the last 50 years ?” ELO may not be the best tool.

        ELO is obviously a good tool to make fast estimations of win probabilities of players in the same time era.
        But I think that the ELO rating of Connors in 1980 can’t be compared to the ELO rating of Djokovic in 2015 simply because a 1 ELO point in 1980 doesn’t have the same value as 1 ELO point today.

        For Chess, several alternatives to ELO have been proposed that may improve historical comparisons:
        * EDO : http://www.edochess.ca/
        * Glicko : http://www.glicko.net/research/glicko.pdf
        * WHR
        * TrueSkill / Trueskill through time etc.

        I like very much your blog, which is a fantastic place for a fan of tennis stats like me.
        My comment was not intended to doubt the quality of your analysis, it was just a comment 🙂

        1. To clarify, the reply to your original comment came from me (Jeff M), not Jeff Sackmann, who runs Tennis Abstract. Sorry for the confusion. I should start using a different handle.

          Anyway, my point is that I don’t see ELO as significantly worse than the other systems you cited. In the WHR paper, it looks like none are more than a half percent better, but all seem more complicated to generate. Not that I’m a defender of ELO, but it might be fairer to say “none of the systems are very good for historical comparisons.”

  5. Hello Jeff,

    As an ELO newbie, was curious about a few things without going into the actual computation – would be grateful if you could clarify.

    1. Does the ELO rating change depending on a tournament i.e. if it is a slam vs Masters vs 500s vs 250s etc ? Or if player A beats player B the net positive score for A would be exactly the same irrespective of the tournament?

    2. Does the number of sets matter i.e if its a best of 5 or a best of 3?

    3. Does the score matter – a straight set win vs a deciding set; do games matter i.e does 6-1,6-1 score more than 6-4,7-5?

    4. Finally, why do you use the term “peak ELO” which suggests a value at a point in time as opposed to a total / average ELO over a period? Is average ELO indeed a better way of measuring career success?

    Thanks

    1. 1. no – tourney level doesn’t matter.

      2. in this calculation, no. Bialik and Morris tested giving 5-setters more weight, but they say it didn’t make the results more predictive.

      3. in this version, no. They also tested some variations of this, and apparently it didn’t help.

      4. At any given time, a player’s Elo reflects their results over a period of time. After a win, it goes up; after a loss, it goes down, and the amount depends on the quality of the opponent. Peak vs career is ultimately a religious debate — you could argue that the best player is the one who sustained the highest level over a single year, or three years, or five years, or ten years, or any number of other permutations. None of them are final.

  6. Hi Jeff,
    Thanks for putting this article together – amazing work! I have a few questions about the Elo computation that I was hoping you could help me with!
    1) How did you compute initial Elo scores? I imagine that you must begin with data from a specific year, say 1968. I also know that you must have a 1500 point average. Do you then allocate points based on ATP rank; what formula, if any do you use? From experience, I know that higher ranked players (1,2,3) have a bigger points spread than the lower ranked players (1200,1300 ranks). Do you account for this in any way when allocation initial points or are the points normally distributed when allocated to players?
    2) Do you reset Elo ratings at the beginning of each year or do you keep the points going?
    3) How do you deal with players that are injured or take a year or more time off the courts. Do you delete the data for this players? how do you compute the Elo when they come back?
    3) How do you see the Elo ratings being applied to doubles players and tournaments (as a side note, do you have any databases on doubles matches?)
    Thanks so much!

    1. everyone starts in 1968 with 1500 points. no, not reset each year. I did something very similar to elo for doubles for a Tennis Magazine article last year; might revisit it soon. Nope, no doubles data.

    1. Carl and Ben tried that when they first published Elo ratings on fivethirtyeight, and they say that counting sets or games won doesn’t improve the predictiveness of the resulting ratings.

Comments are closed.