Indian Wells Projections

If you’ve found your way here from the Wall Street Journal, welcome! If you don’t know what I’m talking about, go read what Carl Bialik has to say in today’s paper, and in an online follow-up.

I’ve developed a fairly sophisticated algorithm to predict the outcome of tennis matches.  It seeks to remedy some of the flaws in the present ranking system and do a better job of forecasting which players will perform better at certain times, on certain surfaces, against certain opponents.

In the past, I’ve written about the predictiveness of ATP ranking points–which are pretty darn good, for all their flaws.  By just about any standard, however, my system is better.  It’s not perfect–it’s far, far from it–but it does give you a valid second opinion on a player’s abilities at any given time.

The components

My algorithm does several things that traditional ranking points do not.  Here are a few of the components:

  • Points are awarded based on the quality of opponents, not on the round or tournament.  Thus, beating Mikhail Youhzny in the quarterfinals in Moscow is worth the same as the semifinals of Indian Wells.  Losing to a low-ranked player counts against you more than losing against Roger Federer.
     
  • These points, and everything else, are adjusted for surface.  Beating Federer counts for more on hard courts than on clay; beating Juan Carlos Ferrero is the opposite.
     
  • The algorithm generates a set of overall rankings, and it also generates two sets of surface-specific rankings, one for clay courts, one for everything else.  (There isn’t enough data on indoor hard courts or grass courts to treat them separately from any other type of fast court.)  So for Indian Wells, I’m using the hard-court rankings.  Of course, this drastically impacts the chances of many players.
     
  • The points awarded for any tournament are also based on how recent the event was.  Beating Andy Murray last week is more relevant than beating him last year.  Thus, Milos Raonic does better in my rankings (24th overall) than in the ATP rankings (37th).  Sure, it would help if Raonic had played more ATP-level events last year, but my algorithm recognizes that February results count for more than wins from last June.
     
  • My system considers matches from the last two years, not just one year, as the ATP rankings do.  This and the ‘recency’ adjustment remedy what I consider to be the most ridiculous part of the ATP ranking system.  A player can fall dozens of spots in the rankings simply because a tournament result “falls off.”  
     
     So, a match from 51 weeks ago tells us a lot about a player’s current skill level, but a match from 53 weeks ago does not?  In my system, both are counted; a match from 51 weeks ago counts for about 55-60% of the value of a match from last week, while a match from a few weeks earlier counts for a little less.
     
  • Grand slams count for a bit more, but not a lot more.  The main reason for this is that the winner of a five-setter is more likely to the more skilled player than the winner of a three-setter.  A couple of bad bounces in a tiebreak can turn a three-setter against you, but it’s awfully hard to win a five-setter with luck.
     
  • There is a bit of home court advantage in tennis, though with the increasing use of the challenge system (which limits officiating bias), it seems to be decreasing.  It still exists, and it’s considered.
     
  • For whatever reason, it appears that qualifiers and wild cards do worse in ATP main draw matches than my system would otherwise expect.  So they are penalized a small amount.
     
  • Finally, there is a head-to-head component.  It turns out that the head-to-head component can’t improve that much on the rankings-based algorithm, but it does have some value.  So I do consider the history of each matchup, giving a slight edge to the player who has won more matches in the past.  (Depending, of course, on how long ago it was, what surface the matches were on, and so on.)

Whew!

Thanks for reading this far.

As I post this, a few matches have already been played.  But these numbers were generated this morning, after the full draw was released.  It shows the probability that each player reaches each round of the tournament.  I’ll have a little more to say at the bottom.

Player            R64   R32   R16    QF    SF     F     W 
(1)Nadal         100% 94.6% 78.3% 56.3% 40.1% 24.1% 13.0% 
(q)De Voest       54%  3.1%  0.8%  0.1%  0.0%  0.0%  0.0% 
Riba              46%  2.3%  0.5%  0.1%  0.0%  0.0%  0.0% 
(q)Sweeting       42%  8.4%  0.8%  0.1%  0.0%  0.0%  0.0% 
Granollers        58% 17.2%  2.0%  0.5%  0.1%  0.0%  0.0% 
(27)Monaco       100% 74.4% 17.7%  7.5%  2.9%  0.8%  0.2% 
(19)Baghdatis    100% 86.1% 52.9% 21.3% 11.3%  4.7%  1.6% 
(q)Devvarman      43%  5.0%  1.0%  0.1%  0.0%  0.0%  0.0% 
Mannarino         57%  8.9%  2.2%  0.2%  0.0%  0.0%  0.0% 
(q)Cipolla        28%  4.0%  0.7%  0.1%  0.0%  0.0%  0.0% 
Malisse           72% 22.1%  6.6%  1.5%  0.4%  0.1%  0.0% 
(15)Tsonga       100% 73.9% 36.7% 12.2%  5.9%  2.0%  0.6% 

(11)Almagro      100% 81.5% 51.0% 22.4%  7.8%  2.7%  0.8% 
(q)Russell        45%  8.1%  2.0%  0.3%  0.0%  0.0%  0.0% 
Anderson          55% 10.4%  3.1%  0.6%  0.1%  0.0%  0.0% 
Istomin           41% 13.1%  4.6%  1.0%  0.2%  0.0%  0.0% 
Nieminen          59% 24.4%  9.3%  2.8%  0.6%  0.1%  0.0% 
(23)Montanes     100% 62.5% 30.2% 10.8%  3.1%  0.8%  0.2% 
(28)Simon        100% 73.1% 27.2% 14.5%  4.6%  1.4%  0.4% 
Schuettler        40%  8.3%  1.2%  0.3%  0.0%  0.0%  0.0% 
Haase             60% 18.7%  4.0%  1.3%  0.2%  0.0%  0.0% 
(q)Matosevic      29%  2.7%  0.6%  0.1%  0.0%  0.0%  0.0% 
Karlovic          71% 12.7%  5.0%  1.8%  0.4%  0.1%  0.0% 
(6)Ferrer        100% 84.6% 61.9% 44.1% 22.2% 10.8%  4.4% 

(4)Soderling     100% 89.0% 71.0% 46.8% 27.3% 15.8%  7.6% 
Phau              37%  3.0%  0.9%  0.2%  0.0%  0.0%  0.0% 
Berrer            63%  8.0%  3.4%  0.9%  0.2%  0.0%  0.0% 
(q)Smyczek        48% 10.5%  1.1%  0.2%  0.0%  0.0%  0.0% 
Marchenko         52% 13.4%  1.5%  0.3%  0.0%  0.0%  0.0% 
(32)Kohlsch.     100% 76.1% 22.0%  7.7%  2.3%  0.6%  0.1% 
(20)Dolgopolov   100% 68.8% 24.4%  8.9%  2.8%  0.9%  0.3% 
Hanescu           39% 10.5%  1.8%  0.3%  0.0%  0.0%  0.0% 
Seppi             61% 20.8%  4.9%  1.1%  0.2%  0.0%  0.0% 
Stepanek          30% 12.1%  6.7%  2.3%  0.8%  0.2%  0.1% 
(PR)Del Potro     70% 46.4% 35.6% 20.8% 11.1%  6.1%  2.9% 
(14)Ljubicic     100% 41.6% 26.5% 10.6%  4.4%  1.7%  0.5% 

(9)Verdasco      100% 86.2% 60.7% 23.2% 10.1%  4.2%  1.3% 
(WC)Berankis      52%  7.4%  2.2%  0.3%  0.0%  0.0%  0.0% 
(q)Bogomolov      48%  6.3%  1.7%  0.2%  0.0%  0.0%  0.0% 
Tipsarevic        71% 34.2% 12.2%  3.3%  0.9%  0.2%  0.0% 
Kamke             29%  8.2%  1.7%  0.2%  0.0%  0.0%  0.0% 
(21)Querrey      100% 57.6% 21.5%  5.8%  1.5%  0.4%  0.1% 
(25)Robredo      100% 70.8% 16.9%  7.6%  2.2%  0.6%  0.1% 
Zverev            62% 20.9%  2.9%  0.8%  0.1%  0.0%  0.0% 
(q)Ebden          38%  8.3%  0.8%  0.2%  0.0%  0.0%  0.0% 
(q)Young          37%  2.2%  0.6%  0.1%  0.0%  0.0%  0.0% 
Starace           63%  6.3%  2.6%  0.7%  0.1%  0.0%  0.0% 
(5)Murray        100% 91.4% 76.3% 57.7% 35.6% 21.5% 11.1% 

(8)Roddick       100% 84.9% 63.0% 43.4% 21.7%  8.7%  3.9% 
(WC)Blake         63% 11.3%  4.5%  1.4%  0.3%  0.0%  0.0% 
(q)Guccione       37%  3.8%  1.1%  0.2%  0.0%  0.0%  0.0% 
Ram-Hidalgo       34%  5.1%  0.5%  0.1%  0.0%  0.0%  0.0% 
Mello             66% 16.4%  2.7%  0.6%  0.1%  0.0%  0.0% 
(30)Isner        100% 78.4% 28.1% 12.6%  3.6%  0.8%  0.2% 
(18)Gasquet      100% 73.4% 34.8% 14.2%  4.6%  1.2%  0.3% 
Cuevas            72% 22.8%  6.7%  1.7%  0.3%  0.0%  0.0% 
Andujar           28%  3.9%  0.5%  0.1%  0.0%  0.0%  0.0% 
Benneteau         46% 16.1%  7.1%  2.3%  0.6%  0.1%  0.0% 
Lopez             54% 18.9%  9.0%  3.1%  0.8%  0.2%  0.0% 
(10)Melzer       100% 65.0% 41.9% 20.4%  8.2%  2.7%  0.9% 

(16)Troicki      100% 82.3% 40.1% 10.5%  4.3%  1.1%  0.3% 
(q)Bopanna        30%  3.1%  0.3%  0.0%  0.0%  0.0%  0.0% 
(WC)Tomic         70% 14.6%  3.1%  0.3%  0.1%  0.0%  0.0% 
Giraldo           55% 14.6%  6.0%  1.0%  0.3%  0.0%  0.0% 
Gim-Traver        45% 10.9%  3.8%  0.6%  0.1%  0.0%  0.0% 
(24)Llodra       100% 74.5% 46.7% 15.8%  7.1%  2.2%  0.7% 
(31)Gulbis       100% 56.7% 12.5%  6.0%  2.3%  0.6%  0.1% 
Hewitt            75% 37.3%  7.5%  3.7%  1.4%  0.4%  0.1% 
Lu                25%  6.0%  0.6%  0.1%  0.0%  0.0%  0.0% 
Mayer             66% 12.7%  7.2%  3.8%  1.6%  0.4%  0.1% 
Golubev           34%  3.7%  1.5%  0.5%  0.1%  0.0%  0.0% 
(3)Djokovic      100% 83.6% 70.8% 57.7% 42.5% 24.8% 15.4% 

(7)Berdych       100% 84.1% 64.8% 33.2% 12.6%  5.6%  2.3% 
Kukushkin         48%  7.6%  2.8%  0.5%  0.1%  0.0%  0.0% 
Kubot             52%  8.3%  3.1%  0.5%  0.1%  0.0%  0.0% 
De Bakker         48% 20.6%  5.3%  1.3%  0.2%  0.0%  0.0% 
Becker            52% 21.9%  5.9%  1.5%  0.2%  0.1%  0.0% 
(26)Bellucci     100% 57.4% 18.1%  4.9%  0.9%  0.2%  0.0% 
(17)Cilic        100% 81.7% 37.2% 20.7%  6.6%  2.6%  1.0% 
Gabashvili        49%  9.6%  1.5%  0.3%  0.0%  0.0%  0.0% 
Serra             51%  8.7%  1.2%  0.3%  0.0%  0.0%  0.0% 
Davydenko         84% 49.6% 32.8% 21.0%  8.7%  4.4%  2.1% 
Fognini           16%  3.5%  1.1%  0.3%  0.1%  0.0%  0.0% 
(12)Wawrinka     100% 47.0% 26.2% 15.5%  5.2%  2.2%  0.9% 

(13)Fish         100% 64.5% 41.9% 13.0%  6.4%  2.7%  1.1% 
(WC)Raonic        81% 33.0% 17.9%  4.3%  1.7%  0.6%  0.2% 
Ilhan             19%  2.5%  0.6%  0.0%  0.0%  0.0%  0.0% 
(WC)Harrison      26%  5.7%  1.0%  0.1%  0.0%  0.0%  0.0% 
Chardy            74% 32.1% 12.0%  2.4%  0.8%  0.2%  0.1% 
(22)Garcia-Lopez 100% 62.2% 26.6%  5.9%  2.3%  0.8%  0.2% 
(29)Chela        100% 59.2%  7.7%  2.6%  0.7%  0.2%  0.0% 
Petzschner        66% 30.5%  3.4%  1.1%  0.3%  0.0%  0.0% 
Brown             34% 10.3%  0.7%  0.1%  0.0%  0.0%  0.0% 
Andreev           41%  3.0%  1.4%  0.4%  0.1%  0.0%  0.0% 
Nishikori         59%  6.4%  3.7%  1.4%  0.4%  0.1%  0.0% 
(2)Federer       100% 90.6% 83.1% 68.7% 52.4% 36.7% 24.5%

You’ll probably notice right off that Federer and Djokovic have the best chances of winning. Indeed, they are the top two players on hard courts, according to my rankings. Yes, Nadal has won the slams lately, but he has also lost to a few players he shouldn’t have (Baghdatis, Melzer, Garcia-Lopez) in the recent past. I personally wouldn’t put money on Federer over Nadal in the final, but my algorithm disagrees.

A few other players my system likes are Juan Martin Del Potro, Nikolay Davydenko, and Marcos Baghdatis. It picks out some players for scoring wins over top-ranked players. It likes Del Potro both because of his strong record in the last few weeks and because the algorithm still considers his torrid summer of 2009, leading up to his U.S. Open win.

One more thing, and then I’ll shut up for now. In the first-round matches, there are very few that stray beyond a 70/30 split. Even Tomic-Bopanna is 70/30, and Bopanna barely plays singles. The narrow divides are partly because no top players are involved in the first round, but it also shows you the depth of the men’s game — even someone ranked outside of the top 150, like Flavio Cipolla, has a decent chance of advancing.

Of course, Flavio doesn’t have quite the same odds against Tsonga, and you can tell from Nadal’s second round odds that neither Pere Riba nor Rik de Voest stand much of a chance against him.

Enjoy the tennis … and the numbers.

12 thoughts on “Indian Wells Projections”

  1. Hi Jeff, have you taken the age of the players into consideration while making predictions?

    1. I haven’t. Age would certainly matter if I were predicting the results of, say, the entire 2011 or 2012 season. But I’m not so sure that age makes a difference if we’re just looking at what will happen *this week*.

      It is on my list, though, so I’ll be checking to see whether an age adjustment improves the predictiveness of the rankings.

Comments are closed.