Roger Federer’s Impressive but Not-Entirely-Relevant Dominance of the Istanbul Field

Roger Federer has faced 14 of the 27 other players in this week’s Istanbul field, and owns a career record of 59-1 against them. His one loss came to Jurgen Melzer, while more than half of his win total is thanks to his decade-long dominance of Mikhail Youzhny (16-0) and Jarkko Nieminen (14-0).

It’s rare that players of Federer’s stature contest such small events, so we don’t expect to see such lopsided head-to-heads very often. In fact, if we limit our view to events where a player faced at least 10 of the other entrants, it is only the 17th time since 1980 that someone has entered an event with a won-loss percentage of 95% or better against the field.

Federer himself represents two of the previous 16 times this has happened. The most notable of them is 2008 Estoril. He had previously faced 14 of the other players in the draw, and had never lost to any of them in 46 meetings. There are only four other instances of players undefeated against a field, all between 1980 and 1984 and in many fewer matches.

The most eye-grabbing of those early-80s accomplishments was Ivan Lendl‘s record entering the 1980 Taipei event. He had faced 15 of the men in the draw, posting a record of 24-0 up to that point. Lendl’s name is the most common on the list, having entered tournaments with a 95% won-loss record against the field on four different occasions, highlighted by a 79-4 mark against the other competitors at Stratton Mountain in 1988.

Federer won the 2008 title in Estoril and Lendl claimed the 1980 trophy in Taipei, but Lendl was ousted in the second round of the 1988 Stratton Mountain event. Federer has also demonstrated that a stratospheric record against the field is no guarantee of success.

After Estoril, Roger’s second-best record entering an event was in Gstaad in 2013. He held a 73-3 record against the field, with each of the three losses coming against different opponents. He lost his opening-round match in straight sets to Daniel Brands. His record against the field of the previous week’s Hamburg event was nearly perfect as well at 137-8, but Federico Delbonis stopped him in the semifinals there.

Rafael Nadal can tell a similar story. His best record against a field was in Santiago two years ago, coming back from injury. He had lost only 1 of 28 career matches against the other players in the draw. That week, Horacio Zeballos doubled Rafa’s loss count.

In fact, of the 16 times that a player went into an event with a 95% or better record against the field, the favorite won only six of them. Expanding the sample to records of 90% or better, the dominant player won 30 of 72 titles. Neither mark is as good as we’d expect if the historically great players continued to win matches at a 95% or 90% clip. In practice, head-to-head records just aren’t as predictive as they seem to be.

As is evident from some of the examples I’ve given, there are mitigating circumstances for many of these losses, and they aren’t entirely random. These days, when a player enters an event that seems below him, there’s a reason for it. Nadal rarely plays 250s; he was doing so to work his way back into match form. Federer rarely seeks out smaller events on clay; he was experimenting with a new racket.

This week, there’s no reason why Fed shouldn’t perform at his usual level–at least his usual level for clay–and win the four matches he needs to claim yet another title. But if he suffers his second loss against the players gathered in Istanbul this week, it won’t be quite as much of a shock as that 59-1 record implies.

Match Charting Project: More Matches, More Data, New Spreadsheet

The Match Charting Project keeps growing, and starting today, even more of the data is available for anyone who wants it. Several new contributors have helped us pass the 750-match milestone, having added an average of two matches per day since I first published the raw data.

New spreadsheet

The Match Charting spreadsheet now does a lot more. As you chart each point, the document updates stats for the match–both total and set-by-set. You’ll find the same stats you see on television (aces, double faults, winners, unforced errors, etc) along with some that are a little less common, like winning percentage in different lengths of rallies, and most consecutive points won.

In other words, As you chart the match, you’ll have access to many of the same stats that commentators do. Here’s what it looks like:

danka

If you’ve hesitated to try charting because you couldn’t see what was in it for you, I hope this changes the calculation a bit.

Click here to download the MatchChart template.

New data

About a month ago, I published the point-by-point data from all charted matches.  In raw form, it’s a bit daunting, and it’s more than what’s necessary for many interesting research projects.

Today, I added 15 different aggregate stats files for men, and another 15 for women. These contain the data that is shown in each charted match report. For instance, if you find it interesting that Simona Halep hit 14% of her backhands down the line in the Indian Wells final, you can take a look in the ShotDirection stats file and compare that number with the results from Halep’s other charted matches, or all matches in the database as a whole.

You can find these files (along with the updated raw data for 760+ matches) by clicking here.

Chart some matches

If you haven’t already, now is a great time to start charting professional matches and contributing to the project. An enormous number of matches are televised and streamed, and as the database of charted matches grows, there’s more and more useful context to all the data we’re generating.

You can start by jumping into the ‘Instructions’ tab of the new MatchChart spreadsheet, or for other tips, you can start with my blog post introducing the project.

Free ATP and WTA Results and Stats Databases

The vast majority of my men’s and women’s tennis results and stats databases are now free for anyone who wants to use them.

ATP Results and Stats:

  • Tour-level results back to 1968, with tons of data on both players in each match (age, handedness, country, rank), and matchstats from 1991-present.
  • Almost a decade of tour-level qualifying matches, with matchstats for the last few years.
  • Challenger results back to 1991, with matchstats for almost the last ten years.
  • Futures (and Satellite) results back to 1991.
  • Linked biographical and rankings data (introduced here).

WTA Results:

  • Tour-level results back to 1968, with the same player data as in the ATP files.
  • Tour-level qualifying matches.
  • Over 220,000 ITF main-draw matches.

Click the links to access the files. Enjoy!

Free ATP and WTA Ranking Databases

More data!

Today I’ve made available my entire ATP and WTA ranking databases through the end of the 2014 season. In addition, you’ll find my complete player tables, which include birthdate, country, and handedness for every player who has ever been ranked or played a tour-level match. (Plus thousands more players, who are included in the database for other reasons.)

This is all the data you need to research all sorts of topics, like the rise and fall of certain countries in the rankings and the changing age of top 10s, 50s, and 100s.

This is the third major dataset I’ve published this week, and more is on the way.

ATP rankings are here, and WTA rankings are here. Enjoy!

Raw Data From The Match Charting Project

In the last year and a half, dozens of contributors and I have amassed detailed shot-by-shot records of nearly 700 professional matches. You can see the full list here, or a menu sorted by player here.

I refer to this as The Match Charting Project, and I hope you’ll consider contributing as well. Using a straightforward text notation system, we record shot type, shot direction,  return depth, error types, and more. The more matches, the more interesting the results. The project made up part of my presentation at the Sloan Sports Analytics Conference last month, which included some very preliminary findings on player tendencies.

Now, you can dig into the raw data yourself. I’ve posted all of the user-submitted match charts in one place, in a standardized format for anyone who wants to mess around with it.

Enjoy!

 

Point-by-Point Data From the Last 17 Grand Slams

I’ve been doing a lot of griping lately about the state of tennis data, so I figured now was a good time to start doing something about it.

I’ve just released point-by-point data for most Grand Slam singles matches back to 2011. Beyond the basic point sequence–which is valuable in and of itself–you’ll find serve speed, winner type, and for a few of the slams, rally length for each point.

More detailed notes on the data are available at that link. Enjoy, and if working with it turns up any interesting findings, please let me know.

Sloan Conference Presentation on Tennis Analytics

Last weekend at the Sloan Sports Analytics Conference in Boston, I gave a talk, “First Service: The Advent of Actionable Tennis Analytics.” The presentation was in three parts:

  1. The sorry state of tennis data
  2. Schedule optimization (based in part on this blog post)
  3. The Match Charting Project (more about that in this post, among others)

The conference video-recorded all presentations, and I understand that video will be posted on the Sloan site. When it becomes available, I’ll post a link here.

In the meantime, many people have asked for my slide deck: First Service.

Also, Jim Pagels wrote a brief piece for Forbes drawing on my talk, which you can read here.

Who Do You Love, Racket Ralliers?

Embed from Getty Images

Many of you probably know by now: Last week, Ben Rothenberg and I launched Racket Rally, a stock-market-style fantasy tennis game. We were overwhelmed by the initial response, getting well over 2,000 signups in only a few days before play began at the Australian Open. If you haven’t joined in yet, we’d still love to have you–you can start building the perfect portfolio for Indian Wells and beyond.

With so much user data, it’s interesting to see which players are most popular among Racket Rally members.

For the uninitiated, here’s how it works. Each member starts with a budget of $100,000. She can spend that money on shares of any player in the top 300 (along with a few injury-protected players), at prices equal to their ATP or WTA ranking points. Last week, Richard Gasquet had 1,350 ATP ranking points, so you could buy one share of Gasquet for $1,350, two shares for $2,700, and so on, up to a maximum of 50 shares or $40,000, whichever comes first.

Each week, sales are limited, so the perfect portfolio isn’t necessarily optimized for the Australian Open. Since users are stuck with many of their players from week to week, their choices reflect both short-term and long-term expectations.

The numbers

Before the Australian Open began, 1,739 members had purchased shares of at least three players–a reasonable cutoff to define active users who built portfolios. They bought over 63,000 shares of 375 different players, spending just short of 169,000,000 fake Racket Rally dollars.

The most popular player, by almost every measurement, was Novak Djokovic. More than half of users (992) bought at least one share of Novak, and the same is true of Roger Federer, who is to be found in 875 portfolios. Here’s the rest of the top ten:

Kei Nishikori      764  
Maria Sharapova    716  
Serena Williams    708  
Andy Murray        697  
Simona Halep       639  
Milos Raonic       571  
Karolina Pliskova  557  
Nick Kyrgios       517

Interesting mix, huh? Pliskova is the big surprise, and shows the savviness of at least 500 users. Since Pliskova reached the final in Sydney last week, her ranking has since gone up, meaning that members who purchased shares last week got her at a discount. Kyrgios is a more Melbourne-optimized choice, as it’s reasonable to expect Nick to perform well at his home slam.

When we switch our focus to shares purchased, many of the same names remain near the top, but the order changes quite a bit. Users bought 2,412 total shares of Kyrgios, most of any player in the game. Pliskova is right behind him, at 1,990. An unexpected name comes in third: 1,921 shares of Viktor Troicki were picked up, presumably by users who think he will return to something much closer to his pre-suspension form.

Here are the other 15 players who garnered enough interest for users to amass at least 1,000 shares each:

Andy Murray         1732  
Novak Djokovic      1723  
Roger Federer       1636  
Bernard Tomic       1563  
Kei Nishikori       1435  
Maria Sharapova     1366  
Borna Coric         1329  
Serena Williams     1292  
Venus Williams      1205  
Thanasi Kokkinakis  1173  
Simona Halep        1158  
Garbine Muguruza    1130  
Vasek Pospisil      1108  
Milos Raonic        1100  
David Goffin        1048

When we turn to total dollars invested–or, to look at it another way, percentage of portfolio allotted–top players take center stage. Djokovic, Federer, Serena, Sharapova, and Murray make up the top five, while Petra Kvitova and Rafael Nadal make their first appearance in a top ten.

The differences among dollars spent are enormous. Members spent nearly $20 million (more than 10% of in-game currency) on Djokovic, $16 million on Federer, and just over $10 million each on Serena and Sharapova.  10 players are over the $5 million mark, 22 over $2 million, and 30 over $1 million.

Plenty of notable players are another order of magnitude less–Bethanie Mattek-Sands, the best Racket Rally investment, as of this writing–is held in only 49 portfolios, for a total of $120,000. Carina Witthoeft, the unheralded German who has reached the third round, appears in only nine portfolios, for a total of $44,000. One lonely user took a chance on Evgeniya Rodina (5 shares for $2,375)–members spent more money on at least 20 players who aren’t even in the Melbourne main draw.

It may be that not every share purchase was based entirely on interest or potential. 76 players–most of them out of action this week–are held in only one portfolio. I suspect that the member who spent $146 on one share of Anastasia Grymalska had about $146 left in his or her portfolio when that choice was made.

In the near future, I’ll put together a page on the Racket Rally website to show all of this data on a weekly basis. It will also be fascinating to see what players are the most traded each week.

You Can’t Win Over Our Aussie Sam Stosur … Or Can You?

With the possible exception of the first movement of Schubert’s Bb major piano sonata (D960), the greatest work of art to emerge from the western musical tradition is, of course, “Sam vs OVAs.”

After the six or seven hundredth time through this song, I untied my dancing shoes, put my tennis statistician hat back on, and wondered: Is the conventional wisdom valid? Is it true that players whose names end in “ova” can’t win over Sam Stosur?

Let’s delve into the database and find out.

“Sam vs OVAs” lists 24 potential opponents: 23 Ovas and 1 Galina Voskoboeva. Stosur has faced 21 of the 24 in her career, missing only Nina Bratchikova, Barbora Zahlavova Strycova, and Kristyna Pliskova. (For the record, Sam has faced Kristyna’s sister Karolina, losing in their one meeting.)

Sure enough, Ovas usually don’t win over Sam Stosur. The Aussie owns winning records against 13, has losing records against 7, and is even with 1, Yaroslava Shvedova.

Despite all those positive head-to-heads, the numbers aren’t so rosy upon closer inspection. Only 5 of 21 truly “can’t win” over Sam Stosur–Sam has lost at least once to 16 others. (That group of 16 includes Anastasia Rodionova and Jarmila Gajdosova, so the song is correct in those cases.) And while she has a positive aggregate record against the players in the song–holding at 56-52 as we head into the 2015 season–it is heavily weighed down by poor performances against Maria Sharapova (2-14), Petra Kvitova (1-7), and Lucie Safarova (2-9).

However, in Sam’s defense, the song’s lyricist didn’t cherry-pick in her favor. Stosur has faced 36 Ovas (plus Voskoboeva) in her career, 16 of whom weren’t named in the song. Against those players, she is undefeated against 10, and her overall record is a slightly better 15-12. Take out her abysmal 0-6 mark against Nicole Vaidisova, and you could put together a compelling (if biased) case that, as we have been led to believe, Ovas can’t win over Sam Stosur.

As my mother always taught me, a song can only reach its true potential once you thoroughly fact-check it. With that in mind, let’s listen again!

The Match Charting Project: One Year On

Just over a year ago, I launched the Match Charting Project, a collaborative effort to track every shot of as many professional matches as possible. Many of you have contributed, and a few of you have given more time to the project than I could have ever hoped. Thank you.

To make the MCP possible, I devised a relatively simple notation system, tracking every type of shot and its direction, along with an Excel document to make recording each point easier. Earlier this year, I beefed up the stats generated for each match, showing not only hundreds of rates and totals for each player, but also player and tour averages for comparison.

The project has recently passed a number of milestones, and even more are coming soon. The database now includes at least one match for every player in the ATP and WTA top 100. There’s depth as well as breadth: 18 players (10 men and 8 women) are represented with at least 10 matches each.

The WTA portion of the database just passed 200 total matches, and by the end of the year, the combined total will cross the 500-match mark. Earlier this year, I hesitated to pursue too much research using this dataset because it was too small and biased toward a few players of interest, but those reservations can increasingly be put to bed.

Frequently on this site, I have reason to vent my frustration with the state of data collection in tennis, and an excellent recent article illustrates how, in many ways, the state of the art is no more advanced than it was thirty years ago. If the professional tours won’t even release all the data they have, let alone lead the way in improving the state of analytics in the game, it’s up to us–the fans–to do better.

The Match Charting Project is one way to do that. Every additional match added to the database increases our knowledge of a specific matchup, of a pair of players, of surface tendencies, and of the sport as a whole. We’ll probably never be able to chart every tour-level match, but as the first (almost) 500 matches have shown, the database doesn’t have to be complete to be extremely valuable.

If you’ve already contributed, thank you. If you’re interested in contributing, start here.