Rafael Nadal’s Wide-Open Monte Carlo Draw

Italian translation at settesei.it

This afternoon, Rafael Nadal will take on Albert Ramos for a chance at his tenth Monte Carlo Masters title. Since 2005, Nadal has faced the best clay-court players in the sport and, with very few exceptions, beaten them all.

Yet this year, Nadal’s path to the trophy has been remarkably easy. The three top seeds–Andy Murray, Novak Djokovic, and Stan Wawrinka–all lost early, leaving Nadal to face David Goffin in the semifinals and Ramos (who ousted Murray) in the final. Goffin, at No. 13, was Rafa’s highest ranked opponent, followed by Alexander Zverev, at No. 20, who Nadal crushed in the third round.

When we run the numbers, we’ll see that this competition isn’t just weak: It’s the weakest faced by any Masters titlist in recent history. I’ll get into the mechanics and show you some numbers in a minute.

First, a disclaimer. By saying a draw is weak, I’m not arguing that the title “means less” or is somehow less deserved. It’s not in any way a reflection on the player. For all we know, Rafa would’ve cruised through the draw had he faced the toughest possible opponent in every round. The only thing a weak draw tells us about the champion is how to forecast his future. Had Nadal beaten multiple top-ten players this week, we might be more confident predicting future success for him than we are now, after he has beaten up on a bunch of players we already suspected he’d have no problem with.

Back to the numbers. To measure the difficulty of a player’s draw, I used jrank–my own surface-adjusted rating system, roughly similar to Elo–at the time of each Masters event back to 2002. For each tournament, I found the jrank of each player the titlist defeated, and calculated the likelihood that a typical Masters winner would beat that group of players.

That’s a mouthful, so let’s walk through an example. In the last 15 years, the median Masters winner was ranked No. 3, with a jrank (for the surface of the tournament) of about 4700, good for fourth at the moment. A 4700-rated player would have an 85.7% chance of beating Ramos, a 75.7% chance of defeating Goffin, and 87.3%, 68.4%, and 88.7% chances of knocking out Diego Schwartzman, Zverev, and Kyle Edmund, respectively. Multiply those together, and our average Masters winner would have a 34.3% chance of claiming the trophy, given that competition.

I’m using a hypothetical average Masters winner so that we measure the level of competition against a constant level. It doesn’t matter whether 2017 Nadal, peak Nadal, or someone else entirely played that series of opponents. If Djokovic had faced the same five players, we’d want the numbers to come out the same.

Here are the ten easiest paths to a Masters title since 2002, measured by this algorithm:

Year  Event                Winner          Path Ease  
2017  Monte Carlo Masters  Rafael Nadal*       34.3%  
2016  Shanghai Masters     Andy Murray         33.0%  
2011  Shanghai Masters     Andy Murray         30.8%  
2013  Madrid Masters       Rafael Nadal        30.8%  
2012  Paris Masters        David Ferrer        30.4%  
2010  Monte Carlo Masters  Rafael Nadal        27.3%  
2012  Canada Masters       Novak Djokovic      25.8%  
2014  Madrid Masters       Rafael Nadal        25.3%  
2016  Paris Masters        Andy Murray         24.7%  
2010  Rome Masters         Rafael Nadal        24.6%

* pending; extremely likely

The average ‘Path Ease’ is 15.6%, and as we’ll see in a moment, some players have had it much, much harder. In Shanghai last year, Murray certainly did not: His draw turned out much like Rafa’s this week, complete with Goffin along the way and a three-named Spaniard in the final–in his case, Roberto Bautista Agut.

Here are the ten most difficult paths:

Year  Event                 Winner              Path Ease  
2007  Madrid Masters        David Nalbandian         4.1%  
2007  Paris Masters         David Nalbandian         6.2%  
2014  Canada Masters        Jo Wilfried Tsonga       6.6%  
2011  Rome Masters          Novak Djokovic           6.6%  
2009  Madrid Masters        Roger Federer            7.0%  
2010  Canada Masters        Andy Murray              7.7%  
2004  Cincinnati Masters    Andre Agassi             7.9%  
2007  Canada Masters        Novak Djokovic           8.0%  
2009  Indian Wells Masters  Rafael Nadal             8.0%  
2002  Canada Masters        Guillermo Canas          8.4%

Those of us who remember the end of David Nalbandian‘s 2007 season won’t be surprised to see him atop this list. In Madrid, he beat Nadal, Djokovic, and Roger Federer in the final three rounds, and in Paris, he knocked out Federer and Nadal again, along with three other top-16 players. Making his paths even more difficult, he didn’t earn a first-round bye in either event.

Given that Monte Carlo is the one non-mandatory Masters event, I expected that, over the years, it would prove to have the weakest competition. That was wrong. Entering this week, Monte Carlo is only fourth-easiest of the nine current 1000-series events. Indian Wells–which requires at least six victories for a title, unlike most of the others, which require only five–has been the toughest, while Miami, which also requires six wins, is closer to the middle of the pack:

Event         Avg Path Ease  
Indian Wells          12.8%  
Canada                14.3%  
Rome                  14.6%  
Miami                 15.3%  
Cincinnati            15.7%  
Monte Carlo*          16.5%  
Madrid**              16.7%  
Paris                 16.8%  
Shanghai              21.5%

* through 2016; ** hard- and clay-court eras included

Finally, seeing the presence of Nadal, Djokovic, and Murray on the list of easiest title paths raises another question. How have the big four’s levels of competition differed at the Masters events?

Player          Titles  Avg Path Ease  
Roger Federer       26          14.6%  
Novak Djokovic      30          16.1%  
Rafael Nadal        28          16.7%  
Andy Murray         14          18.1%

not including 2017 Monte Carlo

Federer has had the most difficult paths, followed by Djokovic, Nadal, and then Murray. Assuming Rafa wins today, his number will tick up to 17.3%.

To reach ten titles at a single event, as Nadal is on the brink of doing in Monte Carlo, requires one to thrive regardless of draw luck. Rafa’s path to the trophy last year was tougher than any of his previous Monte Carlo campaigns, rating a Path Ease of 9.1%, almost difficult enough to show up on the top ten list displayed above. His 2008 title was no cakewalk either–a typical Masters winner would have only a 10.0% chance of coming through that draw successfully.

This year, Rafa’s luck has decidedly changed. To no one’s surprise, the best clay court player in history is taking full advantage.

The Proud Tradition of Americans Skipping Monte Carlo

Italian translation at settesei.it

The Monte Carlo Masters is unique among the ATP’s 1,000 series events. The stakes are high, but attendance isn’t mandatory, so while most of the game’s top players show up, a few take the week off. No group has so consistently skipped Monte Carlo than players from the U.S.A.

This year, six U.S. players had rankings that would’ve gotten them into the Monte Carlo main draw, where winning a single match earns you 45 ranking points and just over €28,000 in prize money. Five of those players–including John Isner, who reached the third round two years ago and won a pair of tough Davis Cup matches at the same venue–opted out. All five played the 250-level Houston tournament last week instead. Only Ryan Harrison made the trip to Europe–losing in the opening round, as Carl Bialik and I safely predicted on this week’s podcast.

Choosing the low-stakes event on home soil isn’t the wise choice, but it’s nothing new. Since 2006, only seven Americans have appeared in a Monte Carlo main draw: Isner twice, Harrison, Sam Querrey, Donald Young, Steve Johnson, and Denis Kudla, who qualified in 2015. From 2006 to 2016, 7 of the 11 Monte Carlo draws were entirely USA-free. In the same time span, Houston draws have featured 35 Americans ranked in the top 60–all players who probably would have earned direct entry in the higher-stakes clay event, as well.

For a player like Isner or Jack Sock, an April schedule can handle both tournaments. Four of the seven Americans who went to Monte Carlo played Houston as well, including Querrey in 2008, when he lost in the first round in Houston but reached the final eight in Monte Carlo.

Most U.S. players, including just about everyone I’ve mentioned so far, would much rather play on hard courts than on clay.  (The Houston surface is more conducive to aggressive, first-strike tennis than is the Monte Carlo dirt, one of the slowest surfaces on the calendar.) However, as Isner and Querrey have shown, a one-dimensional power game can succeed on a slow court, even if it looks nothing like the strategy of a traditional clay specialist.

Isner, in particular, has racked up plenty of points on the surface. While he’d much rather play on home soil, he has twice reached the fourth round at the French Open and pushed none other Rafael Nadal to a deciding set in both Paris and Monte Carlo. Sock is also a threat on the surface, having won nearly two-thirds of his tour-level matches on clay. Many of those wins came in Houston, but like Isner, he took a set from Nadal in Europe on the surface the Spaniard typically dominates.

Even if the top Americans had little chance of going deep in Monte Carlo, one wonders what the additional time on the surface would do for the rest of their clay season. Most will show up for Madrid and Rome, and all of them will play Roland Garros. It’s a bit of a chicken-and-egg question–do Americans avoid the dirt because they suck on clay, or do they suck because they avoid it?–but it couldn’t hurt to play on the more traditional European surface against elite-level opponents.

The difference in rewards between a 250 like Houston and a Masters 1000 like Monte Carlo make it likely that the risk of playing in unfamiliar territory would pay off, as it did for Querrey in his one trip and for Isner two years ago. And I suspect that the rewards would stretch beyond the immediate shot at a bigger payday: If someone like Sock invested more time in developing his clay-court game now, he could become a legitimate threat at a faster clay tournament (such as the Madrid Masters) in a few years. It’s probably too late for the likes of Querrey, but the next generation of U.S. men’s stars would do well to break with tradition and give themselves more chances to excel on the dirt.

Podcast Episode 3: Champions Young and Old

In the Episode 3 of the Tennis Abstract Podcast, Carl Bialik and I discuss Francesca Schiavone and Marketa Vondrousova, the WTA titlists from last week at opposite ends of their career. We speculate on the future of Borna Coric–another one of last week’s titlists–as well.

Also on this week’s podcast: David Goffin’s steady improvement, future Davis Cup dynasties, problems with fixing the international team competition, the Italian Open’s decision not to give Schiavone a wild card, and Ryan Harrison’s surprising trip to Monte Carlo while the rest of the U.S. contingent stays at home.

Thanks for listening, and for tolerating our production “values” as we figure out the best way to do this. It will get better!

Listen here, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

 

Podcast Episode 2: Doubles, Wild Cards, and Megastars

In the second episode of the Tennis Abstract Podcast, Carl Bialik and I give some much-deserved top billing to doubles, especially new ATP No. 1 Henri Kontinen and Elo doubles favorite Jack Sock.

We also cover the role of megastars in tennis, and the benefits and challenges they offer to the sport’s promoters. As we discuss, big names may be key to expanding the appeal of doubles, and they are the one major argument for the continuing existence of wild cards–on whichever side of the Maria Sharapova debate you find yourself.

Listen here, subscribe on iTunes, or use our feed to get updates on your favorite podcast software.

 

Little Data, Big Potential

This is a guest post by Carl Bialik.

I had more data on my last 30 minutes of playing tennis than I’d gotten in my first 10 years of playing tennis  — and it just made me want so much more.

Ben Rothenberg and I had just played four supertiebreakers, after 10 minutes of warmup and before a forehand drill. And for most of that time — all but a brief break while PlaySight staff showed the WTA’s Micky Lawler the system — 10 PlaySight cameras were recording our every move and every shot: speed, spin, trajectory and whether it landed in or out. Immediately after every point, we could walk over to the kiosk right next to the net to watch video replays and get our stats. The tennis sure didn’t look professional-grade, but the stats did: spin rate, net clearance, winners, unforced errors, net points won.

Later that night, we could go online and watch and laugh with friends and family. If you’re as good as Ben and I are, laugh you will: As bad as we knew the tennis was by glancing over to Dominic Thiem and Jordan Thompson on the next practice court, it was so much worse when viewed on video, from the kind of camera angle that usually yields footage of uberfit tennis-playing pros, not uberslow tennis-writing bros.

This wasn’t the first time I’d seen video evidence of my take on tennis, an affront to aesthetes everyone. Though my first decade and a half of awkward swings and shoddy footwork went thankfully unrecorded, in the last five years I’d started to quantify my tennis self. First there was the time my friend Alex, a techie, mounted a camera on a smartphone during our match in a London park. Then in Paris a few years later, I roped him into joining me for a test of Mojjo, a PlaySight competitor that used just one camera — enough to record video later published online, with our consent and to our shame. Last year, Tennis Abstract proprietor Jeff Sackmann and I demo-ed a PlaySight court with Gordon Uehling, founder of the company.

With PlaySight and Mojjo still only in a handful of courts available to civilians, that probably puts me — and Alex, Jeff and Ben — in the top 5 or 10 percent of players at our level for access to advanced data on our games. (Jeff may object to being included in this playing level, but our USPS Tennis Abstract Head2Head suggests he belongs.) So as a member of the upper echelon of stats-aware casual players, what’s left once I’m done geeking out on the video replays and RPM stats? What actionable information is there about how I should change my game?

Little data, modest lessons

After reviewing my footage and data, I’m still looking for answers. Just a little bit of tennis data isn’t much more useful than none.

Take the serve, the most common shot in tennis. In any one set, I might hit a few dozen. But what can I learn from them? Half are to the deuce court, and half are to the ad court. And almost half of the ones that land in are second serves. Even with my limited repertoire, some are flat while others have slice. Some are out wide, some down the T and some to the body — usually, for me, a euphemism for, I missed my target.

Playsight groundstroke report

If I hit only five slice first serves out wide to the deuce court, three went in, one was unreturned and I won one of the two ensuing rallies, what the hell does that mean? It doesn’t tell me a whole lot about what would’ve happened if I’d gotten a chance to I try that serve once more that day against Ben — let alone what would happen the next time we played, when he had his own racquet, when we weren’t hitting alongside pros and in front of confused fans, with different balls on a different surface without the desert sun above us, at a different time of day when we’re in different frames of mind. And the data says even less about how that serve would have done against a different opponent.

That’s the serve, a shot I’ll hit at least once on about half of points in any match. The story’s even tougher for rarer shots, like a backhand drop half volley or a forehand crosscourt defensive lob, shots so rare they might come up once or twice every 10 matches.

More eyes on the court

It’s cool to know that my spinniest forehand had 1,010 RPM (I hit pretty flat compared to Jack Sock’s 3,337 rpm), but the real value I see is in the kind of data collected on that London court: the video. PlaySight doesn’t yet know enough about me to know that my footwork was sloppier than usual on that forehand, but I do, and it’s a good reminder to get moving quickly and take small steps. And if I were focusing on the ball and my own feet, I might have missed that Ben leans to his backhand side instead of truly split-stepping, but if I catch him on video I can use that tendency to attack his forehand side next time.

Playsight video with shot stats

Video is especially useful for players who are most focused on technique. As you might have gathered, I’m not, but I can still get tactical edge from studying patterns that PlaySight doesn’t yet identify.

Where PlaySight and its ilk could really drive breakthroughs is by combining all of the data at its disposal. The company’s software knows about only one of the thousands of hours I’ve spent playing tennis in the last five years. But it has tens of thousands of hours of tennis in its database. Even a player as idiosyncratic as me should have a doppelganger or two in a data set that big. And some of them must’ve faced an opponent like Ben. Then there are partial doppelgangers: women who serve like me even though all of our other shots are different; or juniors whose backhands resemble mine (and hopefully are being coached into a new one).  Start grouping those videos together — I’m thinking of machine learning, clustering and classifying — and you can start building a sample of some meaningful size. PlaySight is already thinking this way, looking to add features that can tell a player, say, “Your backhand percentage in matches is 11 percent below other PlaySight users of a similar age/ability,” according to Jeff Angus, marketing manager for the company, who ran the demo for Ben and me.

The hardware side of PlaySight is tricky. It needs to install cameras and kiosks, weatherproofing them when the court is outdoors, and protect them from human error and carelessness. It’s in a handful of clubs, and the number probably won’t expand much: The company is focusing more on the college game. Even when Alex and I, two players at the very center of PlaySight’s target audience among casual players, happened to book a PlaySight court recently in San Francisco, we decided it wasn’t worth the few minutes it would have taken at the kiosk to register — or, in my case, remember my password. The cameras stood watch, but the footage was forever lost.

Bigger data, big questions

I’m more excited by PlaySight’s software side. I probably will never play enough points on PlaySight courts for the company to tell me how to play better or smarter — unless I pay to install the system at my home courts. But if it gets cheaper and easier to collect decent video of my own matches — really a matter of a decent mount and protector for a smartphone and enough storage space — why couldn’t I upload my video to the company? And why couldn’t it find video of enough Bizarro Carls and Bizarro Carl opponents around the world to make a decent guess about where I should be hitting forehands?

There are bigger, deeper tennis mysteries waiting to be solved. As memorably argued by John McPhee in Levels of the Game, tennis isn’t so much as one sport as dozens of different ones, each a different level of play united only by common rules and equipment. And a match between two players even from adjacent levels in his hierarchy typically is a rout. Yet tactically my matches aren’t so different from the ones I see on TV, or even from the practice set played by Thiem and Thompson a few feet from us. Hit to the backhand, disguise your shots, attack short balls and approach the net, hit drop shots if your opponent is playing too far back. And always, make your first serve and get your returns in.

So can a tactic from one level of the game even to one much lower? I’m no Radwanska and Ben is no Cibulkova, but could our class of play share enough similarity — mathematically, is Carl : Ben :: Aga : Pome — that what works for the pros works for me? If so, then medium-sized data on my style is just a subset of big data from analogous styles at every level of the game, and I might even find out if that backhand drop half volley is a good idea. (Probably not.)

PlaySight was the prompt, but it’s not the company’s job to fulfill product features only I care about. It doesn’t have to be PlaySight. Maybe it’s Mojjo, maybe Cizr. Or maybe it’s some college student who likes tennis and is looking for a machine-learning class. Hawk-Eye, the higher-tech, higher-priced, older competitor to PlaySight, has been slow to share its data with researchers and journalists. If PlaySight has figured out that most coaches value the video and don’t care much for stats, why not release the raw footage and stats to researchers, anonymized, who might get cracking on the tennis classification question or any of a dozen other tennis analysis questions I’ve never thought to ask? (Here’s a list of some Jeff and I have brainstormed, and here are his six big ones.) I hear all the time from people who like tennis and data and want to marry the two, not for money but to practice, to learn, to discover, and to share their findings. And other than what Jeff’s made available on GitHub, there’s not much data to share. (Just the other week, an MIT grad asked for tennis data to start analyzing.)

Sharing data with outside researchers “isn’t currently in the road map for our product team, but that could change,” Angus said, if sharing data can help the company make its data “actionable” for users to improve to their games.

Maybe there aren’t enough rec players who’d want the data with enough cash to make such ventures worthwhile. But college teams could use every edge. Rising juniors have the most plastic games and the biggest upside. And where a few inches can change a pro career, surely some of the top women and men could also benefit from PlaySight-driven insights.

Yet even the multimillionaire ruling class of the sport is subject to the same limitations driven by the fractured nature of the sport: Each event has its own data and own systems. Even at Indian Wells, where Hawk-Eye exists on every match court, just two practice courts have PlaySight; the company was hoping to install four more for this year’s tournament and is still aiming to install them soon. Realistically, unless pros pay to install PlaySight on their own practice courts and play lots of practice matches there, few will get enough data to be actionable. But if PlaySight, Hawk-Eye or a rival can make sense of all the collective video out there, maybe the most tactical players can turn smarts and stats into competitive advantages on par with big serves and wicked topspin forehands.

PlaySight has already done lots of cool stuff with its tennis data, but the real analytics breakthroughs in the sport are ahead of us.

Carl Bialik has written about tennis for fivethirtyeight.com and The Wall Street Journal. He lives and plays tennis in New York City and has a Tennis Abstract page.

The Tennis Abstract Podcast

Today, Carl Bialik and I are excited to announce our new podcast, called–you guessed it–the Tennis Abstract Podcast.

In the inaugural episode, recorded Monday, we discuss the results from Indian Wells and Miami, with a special focus on the renewed relevance of Roger Federer and Rafael Nadal. We also take a wider look at the upcoming clay season for both the ATP and WTA tours.

We’re still working out all the kinks, so please forgive us for the bare bones presentation and the occasional audio glitch. If all goes well, we’ll have a new episode most weeks, maybe even with fewer glitches to apologize for.

Anyway, we both really enjoyed recording this first episode, and we hope you like it as well!

Click here to listen to Episode 1.

We’ll soon be listed on iTunes and elsewhere. In the meantime, our xml feed — http://www.tennisabstract.com/podcast/feed.xml — may come in handy for those of you who would like to subscribe.

Cool Down Tennis

This is a guest post by Carl Bialik.

Imagine you’re named boss of tennis. Right after being sworn in by Rod Laver and Martina Navratilova, you’re handed an empty wall calendar. You make the schedule for 2018. What’s your first move?

Mine would be to move Indian Wells and Miami earlier in the calendar, and the Australian Open later, after the two U.S. Masters tournaments.

I never wanted this more than while sweating my way around the Indian Wells grounds in search of shade last month. I wasn’t alone. The only full sections of the main stadium during day sessions were the ones protected from the sun. Around the fan-friendly venue, there are plenty of seats in the shade — under tents, or in Adirondack chairs that shade-seeking people push ever closer to the screen as the sun shifts. The players can only wait for shade to slowly descend on the court. Jack Sock needed a towel holding 50 ice cubes to cool down.

Sweating in the grass

 

Sure, it was unusually hot at this year’s Indian Wells tournament. But the climatological averages are clear: It’s hot in the California desert and in the Florida sunshine in March, and in the antipodean summer in January. It’d be cooler in Indian Wells, Miami and Melbourne if the two Masters events moved two months earlier and led up to the year’s first Grand Slam in March. Each of the two-week events would be, on average, 4 to 10 degrees Fahrenheit cooler each year. (The precipitation would be about the same, so Miami men’s finalist Rafael Nadal might continue to bemoan humidity, request sawdust and show more than he’d planned beneath his shorts; while women’s champ Johanna Konta might keep having to change clothes midmatch because they’ve accumulated approximately five kilograms of sweat.)

I’m using the averages because I don’t want to make too much of an unseasonably hot Indian Wells, or too little of an unusually cold March in Miami. But the averages might understate the problem because it’s precisely the outliers we’re worried about. A nudge downward of a few degrees, on average, could translate into a big drop in the probability of an unbearably hot fortnight — say, from 25 percent to 5 percent.

Changing the tennis calendar would also mean less daylight. That wouldn’t be so good for the nickname Sunshine Double, but it’d be good for tennis. Until more tennis stadiums adopt overhanging partial roofs — but for sun, not for rain — shorter days means less sun for fans to contend with and more reason to fill the seats. Plus, night tennis is exciting. The venues already have plenty of lights and evening sessions.

Scrambling the schedule would do more than cool down tennis. The three midyear majors’ proximity to each other helps the sport carry some momentum and mainstream buzz from one to the next. The Australian Open squanders all that in the four-month gap between its end and the start of the French Open. There’s even a month between the Aussie Open and the next big event.

The other three majors also get opening acts, to help players build up familiarity with the surface and for fans to build anticipation. The Australian Open gets two weeks at the start of the season — without so much as a 500 event on the men’s side.

The lack of buffer between the offseason and Melbourne also means it loses some players still recovering from the end of the previous season. That was the case this year with Juan Martin del Potro, who skipped this year’s first major after winning the Davis Cup with Argentina in November.

Imagine instead starting the season with Indian Wells and Miami — or Miami, then Indian Wells, while we’re scrambling things, for the convenience of travel from the sport’s power center of Europe — using the same courts and balls as Melbourne. Follow that month — or less, if one or both of the U.S. early-year Masters succumbs to the reality that they could be just a week — by Doha and Dubai, then Brisbane, Sydney and the like, before the main event in Melbourne at the start of March. We’d start the season with a real hard-court swing, ending with the first major.

From Australia, the tour could stay in the southern hemisphere. The swing through South America has a long history and a terrible spot on the current calendar. It was traditionally played on clay but some of its biggest events are moving to hard courts — first (North American) Acapulco, now, maybe, Rio, in search of Masters status — to the chagrin of Nadal and others. Too many players simply don’t think it’s worth it to compete on clay for a few weeks if that’s followed by a month of hard-court events. But move Indian Wells and Miami, and South American clay could move a month later in the calendar — while slightly tempering what Nadal bemoans as “too extreme” weather conditions by an average of 1 degree. The swing would give way seamlessly to Houston, Charleston and the European clay spell — which, by the way, would absorb Bucharest, Hamburg, Umag, Bastad and Gstaad from their awkward post-Wimbledon calendar slots. And no one would suggest Miami move to green clay.

We’d be left with a coherent calendar with five seasons of roughly equal length and importance, four with a major and one with the year-end finals: (1) Outdoor hard courts in the U.S., the Middle East and Oceania, followed by (2) clay in the Americas and Europe, (3) English and German grass (with Newport for those who want to visit the sport’s hall of fame), (4) North American and Asian outdoor hard courts, and (5) European indoor hard courts (absorbing the current winter events such as St. Petersburg and Rotterdam) culminating in wherever the tours’ multiplying year-end finals are calling home that year. And let’s play Davis Cup and Fed Cup at the same time — the tours acting in sync; what a concept! — on weekends at the edge of the five new seasons, giving hosts a wider range of sensible surfaces to choose from, and creating the option for combined venues if men and women from the same country are hosting the same round. (Prague in 2012 would’ve been tennis nirvana.) Or, hell, consider merging the events.

Could all this happen? Sure — if tennis power were centralized in a person or people who prioritize the overall good of the global game. Without a radical transformation of tennis, though, it’ll be slow going: It took years for the idea of lengthening the grass-court season by a week to become reality.

Carl Bialik has written about tennis for fivethirtyeight.com and The Wall Street Journal. He lives and plays tennis in New York City and has a Tennis Abstract page.

The Five Big Questions in Tennis Analytics

Italian translation at settesei.it

The fledgling field of tennis analytics can seem rather chaotic, with scores of mini-studies that don’t fit together in any obvious way. Some seem important but unfinished while others are entertaining but trivial.

Let me try to impose some structure on this project by classifying research topics into what I’ll call the Five Big Questions, each of which is really just an umbrella for hundreds more (like these). As we’ll see, there are really six categories, not five, which just goes to show: analytics is about more than just counting.

1. What’s the long-term forecast?

Beyond the realm of the next few tournaments, what does the evidence tell us about the future? This question encompasses everything from seasons to entire careers. What are the odds that Roger Federer reclaims the No. 1 ranking? How many Grand Slams will Nick Kyrgios win? How soon will Catherine Bellis crack the top ten?

The most important questions in this category are the hardest ones to answer: Given the limited data we have on junior players, what can we predict–and with what level of confidence–about their future? These are questions that national federations would love to answer, but they are far from the only stakeholders. Everyone from sponsors to tournaments to the players’ families themselves have an interest in picking future stars. Further, the better we can answer these questions, the more prepared we can be for the natural follow-ups. What can we (as families, coaches, federations, etc.) do to improve the odds that a player succeeds?

2. Who will win the next match?

The second question is also concerned with forecasting, and it is the subject that has received–by far–the most analytical attention. Not only is it fun and engaging to try to pick winners, there’s an enormous global industry with billions of dollars at stake trying to make more accurate forecasts.

As an analyst, I’m not terribly interested in picking winners for the sake of picking winners. More valuable is the quest to identify all of the factors that influence match outcomes, like the role of fatigue, or a player’s preference for certain conditions, or the specifics of a given matchup. Player rating systems fall into this category, and it’s important to remember they are only a tool for forecasting, not an end to themselves.

As a meta-question in this category, one might ask how accurate a set of forecasts could possibly become. Or, posed differently, how big of a role does chance play in match outcomes?

3. When and why does the i.i.d. model break down?

A lot of sports analysis depends on the assumption that events are “identically and independently distributed”–i.e. factors like streakiness, momentum, and clutch are either nonexistent or impossible to measure. In tennis terms, the i.i.d. model might assume that a player converts break points at the same rate that she wins all ad-court points, or that a player hold serve while serving for the set just as often as he holds serve in general.

The conventional wisdom strongly disagrees, but it is rarely consistent. (“It’s hard to serve for the set” but “this player is particularly good when leading.”) This boils down to yet another set of forecasting questions. We might know that a player wins 65% of service points, but what are her chances of winning this point, given the context?

I suspect that thorough analysis will reveal plenty of small discrepancies between reality and the i.i.d. model, especially at the level of individual players. More than with the first two topics, the limited sample sizes for many specific contexts mean we must always be careful to distinguish actual effects from noise and look for long-term trends.

4. How good is that shot?

As more tennis data becomes available in a variety of formats, the focus of tennis analytics will become more granular. The Match Charting Project offers more than 3,000 matches worth of shot-by-shot logs. Even without the details of each shot–like court position, speed, and spin–we can start measuring the effectiveness of specific players’ shots, such as Federer’s backhand.

With more granular data on every shot, analysts will be able to be even more precise. Eventually we may know the effect of adding five miles per hour to your average forehand speed, or the value of hitting a shot from just inside the baseline instead of just behind. Some academics–notably Stephanie Kovalchik–have begun digging into this sort of data, and the future of this subfield will depend a great deal on whether these datasets ever become available to the public.

5. How effective is that tactic?

Analyzing a single shot has its limits. Aside from the serve, every shot in tennis has a context–and even serves usually form part of the backdrop for other shots. Many of the most basic tactical questions have yet to be quantified, such as the success rate of approaching to the backhand instead of the forehand.

As with the previous topic, the questions about tactics get a lot more interesting–and immensely more complicated–as soon as Hawkeye-type data is available. With enough location, speed, and spin data, we’ll be able to measure the positions from which approach shots are most successful, and the type (and direction) that is most effective from each position. We could quantify the costs and benefits of running around a forehand: How good does the forehand have to be to counteract the weaker court position that results?

We can scrape the surface of this subject with the Match Charting Project, but ultimately, this territory belongs to those with camera tracking data.

6. What is the ideal structure of the sport?

Like I said, there are really just five questions. Forecasting careers, matches, and points, and quantifying shots and tactics encompass, for me, the entire range of “tennis analytics.”

However, there are plenty of tennis-related questions that we might assign to the larger field of “business of sports.” How should prize money be distributed? What is the best way to structure the tour to balance the interests of veterans and newcomers? Are there too many top-level tournaments, or too few? What the hell should we do with Davis Cup, anyway?

Many of these issues are–for now–philosophical questions that boil down to preferences and gut instincts. Controlled experiments will always be difficult if only because of the time frames involved: If we change the Davis Cup format and it loses popularity, is it causation or just correlation? We can’t replicate the experiment. But despite the challenges, these are major questions, and analysts may be able to offer valuable insights.

Now … let’s get to work.

3,000 Matches!

Italian translation at settesei.it

Last week, the Match Charting Project hit an exciting milestone: 3,000 matches!

The MCP has been logging shot-by-shot records of professional matches for about two and a half years now, and in doing so, we’ve built an open dataset unlike anything else in the tennis world. We have detailed records of at least one match from almost every player in the ATP and WTA top 200s, and extensive data on the top players of each tour. Altogether, we’ve tracked 450,000 points and over 1.7 million shots.

The research that could be conducted using this data is almost inexhaustible, and we’ve barely scraped the surface. My work on Federer’s new-and-improved backhand was just one example of what the Match Charting Project has made possible.

One of the most valuable aspects of the project last year was the addition–spearheaded by Edo–of nearly all men’s and women’s Grand Slam finals back to 1980. (We’re still missing a handful of them–if you can help us find video, we’d be very grateful!) This year, we’ve taken on another challenge: All of the head-to-heads of the ATP Big Four. Already, we’ve covered the 37 meetings of Federer and Nadal (through yesterday’s Miami final), and we’re near the 75% mark for the 216 total matches contested among these four all-time-greats.

Meanwhile, we’re continuing to add a broad range of matches almost as soon as they happen, including over 20 each from Indian Wells and Miami,  along with the occasional ITF and Challenger contest. While the data is skewed toward a handful of popular players, we’ve been careful to amass several matches for nearly every player of consequence on both tours.

If you’re interested in tennis analytics, I hope you’ll consider contributing to the project by charting matches. This data doesn’t magically collect itself, and like most volunteer-driven endeavors, a small number of contributors are responsible for a substantial percentage of the work. Even a single match is a useful addition, and the biggest risk you face is that you’ll get hooked.

Click here to find out how to get started.

Here’s to the next 3,000 matches!

Del Potro’s Draws and the Possible Persistence of Bad Luck

Italian translation at settesei.it

Tennis’s draw gods have not been kind to Juan Martin del Potro this year.

In Acapulco and Indian Wells, he drew Novak Djokovic as his second-match opponent. In Miami, Delpo got a third-rounder with Roger Federer. In each of the March Masters events, with 1,000 ranking points at stake, del Potro was handed the most difficult opponents for his first round against a fellow seed. Thanks in part to the resulting early exits, one of the most dangerous players on tour is still languishing outside of the top 30 in the ATP rankings.

When I wrote about the Indian Wells quarter of death–the section of the draw containing del Potro, Djokovic, Federer, Rafael Nadal, and Nick Kyrgios–I attempted to quantify the effect of the draw on each player’s expected ranking points. Before each player’s name was placed in the bracket, my model predicted that Delpo would earn about 150 ranking points–the weighted average of his likelihood of reaching the third round, the fourth round, and so on–and after the draw was conducted, his higher probability of a clash with Djokovic knocked that number down to just over 100. That negative effect was one of the worst of any player in the tournament.

The story in Miami is similar, if less extreme. Pre-draw, Delpo’s expected points were 183. Post draw: 155. In the four tournaments he has entered this year, he has been uniformly unlucky:

Tournament    Pre-Draw  Post-Draw  Effect  
Delray Beach      89.3       74.0  -17.1%  
Acapulco         121.5       97.1  -20.1%  
Indian Wells     154.6      102.5  -33.7%  
Miami            182.9      155.4  -15.0%  
TOTAL            548.2      429.0  -21.7%

*The numbers above for Indian Wells are slightly different than what I published in the Indian Wells article, since the simulations I ran for this post consider the entire 96-player field, not just the 64-player second round.

The good news, as we’ll see, is that it’s virtually impossible for this degree of misfortune to continue. The bad news is that those 119 points are gone forever, and at Delpo’s current position in the ranking table, that disadvantage will affect his tournament seeds, which in turn will result in worse draws (earlier meetings with higher-ranked players, independent of luck) for at least another few weeks.

Before we go any further, let me review the methodology I’m using here. (If you’re not interested, skip this paragraph.) For “post-draw” expected points, I’m taking jrank-based forecasts–like the ones on the front page of Tennis Abstract–and using each player’s probability of each round to calculate a weighted average of expected points. “Pre-draw” forecasts are much more computationally demanding. In Miami, for instance, Delpo could’ve faced any of the 64 unseeded players in the second round and been slated to meet any of the top eight seeds in the third round. For each tournament, I ran a Monte Carlo simulation with the tournament seeds, generating a new draw and simulating the tournament–100,000 times, then summing all those outcomes. So in the pre-draw forecast, Delpo had a one-eighth chance of getting Fed in the third round, a one-eighth chance of getting Kei Nishikori there, and so on.

It seems clear that a 22%, 119-point rankings hit over the course of four tournaments is some seriously bad luck. Last year, there were about 750 instances of a player being seeded at an ATP tournament, and in fewer than 60 of those, the draw resulted in an effect of -22% or worse on the player’s expected ranking points. And that’s just one tournament! The odds that Delpo would get such a rough deal in all four of his 2017 tournaments are 1 in more than 20,000.

Over the course of a full season, draw luck mostly evens out. It’s rare to see an effect of more than 10% in either direction. Last year, Thiemo de Bakker saw a painful difference of 18% between his pre-draw and post-draw expected points in 12 ATP events, but everyone else with at least that many tournaments fell between -11% and +11%, with three-quarters of players between -5% and +5%. Even when draw luck doesn’t balance itself out, the effect isn’t as bad as what Delpo has seen in 2017.

Del Potro’s own experience in 2016 is a case in point. His most memorable event of the season was the Olympics, where he drew Djokovic in the first round, so it’s easy to recall his year as being equally riddled with bad luck. But in his 12 other ATP events, the draw aided him in six–including a +34% boost at the US Open–and hurt him at the other six. Altogether, his 2016 ATP draws gave him a 5.9% advantage over his “pre-draw” expected points–a bonus of 17 ranking points. (I didn’t include the Olympics, since no ranking points were awarded there.)

Taken together, Delpo’s 2016-17 draws have deprived him of about 100 ranking points, which would move him three spots up the ranking table. So even with a short stretch of extreme misfortune, draw luck hasn’t affected him that much. Last year’s most extreme case among elite players, Richard Gasquet, suffered a similar effect: His draws knocked down his expected take by 9%, or 237 points, a difference that would bump him up from #22 to #19 in this week’s ranking list.

There are many reasons to believe that del Potro is a much better player than his current ranking suggests, such as his Elo rating, which stands at No. 7. But his ATP ranking reflects his limited schedule and modest start last year much more than it does the vagaries of each week’s brackets. The chances are near zero that he will continue to draw the toughest player in each tournament’s field in the earliest possible round, so we’ll soon have a better idea of what exactly he is capable of, and where exactly he should stand in the rankings.