Little Data, Big Potential

This is a guest post by Carl Bialik.

I had more data on my last 30 minutes of playing tennis than I’d gotten in my first 10 years of playing tennis  — and it just made me want so much more.

Ben Rothenberg and I had just played four supertiebreakers, after 10 minutes of warmup and before a forehand drill. And for most of that time — all but a brief break while PlaySight staff showed the WTA’s Micky Lawler the system — 10 PlaySight cameras were recording our every move and every shot: speed, spin, trajectory and whether it landed in or out. Immediately after every point, we could walk over to the kiosk right next to the net to watch video replays and get our stats. The tennis sure didn’t look professional-grade, but the stats did: spin rate, net clearance, winners, unforced errors, net points won.

Later that night, we could go online and watch and laugh with friends and family. If you’re as good as Ben and I are, laugh you will: As bad as we knew the tennis was by glancing over to Dominic Thiem and Jordan Thompson on the next practice court, it was so much worse when viewed on video, from the kind of camera angle that usually yields footage of uberfit tennis-playing pros, not uberslow tennis-writing bros.

This wasn’t the first time I’d seen video evidence of my take on tennis, an affront to aesthetes everyone. Though my first decade and a half of awkward swings and shoddy footwork went thankfully unrecorded, in the last five years I’d started to quantify my tennis self. First there was the time my friend Alex, a techie, mounted a camera on a smartphone during our match in a London park. Then in Paris a few years later, I roped him into joining me for a test of Mojjo, a PlaySight competitor that used just one camera — enough to record video later published online, with our consent and to our shame. Last year, Tennis Abstract proprietor Jeff Sackmann and I demo-ed a PlaySight court with Gordon Uehling, founder of the company.

With PlaySight and Mojjo still only in a handful of courts available to civilians, that probably puts me — and Alex, Jeff and Ben — in the top 5 or 10 percent of players at our level for access to advanced data on our games. (Jeff may object to being included in this playing level, but our USPS Tennis Abstract Head2Head suggests he belongs.) So as a member of the upper echelon of stats-aware casual players, what’s left once I’m done geeking out on the video replays and RPM stats? What actionable information is there about how I should change my game?

Little data, modest lessons

After reviewing my footage and data, I’m still looking for answers. Just a little bit of tennis data isn’t much more useful than none.

Take the serve, the most common shot in tennis. In any one set, I might hit a few dozen. But what can I learn from them? Half are to the deuce court, and half are to the ad court. And almost half of the ones that land in are second serves. Even with my limited repertoire, some are flat while others have slice. Some are out wide, some down the T and some to the body — usually, for me, a euphemism for, I missed my target.

Playsight groundstroke report

If I hit only five slice first serves out wide to the deuce court, three went in, one was unreturned and I won one of the two ensuing rallies, what the hell does that mean? It doesn’t tell me a whole lot about what would’ve happened if I’d gotten a chance to I try that serve once more that day against Ben — let alone what would happen the next time we played, when he had his own racquet, when we weren’t hitting alongside pros and in front of confused fans, with different balls on a different surface without the desert sun above us, at a different time of day when we’re in different frames of mind. And the data says even less about how that serve would have done against a different opponent.

That’s the serve, a shot I’ll hit at least once on about half of points in any match. The story’s even tougher for rarer shots, like a backhand drop half volley or a forehand crosscourt defensive lob, shots so rare they might come up once or twice every 10 matches.

More eyes on the court

It’s cool to know that my spinniest forehand had 1,010 RPM (I hit pretty flat compared to Jack Sock’s 3,337 rpm), but the real value I see is in the kind of data collected on that London court: the video. PlaySight doesn’t yet know enough about me to know that my footwork was sloppier than usual on that forehand, but I do, and it’s a good reminder to get moving quickly and take small steps. And if I were focusing on the ball and my own feet, I might have missed that Ben leans to his backhand side instead of truly split-stepping, but if I catch him on video I can use that tendency to attack his forehand side next time.

Playsight video with shot stats

Video is especially useful for players who are most focused on technique. As you might have gathered, I’m not, but I can still get tactical edge from studying patterns that PlaySight doesn’t yet identify.

Where PlaySight and its ilk could really drive breakthroughs is by combining all of the data at its disposal. The company’s software knows about only one of the thousands of hours I’ve spent playing tennis in the last five years. But it has tens of thousands of hours of tennis in its database. Even a player as idiosyncratic as me should have a doppelganger or two in a data set that big. And some of them must’ve faced an opponent like Ben. Then there are partial doppelgangers: women who serve like me even though all of our other shots are different; or juniors whose backhands resemble mine (and hopefully are being coached into a new one).  Start grouping those videos together — I’m thinking of machine learning, clustering and classifying — and you can start building a sample of some meaningful size. PlaySight is already thinking this way, looking to add features that can tell a player, say, “Your backhand percentage in matches is 11 percent below other PlaySight users of a similar age/ability,” according to Jeff Angus, marketing manager for the company, who ran the demo for Ben and me.

The hardware side of PlaySight is tricky. It needs to install cameras and kiosks, weatherproofing them when the court is outdoors, and protect them from human error and carelessness. It’s in a handful of clubs, and the number probably won’t expand much: The company is focusing more on the college game. Even when Alex and I, two players at the very center of PlaySight’s target audience among casual players, happened to book a PlaySight court recently in San Francisco, we decided it wasn’t worth the few minutes it would have taken at the kiosk to register — or, in my case, remember my password. The cameras stood watch, but the footage was forever lost.

Bigger data, big questions

I’m more excited by PlaySight’s software side. I probably will never play enough points on PlaySight courts for the company to tell me how to play better or smarter — unless I pay to install the system at my home courts. But if it gets cheaper and easier to collect decent video of my own matches — really a matter of a decent mount and protector for a smartphone and enough storage space — why couldn’t I upload my video to the company? And why couldn’t it find video of enough Bizarro Carls and Bizarro Carl opponents around the world to make a decent guess about where I should be hitting forehands?

There are bigger, deeper tennis mysteries waiting to be solved. As memorably argued by John McPhee in Levels of the Game, tennis isn’t so much as one sport as dozens of different ones, each a different level of play united only by common rules and equipment. And a match between two players even from adjacent levels in his hierarchy typically is a rout. Yet tactically my matches aren’t so different from the ones I see on TV, or even from the practice set played by Thiem and Thompson a few feet from us. Hit to the backhand, disguise your shots, attack short balls and approach the net, hit drop shots if your opponent is playing too far back. And always, make your first serve and get your returns in.

So can a tactic from one level of the game even to one much lower? I’m no Radwanska and Ben is no Cibulkova, but could our class of play share enough similarity — mathematically, is Carl : Ben :: Aga : Pome — that what works for the pros works for me? If so, then medium-sized data on my style is just a subset of big data from analogous styles at every level of the game, and I might even find out if that backhand drop half volley is a good idea. (Probably not.)

PlaySight was the prompt, but it’s not the company’s job to fulfill product features only I care about. It doesn’t have to be PlaySight. Maybe it’s Mojjo, maybe Cizr. Or maybe it’s some college student who likes tennis and is looking for a machine-learning class. Hawk-Eye, the higher-tech, higher-priced, older competitor to PlaySight, has been slow to share its data with researchers and journalists. If PlaySight has figured out that most coaches value the video and don’t care much for stats, why not release the raw footage and stats to researchers, anonymized, who might get cracking on the tennis classification question or any of a dozen other tennis analysis questions I’ve never thought to ask? (Here’s a list of some Jeff and I have brainstormed, and here are his six big ones.) I hear all the time from people who like tennis and data and want to marry the two, not for money but to practice, to learn, to discover, and to share their findings. And other than what Jeff’s made available on GitHub, there’s not much data to share. (Just the other week, an MIT grad asked for tennis data to start analyzing.)

Sharing data with outside researchers “isn’t currently in the road map for our product team, but that could change,” Angus said, if sharing data can help the company make its data “actionable” for users to improve to their games.

Maybe there aren’t enough rec players who’d want the data with enough cash to make such ventures worthwhile. But college teams could use every edge. Rising juniors have the most plastic games and the biggest upside. And where a few inches can change a pro career, surely some of the top women and men could also benefit from PlaySight-driven insights.

Yet even the multimillionaire ruling class of the sport is subject to the same limitations driven by the fractured nature of the sport: Each event has its own data and own systems. Even at Indian Wells, where Hawk-Eye exists on every match court, just two practice courts have PlaySight; the company was hoping to install four more for this year’s tournament and is still aiming to install them soon. Realistically, unless pros pay to install PlaySight on their own practice courts and play lots of practice matches there, few will get enough data to be actionable. But if PlaySight, Hawk-Eye or a rival can make sense of all the collective video out there, maybe the most tactical players can turn smarts and stats into competitive advantages on par with big serves and wicked topspin forehands.

PlaySight has already done lots of cool stuff with its tennis data, but the real analytics breakthroughs in the sport are ahead of us.

Carl Bialik has written about tennis for fivethirtyeight.com and The Wall Street Journal. He lives and plays tennis in New York City and has a Tennis Abstract page.

Cool Down Tennis

This is a guest post by Carl Bialik.

Imagine you’re named boss of tennis. Right after being sworn in by Rod Laver and Martina Navratilova, you’re handed an empty wall calendar. You make the schedule for 2018. What’s your first move?

Mine would be to move Indian Wells and Miami earlier in the calendar, and the Australian Open later, after the two U.S. Masters tournaments.

I never wanted this more than while sweating my way around the Indian Wells grounds in search of shade last month. I wasn’t alone. The only full sections of the main stadium during day sessions were the ones protected from the sun. Around the fan-friendly venue, there are plenty of seats in the shade — under tents, or in Adirondack chairs that shade-seeking people push ever closer to the screen as the sun shifts. The players can only wait for shade to slowly descend on the court. Jack Sock needed a towel holding 50 ice cubes to cool down.

Sweating in the grass

 

Sure, it was unusually hot at this year’s Indian Wells tournament. But the climatological averages are clear: It’s hot in the California desert and in the Florida sunshine in March, and in the antipodean summer in January. It’d be cooler in Indian Wells, Miami and Melbourne if the two Masters events moved two months earlier and led up to the year’s first Grand Slam in March. Each of the two-week events would be, on average, 4 to 10 degrees Fahrenheit cooler each year. (The precipitation would be about the same, so Miami men’s finalist Rafael Nadal might continue to bemoan humidity, request sawdust and show more than he’d planned beneath his shorts; while women’s champ Johanna Konta might keep having to change clothes midmatch because they’ve accumulated approximately five kilograms of sweat.)

I’m using the averages because I don’t want to make too much of an unseasonably hot Indian Wells, or too little of an unusually cold March in Miami. But the averages might understate the problem because it’s precisely the outliers we’re worried about. A nudge downward of a few degrees, on average, could translate into a big drop in the probability of an unbearably hot fortnight — say, from 25 percent to 5 percent.

Changing the tennis calendar would also mean less daylight. That wouldn’t be so good for the nickname Sunshine Double, but it’d be good for tennis. Until more tennis stadiums adopt overhanging partial roofs — but for sun, not for rain — shorter days means less sun for fans to contend with and more reason to fill the seats. Plus, night tennis is exciting. The venues already have plenty of lights and evening sessions.

Scrambling the schedule would do more than cool down tennis. The three midyear majors’ proximity to each other helps the sport carry some momentum and mainstream buzz from one to the next. The Australian Open squanders all that in the four-month gap between its end and the start of the French Open. There’s even a month between the Aussie Open and the next big event.

The other three majors also get opening acts, to help players build up familiarity with the surface and for fans to build anticipation. The Australian Open gets two weeks at the start of the season — without so much as a 500 event on the men’s side.

The lack of buffer between the offseason and Melbourne also means it loses some players still recovering from the end of the previous season. That was the case this year with Juan Martin del Potro, who skipped this year’s first major after winning the Davis Cup with Argentina in November.

Imagine instead starting the season with Indian Wells and Miami — or Miami, then Indian Wells, while we’re scrambling things, for the convenience of travel from the sport’s power center of Europe — using the same courts and balls as Melbourne. Follow that month — or less, if one or both of the U.S. early-year Masters succumbs to the reality that they could be just a week — by Doha and Dubai, then Brisbane, Sydney and the like, before the main event in Melbourne at the start of March. We’d start the season with a real hard-court swing, ending with the first major.

From Australia, the tour could stay in the southern hemisphere. The swing through South America has a long history and a terrible spot on the current calendar. It was traditionally played on clay but some of its biggest events are moving to hard courts — first (North American) Acapulco, now, maybe, Rio, in search of Masters status — to the chagrin of Nadal and others. Too many players simply don’t think it’s worth it to compete on clay for a few weeks if that’s followed by a month of hard-court events. But move Indian Wells and Miami, and South American clay could move a month later in the calendar — while slightly tempering what Nadal bemoans as “too extreme” weather conditions by an average of 1 degree. The swing would give way seamlessly to Houston, Charleston and the European clay spell — which, by the way, would absorb Bucharest, Hamburg, Umag, Bastad and Gstaad from their awkward post-Wimbledon calendar slots. And no one would suggest Miami move to green clay.

We’d be left with a coherent calendar with five seasons of roughly equal length and importance, four with a major and one with the year-end finals: (1) Outdoor hard courts in the U.S., the Middle East and Oceania, followed by (2) clay in the Americas and Europe, (3) English and German grass (with Newport for those who want to visit the sport’s hall of fame), (4) North American and Asian outdoor hard courts, and (5) European indoor hard courts (absorbing the current winter events such as St. Petersburg and Rotterdam) culminating in wherever the tours’ multiplying year-end finals are calling home that year. And let’s play Davis Cup and Fed Cup at the same time — the tours acting in sync; what a concept! — on weekends at the edge of the five new seasons, giving hosts a wider range of sensible surfaces to choose from, and creating the option for combined venues if men and women from the same country are hosting the same round. (Prague in 2012 would’ve been tennis nirvana.) Or, hell, consider merging the events.

Could all this happen? Sure — if tennis power were centralized in a person or people who prioritize the overall good of the global game. Without a radical transformation of tennis, though, it’ll be slow going: It took years for the idea of lengthening the grass-court season by a week to become reality.

Carl Bialik has written about tennis for fivethirtyeight.com and The Wall Street Journal. He lives and plays tennis in New York City and has a Tennis Abstract page.