Baker vs Hewitt Return Profile: In Extreme Detail

Here’s the trouble with jotting down the details of every single shot in a tennis match: When you’re done, you have details about every single shot in the tennis match.

If you saw my post yesterday presenting serve profiles for Federer and Zemlja, you already have some idea of what I’m talking about.  When you can chop up each player’s performance a thousand different ways, it seems like a waste to ignore any possibility.

Here we go again.

I charted tonight’s match between Brian Baker and Lleyton Hewitt, two of the more electric baseliners in today’s game.  Hewitt doesn’t have much of a serve, and while Baker can crush his share of aces, he’s rarely consistent enough to shut down his opponent’s return game.

Here’s all the data I could think to generate regarding their return games tonight.

(Seriously, click the link.  I’m only writing this post as an excuse to show off what’s on the other side of that link.)

Here are some tidbits of interest I’ve noted from the data:

  • Hewitt is remarkably consistent, winning about the same number of return points in the deuce and ad courts, and against all types of serves except for those down the T.  (As we saw yesterday, Federer got almost all of his aces down the T, and that is probably true for most players.  Thus, returners will look weak in that category.)
  • Baker didn’t take much advantage of shallow returns.  Hewitt won more than half of the points in which he failed to get the return past Baker’s service line.
  • While Baker did a better job of hitting deep returns (80% past the service line), he wasn’t nearly as successful (winning only 29% of points) when his returns fell in the service box.  That’s probably a credit to Hewitt more than a knock on Baker.
  • Neither player sliced or chipped returns unless they absolutely had to.  Baker sliced less than 10% of his returns, and Hewitt barely 5% of his.
  • Baker loves his down-the-line backhand.  His five down-the-line return winners accounted for half of his total return winners, and they also represent half of his down-the-line returns.

Go look at the tables, let your eyes adjust for a minute, and then tell me if you find anything else interesting.

Duval’s Triumph, Isner’s Breaks, Flushing’s Favorites

17-year-old Victoria Duval, she of six career tour-level matches, upset 2011 champion and 11th seed Samantha Stosur last night.  Leave it at that, and it sounds pretty impressive.

But it doesn’t quite convey how impressive the youngster’s path to the second round has been.  Duval is ranked just inside the top 300–not high enough to get into the qualifying tournament on that basis.  Armed with a wild card, she beat three players, each with considerably more experience than she has.

Reaching a Grand Slam main draw as a qualifying wild card is notable in and of itself.  The only one of this year’s nine qualie WCs to reach the main draw, she’s only the 16th woman to do so at the US Open since 1998 and only the 31st woman to do so at any Slam in that time frame.

As we now know, she didn’t stop there, and that sets her further apart.  Of the 30 women who previous accomplished the feat, only 11 went on to win a match in the main draw.  (Only one of those, Great Britain’s Karen Cross, at Wimbledon in 1997, won two main draw matches.)  And only one of those ladies–Yulia Fedossova, who qualified for the US Open in 2006–beat a seed.  Her victim was the much less imposing 25th seed, Anabel Medina Garrigues.

Every slam has its share of upsets, but this one goes far beyond that.  By beating a former champion and highly-seeded player, Duval did something no woman had done before.

Yesterday was a good day for American men, who went 5-2.  The only victims were Steve Johnson, who struggled with injuries, and junior champion Collin Altamirano, who no one could’ve expected would give Philipp Kohlschreiber much of a fight.

More notable than the simple fact of winning was the manner in which two US men did so.  In the battle of oppositesJohn Isner defeated Filippo Volandri, 6-0 6-2 6-3, and Donald Young knocked out Martin Klizan, 6-1 6-0 6-1.

Isner set all kinds of personal records in the process.  Not known for his return game–to put it mildly–Isner had never won a bagel set on hard courts.  In fact, until beating Adrian Mannarino in Newport last month, he had never won a set of professional tennis 6-0.

Next, also because of that not-so-pesky return game, Isner tends to lose quite a few games, even when he’s winning.  In best-of-five matches, he had never before won a match without dropping at least nine games.  (That was at the French in 2010, when he beat Andrey Golubev.)  Today he won while giving up only five.

Finally, to reach such a scoreline, Isner broke serve a total of six times.  That’s something else he’s never done before.  He’s broken five times on a handful of occasions, but never six, unless he did so in Davis Cup, for which stats are more difficult to come by.

Still, it seems likely that Klizan played worse than Volandri did.  As you might imagine, Klizan has never lost quite so comprehensively, though he did turn in a similarly abysmal performance in New York three years ago, when he lost to Juan Carlos Ferrero, 6-1 6-3 6-0.

For Young, it was only his third straight-set victory at a slam, regardless of lopsidedness.  And it was only the third time he won a Grand Slam match having earned his way into the main draw.  His other five wins–all at the US Open–came as a wild card.

Since we’re talking about all these Americans winning in New York, it seems like a great time to point you toward Colin Davy’s recent effort to quantify home-court advantage in tennis.  He finds that home-country players–both men and women–have a slight advantage that cannot be explained by other factors, to the tune of about 2%.

In building jrank, I’ve done some work along the same lines, and arrived at a similar number.  (On my old blog, I posted some very crude attempts, not controlling for things like surface, and claimed a much bigger effect.  I don’t think I’ve published the details of my more recent efforts.)

As Colin notes, it’s a small effect compared to other sports. (Isner’s love of the USA notwithstanding.)  To the extent home-court advantage in tennis stems from officiating bias–a common cause in other sports–the increasing use of Hawkeye would seem to lessen the effect.  And oddly, the practice of putting local players on main courts would turn out to be counterproductive.  By putting locals on Hawkeye courts, you’re taking away at least one slight advantage.

Colin also suggests comparing different stages of the tournament, which may reveal that umpires have a greater or lesser bias as the stakes get higher.  That test occurred to me for a different reason.  Travel-related fatigue is a major factor (again, something Colin acknowledges), but it is one that would likely lessen as the tournament goes on.  A player might still be jetlagged for his first-round match, but if he wins a couple of rounds, that effect is likely gone.

It’s an interesting field of study, one that is particularly tricky to separate from others–such as travel effects, surface preferences, venue familiarity, and so on.  As is so often the case in tennis, it is a topic that has been extensively hashed out for other sports, yet barely researched in ours.

In case you missed it yesterday afternoon, I tracked every point of the Federer-Zemlja match, and came up with some very detailed serve breakdowns for each player.  Check it out.

Federer vs Zemlja Serve Profile: In Extreme Detail

As I wrote last week, tennis needs more detailed statistics.  Most of all, we need them in an open format so that researchers can utilize all the data stored for every match.  No use in have Hawkeye cameras on every court if the data stays locked up.

I’m working on a system for charting matches and storing extremely detailed serve and shot information.  It will have to stay under wraps until I get a few more kinks worked out, but in the meantime, I want to show off some of what it can do.

Click here for more exhaustive serve data than you’ve probably ever seen before.

Today’s match wasn’t the most gripping that Roger Federer (or Grega Zemlja) ever played, but there’s still plenty of interesting stuff:

  • Roger won 85% of first-serve points. No surprised there.  More impressively, he won 60% of his first-serve points on or before his second shot.  (That’s “<=3W” in the tables.)
  • Fed went down the T with just under half his first serves (47%), but up-the-middle offerings accounted for 11 of his 12 aces.
  • Zemlja hit a shocking 27 serves into the net–almost half of his faults, and just over 20% of all of the serves he hit today.  (Watching the match, it felt like even more.)
  • Roger’s first serves were somewhat more dominant in the deuce court, as he lost only three first-serve points in that half, and won two-thirds of his first-serve points in the deuce court by his second shot.  In the small amount of data on offer today, he was noticeably weaker with his deuce court second serve, losing 5 of 12 second-serve points in that direction, compared to only 3 of 18 second-serve points to the ad court.
  • Zemlja fared better serving to the ad court today (64% of service points won to 56% in the deuce court), and was particularly deadly when he landed a serve wide in the ad court.  He won seven of the eight points that started that way, five of them with or before his second shot.

(If you didn’t click on the link the first time you saw it, now would be a good time.)

You get the idea, I hope.  With this much data, the sifting is as important as the collecting.  There are hundreds of data points we can generate just from tracking each player’s serve performance, and we can expect that most of them won’t have much to tell us.

And, of course, one match is just that–a small sample, fewer than 100 service points for each player.  While we can look at these tables and gain some insight into exactly how Roger was dominant today, it would be a mistake to draw much in the way of broader conclusions.

For that, we’ll need more matches, more data.  We’ll get there.

Contrasting Serves, Futile Slams, and (More) IBM Shortcomings

In most of his matches, John Isner makes his opponents look short and their serves look weak.  What happens, then, when his opponent really is short, with one of the weakest serves in the game?

Third up on grandstand today, Isner takes on Filippo Volandri, the man who sets records Isner will never reach.  Three years ago, the Italian failed to hit a single ace for 19 straight matches.  Volandri may not be as short as some players on tour–the ATP site lists him at six feet–but it’s more common for him to fail to hit an ace in a match than it is for him to hit one.

In the last year, Isner has hit nearly 19% of his first serves for aces, good for best among tour regulars.  In the top 50, the other extreme is represented by Nikolay Davydenko, whose rate is just under 3%.  Volandri–despite playing many weaker opponents on the Challenger tour–sits at 0.8%.

The good news for Big John is that the 31-year-old Volandri is a nonentity on hard courts, having not played on the surface since losing in the first round of the Australian. The bad news? He’ll have to hit a lot of returns today.

As my forecast very delicately predicted, Fernando Verdasco didn’t live up to his seed, losing to the barely-unseeded Ivan Dodig yesterday in five sets.  That’s the fourth slam this year in which he’s lost in a five-setter.

Verdasco, with his flashy talent and underwhelming results, comes in for his share of fan mockery.  But this is one time he doesn’t deserve it.  Out of the several dozen players who enter all four slams each year, almost all will lose four matches.  While it may be frustrating to lose in five, losing in five, all else equal, says better things about your game than losing in three.

One of those five-set losses this year was to Andy Murray at Wimbledon; the other two previous contests were against Janko Tipsarevic and Kevin Anderson.  Perhaps Fernando should have finished off at least one of those matches, but none of his four slam losses this year are nearly as groan-inducing as, say, Ernests Gulbis‘s disaster yesterday against Andreas Haider-Maurer.  And his record is nothing compared to Marinko Matosevic‘s streak of 11 losses in 11 slam appearances.

Verdasco is the sixth man in the Open era to complete this distinctive slam feat, and he’s not in bad company. Last year, Isner did it–and added an exclamation point with a five-set loss in Davis Cup.  Before that, the most recent were Fernando Gonzalez in 2006 and Tim Henman in 2000.  Not bad company.

Anyway, if you’re drawn to this unusual feat, don’t miss Steve Johnson‘s first-round match with Tobias Kamke. It’s last on Court 13 today. Johnson is three-quarters of the way to the Fernando slam, losing all three of his matches at majors this year in five sets.  If he completes the set, it will be particularly impressive for at least one man: Kamke has won only two five-setters in his career.

As part of IBM’s ham-handed PR push leading up to another slam, the company gave analyst and coach Craig O’Shannessy some data.  He reported some results on both the ATP site and the New York Times Straight Sets blog.

This is a huge step up from the thinly-veiled advertisement I highlighted yesterday.  But it still, frustratingly, falls short.

One of the major points of Craig’s ATP piece is summarized at the beginning: “Most baseline points are a losing proposition,” and “Approaching the net is a goldmine.”  Later, he continues, “It seems amazing that players don’t venture forward more often to capitalize on the far higher winning percentage approaching offers over baseline play.”

Is this the data-driven, actionable advice I pleaded for last week? Not quite.

As I’m sure Craig would agree, opportunities to come to net aren’t always available, and they don’t arise in a vacuum.  Especially in today’s baseline-focused game, net points tend to occur when one player hits a particularly weak shot.  So if most net points end in victory for the player who approaches, is that because of the choice to come to net, or the weak shot that generated that opportunity?

Think about it probabilistically.  When Djokovic serves against Tsonga, let’s say he has a 75% chance of winning a first serve point.  If Tsonga hits a weak chip return in the middle of the court, allowing Novak to take several steps forward, we could figure that Djokovic’s chance of winning the point increases to 95%–perhaps higher.  When Novak puts away his second shot, he wins the point.  Formally speaking, his chance of winning jumps to 100%.

Now, in that example, what do you credit as the reason for Djokovic winning the point?  Landing a solid first serve, which gives him a 75% chance of winning instead of, say, 60%? A particularly good first serve, which forced the weak return?  Tsonga’s poor return? Or Novak’s “choice” to approach the net?

That final choice is laughable.  And this is the data he’s drawing from.  Aside from a few particularly aggressive players on tour, that’s the profile of a net point in 2013.

So, what’s the actionable advice here?  You probably shouldn’t approach the net without a reasonable opening, so … hit bigger serves to get more weak returns? Hit deep groundstrokes into corners? Take advantage of short balls?

These are the benefits we reap from “Big Data?”

IBM clearly wants to wow us with this stuff.  Yet the “findings” are so elementary as to be useless.  The solution is so simple: release the data, let fans and analysts innovate, and watch the quality of this work go through the roof.

Dodig’s Consistency, IBM’s Offensive, and Hopeless Wild Cards

Ivan Dodig just missed out on a seeding at this year’s US Open.  Ranked 37th when seeds were assigned, he had ascended as high as #35, largely on the strength of his fourth-round showing at Wimbledon.

While the Croatian could have drawn any seed as early as the first round, he got lucky, pulling 27th-seeded Fernando Verdasco.  My forecast underlines his fortune, giving him a 51% chance to advance to the round of 64, then roughly even odds again to make the round of 32 against (probably) Nikolay Davydenko–another player who fell just outside the seed cut.

Making the Dodig-Verdasco comparison more interesting is that in the last 52 weeks, the unseeded player has won more matches (38 to 29) with a higher winning percentage (58% to 56%).  What the Spaniard has done, however, is bunch his wins much more effectively than his first round opponent.  While Dodig achieved a career highlight with his R16 showing in London, Verdasco made the quarters.  Fernando reached the final in Bastad, and earlier in the year, won two matches at the Madrid Masters.

A telling comparison is that while Dodig has lost five opening-round matches in the last year, Verdasco has lost nine.  As Carl Bialik explained two years ago, consistency isn’t such a great thing in tennis.  Certainly, the ATP rankings–and the seedings that utilize them–prefer inconsistency.

You know there’s a Grand Slam in the offing when the PR pieces from IBM start to appear.  Last week, a particularly bald-faced plant showed up in the New York Times, a publication that–one fervently hopes–should know better.

This particular piece includes such hard-hitting journalism as, “The keys are updated during matches to track any shift in momentum, and they correlate well with the final outcome,” and “These extra features are likely to drive traffic to the event’s Web site, USOpen.org, and its various mobile versions. ”

The Times should be embarrassed.  What makes this particularly frustrating to the statistically-oriented fan is that while IBM speaks the right language, the results of this effort to “fulfill fans’ desire for deeper knowledge” are so disappointing.

The much-vaunted Keys to the Match are frequently arbitrary, often bizarre.  In Kei Nishikori‘s second-round match at Wimbledon, one of his “Keys” was to “Win between 71 and 89 of winners on the forehand side.”  He didn’t do that–whatever it means, exactly. He didn’t meet the goals set by his two other Keys, either, yet he won the match in straight sets.

Most frustrating to those of us who want actual analysis, the underlying data–to the extent it is available at all–is buried almost beyond the possibility of a fan’s use.  IBM–like Hawkeye–is collecting so much data, yet doing so little with it.

Lots of fans do desire more statistical insight. Much more. The raw material is increasingly collected, yet the deeper knowledge remains elusive.

Stay with me as I leap from one hobby-horse to another.

Wild cards cropped up as a topic of conversation last weekend, largely thanks to Lindsay Gibbs’s piece for Sports on Earth, in which Jose Higueras said, “If it was up to me, there would be no wild cards. Wild cards create entitlement for the kids. I think you should be in the draw if you actually are good enough to get in the draw.”

I don’t object to wild cards used as rewards, like the one that goes to the USTA Boys’ 18s champion, or the ones that the USTA awards based on Challenger performance in a set series of events.  There’s even a place for WCs as a way to get former greats into the draw. James Blake shouldn’t have gotten the deluge of free passes that he has received in the last few years, but it’s probably good for the sport to have him in more top-level events than he strictly deserves.

The problem stems from all the other wild cards, and not just from a player development perspective.  Are fans going to get that much enjoyment out of one or two matches from the likes of Rhyne Williams and Ryan Harrison, Americans who didn’t have a high enough ranking to make the cut?  Of the fourteen Americans in the men’s main draw, six were wild cards, and it would shock no one if those six guys failed to win a single match.

There are further effects, as well.  By exempting Williams, Harrison, Tim Smyczek, and Brian Baker from the qualifying tournament, fans seeking quality American tennis last week barely got to see any.  Donald Young–who has received far too many wild cards himself–was the only American to qualify, largely because the US players at the same level as the other would-be qualifiers didn’t have to compete.  The remaining Americans were in over their heads.

This leads me to a great alternative suggested by Juan José Vallejo on Twitter: Be liberal with free passes in qualifying, and take the opportunity to promote those early rounds much more.  At the Citi Open a few weeks ago, the crowds on Saturday and Sunday for qualifying were comparable to those Monday and Tuesday.  Because qualifying often falls on the weekend, the crowds are there.  But if they want to see Jack Sock play, they’ve got to come back Tuesday night (and spend a lot more money), and they’re much more likely to see him overmatched by a better, more experienced player.

Cut the entitlement, improve the quality of main draw play, and give the fans more chances to watch up-and-coming stars.  I wish there was a chance this would happen.

Halep’s Draw, Serena’s H2Hs, American Advancement

When the US Open Women’s draw was released on Friday, things looked awfully bright for Caroline Wozniacki.  With Maria Sharapova‘s withdrawal, Sara Errani became the #4 seed, meaning that one spot in the semis belonged to Errani–or, more likely, someone who knocked her off along the way.

But Wozniacki is no lock herself.  11 of her last 12 losses have come to players outside the top 20.  She’ll have to do much better than that to take advantage of her position in the Errani quarter.

To find a dark horse for that semifinal spot, look no further than Wozniacki’s latest conqueror, Simona Halep.  Halep crushed Petra Kvitova yesterday in New Haven, marking her fourth title of the year on three (!) different surfaces.  In her last 38 matches, the only player to beat her in straight sets has been Serena Williams.

Halep’s path to the semifinal goes starts with Heather Watson and either Donna Vekic or Mariana Duque Marino, then a possible third-rounder with Maria Kirilenko, whom she has never played.  Errani would be her fourth-round opponent if she lives up to her seeding, though that section is completely up for grabs. Wozniacki–who Halep beat on Friday in straight sets–is the presumptive quarterfinalist.

Strangely enough, Halep is one of the few players in the draw with a reason to fear Errani on hard courts.  In Miami this year, the Italian routed her 6-1 6-0.

Yesterday, when Serena Williams was asked about her rivalry with Victoria Azarenka, she said, “I think the head-to-head is close.”  It’s not: Serena has won 12 of their 15 meetings.  While Vika has won two of the last three–including each of the last two on hard courts–the American won the ten before that.

Given Serena’s dominance over the rest of the WTA, one might reasonably ask whether an 80% winning percentage actually does constitute “close” for the world #1.  Sure enough, there are few players who have topped that.

In her career, Serena has faced 42 different opponents at least five times.  Only 13 of those have won one-quarter or more of their meetings, and only five of those remain active.  To go even further, three of those five–Venus Williams, Nadia Petrova, and Francesca Schiavone–no longer figure to threaten Serena at all.

The remaining two players are Jelena Jankovic (4 wins in 10 meetings) and Samantha Stosur (3 wins in 9 meetings).  Jankovic wouldn’t face Serena until the semifinals, and Stosur until the finals, even in the unlikely event either player made it that far.

Of course, there are good players who have met Serena fewer than five times, including her possible fourth-round opponent, Sloane Stephens.  Of the 108 active players who have ever faced Williams, Sloane is one of only five who have won at least half of their meetings with her.

The three US women who qualified for the main draw pushed the total number of Americans on the women’s side to 19, the highest number since 2006.  Between those qualifiers and a few long-shot wild cards, most of the 19 will be gone a week from now.  But even accounting for plenty of attrition, the American force could continue to shine brighter than they have for nearly a decade.

Based on my draw forecast (which is in turn based on WTA rankings), we should expect to see between eight and nine US women in the second round.  Eight wouldn’t be terribly impressive–that mark was reached in both 2009 and 2011, but nine would represent a step forward, however incremental.  The last time nine or more American women reached the second round was when ten did so in 2005–and that accomplishment required 23 US players in the main draw.

My forecasts predict about four American women in the third round–equal to last year’s mark, and one short of 2011’s.  But if the home favorites can score a couple of upsets and get six women into the round of 32, it would be the first time since 2004, when eight US women made it that far.

If the American women do make a strong showing, there’s an added bonus: It might help us ignore the plight of the American men.

Harrison’s Luck, Karlovic’s Danger, and Vesely’s Prep

When Ryan Harrison drew Rafael Nadal in the first round of the US Open, the reaction in the twitterverse was instantaneous and unanimous. A guy with horrible luck in Grand Slam draws just saw his luck get even worse.

Certainly, drawing one of the big four (or big seven?) means an almost guaranteed early exit.  Harrison could’ve drawn a seed ranked much-lower, or better yet, one of the many anonymous characters required to fill up the 128-man field.

But has Ryan’s luck really been that bad?  In his previous twelve Slam appearances, Harrison has drawn a seed six times in the first time.  (An unseeded player has a one-in-three chance of pulling a seed, so he “should” have faced four seeds instead.)  Only one of those was a member of the big four–Andy Murray at the 2012 Australian–and two of them have been seeded 27th or worse.

The real complaint for Harrison’s supporters has been his second round draws.  In Melbourne this year and Wimbledon last year, he faced Novak Djokovic in the second round. One year ago in Flushing, his R64 opponent was Juan Martin del Potro.

Alright–that’s pretty bad luck.  But keep in mind that any unseeded player is very likely to face a seed in one of the first two rounds.  Harrison lucked into a slightly fortunate draw at Roland Garros this year, drawing Andrey Kuznetsov in the first then 19th-seeded John Isner in the second.

And of course, lucky or unlucky, there’s the question of whether Harrison is likely to beat anyone at a Grand Slam right now.  Ranked 97th, he’s one of the weakest players in the draw.  Given a luckier draw, there still wouldn’t be much hope that he would take advantage.

Yesterday Ivo Karlovic qualified for the US Open main draw, and again the twitterverse responded unanimously.  To paraphrase everyone: “He’s a dangerous floater. No one wants to see him in the first round.”

I can’t speak to the psychological preferences of players, so maybe that’s right–maybe no one wants to see him in their section. But at this point in his career, there’s little reason to fear Dr. Ivo.

In fact, I wrote about this specific issue almost two years ago: “Karlovic has shown himself far less likely than the average player to perform above or below his ranking.”

Aside from a victory over Kevin Anderson in the thin air of Bogota and two wins by retirement, the highest-ranked player Karlovic has beaten in the last year was (then) #40 Grigor Dimitrov in Zagreb–indoors. He hasn’t scored a complete-match win against a top-20 player since he played Kei Nishikori in Davis Cup 18 months ago.

It’s true, Karlovic has a very good chance of advancing past James Blake in his first main draw match.  But that says more about the 33-year-old Blake than it does about Ivo.

Diego Sebastian Schwartzman was so close to qualifying.  In yesterday’s final round, he took the first set from Albano Olivetti.  He saved a break in the third, went up a break for 4-2, but couldn’t close it out.

It would’ve been a remarkable achivement for the newly-minted 21-year-old.  He has built his ranking up to 131 entirely on the back of clay-court challengers.  In fact, he had played only eight career hard-court matches before this week, winning just two–both against fellow clay specialists in Melbourne qualifying this year.

For all that, Schwartzman would not have been the main draw contender with the least hard-court preparation this year!  That honor goes to Jiri Vesely, the 20-year-old Czech, who has not played a hard-court match since the Sarajevo (ice-rink) Challenger in March.

These two youngters’ routes to success reveal an interesting quirk of the ATP schedule.  While clay-court events are a distinct minority at tour level, they make up a slight majority among Challengers.  Furthermore, it is easier to fill out a minor-league schedule with clay events because of the dearth of hard-court options in April and May.  For instance, in the ten-week span this year from 22 April to 1 July, there were only four hard-court challengers–in Johannesburg, Kun-Ming, Karshi, and Busan.

For his part, Vesely has had an outstanding season.  In March, he was ranked outside the top 200.  After three Challenger titles (and two more finals, with losses to Radek Stepanek and Florian Mayer), he sits comfortably inside the top 100, with no need to qualify in New York.

Despite his scheduling choices, Vesely isn’t hopeless on hard courts.  Two years ago, he reached the final in the US Open junior tournament and won in Melbourne.

For his first match on the surface in months, the youngster got a manageable first-round opponent in Denis Kudla.  The winner of that battle of counterpunching youngsters will likely go no further, thanks to a second-round date with Tomas Berdych.

Finally, my draw forecasts are up for both singles main draws. Men are here, and women are here.  With a little luck, they’ll update hourly throughout the tournament.

Five First-Round Men’s Qualifying Matches to Watch at the US Open

Why wait until next week to get excited about the US Open?  Qualifying rounds start tomorrow, and there is a ton of action all over the grounds as 128 men and 128 women fight for 16 spots in each main draw.  There’s more cash on the line than ever, so you can count on some very hard-fought contests for the right to stick around into next week.

1. Ivo Karlovic vs Mackenzie McDonald

You know Ivo.  Two weeks ago, you almost certainly didn’t know McDonald.  The UCLA commit’s pedestrian junior career didn’t prepare anyone for his victories over Nicolas Mahut and Steve Johnson in Cincinnati qualifying last week.  That’s right: The unranked 18-year-old made the main draw of last week’s Masters 1000 event, and the cannon-serving veteran did not.

I saw much of McDonald’s match against Johnson.  To the extent you can be a believer in a pint-sized player without any weapons, count me in.  He fought Johnson hard on every point, waiting until the older player made a mistake. That won’t work against most tour-level players, but it might do the trick against the Croatian.

They are third up on Court 11 today.

2. Jesse Huta Galung vs Florent Serra

Two years ago, Huta Galung qualified in Flushing and took a set from James Blake in the first round of main draw play.  It was something of a career highlight for the Dutchman, who has only won four main draw matches in his tour-level career.

Yet this year, he returns to New York on a tear.  He has a 29-7 record in Challengers this year, including wins in Cherbourg (as a 346th-ranked lucky loser), St. Brieuc, Scheveningen, and Tampere, along with a final in Meerbusch last week.  He broke into the top 100 for the first time with this week’s rankings, and he has almost no points to defend until Cherbourg comes along again at the end of next February.

I’ve long loved Huta Galung’s game–he’s a stylish player with plenty of variety who can move particularly well.  Even in a losing effort, he is enjoyable to watch.

His opener would have been on this list regardless of opponent, but Serra has the ability to turn this into one of the better matches of qualifying week–certainly one of the tougher tilts in the first round.  The 32-year-old is unlikely to recover the form that took him into the top 40 seven years ago, but remains a threat at the challenger level.

Look for this match on Wednesday’s schedule.

3. Evgeny Korolev vs Illya Marchenko

In contrast to the previous match, stylishness isn’t the word that comes to mind here.  Korolev is not just a slugger; he’s a ball-basher who has lost his way.  He broke into the top 100 as an 18-year-old, peaking inside the top 50, and had a double-digit ranking as recently as three years ago.  At the age of 25, he should be heading toward a new peak, but instead is languishing in Challengers, losing to … well, just about everybody.

Injuries have repeatedly derailed his progress, and since he has retired in two of his last three matches, it wouldn’t shock anyone if he didn’t complete this match, either.  But on a good day, he has an uncanny ability to smack groundstrokes to within inches of the baseline.  Though it it’s never pretty, I’m always impressed.

Marchenko has a more well-rounded game, and despite never cracking the top 60, has the physical potential to return to that range.  His qualifying match against Christian Harrison in Washington a few weeks ago was one of the better displays I saw at that event.  But it was typical Illya.  He was the superior player, except on crucial points.  Marchenko’s last six losses have been three-setters, yet only against Harrison did he push the final set past 6-4.

These guys play third on Court 4 today.

4. Cedrik Marcel Stebe vs Malek Jaziri

(Hey, it’s my list. If you don’t like my choices, make your own list!)

Stebe dominated the 2011 Challenger tour, then kept his ranking just high enough throughout 2012 to earn a direct entry into last year’s US Open, where he beat Viktor Troicki in the first round.  Two weeks later he beat Lleyton Hewitt in Davis Cup, and it’s been all downhill from there.  Aside from the final at the Tallahassee Challenger in the spring, there’s little sign of the guy who charged into the top 100 barely out of his teens.

The 22-year-old lefty is too passive to have a natural home on hard courts, though he has registered some big wins on the surface, such as the ’11 Challenger Tour finals and that Troicki upset.  That makes Jaziri an ideal opponent for him.  The 29-year-old Tunisian has played a bit more on hard courts this summer, showing up at a couple of North American challengers and playing qualifying in Washington, but he’s a counterpunching dirtballer at heart.

It could make for some ugly tennis, or it could generate some entertaining scampering around the back of the court.  They’ll play tomorrow.

5. Mitchell Krueger vs Lucas Pouille

It wouldn’t be a qualifying preview without some of the youngest players in the draw.  With so many of the fringey Americans wildcarded into the main draw, US fans need to look deeper for local boys, and Krueger is a good place to start.  The 19-year-old had a single ranking point when he got a qualifying wild card last year (and won a round); he has now edged into the top 500.  While he hasn’t made a strong impression on his first trip around the North American Challenger circuit, he has scored two top-300 wins.

Pouille, also 19, is a bit more advanced, having won 10 matches at the Challenger level and above since the beginning of this year.  Many view him as a big part of the future of French tennis, and with a ranking on the cusp of the top 200, he should be heavily favored here.

But the outcome isn’t what matters here; neither player is likely to reach the main draw.  In a qualifying field full of guys 10 years older, these two are unquestionably on the way up.  They’ll be on the Wednesday schedule.

A few notes:

Toward Atomic Statistics

Italian translation at settesei.it

The other day, Roger Federer mentioned in a press conference that he’s “never been a big stat guy.”  And why would he be?  Television commentators and the reporters asking him post-match questions tend to harp on the same big-picture numbers, like break points converted and 2nd-serve points won.

In other words, statistics that look better when you’re winning points.  How’s that for cutting edge insight: You get better results when you win more points.  If I were in Fed’s position, I wouldn’t be a “big stat guy” either.

To the extent statistics have the potential to tell us about a particular player’s performance, we need to look at numbers that each player can control as much as possible.  Ace counts–though they are affected by returners to a limited extent–are an example of one of the few commonly-tracked stats that directly reflect an aspect of a player’s performance.  You can have a big serving day with not too many aces and a mediocre serving day with more, but for the most part, lots of aces means you’re serving well.  Lots of double faults means you’re not.

By contrast, think about points won on second serve, a favorite among the commentariat.  That statistic may weakly track second serve quality, but it also factors the returner’s second serve returns, as well as both player’s performance in rallies that begin close to an even keel.  It provides fodder for discussion, but it certainly doesn’t offer anything actionable for a player, or an explanation of exactly what either player did well in the match.

Atomic statistics

Aces and double faults are a decent proxy for performance on serve.  (It would be nice to have unreturnables as well, since they have more in common with aces than they do with serves that are returned, however poorly.)

But what about every other shot?  What about specific strategies?

An obvious example of a base-level stat we should be counting is service return depth.  Yes, it’s affected by how well the opponent serves, but it refers to a single shot type, and one upon which the outcome of a match can hinge.  It can be clearly defined, and it’s actionable.  Fail to get a reasonable percentage of service returns past the service line, and a good player will beat you.  Put a majority of service returns in the backmost quarter of the court, and you’re neutralizing much of the server’s advantage.

Here are more atomic statistics with the same type of potential:

  • Percentage of service returns chipped or sliced.
  • Percentage of backhands chipped or sliced.
  • Serves (and other errors) into the net, as opposed to other types of errors.
  • Variety of direction on each shot, e.g. backhands down the line compared to backhands crosscourt and down the middle.
  • Net approaches
  • Drop shot success rate (off of each wing).

Two commonly-counted statistics, unforced errors and winners, have many characteristics in common with these atomic stats, but are insufficiently specific.  Sure, knowing a player’s winner/ufe rate for a match is some indication of how well he or she played, but what’s the takeaway? Federer needs to be less sloppy? He needs to hit more winners?  Once again, it’s easy to see why players aren’t clamoring to hear these numbers.  No baseball pitcher benefits from learning he should give up fewer runs, or a hockey goaltender that he needs to allow fewer goals.

Glimmers of hope

With full access to Hawkeye data, this sort of analysis (and much, much more) is within reach.  Even if Hawkeye material remains mostly impenetrable, the recent announcement from SAP and the WTA holds out hope for more granular tennis data.

In the meantime, we’ll have to count this stuff ourselves.

Raonic, del Potro, and the Importance of One Point

In last night’s Coupe Rogers match between Milos Raonic and Juan Martin del Potro, one point stands out from the rest.

Raonic won the first set, then Delpo broke early in the second.  With del Potro serving at 4-3, Raonic earned a break point with a winner at the net.  Replays clearly show that he touched the net.  Had the chair umpire seen it in real time, Delpo would have been awarded the point.

The Argentine never recovered, losing the next nine points and the match.

The net touch, and the point Milos didn’t deserve, was clearly a turning point in the match.  But how important was it, really?

If we assume that the two men were equal and that both players win 75% of service points (not true in Delpo’s case yesterday, but reasonable for two big servers on hard courts), here is a summary of Raonic’s probability of winning at various stages of the match:

  • After winning the first set: 75.0%
  • With Delpo serving 4-3, 00-00: 52.4%
  • With Delpo serving 4-3, 40-40: 53.9%
  • After winning the “touch” point: 58.9%
  • If Delpo had won that point: 51.8%
  • After winning the “touch” game: 75.0%
  • After holding serve for 5-4: 76.3%

The controversial point was, clearly, very important.  The difference between winning it and losing it was 7%, a magnitude that doesn’t happen very often in a tennis match, especially outside of tiebreaks.

But the real story here is the next point.  Remember that under normal circumstances, del Potro is a huge server and Raonic does not have a strong return of serve.  (I say “normal circumstances” because somehow, Raonic won 50% of return points in this match.)

If a server is winning 75% of points on his own racquet, his probability of winning a game from break point down is still 67.5%.  There’s a 25% chance he’ll lose the game on the next point, of course, but a 75% chance he’ll get back to deuce, where his serve gives him a 90% chance of winning the game.

The touch point increased Raonic’s chances of winning from 53.9% to 58.9%.  The next point upped his odds from 58.9% to 75.0%.  Which one do you think was more important?

Another way of looking at this to consider what would’ve happened had there been no video replay, and no chance of del Potro spotting the touch and arguing with the umpire about it.  Normal Delpo would’ve stepped back to the line and hit a service winner.  Five minutes later he would’ve held serve again and the two men would’ve played a third set.

It’s easy to look back at this match and conclude that the net touch was the difference in the match.  But no: It was the reaction to the touch–the controversy itself–that had a much greater impact.