Djokovic the Favorite, Murray the Vulnerable, Smyczek the Last Hope?

Last night, Novak Djokovic cemented his status as the US Open favorite, all without doing a thing.

The 32-man draw has 19 seeds left, but only two others remain in Novak’s quarter, and those two–Tommy Haas and Mikhail Youzhny–play each other in the 3rd round.  Djokovic will face the shocking Joao Sousa in his third-rounder, followed by the winner of Tim SmyczekMarcel Granollers in the fourth.

Novak’s quarterfinal threat was supposed to be Juan Martin del Potro, and that’s where the Serbian has really gained.  Lleyton Hewitt upset Delpo in a slipshod five-setter last night, making Djokovic’s most likely QF opponent Tommy Haas. While Haas has a recent win against the world #1, you have to figure he remains the preferred opponent.

These shifts in the draw mean that my forecast now gives Djokovic almost exactly double the chances of winning of his nearest competitor, Rafael Nadal.  Nadal, of course, has a much trickier path to the semifinals, likely having to go through both John Isner and Roger Federer.  Andy Murray has a more fortunate draw than that, but he’ll probably need to beat Tomas Berdych to earn a matchup with Djokovic.

Djokovic didn’t look dominant in his second-round win, but it was Murray who lost a set yesterday, to journeyman Argentine Leonardo Mayer, a 26-year-old who has yet to crack the top 50.  The defending champion recovered just fine, but is second-round weakness a sign of bad things to come?

The short answer is no.  Since 1991, seven US Open champions have been pushed to four or five sets in their second round match en route to the title, though none have suffered that fate since 2004, when eventual champ Federer dropped a set to Marcos Baghdatis.  Another three titlists lost at least one set in the first round.

However, few of those early-round challengers have been as anonymous as Mayer.  Besides Baghdatis, the most recent second-round threats have been Ivan Ljubicic and James Blake.  The last time an Open champion dropped a second-round set to such an anonymous figure was in 2000, when Marat Safin needed five sets to get past Gianluca Pozzi.

Also worth noting is that in Murray’s trio of notable victories–last year’s Olympics and US Open, plus this year’s Wimbledon–he has never dropped a set so early.  In fact, in London this summer, he won his first four matches in straights before battling through a five-setter against Fernando Verdasco.

Whatever else you might say about Verdasco, he’s a much more dangerous opponent than Leonardo Mayer.

American grinder Tim Smyczek scored the biggest win of his career yesterday with a five-set victory over Alex Bogomolov.  Smyczek has taken advantage of an easy draw (Bogie defeated Benoit Paire in the first round) to reach his first Grand Slam round of 32 in his fifth main draw appearance.

He has a rare opportunity to go even further, facing 43rd-ranked Marcel Granollers, also the beneficiary of a friendly draw thanks to Fabio Fognini‘s first-round loss.  Granollers has played 18 slams on hard and grass courts, never reaching the round of 16.

It’s a strange world when Smyczek is one of only three Americans–along with John Isner and Jack Sock–still alive.  Stranger still is the very real possibility that Tim will be the only man standing two days from now.  Sock faces Janko Tipsarevic, a winnable match but not one he’ll be favored in.  Isner is ranked higher than his next opponent, Philipp Kohlschreiber, but the German eliminated him in last year’s Open.

Smyczek, on the other hand, has nothing to lose.  Well, except for his pride, when he reaches the fourth round and suffers a triple-bagel at the hands of Novak Djokovic.

If you’re already worrying about not having enough matches to watch during week two, look no further than Colette Lewis’s thorough US Open Juniors preview, which lays out the contenders in both the boys’ and girls’ draw.

Anderson vs Baghdatis: In Extreme Detail

This one was fun.  When choosing to chart this match, I figured it was good for at least four sets, and that Kevin Anderson was likely to come out on top.  The typical Marcos Baghdatis performance this year has consisted of occasional glimpses of brilliance, mired in clunky decision-making and a pile of unforced errors.

Tonight we were treated to vintage Baghdatis, the version that packs stadiums with fans hoping to see some his trademark electric shotmaking.  Anderson may not have brought his best game, but he hit a fair number of first serves that would have gone for cheap points against most players, while Baghdatis not only got them back, he quickly turned the point to his advantage.

In 12 service games, Anderson was broken six times, most on hard courts since a 2010 match against Sam Querrey.  (Really, Sam Querrey.)  And it was Baghdatis’s most dominant performance in a Slam match since the 2006 Australian, when he beat Denis Gremelmayr, 6-2 6-1 6-2.  I seem to recall the rest of that tournament going pretty well for Marcos, too.

If the same Baghdatis shows up for Sunday’s match against Stanislas Wawrinka, that third-rounder could be a highlight of the weekend.

In the meantime, enjoy all serve and return breakdowns for both players.

Almost every one of those tables illustrates some aspect of Baghdatis’s dominance tonight.

  • Anderson only won 43% of his serve points by his second shot.  Without a larger dataset to compare to, it’s tough to know just how bad that is, but look at it another way: More than half of the time, Anderson’s serve resulted in a prolonged rally.  That can’t be good.
  • It’s interesting to see that both players hit several aces in both directions, both wide and down the T.  This is in contrast to Federer‘s performance the other night, in which almost all of his aces were down the T.
  • Of Baghdatis’s 57 serve points, 37 were returnable.  Anderson won only nine of those points. Nine.  It’s almost pointless to break that down any further, because no subset of those return points is going to look good.
  • By contrast, Baghdatis won 30 of the 45 points in which Anderson hit a returnable serve.  He only hit five unforced errors on serve returns, and got 35 of those 45 returns past the service line.

In case you’re new to my serve and return breakdowns, here are the previous ones:

Lopsided Four-Setters, Orderly Doubles, and Sock’s Luck

On Wednesday, Guillermo Garcia-Lopez appeared to give Juan Martin del Potro quite the battle, taking him to four sets, with two tiebreaks along the way.  It wasn’t what anyone expected from Delpo’s first-round match against someone ranked outside the top 70.

Looking behind the scoreline, however, it becomes evident that the Argentine dominated the match.  Frequent HT commenter Tom Welsh pointed out that del Potro’s Dominance Ratio (DR) was 1.64, a mark that Delpo had not reached in his previous nine matches, and not since posting a 1.68 DR in a routine victory against Bernard Tomic in Washington.

Of course, a stat like DR, which considers the total number of return points won and service points lost, will not capture the ups and downs within a match..  What it does tell you is, over the course of the afternoon, how well both guys were playing.  And comparatively speaking, del Potro was playing much better.

Delpo had previously played 29 matches in his career in which he finished with a DR between 1.6 and 1.7, and in all but one of those (a three-setter against Dudi Sela in Washington in 2008) he won in straight sets.

It turns out, though, that in Grand Slam play, dropping a set in the middle of an otherwise routine performance–as measured by DR–isn’t that uncommon.  While the average DR in a Slam four-setter is only 1.37, the winner has tallied a DR of 1.64 or better in more than 12% of Slam matches since 1991.

If there’s a takeaway here, it’s something we should already know.  In a tennis match-especially one with tiebreaks–some points are tremendously more important than others.  Garcia-Lopez saved 9 of 13 break points.  Take away one of those in the second set, and we’re not having this discussion.  Give Delpo one more of the first 12 points in the second-set tiebreak, and things could’ve turned out differently.  One well-timed, high-leverage point has the potential to overturn dozens of points worth of poor play.

Yesterday I mused on the chaos that is men’s doubles, and the Bryan brothers’ ability to rise above it.  Yesterday’s action was surprisingly unchaotic.

By the end of play yesterday, 15 of the 16 men’s doubles seeds had completed their first-round matches.  (Sixth seeds Edouard Roger-Vasselin and Rohan Bopanna play today.)  Of those 15, 10 reached the second round, including every top-seven seed who has played.

Compare that to men’s singles, in which 10 of 32 seeds crashed out in the first round.  For a more direct comparison, consider that 4 of the top 16 men’s singles seeds lost in the first-round.  Arguably, the doubles players have a tougher task.  Since the field is made up of only 64 teams, the first round can be more challenging in doubles than in singles.

What makes the sticking power of these top seeds surprising is the number of good doubles players who aren’t part of seeded teams.  Because the game is less physically demanding, doubles specialists can play on to much more advanced ages than can singles players.  One of the teams that executed an upset yesterday, Jonathan Erlich and Andy Ram, was in 2008 ranked among the top few pairings in the world.  Further, plenty of singles players have proven themselves quite adept at doubles, but don’t play enough to amass much of a ranking.

Part of the reason why the seeds have progressed more-or-less intact is the US Open format of three full sets.  At other levels, the third-set match tiebreak essentially turns the contest into a coin flip.  Both the second- and fifth-seeded pairs were forced into a third set, and at an event with a ten-point tiebreak, the odds would’ve been much higher that one of them would be headed home.

Jack Sock is playing only his fifth Grand Slam, and his first as a direct entry, having recently gotten his ranking into the top 100.  Part of the reason he was able to move into that rarefied air is his lucky path to the third round in last year’s US Open.

In 2012, his first-round draw was Florian Mayer, who retired in the middle of the third set.  That gave him a shot at the relatively weak Flavio Cipolla, who he beat in straight sets.  He gave Nicolas Almagro a scare in the third round but ultimately lost.  Still, he took home 90 ranking points instead of the 10 he would’ve collected had he lost to a healthy Mayer in the first round.

Defending those points, one might expect the young American to take a tumble in the rankings after the US Open.  After all, your typical 86th-ranked player doesn’t have much chance to reach the third round, let alone do so two years in a row.

But fortune has favored him again.  In the first round, he drew Philipp Petzschner, who retired in the middle of the third set.  (Sound familiar?)  Yesterday, he defeated the clay-court specialist qualifier Maximo Gonzalez, who did him the huge favor of knocking out Jerzy Janowicz in the first round.

It’s hard to imagine an easier route to a Slam round of 32.

At his site Betting Market Analytics, Michael Beuoy shows us the trajectory of Vicky Duval’s historic first-round upset, similar to some of the win-probability work I’ve done in the past.

Finally, more Duval: I charted her match last night, and have reams of data to show for it.

Hantuchova vs Duval: In Extreme Detail

Tonight I logged every point of the second-round match between Daniela Hantuchova and Vicky Duval.  It didn’t end up being very close, but Duval showed off some of the baseline skills that got her into the second round, while Hantuchova displayed the powerful serving and speed that kept her in the top 30 for so long.

Here is the complete breakdown. Tonight, we have both serves and returns.

Over the next few days, I’m hoping to come up with similar breakdowns for rally endings, shot types, and just about all the other numbers you can imagine crunching when you’ve charted every shot of a tennis match.  Stay tuned.  Maybe I’ll even try to make the presentation a little easier on the eyes.  (But don’t bet on it.)

Doubles Chaos, R2 Rigging, and the Threat of Watson

Today Bob Bryan and Mike Bryan open up their title defense in Flushing.  They’ve won four Grand Slams in a row, so winning this one would give them a calendar-year Slam, one of the few accomplishments they don’t already have in their pockets.

What makes this so impressive to me is the unpredictability of men’s doubles results, not to mention the utter chaos that reigns these days in the sport.  As I wrote after last year’s surprise Wimbledon results, men’s doubles is so heavily serve oriented that it often comes down to a tiebreak or two.  For most teams, that means that winning a tournament is roughly equivalent to guessing right on a series of coin flips.

For the Bryans to remain so dominant, they need to break serves that are rarely broken and win plenty of the tiebreaks that ensue when they don’t.  Roughly speaking, it’s as if John Isner stopped getting broken and improved his already impressive record in 7-6 sets.

Before the rain struck, yesterday provided a case in point of how good teams can easily suffer a bad loss.  Max Mirnyi and Horia Tecau make up one of the few teams that has remained together lately.  They aren’t unbeatable, but both are very good doubles players.  In their first-rounder yesterday, they lost in straight sets to Pablo Cuevas and Horacio Zeballos.  Yes, both of their opponents have strong doubles resumes, but Cuevas has been injured for what seems like years, and Zeballos was sick.  And neither plays nearly as much doubles as Mirnyi and Tecau do.

That sort of thing happens at every tournament.  We’ll see more of it in the next two days.  Somehow, it seems only the Bryans are immune.

Remember a couple years ago, when ESPN thought they discovered that the US Open was rigging the draw in favor of the top two seeds?  They weren’t, but tournament favorites have gotten a lot of easy first-round matches over the years.

While it’s surely just an accident, one can’t help think about it when looking at the men’s second-round draw.  Each of the original big four is playing a virtual non-threat, as is David Ferrer.  Djokovic gets Benjamin Becker, Murray drew Leonardo Mayer, Federer gets Carlos Berlocq, and Nadal drew Rogerio Dutra Silva.

To find a second-round match with some interest, you have to look to sixth-seed Juan Martin del Potro, who drew Lleyton Hewitt.  Even eighth-seed Richard Gasquet gets off easy, drawing qualifier Stephane Robert.

Sure, Slam second rounds aren’t always filled with interest.  But there are plenty of unseeded players–like Hewitt, or even Lleyton’s victim yesterday, Brian Baker–who could make things interesting for a top seed.  Ivo Karlovic, Gael Monfils, and Marcos Baghdatis, frequently cited as floaters, will face lower-ranked seeds, while Bernard Tomic and Jack Sock have clear paths to the third round.

In other words, we can look forward to some more blowouts on the show courts.

Could IBM’s contribution to the US Open get any worse?  It seems that the corporate giant has a team working hard on just that.

For those hardy enough to venture to the company’s website, there is a blog post called–I kid you not–“What if Watson Showed Up at the US Open Tennis Championships?

(They’re not talking about Heather.)

The answer is predictable: A bunch of amazing stuff will happen, what with the leveraging and the analytics and undoubtedly some synergies.  And predictive.

Aaron believes that cognitive technologies could utterly transform the US Open, from the way the technology responds to changes in demand for computing resources to the experiences of the fans, commentators and players. “Watson could bring a whole new level of engagement. It’s a cognitive agent that can improve the interactions between all of the people involved and between them and the event itself,” he says.

Ooh, cognitive agent!

He envisions augmenting Watson with predictive analytics technologies the sports events  team  has created for the US Open. In this future scenario, that technology would help commentators analyze and offer insights about matches with a level of accuracy never possible before.

We can only hope that IBM’s Watson team is completely different from IBM’s current tennis group.

On the subject of analytics–but I hope not embarrassingly bad ones–please check out my post last night with extremely detailed return profiles for Brian Baker and Lleyton Hewitt.  Return stats like you’ve never seen them before.

Baker vs Hewitt Return Profile: In Extreme Detail

Here’s the trouble with jotting down the details of every single shot in a tennis match: When you’re done, you have details about every single shot in the tennis match.

If you saw my post yesterday presenting serve profiles for Federer and Zemlja, you already have some idea of what I’m talking about.  When you can chop up each player’s performance a thousand different ways, it seems like a waste to ignore any possibility.

Here we go again.

I charted tonight’s match between Brian Baker and Lleyton Hewitt, two of the more electric baseliners in today’s game.  Hewitt doesn’t have much of a serve, and while Baker can crush his share of aces, he’s rarely consistent enough to shut down his opponent’s return game.

Here’s all the data I could think to generate regarding their return games tonight.

(Seriously, click the link.  I’m only writing this post as an excuse to show off what’s on the other side of that link.)

Here are some tidbits of interest I’ve noted from the data:

  • Hewitt is remarkably consistent, winning about the same number of return points in the deuce and ad courts, and against all types of serves except for those down the T.  (As we saw yesterday, Federer got almost all of his aces down the T, and that is probably true for most players.  Thus, returners will look weak in that category.)
  • Baker didn’t take much advantage of shallow returns.  Hewitt won more than half of the points in which he failed to get the return past Baker’s service line.
  • While Baker did a better job of hitting deep returns (80% past the service line), he wasn’t nearly as successful (winning only 29% of points) when his returns fell in the service box.  That’s probably a credit to Hewitt more than a knock on Baker.
  • Neither player sliced or chipped returns unless they absolutely had to.  Baker sliced less than 10% of his returns, and Hewitt barely 5% of his.
  • Baker loves his down-the-line backhand.  His five down-the-line return winners accounted for half of his total return winners, and they also represent half of his down-the-line returns.

Go look at the tables, let your eyes adjust for a minute, and then tell me if you find anything else interesting.

Duval’s Triumph, Isner’s Breaks, Flushing’s Favorites

17-year-old Victoria Duval, she of six career tour-level matches, upset 2011 champion and 11th seed Samantha Stosur last night.  Leave it at that, and it sounds pretty impressive.

But it doesn’t quite convey how impressive the youngster’s path to the second round has been.  Duval is ranked just inside the top 300–not high enough to get into the qualifying tournament on that basis.  Armed with a wild card, she beat three players, each with considerably more experience than she has.

Reaching a Grand Slam main draw as a qualifying wild card is notable in and of itself.  The only one of this year’s nine qualie WCs to reach the main draw, she’s only the 16th woman to do so at the US Open since 1998 and only the 31st woman to do so at any Slam in that time frame.

As we now know, she didn’t stop there, and that sets her further apart.  Of the 30 women who previous accomplished the feat, only 11 went on to win a match in the main draw.  (Only one of those, Great Britain’s Karen Cross, at Wimbledon in 1997, won two main draw matches.)  And only one of those ladies–Yulia Fedossova, who qualified for the US Open in 2006–beat a seed.  Her victim was the much less imposing 25th seed, Anabel Medina Garrigues.

Every slam has its share of upsets, but this one goes far beyond that.  By beating a former champion and highly-seeded player, Duval did something no woman had done before.

Yesterday was a good day for American men, who went 5-2.  The only victims were Steve Johnson, who struggled with injuries, and junior champion Collin Altamirano, who no one could’ve expected would give Philipp Kohlschreiber much of a fight.

More notable than the simple fact of winning was the manner in which two US men did so.  In the battle of oppositesJohn Isner defeated Filippo Volandri, 6-0 6-2 6-3, and Donald Young knocked out Martin Klizan, 6-1 6-0 6-1.

Isner set all kinds of personal records in the process.  Not known for his return game–to put it mildly–Isner had never won a bagel set on hard courts.  In fact, until beating Adrian Mannarino in Newport last month, he had never won a set of professional tennis 6-0.

Next, also because of that not-so-pesky return game, Isner tends to lose quite a few games, even when he’s winning.  In best-of-five matches, he had never before won a match without dropping at least nine games.  (That was at the French in 2010, when he beat Andrey Golubev.)  Today he won while giving up only five.

Finally, to reach such a scoreline, Isner broke serve a total of six times.  That’s something else he’s never done before.  He’s broken five times on a handful of occasions, but never six, unless he did so in Davis Cup, for which stats are more difficult to come by.

Still, it seems likely that Klizan played worse than Volandri did.  As you might imagine, Klizan has never lost quite so comprehensively, though he did turn in a similarly abysmal performance in New York three years ago, when he lost to Juan Carlos Ferrero, 6-1 6-3 6-0.

For Young, it was only his third straight-set victory at a slam, regardless of lopsidedness.  And it was only the third time he won a Grand Slam match having earned his way into the main draw.  His other five wins–all at the US Open–came as a wild card.

Since we’re talking about all these Americans winning in New York, it seems like a great time to point you toward Colin Davy’s recent effort to quantify home-court advantage in tennis.  He finds that home-country players–both men and women–have a slight advantage that cannot be explained by other factors, to the tune of about 2%.

In building jrank, I’ve done some work along the same lines, and arrived at a similar number.  (On my old blog, I posted some very crude attempts, not controlling for things like surface, and claimed a much bigger effect.  I don’t think I’ve published the details of my more recent efforts.)

As Colin notes, it’s a small effect compared to other sports. (Isner’s love of the USA notwithstanding.)  To the extent home-court advantage in tennis stems from officiating bias–a common cause in other sports–the increasing use of Hawkeye would seem to lessen the effect.  And oddly, the practice of putting local players on main courts would turn out to be counterproductive.  By putting locals on Hawkeye courts, you’re taking away at least one slight advantage.

Colin also suggests comparing different stages of the tournament, which may reveal that umpires have a greater or lesser bias as the stakes get higher.  That test occurred to me for a different reason.  Travel-related fatigue is a major factor (again, something Colin acknowledges), but it is one that would likely lessen as the tournament goes on.  A player might still be jetlagged for his first-round match, but if he wins a couple of rounds, that effect is likely gone.

It’s an interesting field of study, one that is particularly tricky to separate from others–such as travel effects, surface preferences, venue familiarity, and so on.  As is so often the case in tennis, it is a topic that has been extensively hashed out for other sports, yet barely researched in ours.

In case you missed it yesterday afternoon, I tracked every point of the Federer-Zemlja match, and came up with some very detailed serve breakdowns for each player.  Check it out.

Federer vs Zemlja Serve Profile: In Extreme Detail

As I wrote last week, tennis needs more detailed statistics.  Most of all, we need them in an open format so that researchers can utilize all the data stored for every match.  No use in have Hawkeye cameras on every court if the data stays locked up.

I’m working on a system for charting matches and storing extremely detailed serve and shot information.  It will have to stay under wraps until I get a few more kinks worked out, but in the meantime, I want to show off some of what it can do.

Click here for more exhaustive serve data than you’ve probably ever seen before.

Today’s match wasn’t the most gripping that Roger Federer (or Grega Zemlja) ever played, but there’s still plenty of interesting stuff:

  • Roger won 85% of first-serve points. No surprised there.  More impressively, he won 60% of his first-serve points on or before his second shot.  (That’s “<=3W” in the tables.)
  • Fed went down the T with just under half his first serves (47%), but up-the-middle offerings accounted for 11 of his 12 aces.
  • Zemlja hit a shocking 27 serves into the net–almost half of his faults, and just over 20% of all of the serves he hit today.  (Watching the match, it felt like even more.)
  • Roger’s first serves were somewhat more dominant in the deuce court, as he lost only three first-serve points in that half, and won two-thirds of his first-serve points in the deuce court by his second shot.  In the small amount of data on offer today, he was noticeably weaker with his deuce court second serve, losing 5 of 12 second-serve points in that direction, compared to only 3 of 18 second-serve points to the ad court.
  • Zemlja fared better serving to the ad court today (64% of service points won to 56% in the deuce court), and was particularly deadly when he landed a serve wide in the ad court.  He won seven of the eight points that started that way, five of them with or before his second shot.

(If you didn’t click on the link the first time you saw it, now would be a good time.)

You get the idea, I hope.  With this much data, the sifting is as important as the collecting.  There are hundreds of data points we can generate just from tracking each player’s serve performance, and we can expect that most of them won’t have much to tell us.

And, of course, one match is just that–a small sample, fewer than 100 service points for each player.  While we can look at these tables and gain some insight into exactly how Roger was dominant today, it would be a mistake to draw much in the way of broader conclusions.

For that, we’ll need more matches, more data.  We’ll get there.

Contrasting Serves, Futile Slams, and (More) IBM Shortcomings

In most of his matches, John Isner makes his opponents look short and their serves look weak.  What happens, then, when his opponent really is short, with one of the weakest serves in the game?

Third up on grandstand today, Isner takes on Filippo Volandri, the man who sets records Isner will never reach.  Three years ago, the Italian failed to hit a single ace for 19 straight matches.  Volandri may not be as short as some players on tour–the ATP site lists him at six feet–but it’s more common for him to fail to hit an ace in a match than it is for him to hit one.

In the last year, Isner has hit nearly 19% of his first serves for aces, good for best among tour regulars.  In the top 50, the other extreme is represented by Nikolay Davydenko, whose rate is just under 3%.  Volandri–despite playing many weaker opponents on the Challenger tour–sits at 0.8%.

The good news for Big John is that the 31-year-old Volandri is a nonentity on hard courts, having not played on the surface since losing in the first round of the Australian. The bad news? He’ll have to hit a lot of returns today.

As my forecast very delicately predicted, Fernando Verdasco didn’t live up to his seed, losing to the barely-unseeded Ivan Dodig yesterday in five sets.  That’s the fourth slam this year in which he’s lost in a five-setter.

Verdasco, with his flashy talent and underwhelming results, comes in for his share of fan mockery.  But this is one time he doesn’t deserve it.  Out of the several dozen players who enter all four slams each year, almost all will lose four matches.  While it may be frustrating to lose in five, losing in five, all else equal, says better things about your game than losing in three.

One of those five-set losses this year was to Andy Murray at Wimbledon; the other two previous contests were against Janko Tipsarevic and Kevin Anderson.  Perhaps Fernando should have finished off at least one of those matches, but none of his four slam losses this year are nearly as groan-inducing as, say, Ernests Gulbis‘s disaster yesterday against Andreas Haider-Maurer.  And his record is nothing compared to Marinko Matosevic‘s streak of 11 losses in 11 slam appearances.

Verdasco is the sixth man in the Open era to complete this distinctive slam feat, and he’s not in bad company. Last year, Isner did it–and added an exclamation point with a five-set loss in Davis Cup.  Before that, the most recent were Fernando Gonzalez in 2006 and Tim Henman in 2000.  Not bad company.

Anyway, if you’re drawn to this unusual feat, don’t miss Steve Johnson‘s first-round match with Tobias Kamke. It’s last on Court 13 today. Johnson is three-quarters of the way to the Fernando slam, losing all three of his matches at majors this year in five sets.  If he completes the set, it will be particularly impressive for at least one man: Kamke has won only two five-setters in his career.

As part of IBM’s ham-handed PR push leading up to another slam, the company gave analyst and coach Craig O’Shannessy some data.  He reported some results on both the ATP site and the New York Times Straight Sets blog.

This is a huge step up from the thinly-veiled advertisement I highlighted yesterday.  But it still, frustratingly, falls short.

One of the major points of Craig’s ATP piece is summarized at the beginning: “Most baseline points are a losing proposition,” and “Approaching the net is a goldmine.”  Later, he continues, “It seems amazing that players don’t venture forward more often to capitalize on the far higher winning percentage approaching offers over baseline play.”

Is this the data-driven, actionable advice I pleaded for last week? Not quite.

As I’m sure Craig would agree, opportunities to come to net aren’t always available, and they don’t arise in a vacuum.  Especially in today’s baseline-focused game, net points tend to occur when one player hits a particularly weak shot.  So if most net points end in victory for the player who approaches, is that because of the choice to come to net, or the weak shot that generated that opportunity?

Think about it probabilistically.  When Djokovic serves against Tsonga, let’s say he has a 75% chance of winning a first serve point.  If Tsonga hits a weak chip return in the middle of the court, allowing Novak to take several steps forward, we could figure that Djokovic’s chance of winning the point increases to 95%–perhaps higher.  When Novak puts away his second shot, he wins the point.  Formally speaking, his chance of winning jumps to 100%.

Now, in that example, what do you credit as the reason for Djokovic winning the point?  Landing a solid first serve, which gives him a 75% chance of winning instead of, say, 60%? A particularly good first serve, which forced the weak return?  Tsonga’s poor return? Or Novak’s “choice” to approach the net?

That final choice is laughable.  And this is the data he’s drawing from.  Aside from a few particularly aggressive players on tour, that’s the profile of a net point in 2013.

So, what’s the actionable advice here?  You probably shouldn’t approach the net without a reasonable opening, so … hit bigger serves to get more weak returns? Hit deep groundstrokes into corners? Take advantage of short balls?

These are the benefits we reap from “Big Data?”

IBM clearly wants to wow us with this stuff.  Yet the “findings” are so elementary as to be useless.  The solution is so simple: release the data, let fans and analysts innovate, and watch the quality of this work go through the roof.

Dodig’s Consistency, IBM’s Offensive, and Hopeless Wild Cards

Ivan Dodig just missed out on a seeding at this year’s US Open.  Ranked 37th when seeds were assigned, he had ascended as high as #35, largely on the strength of his fourth-round showing at Wimbledon.

While the Croatian could have drawn any seed as early as the first round, he got lucky, pulling 27th-seeded Fernando Verdasco.  My forecast underlines his fortune, giving him a 51% chance to advance to the round of 64, then roughly even odds again to make the round of 32 against (probably) Nikolay Davydenko–another player who fell just outside the seed cut.

Making the Dodig-Verdasco comparison more interesting is that in the last 52 weeks, the unseeded player has won more matches (38 to 29) with a higher winning percentage (58% to 56%).  What the Spaniard has done, however, is bunch his wins much more effectively than his first round opponent.  While Dodig achieved a career highlight with his R16 showing in London, Verdasco made the quarters.  Fernando reached the final in Bastad, and earlier in the year, won two matches at the Madrid Masters.

A telling comparison is that while Dodig has lost five opening-round matches in the last year, Verdasco has lost nine.  As Carl Bialik explained two years ago, consistency isn’t such a great thing in tennis.  Certainly, the ATP rankings–and the seedings that utilize them–prefer inconsistency.

You know there’s a Grand Slam in the offing when the PR pieces from IBM start to appear.  Last week, a particularly bald-faced plant showed up in the New York Times, a publication that–one fervently hopes–should know better.

This particular piece includes such hard-hitting journalism as, “The keys are updated during matches to track any shift in momentum, and they correlate well with the final outcome,” and “These extra features are likely to drive traffic to the event’s Web site, USOpen.org, and its various mobile versions. ”

The Times should be embarrassed.  What makes this particularly frustrating to the statistically-oriented fan is that while IBM speaks the right language, the results of this effort to “fulfill fans’ desire for deeper knowledge” are so disappointing.

The much-vaunted Keys to the Match are frequently arbitrary, often bizarre.  In Kei Nishikori‘s second-round match at Wimbledon, one of his “Keys” was to “Win between 71 and 89 of winners on the forehand side.”  He didn’t do that–whatever it means, exactly. He didn’t meet the goals set by his two other Keys, either, yet he won the match in straight sets.

Most frustrating to those of us who want actual analysis, the underlying data–to the extent it is available at all–is buried almost beyond the possibility of a fan’s use.  IBM–like Hawkeye–is collecting so much data, yet doing so little with it.

Lots of fans do desire more statistical insight. Much more. The raw material is increasingly collected, yet the deeper knowledge remains elusive.

Stay with me as I leap from one hobby-horse to another.

Wild cards cropped up as a topic of conversation last weekend, largely thanks to Lindsay Gibbs’s piece for Sports on Earth, in which Jose Higueras said, “If it was up to me, there would be no wild cards. Wild cards create entitlement for the kids. I think you should be in the draw if you actually are good enough to get in the draw.”

I don’t object to wild cards used as rewards, like the one that goes to the USTA Boys’ 18s champion, or the ones that the USTA awards based on Challenger performance in a set series of events.  There’s even a place for WCs as a way to get former greats into the draw. James Blake shouldn’t have gotten the deluge of free passes that he has received in the last few years, but it’s probably good for the sport to have him in more top-level events than he strictly deserves.

The problem stems from all the other wild cards, and not just from a player development perspective.  Are fans going to get that much enjoyment out of one or two matches from the likes of Rhyne Williams and Ryan Harrison, Americans who didn’t have a high enough ranking to make the cut?  Of the fourteen Americans in the men’s main draw, six were wild cards, and it would shock no one if those six guys failed to win a single match.

There are further effects, as well.  By exempting Williams, Harrison, Tim Smyczek, and Brian Baker from the qualifying tournament, fans seeking quality American tennis last week barely got to see any.  Donald Young–who has received far too many wild cards himself–was the only American to qualify, largely because the US players at the same level as the other would-be qualifiers didn’t have to compete.  The remaining Americans were in over their heads.

This leads me to a great alternative suggested by Juan José Vallejo on Twitter: Be liberal with free passes in qualifying, and take the opportunity to promote those early rounds much more.  At the Citi Open a few weeks ago, the crowds on Saturday and Sunday for qualifying were comparable to those Monday and Tuesday.  Because qualifying often falls on the weekend, the crowds are there.  But if they want to see Jack Sock play, they’ve got to come back Tuesday night (and spend a lot more money), and they’re much more likely to see him overmatched by a better, more experienced player.

Cut the entitlement, improve the quality of main draw play, and give the fans more chances to watch up-and-coming stars.  I wish there was a chance this would happen.