Nine Degrees of Spencer Gore

In many ways, the early days of tennis seem impossibly ancient. It was a time of long skirts, wooden rackets, and underhand serves that were in no way tactical. Sometimes, though, the century and a half of lawn tennis feels like almost nothing.

After stumbling across a mention of a 1951 professional match between Bill Tilden and Pancho Gonzales, I took to Twitter:

The path from Tilden to Federer–or one of many other active players–requires only three intermediate steps.

If we expand the types of links we’re willing to consider, the connections are almost overwhelming. From the 1931 men’s champion of Black tennis, forbidden from entering the US National Championships, you can get to Svetlana Kuznetsova in only three steps:

If we stick to women’s singles, the paths are a bit longer, because fewer women played for as long as the likes of Tilden and Gonzales, especially in the amateur era. Yet it still only takes five steps to travel from 1908 US champion Maud Barger-Wallach to Venus Williams:

If you’ve ever played Six Degrees of Kevin Bacon, you know how addicting this kind of thing can be. And you can guess how productive I was at work today while mulling the kinds of paths that can be constructed between early tennis and the present.

But wait, there’s math!

Is my path from Tilden to Federer the optimal one? Could we construct a smaller set of connections between Barger-Wallach and Venus Williams? Like many pursuits that start out as time-wasters, this is a math problem that we can solve.

In a different domain, the Oracle of Bacon offers just that sort of solution, calculating the shortest path between Kevin Bacon and any actor, where each step is a film that “connects” any pair of cast members. For example, Serena Williams has a “Bacon number” of 3:

Academics have “ErdÅ‘s numbers” and you can see how baseball players are connected with the Oracle of Baseball at baseball-reference.com.

These solutions come from the field of graph theory, which includes many algorithms that address this sort of problem. (As well as real problems that are relevant to the real world.) Checking every possible path between actors, academics, or baseball players is extremely computationally intensive, so different techniques take varying approaches to trimming the number of paths worth investigating. One of these algorithms, breadth-first search, is efficient enough that it can identify the shortest route from a half-million tennis matches on my laptop in a few seconds.

Gore to Djokovic

Let’s see what this Oracle of Tennis can tell us. The first Wimbledon champion, in 1877, was Spencer Gore. He was no Pancho–he played The Championships only one more time. The Oracle will have some work to do to get from Gore’s corner of the graph to the modern era.

It turns out that the shortest path from Gore to Novak Djokovic–the first Wimbledon winner to the reigning titleholder–takes nine steps:

Spencer Gore vs Montague Hankey (1877 Wimbledon)

Hankey vs Charles Lacy Sweet (1883 Cirencester Park)

Sweet vs George Lawrence Orme (1884 Sussex County)

Orme vs Max Decugis (1901 French Covered)

Decugis d Coco Gentien (1924 Coupe de Noel)

Gentien vs Pancho Gonzales (1949 Roland Garros)

Gonzales vs Jimmy Connors (1971-73, 4 meetings)

Connors vs Fabrice Santoro (1992 Vienna)

Santoro vs Novak Djokovic (2007-08, 2 meetings)

That isn’t the only nine-step path from Gore to Djokovic, but there are none shorter. Many of the most efficient routes involve the same players. Gore didn’t give us many opponents to choose from, so the relatively(!) long career of Montague Hankey is a common first step. And the final sequence of Pancho-to-Connors-to-Santoro-to-Djokovic (and many other present-day stars) is tough to beat.

Sutton to Raducanu

Historical women’s tennis data isn’t in quite as good of shape as men’s–yet. Thanks to TennisArchives.com, we can scan hundreds of thousands of men’s results from the amateur years in addition to the usual Open Era records. I’ve pushed my dataset of historical women’s results back to 1917–a huge improvement over the state of affairs a year ago, but missing the first few decades of tournaments.

We can still reach quite far back. Two-time Wimbledon champ and winner of the 1904 US National Championships, May Sutton Bundy was part of a Southern California tennis dynasty and one of the greats of her era. After giving birth to four kids in the 1910s, she returned to competitive tennis and won singles titles as late as 1928.

So even though we don’t yet have her entire career record in the database, we can use the Oracle to link her to the present. It takes only seven steps to get from Sutton to 2021 US Open champ Emma Raducanu:

May Sutton Bundy vs Marion Zinderstein Jessup (1921 Seabright)

Jessup vs Betty Rosenquest Pratt (1943 Wilmington)

Pratt vs Christine Truman (1957-59, 3 meetings)

Truman vs Martina Navratilova (1973 Wimbledon)

Navratilova vs Ai Sugiyama (1993 Tokyo)

Sugiyama vs Stefanie Voegele (2006 Fed Cup)

Voegele vs Emma Raducanu (2021 US Open)

I don’t know what else to add–this was a weird day.

The Underserved First Point

Not all points are created equal. Ask around, and you’ll get a variety of opinions as to which points are most important. Break points, obviously, are key. Pundits are fond of 15-30.

Then there’s the first point of the game. It’s been conventional wisdom for a long time that the opening points holds disproportionate weight. In a previous study, I disproved that. Of course it’s valuable to move from 0-0 to 15-0, and no one likes to start a game by dropping to 0-15. But the first point doesn’t have any magical effect on the outcome of the game beyond simply adding to one or the other player’s tally.

Yet here I am, talking about the first point again. While there still isn’t any magic, the first point is going to the returner too often. With a slight change in tactics or focus, this is a rare analytical insight that pros may be able to use to win a few more service games.

Point by point

The balance between the server and returner varies a great deal depending on the point score. In men’s singles matches at the US Open between 2019 and 2021, servers won 63.6% of points in non-tiebreak games. Yet at 40-love, the server won 67.7%, and at ad-out, the server won only 59.6%.

The point scores that generated such extremes hint at what’s going on here. If a game has reached 40-love, the server is probably a good one. It’s not always the case, but if you look at all the 40-love games in a large dataset, you’ll get far more John Isner holds than Benoit Paire holds. The opposite applies to ad-out, a score that Isner rarely faces. Thus, the difference in point-by-point serve percentage isn’t (entirely) because of the point score–it’s because of the servers who get there.

Other differences are more prosaic. On average, servers win more deuce-court points than ad-court points. In the same three-year dataset, the difference was 64.2% to 62.9%. There’s no selection bias component here. The typical ATPer is simply stronger in that direction. Some players–particularly left-handers–break the mold, but most will favor the deuce side. Both Novak Djokovic and Roger Federer, for instance, win nearly two percentage points more often when serving to that court.

Unbiasing

Because scores like 40-love and ad-out aren’t randomly distributed among servers, we need to do a bit more work to figure out which scores really do favor the server. The trick here is to compare each service point to the rest of the server’s points in the same match. A point like 40-love has a ton of Isners and Opelkas in it, so we’ll end up comparing it to a lot of other Isner and Opelka points. And in fact, the average player who reaches 40-love wins 65.0% of their service points and 64.3% in the ad court, two numbers that are well above average.

Working through the same exercise for every point score gives us a list of “actual” serve points won, “expected” serve points won, and differences. The “actual” column tells us what really happened at that score, bias and all; “expected” tells us how often that particular set of players won service points during the entire matches in question; and the difference gives us a first look at where servers are over- or under-performing.

The following table shows these numbers for each point score:

Score  Actual  Expected  Difference  
40-AD   59.6%     61.4%       -1.8%  
0-0     63.3%     64.6%       -1.3%  
15-0    62.7%     63.3%       -0.6%  
40-30   61.6%     62.2%       -0.6%  
15-30   62.3%     62.7%       -0.4%  
30-0    64.7%     65.1%       -0.3%  
40-40   62.6%     62.8%       -0.1%  
0-15    63.2%     63.3%       -0.1%  
                                     
Score  Actual  Expected  Difference  
40-15   64.6%     64.5%        0.0%  
30-15   62.8%     62.7%        0.1%  
AD-40   61.6%     61.4%        0.2%  
30-30   64.0%     63.6%        0.4%  
0-30    65.9%     65.2%        0.8%  
15-15   64.8%     64.0%        0.8%  
30-40   63.6%     62.2%        1.4%  
0-40    66.1%     64.7%        1.4%  
15-40   66.9%     64.5%        2.4%  
40-0    67.7%     64.3%        3.4%

The scores at the top of the table are the ones where we would expect servers to win more points. At the bottom of the list are those where the server seems to overperform.

Some of the results lend themselves to easy narratives. Servers really focus at 0-40 and 15-40, while returners know they have more break chances coming. 40-AD (ad-out) seems like a stressful time to serve, and the numbers back that up. Other results are a bit more baffling–shouldn’t 30-30 and 40-40 be the same, since they are logically equivalent? Why are servers performing so well at 30-40 if they ultimately struggle at 40-AD?

And to today’s topic: What about the first point? It ranks second only to 40-AD in how much the server underperforms, despite no obvious reason why it should lean one way or the other.

Second to none

When we consider a few more factors, this first-point underperformance has an even greater impact.

One useful way to measure the importance of a point is with win probability. Given any point score (or set/game/point score), combined with the likelihood that the server will win any given point, you can calculate the probability of a hold (or a match victory). If we assume that the server wins 64.2% of points, he’ll hold 81.6% of the time, so his win probability at the beginning of the game is 81.6%.

* 64.2% was the rate in non-tiebreak games at the 2021 US Open, while the overall rate for this 2019-21 dataset is a bit lower.

The next concept is volatility. A point’s volatility is determined by how much the result could swing the win probability. By winning the first point, the server’s win probability rises to 89.7%, the figure for such a server at 15-love. If he loses, it falls to 67.2%. The difference–22.5%–tells us how much is at stake in that single point.

In volatility terms, the first point isn’t particularly crucial. A 22.5% swing far outstrips, say, the 9.3% volatility at 30-love, but it pales next to the 76.3% volatility at 30-40. When the server faces break point, one swing of the racket can determine whether win probability drops to zero (because he loses the game), or bounces back north of 50% (because he gets back to deuce).

What the first point of the game gives up in volatility, it wins back in volume. The stakes are never higher than at 40-AD, but at the US Open in the last few years, barely one-fifth of games ever get that far. By contrast, there’s a love-love kickoff in every single game.

By combining volatility and volume with the degree to which servers under- or over-perform, we can put together a top-level view of what players are gaining or losing at each point score.

Multipliers gone wild

In a tour de force of mathematical derring-do, I’m going to take these three numbers and multiply them together.

The “difference” from the previous table tells us how much better or worse players are serving at a specific point score, compared to their overall performance. If two differences are similar, the one that matters more is the one with higher volatility, right? So we multiply by volatility. And all else equal, the more often a situation occurs, the greater its impact on the end result. So we multiply by the number of occurrences in the dataset.

The final tally is volatility * occurrences * difference, cleverly dubbed “V*O*D” in the table below. The product of three percentages is tiny, so I’ve multiplied those figures by 10,000 to make the results easier to read.

Here are the results:

Score  Volatility  Occurrences  Difference  V*O*D  
40-AD       76.3%          22%       -1.8%  -29.9  
0-0         22.5%         100%       -1.3%  -29.2  
15-30       44.9%          34%       -0.4%   -5.8  
15-0        16.5%          50%       -0.6%   -4.9  
40-30       23.8%          26%       -0.6%   -3.6  
40-40       42.5%          43%       -0.1%   -2.6  
0-15        33.2%          50%       -0.1%   -2.3  
30-0         9.3%          27%       -0.3%   -0.9  
                                                   
Score  Volatility  Occurrences  Difference  V*O*D  
40-15        8.5%          24%        0.0%    0.1  
30-15       20.7%          34%        0.1%    0.6  
AD-40       23.8%          22%        0.2%    1.1  
40-0         3.0%          16%        3.4%    1.7  
30-30       42.5%          32%        0.4%    5.9  
0-40        31.4%          16%        1.4%    7.1  
0-30        40.0%          27%        0.8%    8.2  
15-15       29.4%          46%        0.8%   11.0  
30-40       76.3%          25%        1.4%   26.3  
15-40       49.0%          24%        2.4%   28.2

With all factors taken into account, we see that servers are giving up about as much on the first point of the game as they are when faced with nerves at 40-AD. Two point scores also stick out at the other end of the spectrum, where 30-40 puzzlingly continues to be a time when servers find their best stuff.

Exploiting the mundane

The exact V*O*D numbers are far (far!) from natural laws, but when I ran the same algorithm on data from other grand slams, the contours were nearly the same. In the 2017 and 2018 US Opens, for instance, 40-AD and 0-0 were again the standout “underperforming” points, and 0-0 was the one that topped the list.

* I took a rudimentary look at this topic very early in the blog’s history, using data from 2011. 0-0 didn’t stick out to the same degree, but I didn’t control for the deuce/ad difference, as I have today. When accounting for deuce-court strength, 0-0 performance looks relatively worse.

All of which is to say: I can’t explain why this is a thing, but it sure looks like it’s a thing. And if it’s a thing, it looks like an opportunity for savvy players and coaches.

I’m perfectly happy to accept that servers struggle to maintain their focus (and perhaps their ability to surprise) at 40-AD. More importantly, I’m sure that players and coaches are very aware of the necessary mental gymnastics so deep in a game.

On the other hand, there’s no good reason that servers should underperform at the start of every game. In fact, I’d be more ready to accept the idea that servers would have the edge. The opponent hasn’t seen a serve for a few minutes (or more), and the server’s arm is (relatively) fresh. While it’s not a recipe for domination, it sounds like a recipe for a tiny edge that the server can build on.

That’s why I believe there’s something to be exploited here. Perhaps players–or at least some of them–are taking a bit off their first-point first serves, using the opening salvo as a mini-warmup. Maybe they are more willing to hit their second-best serve, or aim to the returner’s stronger side, as a tactical move to set up more effective serves later in the game. As I’ve said, I don’t know why the numbers are turning up this underperformance, but it’s clear there’s a gap to be closed.

There’s no magic in the first point, but there’s an awful lot of value. Players who serve up their best stuff at the beginning of the game are getting an edge that their peers ought to be developing, too.