{"id":844,"date":"2012-08-06T11:31:45","date_gmt":"2012-08-06T15:31:45","guid":{"rendered":"http:\/\/heavytopspin.com\/?p=844"},"modified":"2012-08-06T11:31:45","modified_gmt":"2012-08-06T15:31:45","slug":"the-tournament-simulation-reference","status":"publish","type":"post","link":"https:\/\/www.tennisabstract.com\/blog\/2012\/08\/06\/the-tournament-simulation-reference\/","title":{"rendered":"The Tournament Simulation Reference"},"content":{"rendered":"<p><a href=\"http:\/\/www.tennisabstract.com\/settesei\/2016\/10\/07\/guida-alle-simulazioni-predittive\/\"><em>Italian translation at settesei.it<\/em><\/a><\/p>\n<p>Among the more popular features of Heavy Topspin are my <a href=\"http:\/\/tennisabstract.com\/blog\/category\/forecasting\/\">tournament forecasts<\/a>, based on draw simulations. \u00a0It&#8217;s about time that I summarize how these work.<\/p>\n<p><strong>Monte Carlo simulations<\/strong><\/p>\n<p>To generate tournament predictions, we first need a way to predict the outcome of individual matches. \u00a0For that, I use jrank, <a title=\"The Official JRank\u00a0Reference\" href=\"http:\/\/tennisabstract.com\/blog\/2012\/05\/28\/the-official-jrank-reference\/\">which I&#8217;ve written about elsewhere<\/a>. \u00a0With numerical estimates of a player&#8217;s skill&#8211;not unlike ATP ranking points&#8211;we can calculate the probability that each player wins the match.<\/p>\n<p>Once those matchup probabilities are calculated, it&#8217;s a matter of &#8220;playing&#8221; the tournament thousands upon thousands of times. \u00a0Here, computers come in awfully handy.<\/p>\n<p>My code (<a href=\"https:\/\/gist.github.com\/781922\">a version of which is publicly available<\/a>) uses a random-number generator (RNG) to determine the winner of each match. \u00a0For instance, at the top of the Rogers Cup draw this week, <a href=\"http:\/\/www.tennisabstract.com\/cgi-bin\/player.cgi?p=NovakDjokovic\">Novak Djokovic<\/a> gets a bye, after which he&#8217;ll play the winner of <a href=\"http:\/\/www.tennisabstract.com\/cgi-bin\/player.cgi?p=BernardTomic\">Bernard Tomic<\/a>&#8216;s match with <a href=\"http:\/\/www.tennisabstract.com\/cgi-bin\/player.cgi?p=MichaelBerrer\">Michael Berrer<\/a>. \u00a0My numbers give Tomic a 64% chance of beating Berrer. \u00a0To &#8220;play&#8221; that match in a simulated tournament, the RNG spits out a number between 0 and 1. \u00a0If the result is below .64, Tomic is the winner; if not, Berrer wins.<\/p>\n<p>The winner advances to &#8220;play&#8221; Djokovic. \u00a0The code determines Djokovic&#8217;s probability of beating whoever advances to play him, then generates a new random number to pick the winner. \u00a0Repeat the process 47 times&#8211;one for each match&#8211;and you&#8217;ve simulated the entire tournament.<\/p>\n<p>Each simulation, then, gives us a set of results. \u00a0Perhaps Tomic reaches the second round, losing to Djokovic, who then loses in the quarters to <a href=\"http:\/\/www.tennisabstract.com\/cgi-bin\/player.cgi?p=JuanMartinDelPotro\">Juan Martin Del Potro<\/a>, who goes on to win the tournament. \u00a0 That&#8217;s one possibility&#8211;and it&#8217;s more likely than many alternatives&#8211;but it doesn&#8217;t tell the whole story.<\/p>\n<p>That&#8217;s why we do it thousands (or even millions) of times. \u00a0Over that many simulations, Delpo occasionally wins, but somewhat more often, Djokovic wins that quarterfinal showdown. \u00a0Tomic usually reaches the second round, but sometimes it&#8217;s Berrer into the second round. \u00a0All of these &#8220;usually&#8217;s&#8221; and &#8220;sometimes&#8217;s&#8221; are converted into percentages based on just how often they occur.<\/p>\n<p><strong>Probability adjustments<\/strong><\/p>\n<p>For any given pair of players, we don&#8217;t always expect the same outcome. \u00a0<a href=\"http:\/\/www.tennisabstract.com\/cgi-bin\/player.cgi?p=PabloAndujar\">Pablo Andujar<\/a> is almost always the underdog on hardcourts, but we expect him to beat most mid-packers on clay. \u00a0Players perform (a bit) better in their home country. \u00a0Qualifiers do worse than equivalent players who didn&#8217;t have to qualify.<\/p>\n<p>Thus, if we take last week&#8217;s Washington field and transplant it to the clay courts of Vina Del Mar, the numbers would change a great deal. \u00a0Americans and hard-court specialists would see their chances decrease, while Chileans and clay-courters would see theirs increase&#8211;just as conventional wisdom suggests would happen.<\/p>\n<p><strong>Simulation variations: Draw-independence<\/strong><\/p>\n<p>Some of the more interesting results come from messing around with the draw. \u00a0Every time a field is arranged into a bracket, there are winners and losers. \u00a0Whoever is drawn to face the top seed in the first round (or second, as Berrer and Tomic can attest) is probably unlucky, while somewhere else in the draw, a couple of lucky qualifiers get to play each other for a spot in the second round.<\/p>\n<p>That&#8217;s one of the reasons I sometimes run draw-independent simulations (DIS). \u00a0If we want to know how much the draw helped or hurt a player, we need to know how successful he was likely to be\u00a0<em>before<\/em>\u00a0he was placed in the draw. \u00a0(DISs are also handy if you know the likely field, but the draw isn&#8217;t yet set.)<\/p>\n<p>To run a draw-independent sim, we have to start one step earlier. \u00a0Instead of taking the draw as a given, we take the\u00a0<em>field<\/em> as a given, including the seedings if we know them. \u00a0Then we use the same logic as tournament officials will use in constructing the draw. \u00a0The #1 seed goes at the top, #2 at the bottom. \u00a0#3 and #4 are randomly placed in the remaining quarters. \u00a0#5 through #8 are randomly placed in the remaining eighths, and so on.<\/p>\n<p><em>(Update: I&#8217;ve published a python function, <a href=\"https:\/\/gist.github.com\/3609419\">reseeder()<\/a>, which generates random draws for any combination of number of seeds and field size that occurs on the ATP tour.)<\/em><\/p>\n<p><strong>Simulation variations: Seed-independence<\/strong><\/p>\n<p>We can take this even further to measure the beneficial effect of seeding. \u00a0Most of the time we take seeding for granted&#8211;we want the top two players in the world to clash only in the final, and so on. \u00a0But it can have a serious effect on a player&#8217;s chances of winning a tournament. \u00a0In Toronto this week, the top 16 seeds (along with, in all likelihood, a very lucky loser or two) <a href=\"http:\/\/tennisabstract.com\/blog\/2012\/07\/27\/who-benefits-from-byes\/\">get a bye straight into the second round<\/a>. \u00a0That helps!<\/p>\n<p>Even when there are no byes, seedings guarantee relatively easy matches for the first couple of rounds. \u00a0That may not make a huge difference for someone like Djokovic&#8211;he&#8217;ll cruise whether he draws a seeded <a href=\"http:\/\/www.tennisabstract.com\/cgi-bin\/player.cgi?p=FlorianMayer\">Florian Mayer<\/a> or an unseeded <a href=\"http:\/\/www.tennisabstract.com\/cgi-bin\/player.cgi?p=JeremyChardy\">Jeremy Chardy<\/a>. \u00a0But if you are Mayer, consider the benefits. \u00a0You&#8217;re barely better than some unseeded players, but you&#8217;re guaranteed to miss the big guns until the third round.<\/p>\n<p>This is why we talk so much about getting into the top 32 in time for slams. \u00a0When the big points and big money are on the line, you want those easy opening matches even more than usual. \u00a0There isn&#8217;t much separating <a href=\"http:\/\/www.tennisabstract.com\/cgi-bin\/player.cgi?p=KevinAnderson\">Kevin Anderson<\/a> from <a href=\"http:\/\/www.tennisabstract.com\/cgi-bin\/player.cgi?p=SamQuerrey\">Sam Querrey<\/a>, but if the US Open draw were held today, Anderson would get a seed and Querrey wouldn&#8217;t. \u00a0Guess who we&#8217;d be more likely to see in the third round!<\/p>\n<p>To run a seed-independent simulation: Instead of generating a logical draw, as we do with a DIS, generate a random draw, in which anyone can face anyone in the first round.<\/p>\n<p><strong>Measuring variations<\/strong><\/p>\n<p>If we compare forecasts based on the actual draw to draw-independent or seed-independent forecasts, we want to quantify the difference. \u00a0To do so, I&#8217;ve used two metrics: Expected Ranking Points (ERP) and Expected Prize Money (EPM).<\/p>\n<p>Both reduce an entire tournament&#8217;s worth of forecasts to one number per player. \u00a0If Djokovic has a 30% chance of winning this week in Toronto, that&#8217;s the probability he&#8217;ll take home 1,000 points. \u00a0If those were the only points on offer, his ERP would be 30% of 1,000, or 300.<\/p>\n<p>Of course, if Djokovic loses, he&#8217;ll still get some points. \u00a0To come up with his overall ERP, we consider his probability of losing the finals and the number of points awarded to the losing finalist, his probability of losing in the semis and the number of points awarded to semifinalists, and so on. \u00a0To calculate EPM, we use the same process, but with&#8211;you guessed it&#8211;prize money instead of ranking points.<\/p>\n<p>Both numbers allow to see how much the draw helps or hurts a player. \u00a0For instance, before the French Open, I calculated that <a href=\"http:\/\/www.tennisabstract.com\/cgi-bin\/player.cgi?p=RichardGasquet\">Richard Gasquet<\/a>&#8216;s EPM <a href=\"http:\/\/tennisabstract.com\/blog\/2012\/05\/25\/the-luck-of-the-2012-french-open-draw\/\">rose by approximately 25% thanks to a very lucky draw<\/a>.<\/p>\n<p>These numbers also help us analyze a player&#8217;s scheduling choices. \u00a0The very strong Olympics field and the much weaker Washington field last week created an odd situation: Lesser players were able to rack up far more points than their more accomplished colleagues. Even before the tournament, we could use the ERP\/EPM approach to see that <a href=\"http:\/\/www.tennisabstract.com\/cgi-bin\/player.cgi?p=MardyFish\">Mardy Fish<\/a> <a title=\"Why More Players Should Have Skipped the\u00a0Olympics\" href=\"http:\/\/tennisabstract.com\/blog\/2012\/07\/20\/why-more-players-should-have-skipped-the-olympics\/\">could expect 177 points in Washington<\/a> while the far superior <a href=\"http:\/\/www.tennisabstract.com\/cgi-bin\/player.cgi?p=DavidFerrer\">David Ferrer<\/a> could expect only 159 in London.<\/p>\n<p>If you&#8217;ve read this far, you will probably enjoy the newest feature on <a href=\"http:\/\/tennisabstract.com\/\">TennisAbstract.com<\/a>&#8211;live-ish forecast updates for all ATP events. \u00a0Find links on the <a href=\"http:\/\/tennisabstract.com\/\">TA.com homepage<\/a>, or <a href=\"http:\/\/www.tennisabstract.com\/current\/2012RogersCup.html\">click straight to the Rogers Cup page<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Italian translation at settesei.it Among the more popular features of Heavy Topspin are my tournament forecasts, based on draw simulations. \u00a0It&#8217;s about time that I summarize how these work. Monte Carlo simulations To generate tournament predictions, we first need a way to predict the outcome of individual matches. \u00a0For that, I use jrank, which I&#8217;ve &hellip; <a href=\"https:\/\/www.tennisabstract.com\/blog\/2012\/08\/06\/the-tournament-simulation-reference\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">The Tournament Simulation Reference<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[40,94],"tags":[],"class_list":["post-844","post","type-post","status-publish","format-standard","hentry","category-forecasting","category-reference"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/www.tennisabstract.com\/blog\/wp-json\/wp\/v2\/posts\/844","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.tennisabstract.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tennisabstract.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tennisabstract.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tennisabstract.com\/blog\/wp-json\/wp\/v2\/comments?post=844"}],"version-history":[{"count":0,"href":"https:\/\/www.tennisabstract.com\/blog\/wp-json\/wp\/v2\/posts\/844\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.tennisabstract.com\/blog\/wp-json\/wp\/v2\/media?parent=844"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tennisabstract.com\/blog\/wp-json\/wp\/v2\/categories?post=844"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tennisabstract.com\/blog\/wp-json\/wp\/v2\/tags?post=844"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}