As a type of cross-validation procedure, we fit the Bayesian latent variable model using only first innings data. Although this reduces the size of the dataset (by roughly 50%), it permits us to compare simulated results for the second innings with actual second innings results that were not used in determining the parameter estimates.
The model was fit using WinBUGS software where posterior means were calculated for the 853 model parameters. Although the WinBUGS program requires two hours of computation, once the parameter estimates are obtained, they can be used over and over again as inputs to the simulation program. In the Appendix A, we provide the WinBUGS code to emphasize the simplicity in which WinBUGS software facilitates the implementation of latent variable models.
Another advantage of the Bayesian formulation concerns the use of parameter estimates in the simulation program. It is a widely held belief that the performances of batsmen and bowlers are not constant. For example, batsmen have good days and bad days, and this can be related to their health or any number of reasons. In a Bayesian formulation, we need not use the same parameter estimates µ(1) and µ(2) (posterior means) for batsmen and bowlers over all matches. Alternatively, at the beginning of a match, the µ(1)and µ(2) values can be generated from their respective posterior distributions to reflect match by match variation in performance.
Now, there are countless ways that one might test the adequacy of the model. In Table 2.3, we provide the estimated probabilities pijwbk for some batsmen/bowler combinations
at different states of a match. We have also included the expected number of runs per over for each combination. We have presented batting outcome probabilities when Alistair Cook of England is batting against Glenn McGrath of Australia, and against Nazmul Hossain of Bangladesh. At the beginning of a match (i.e. ball 1, 0 wickets), we observe that with probabilities 0.681 and 0.078, Cook scores 0 runs and 4 runs respectively against McGrath. At the beginning of a match, these probabilities change to 0.626 and 0.100 respectively when Hossain is bowling. These changes are consistent with the general belief that McGrath is a better bowler than Hossain. We then investigate a situation where batsmen ought to become more aggressive (ball 271 when 2 wickets are lost). Indeed, the probability that Cook scores 0 runs decreases substantially to 0.338 and 0.285 depending on whether McGrath or Hossain is the bowler. We also note a curious result concerning the probability of scoring 4 runs. Even though batsmen are more aggressive on ball 271 with 2 wickets than at the beginning of a match (ball 1 with 0 wickets), the fielding restriction that is in place at the beginning of a match enables batsmen to score 4’s at a higher rate. This batting behaviour is observed in Table 2.3 and has been verified by looking at empirical data. In Table 2.3, we also investigate the case of ball 271 when 4 wickets are lost which according to common knowledge should be a less aggressive batting situation than ball 271 when 2 wickets are lost. Accordingly, we observe that the probability of 0s increase and the probability of 1s decrease in the less aggressive situation.
Table 2.3: Batting probabilities p for various states and the expected number of runs per over E(R) where CM denotes the Cook/McGrath matchup and CH denotes the Cook/Hossain matchup.
State of the Match Dismissal Zero One Two Three Four Six E(R)
CM (ball 1, 0 wickets) 0.024 0.681 0.165 0.038 0.010 0.078 0.004 3.7 CM (ball 271, 2 wickets) 0.039 0.338 0.452 0.077 0.006 0.069 0.018 6.1 CM (ball 271, 4 wickets) 0.033 0.352 0.435 0.085 0.007 0.071 0.017 6.1 CH (ball 1, 0 wickets) 0.018 0.626 0.191 0.047 0.012 0.100 0.006 4.5 CH (ball 271, 2 wickets) 0.030 0.285 0.472 0.094 0.008 0.088 0.024 7.1 CH (balls 271, 4 wickets) 0.025 0.297 0.454 0.103 0.008 0.091 0.022 7.1
again the matchup between the batsman Cook and the bowler Hossain. Suppose that Bangladesh has scored f = 250 runs in the first innings. Suppose further that ball b = 183 is about to be bowled and w = 3 wickets have been lost in the second innings (situation 2). From the Duckworth/Lewis table, we therefore have the proportion of resources lost x = 0.0027 due to the current ball and the proportion of resources lost y = 0.044 due to a wicket. This might be considered as a “middle point” of the second innings since the proportion of resources used is R(w, b) = 0.4993, and therefore, the proportion of resources remaining is r = 0.5007. In this case, the estimated parameters give E1(p) = 0.9723 and
E2(p) = 0.00393. We now investigate the outcome probabilities p0ijwbk when England has
scored s = 127, 90, 60 runs. When s = 127, then (f −s+1)/r = 247.7 ≈ E1(p)/E2(p) = 247.5
and England is on pace to draw the match, and p0 = p (i.e. no adjustment). When s = 90, Cook should become more aggressive (c = 0.768), and when s = 60, Cook should become even more aggressive (c = 0.648). The entries in Table 2.4 appear reasonable and support these tendencies.
We now compare actual runs versus simulated runs. For this, we consider the 23 matches between Sri Lanka and India from November 1998 through March 2007 in which Sri Lanka batted first. The 23 matches consist of 15 matches from the original dataset (2001-2006)
Table 2.4: Second innings batting probabilities p0 and the expected number of runs per over for the Cook/Hossain matchup when Bangladesh has scored f = 250 runs in the first innings. In the second innings, w = 3 wickets have been lost, ball b = 183 is about to be bowled and England has scored s runs.
England Runs (s) Dismissal Zero One Two Three Four Six Runs/Over
127 0.016 0.452 0.397 0.058 0.008 0.062 0.008 5.0
90 0.029 0.347 0.466 0.068 0.010 0.072 0.009 5.8
60 0.036 0.293 0.501 0.073 0.010 0.078 0.010 6.3
used for model fitting and 8 matches outside of the training period. We simulate 1000 first innings results for Sri Lanka based on representative batting and bowling orders employed during the time period. The resultant QQ plot comparing the actual runs and the simulated runs is given in Figure 2.2. We observe excellent agreement and we remark that satisfactory plots are also observed for other pairs of teams that we investigated. In comparing wickets taken, the actual results also compare favourably with the simulated results.
We also investigate the effect of the second innings adjustment. The difficulty in this exercise is that given the number of first innings runs between two teams, replicate observa- tions tend not to occur. Therefore, to address goodness-of-fit, we provide some evidence that the second innings adjustment p0 is an improvement over having the second innings team bat in a neutral fashion (i.e. p0 = p). Consider then simulated matches between Australia and the other 9 ICC teams where Australia is batting in the second innings and where the batting and bowling lineups resemble those used in the 2007 World Cup matches. We gen- erate first innings runs for the other teams, and then second innings runs for Australia with the proposed batting adjustment p0 based on the target scores. The simulation is repeated for 1000 hypothetical matches for each of the 9 teams. We observe that Australia uses their full 50 overs in 8.5% of the simulated matches. The small percentage seems sensible since Australia rarely uses all 50 overs in matches that they win. In matches that Australia
• • • • • • • • • • • • • • • • • • • • • • • simulated runs actual runs 150 200 250 300 150 200 250 300
Figure 2.2: QQ plot corresponding to first innings runs for Sri Lanka batting against India.
loses, at some point when they are falling behind, they become desperate (aggressive), and typically consume all of their wickets before using the allotted 50 overs. When we repeat the simulation with neutral batting in the second innings (i.e. Australia behave as they would in the first innings), Australia uses all 50 overs 13% of the time. To get a sense of the percentages using actual data, we look at all 83 matches from 2000 to 2006 where Australia batted second, and observe that Australia used the full 50 overs only 7% of the time. This suggests that there is merit in our modification of aggressiveness in second innings batting.
2.6
Addressing questions via simulation
Having developed a simulator for ODI cricket matches, there is no limit to the number and type of questions that may be posed. The greatest utility of the simulator occurs for circumstances in which there is limited empirical data. In these cases, without a simulator, the best that one can do is to rely on hunches with respect to the questions of interest. In this section, we give a flavour for the types of questions that might be posed. We see these
types of applications as being of value not only to cricket devotees but also to selection committees and team strategists. We note that each of the simulations described below requires less than one minute of computation.
2.6.1 Question 1.
Adam Gilchrist is often an opening batsman for Australia, and Australia has not played the West Indies often in recent history. We are interested in the probability of Gilchrist hitting a century as an opening batsman against the West Indies when Australia is batting in the first innings and the West Indies are using a bowlng lineup from the 2007 World Cup. Based on 1000 first innings simulations, Gilchrist reaches a century 5.1% of the time. The result appears consistent with Gilchrist’s actual batting performances where Gilchrist made a century 8 times as an opening batsman in 138 first innings ODI matches (5.8%) throughout his career (1996-2008).
2.6.2 Question 2.
England has occasionally sent Alastair Cook and Matt Prior as opening batsmen. In other matches, they used Ian Bell and Michael Vaughan as opening batsmen. We are interested in the performance in the untested opening partnership of Alastair Cook and Ian Bell. More specifically, we consider the length of the partnership (i.e. the number of overs prior to losing the first wicket) for Cook and Bell when they are batting in the first innings against Pakistan where Pakistan uses a bowling lineup comparable to the lineup used in their December 15/2005 match against England. In Figure 2.3, we provide a histogram of the number of overs in the length of their partnership based on 1000 simulations. We observe that the median and the mean length of the partnership is 5 overs and 7.1 overs respectively. It appears very unlikely for Cook and Bell to have a partnership exceeding 20 overs.
2.6.3 Question 3.
Consider a match between New Zealand and Sri Lanka where Sri Lanka is batting in the second innings and New Zealand has scored an impressive 300 runs in the first innings.
0 10 20 30 40 0 50 100 150 200 250 300
number of overs in partnership
frequency
Figure 2.3: Histogram of the length of the partnership (in overs) of the untested opening partnership of Alastair Cook and Ian Bell.
We are interested in the probability that Sri Lanka can overcome the barrier and win the match. Based on 1000 simulations, and taking batting/bowling lineups used in the 2007 World Cup matches, we observe that Sri Lanka wins 10.0% of the simulated matches. The result is consistent with Sri Lanka’s first innings performance in the current decade where Sri Lanka has scored over 300 runs in 9 out of 121 matches (i.e. 7.4% of the time).
2.6.4 Question 4.
Muttiah Muralitharan of Sri Lanka is a spin bowler and is widely regarded as one of the best bowlers in cricket. The question arises as to his value to the Sri Lankan team. Consider a match between Sri Lanka and India where India is batting in the first innings and India’s batting lineup is based on their 2007 World Cup team. We use a bowling lineup comparable to Sri Lanka’s bowling lineup in the 2007 World Cup where Muralitharan was a prominent bowler. Based on 1000 simulations, we observe that India scores 247 runs on average. When we make Muralitharan the only Sri Lankan bowler (which is of course against the rules),
then India scores only 185 runs on average. Clearly, if every bowler on Sri Lanka were as good as Muralitharan, Sri Lanka would have a much better team. When we instead replace Muralitharan with Upul Chandana (a more typical bowler), then India scores 267 runs on average.
2.6.5 Question 5.
Here is a crazy question that surely few people have contemplated. What would happen if we reverse a team’s batting order? We consider a batting order used by Australia in the 2007 World Cup matches. Based on 1000 simulations against each of the other 9 teams using their 2007 World Cup bowling lineups, Australia produces on average 272 runs, losing 6.3 wickets in 48.8 overs during the first innings. This compares favourably with empirical data over the data collection period where Australia produces on average 273 runs, losing 6.8 wickets in 49.8 overs during the first innings. When we reverse the Australian batting lineup, and simulate 1000 matches against each of the other 9 teams, Australia produces on average 234 runs, losing 7.3 wickets in 48.4 overs during the first innings. A simple explanation for the difference in expected runs is that higher-scoring batsmen tend to be placed at the beginning of the lineup. When the batting order is reversed, they often do not get an opportunity to bat or they bat for shorter periods of time.
2.6.6 Question 6.
We now determine the probability that one team defeats another team. Using typical batting and bowling lineups taken from matches during the 2007 World Cup of Cricket, Table 2.5 provides estimated probabilities based on 10000 simulations between each pair of ODI teams. Accordingly, Australia is clearly the best team, and Bangladesh and Zimbabwe are the weakest teams. The probabilities for the other teams roughly agree with the authors’ beliefs although we note that the probabilities are sensitive to the choice of the batting and bowling lineups. Referring to the row and column averages, we observe that the probability of winning is nearly the same for first and second innings batting. This corresponds to the observations of de Silva and Swartz (1997) and provides a justification for the tuning parameter η introduced at the end of Section 2.4.
Table 2.5: Estimated probabilities of the row team defeating the column team where the row team corresponds to the team batting first. The final column are the row averages and correspond to the average probabilities of winning when batting in the first innings. The final row are the average probabilities of winning when batting in the second innings.
Batting First Batting Second
Aus Bang Eng Ind NZ Pak SA SL WI Zimb Average
Aus 0.88 0.69 0.65 0.72 0.81 0.58 0.64 0.74 0.96 0.74 Bang 0.14 0.29 0.23 0.34 0.42 0.19 0.25 0.35 0.77 0.33 Eng 0.29 0.75 0.43 0.54 0.61 0.36 0.43 0.54 0.89 0.54 Ind 0.33 0.75 0.54 0.57 0.65 0.40 0.46 0.57 0.89 0.57 NZ 0.30 0.81 0.50 0.44 0.64 0.40 0.46 0.59 0.92 0.56 Pak 0.16 0.58 0.31 0.27 0.37 0.22 0.28 0.37 0.78 0.37 SA 0.44 0.89 0.63 0.57 0.71 0.77 0.60 0.72 0.96 0.70 SL 0.37 0.81 0.58 0.52 0.61 0.70 0.46 0.63 0.92 0.62 WI 0.23 0.67 0.41 0.34 0.47 0.55 0.30 0.36 0.85 0.46 Zimb 0.05 0.33 0.12 0.10 0.15 0.21 0.08 0.12 0.16 0.15 Average 0.74 0.28 0.55 0.61 0.50 0.40 0.67 0.60 0.48 0.12