A Strategy to Predict Association
football players' passin
g
skills
Jorge Tovar
Andrés Clavijo
Julián Cárdenas
Documentos
CEDE
ISSN 1657-7191 Edición electrónica.
No.
6
3
Serie Documentos Cede, 2017-63
ISSN 1657-7191 Edición electrónica.
Noviembrede 2017
© 2017, Universidad de los Andes, Facultad de Economía, CEDE. Calle 19A No. 1 – 37 Este, Bloque W.
Bogotá, D. C., Colombia Teléfonos: 3394949- 3394999, extensiones 2400, 2049, 2467
infocede@uniandes.edu.co http://economia.uniandes.edu.co
Impreso en Colombia – Printed in Colombia
La serie de Documentos de Trabajo CEDE se circula con propósitos de discusión y divulgación. Los artículos no han sido evaluados por pares ni sujetos a ningún tipo de evaluación formal por parte del equipo de trabajo del CEDE. El contenido de la presente publicación se encuentra protegido por las normas internacionales y nacionales vigentes sobre propiedad intelectual, por tanto su utilización, reproducción, comunicación pública, transformación, distribución, alquiler, préstamo público e importación, total o parcial, en todo o en parte, en formato impreso, digital o en cualquier formato conocido o por conocer, se encuentran prohibidos, y sólo serán lícitos en la medida en que se cuente con la autorización previa y expresa por escrito del autor o titular. Las limitaciones y excepciones al Derecho de Autor, sólo serán aplicables en la medida en que se den dentro de los denominados Usos Honrados (Fair use), estén previa y expresamente
establecidas, no causen un grave e injustificado perjuicio
a los intereses legítimos del autor o titular, y no atenten contra la normal explotación de la obra.
Universidad de los Andes | Vigilada Mineducación
A strategy to predict association football players’ passing skills
Jorge Tovar*
Andrés Clavijo†
Julián Cárdenas‡ Abstract
Transfers are big business in association football. This paper develops a
generalized additive mixed model that aids managers in predicting how
a football player is expected to perform in a new team. It does so by
using event‐level data from the Spanish and the Colombian football
leagues. Using passes as a performance proxy, the model exploits the
richness of the data to account for the difficulty of each pass attempt
performed by each player over an entire season. The model estimates
are then used to determine how a player transferred from the
Colombian league should perform in the Spanish league, taking into
account that teammates and rivals’ abilities are different in the latter.
Keywords: Generalized additive mixed models, football, sports forecasting, passing.
JEL: C53, Z21, Z22
* Associate professor, economics department, Universidad de Los Andes (Bogotá – Colombia). Email: jtovar@uniandes.edu.co. Website: http://economia.uniandes.edu.co/tovar
Una
Estrategia
para
Predecir
la
Habilidad
en
el
Pase
de
Jugadores
de
Fútbol
Jorge Tovar*
Andrés Clavijo†
Julián Cárdenas‡ Resumen
Las transferencias de jugadores es uno de los grandes rubros en el
millonario negocio del fútbol. Este trabajo desarrolla un modelo
generalizado aditivo mixto para predecir cómo se espera que se
desempeñe un jugador de fútbol en un equipo nuevo. Con base en
datos a nivel de evento de la liga española y colombiana se utilizan los
pases como proxy de desempeño. El modelo explota la riqueza de los
datos para controlar por la dificultad de cada pase que realiza un
jugador a lo largo de una temporada. Una vez se estima el modelo, se
utilizar para establecer como un jugador transferido de Colombia a
España debería desempeñarse, teniendo en cuenta que allí encontrar
nuevos compañeros y rivales.
Palabras claves: Modelos general aditivo mixto, fútbol, predicción deportiva, pases.
JEL: C53, Z21, Z22.
* Profesor asociado, departamento de economía, Universidad de Los Andes (Bogotá – Colombia). Correo
electrónico: jtovar@uniandes.edu.co. Página Web: http://economia.uniandes.edu.co/tovar
Introduction
Transfers are a big business in association football.1 According to the FIFA 2017 Global Transfer
Market Report, in 2016 there were 14.591 international transfers for a total value of USD 4.79 billion.
The South American Market is a major player in this scenario, representing 17% of the World total
transfers. In 2016, there were 685 direct transfers from South America to European Leagues for a total
of USD 101 million.2 Players move across countries and continents and are expected to perform
according to the investment that they represent. However, managers at their destination teams have
little quantitative information on how a player should perform given the new environment it faces. This
paper contributes to filling this gap.
To predict the performance of any given player in a new environment, we use detail event‐level
data. The use of such data in football is not a new phenomenon. In fact, the analysis of detailed match‐
level data in football goes back, at least, to Reep and Benjamin (1968). Nevertheless, as stated by the
New York Times in 2010 “people in soccer have historically paid little attention to statistics, arguing that
the only way to judge players is to see them in action.”3 Consequently, the engagement of football clubs
with event‐level data widely differs from that observed in American sports, particularly baseball.
However, times have changed, and the use of sophisticated event‐level data has gained
importance over the past decade in football. Still, with very few exceptions, (maybe the Danish club
Midtjylland a remarkable exception), there has been no Moneyball cases where data analytics has
successfully driven a team’s performance.4
The lack of successful experiences has much to do with the need to understand the analytics of
football better. Currently, a wide range of European football clubs uses data analytics to complement
the traditional training and game process. The use of such data in other areas of the world, mostly due
to the lack of data and resources, is much scarcer. Consequently, countries such as England understand
1 Association football refers to what commonly is known as football in Europe and Latin America, and soccer in the
United States. In this paper, henceforth, we will refer to it simply as football.
2 A number of South American players are transferred within Europe and arrived to Europe from other regions. 3 The New York Times, visited September 4th, 2017:
http://www.nytimes.com/2010/07/09/sports/soccer/09soccerstats.html
4 Midtjylland is a team that won the 2014/15 Danish Superliga under the hypothesis that a football club could be
much better how its football performs, while others, Latin American countries in particular, know little
about the analytics behind their game.
This paper adds to the literature (and to the football industry) by estimating and implementing a
model that offers a quantitative measure of how a player should perform when transferred to a new
league. It does so by evaluating the passing ability of football players in the case of a little‐studied
football competition. Specifically, it reviews a strong South American league recently ranked by the
International Federation of Football History and Statistics (IFFHS) as the second‐best league in 2016.5
The Colombian league might be by almost any standard inferior to the major European football leagues,
but the high position in the ranking does suggest that football beyond the traditional five associations is
worth studying.6 To evaluate how Colombian players should perform when moving to Spain, we need to
estimate the model for LaLiga, the Spanish first division football league, undoubtedly one of the top
tournaments in the World.
We estimate the probability of a pass being successful using a generalized additive mixed model
as proposed by Lin and Zhang (1999). This particular estimation technique, to the best of our knowledge,
has only been used in the past by Szczepański and McHale (2016) and McHale and Relton (2017), in both
cases using data for the English Premier League. The technique, ideal because it takes into account the
difficulty of any given pass, has never been implemented using data from Spain or any South American
league. The model in the past has been used to understand the player´s passing abilities and to use the
estimated probabilities as weights in a passing network. We test the validity of the model and once
finding that the model accurately predicts the player’s passing skills move on to implement a practical
application.
The novelty of our paper relies on the model’s ability to predict how a player that participated in
the 2016‐I Colombian League should perform over the following season in another league, the 2016/17
Spanish LaLiga.7 The model generates predictions for all players that participated in the Colombian
league. That is, our strategy can benefit potential employers as they can anticipate how a given player is
expected to perform under a new environment. In practice, during our sample period, three players
moved from Colombia to Spain. For each of these players, we analyze in detail the passing ability
predictions and compare them to how they actually performed.
5 IFFHS link visited September 5th, 2017: http://iffhs.de/strongest‐national‐league‐world‐2016‐spain‐since‐2010/ 6 The top five leagues include the English, French, German, Italian and Spanish domestic competitions.
The use of event‐level data is relatively new, mainly when using South American data. There
have been detailed data‐based studies of South American football in the past (see for instance Flores et
al., 2012 or Tovar, 2014) but, to best of our knowledge, no approach as the one implemented in this
paper has been used in the past. The relevancy of the paper exceeds academia as the technique is of use
to the football industry. South America is the source of some of the best players in the football industry
that, bred in local clubs, will eventually end up playing in Europe’s top leagues. A better understanding
of the abilities of these players while playing in South America, and predicting how their passing
performance abroad should be, will aid European clubs to optimize their investment.
Data
We use proprietary data provided by www.golyfutbol.com for the Colombian League, and LaLiga, the Spanish football league.8 In both cases, the data is available for one complete season. The
Spanish League, available for the 2016/17 season, is played under the traditional format of most
European leagues: a double round‐robin. The Colombian League, with a new champion every six
months, follows a single round‐robin schedule.9 The eight top teams’ then compete in knockout playoffs
(with home and away games) until a champion is declared. We use data for the first semester of 2016. We have data available for 208 games in the Colombian League, and 378 in the Spanish League.
The data provided follow all events such as passes, shots, goals, interceptions, and recoveries. It gives us
the precise x,y coordinates on the pitch, and records the minute and second it occurred. In that sense, it
is similar to the dataset used in Szczepański and McHale (2016). It primarily differs, however, with the
detail tracking data used in McHale and Relton (2017) which captures each player’s position ten times
per second. Using our data we identify when and where a player passes the ball, but we do not know
where players (either teammates or opponents) surrounding the man with the ball are.
The key event used in the paper refers to the number of pass attempts. For Colombia, we have
20 teams, 536 players and 101,573 (correct and incorrect) passes. The corresponding figures for LaLiga
are 20, 535 and 234,827.
We expect the Spanish League to be superior in quality than its counterpart. The passing skill of
a player is arguably one of the most reliable indicators of player performance in a given game (See for
8 Data Factory originally collected the data.
instance Anderson and Sally, 2013 or Tovar, 2014). Table 1 suggests that measured by the number of
passes per game, the game is faster and precision is higher in Spain.
Table 1
Passing ability: comparative across competitions
A primary objective of the paper is to control for the pass’ difficulty. It is relatively well‐known
that, on average, the chance of completing a pass differs by the players’ position. Defenders tend to face
little opposition, while forwards encounter stronger resistance as they are closer to the opposing goal.
Table 2
Passing accuracy
Table 2 shows that LaLiga players pass with greater accuracy than those playing in the
Colombian League regardless of the position. The raw figures in this section reveal the expected
strength of the Spanish League relative to its South American counterpart.
The
Model
To estimate the probability that a player k, k=1..K, successfully makes a pass, recall that our data
is defined at the event‐level, i.e., each observation is a pass attempt by player k, at game g, g=1…G, in
any of the two leagues considered.10 In other words, we are observing a group of players’ actions
(passes) over time (the same game), under the same conditions, but the success probabilities vary from
player to player. To deal with this type of data, we would need to estimate a generalized linear mixed
model (GLMM) which uses a parametric mean function to model covariates effects while adding random
effects to deal with overdispersion and correlation (Lin and Zhan, 1999).
10 We estimate a separate model for each competition.
Competition Percentage of successful passes Number of passes per game
LaLiga 16/17 87.66% 618
Colombian League 2016‐I 84.08% 475
Source: www.golyfutbol.com. Own calculations
Competition Goalkeeper Defender Midfielder Forward
LaLiga 16/17 86.03% 88.72% 88.11% 82.72%
Colombian League 2016‐I 83.06% 84.82% 85.70% 79.72%
The available data is precise enough to position each pass (whether successful or unsuccessful)
on a specific coordinate on the pitch. For covariates such as this, an appropriate functional form is
unlikely to be known in advance. The need to model non‐parametrically this type of covariates calls for
the use of a generalized additive mixed model (GAMM), as developed by Lin and Zhan (1999) and
implemented for the English Premier League by Szczepański and McHale´s (2016), and McHale and
Relton´s (2017).
Let oi denote the outcome for the ith pass where oi = 1 if the pass is successful and oi = 0 if the
pass is unsuccessful. Assuming that the distribution of pass outcomes follows a Bernoulli distribution
with the probability of success represented by the inverse logit function of the linear predictor
i, wehave:
ok
i
~ Bernoulli
piwhere
ii i
p
exp
1
exp
The linear predictor
i is given by:
iend i iend
i
W
i
Z
ib
s
x
i,
x
,,
y
,
y
,
[1]where Wiis a row vector of covariates and Ziis a design matrix selecting the elements of the random
effects vector
b
that corresponds to the ith pass executed by a given player. Lastly, s is a smoothfunction used to model the coordinates of the passing event, i.e., origin and destination.
The
W
matrix includes in principle the distance of the pass, whether it is a forward pass, gametime, the pass number in the current sequence of passes, the time on the ball the player had it before
passing and an indicator variable on whether his team was playing at home or away.
The chosen variables are expected to capture various factors that influence the ability to pass
probability of successfully achieving a pass.11 A forward pass is an indicator variable indicating whether
or not the pass intended to move the ball closer to the opposition goal line. The larger the sequence of
passes proxies for the pressure that the opposition players place on the player attempting to pass the
ball.
Typically, the passing precision will increase if the player has the ball for a few extra seconds,
enough to control, look and pass. However, if the player retains the ball for too long, presumably
opponents’ players will increase pressure making it harder to pass the ball successfully. Consequently,
for the time on the ball, we expect a quadratic relation.
There is a well‐known home advantage effect in football where teams playing at home tend to
perform better (Carmichael and Thomas, 2005). Given the overwhelming historical evidence of such
effect, the home/away indicator variable is included despite the existence of recent research suggesting
that, under certain circumstances, the home advantage effect can disappear (Krumer and Lechner,
2017).12
The random effects vector
b
is composed of three elements: the passing ability of each player inthe sample, the ability of the passing player’s team and the ability of the opposition to impede the pass
execution. In other words, it captures the individual’s passing skill, the team ability to facilitate the kth
player pass and the capacity of the opposition to obstruct such pass.
The last component in equation (1) is modeled as a tensor product smooth to deal with multiple
inputs that in this case represent each i pass’ origin and destination by x,y coordinates. The player’s
position on the pitch proxies for the pressure faced by the passing player and the receiving player, as
well as the difficulty of the associated pass with the location on the pitch and the pass’ distance.
The
results
We estimated the GAMM model described in the previous section separately for each of the
competitions considered using the mgcv package (Wood, 2006) in R. The variables are included in each
specification depending on whether they are statistically significant or not.
11 The pitch is normalized between [0;1] along the sideline and between [0;0.65] along the goal line. This is based
on the standard size of a pitch, 105 meters long and 68 meters wide. Although not all pitch surfaces measured
exactly the same, we searched on the Internet the size of most of the pitch included in our datasets, and a vast
majority has, relatively, the same proportions.
12 Based on Bundesliga data, they find that the home advantage disappears when the game is in the middle of the
Table 3
Estimates of the parametric model (
elements in equation 1)
The results reported in Table 3 show the expected signs for all parametric model terms
contained in the vector
. We find that the probability of successfully passing the ball increases ifplaying at home, as the game advances, and the longer the sequence of passes. The longer a player is in
control of the ball, the higher the chances of successfully passing but only for so long. As hypothesized, if
the player holds onto the ball for too long, the probability of correctly passing the ball decreases. Finally,
when statistically significant, the likelihood of success falls when the ball is passed forward. The distance
is not statistically significant, most likely because its effects are captured by the smooth tensor term for
the origin and destination, which is strongly statistically significant.
Comparative
analysis
To assess the model’s validity, we first estimate the predicted pass completion rate by
substituting the model estimates (fixed parameters, random effects, and the smooth function) into the
corresponding components of equation (1). The resulting pass completion rate for all passes attempted
is averaged and scaled over players to obtain the predicted passing rate for each k player.
Variable Estimate Std. Error z‐Value P‐Value Estimate Std. Error z‐Value P‐Value
Home dummy 0.18 0.02 10.26 0.00 0.16 0.02 6.46 0.00
Minute of the pass 0.00 0.00 10.09 0.00 0.00 0.00 9.68 0.00 Number in the pass
sequence 0.02 0.00 7.11 0.00 0.03 0.00 7.50 0.00
Time to execute pass 0.33 0.02 13.57 0.00 0.25 0.04 6.90 0.00 (Time to execute pass)2 ‐0.05 0.00 ‐10.88 0.00 ‐0.03 0.01 ‐5.38 0.00
Forward pass ‐0.11 0.03 ‐3.40 0.00
Intercept 1.77 0.06 29.95 0.00 1.38 0.07 19.72 0.00
Source: www. golyfutbol.com. Own calculations
Figure 1
Per player predicted vs. observed passing rate
(a) LaLiga (b) Colombian League
Source: www.golyfutbol.com. Own calculations.
Figure 1 compares the predicted with the observed pass completion rate. On average the model
predicts with 90% accuracy LaLiga, and with 85% accuracy the Colombian League. The predictive power
is in line with Szczepański and McHale´s (2016) estimates for the English Premier League.
The nonparametric additive function included in equation (1) controls for the origin and
destination coordinates of each pass, whether successful or unsuccessful. We visualize the observed
impact of the smooth function s, on the passing ability rate by fixing a point of origin and depicting the
predicted probability for each pass. Specifically, Figure 2 sets the pass’ source in the center of the pitch
and using the estimated linear predictor, determines the chances of successfully passing the ball for a
central midfielder.13
A central midfielder tends to be more accurate when passing towards his own half than towards
the opponents half, i.e., it is easier to pass the ball backward than forward. Figure 2 also highlights that
open passes tend to be more successful than passes towards the end zone, not a surprising result since
much of the opponents’ objective when not in possession is to impede the ball to reach their own goal.
Consequently, the chances of successfully passing the ball towards the opposing goal are slim.
Figure 2 also reveals some differences across competitions. Not surprisingly, the Spanish league
has a higher passing precision, mainly when passing backward or when opening the ball to the wings.
Remarkably, a player in the Colombian league has a surprising difficulty passing the ball towards his own
13 Using all available events in the dataset (goals, assists, passes, recoveries, etc), a player is defined as a center
goal: the predicted probability in this case, near the own goal, is smaller than in LaLiga. Similarly, an
open offensive pass in the Colombian League is expected to be less successful than one executed in
Spain.
Figure 2
Value of the linear predictor by origin of the pass and player position
Center Midfield originating the pass from the center of the pitch
(a) LaLiga (b) Colombian League
Note: X represents the pitch’s length. Y, the width. X2 and Y2 are the destination coordinates of the
attempted pass by length and width. The origin represents the right corner of the player’s team own
goal line.
Source: www.golyfutbol.com. Own calculations.
One can conclude that a center midfielder with the ability to successfully pass towards the
penalty box (say the penalty spot) would be among the best (and presumably amongst the most
valuable) players in a given league. Indeed, according to our model, we find evidence that Tony Kroos
(Real Madrid) and Steven N’Zonzi (Sevilla) were the two most successful players in doing this in LaLiga
during the season considered. Correspondingly, for Colombia, the name that pops out is Mayer Candelo,
a locally well‐known name. Candelo, 11 caps with Colombia, was 39 years old during our sample period.
He is arguably the last of the traditional South American playmakers in the Colombian league. Like
others in the past, Carlos Pibe Valderrama the most renown, Candelo is physically slow but mentally
fast. Such characteristics made it possible for him to achieve great performance at a relatively advanced
age.
The annex presents the linear predictor for additional positions (Figure A.1 and Figure A.2). As
center back in Colombia has a hard time passing successfully towards the opponents half. Similarly, a
Colombian center forward is very erratic when attempting anything beyond a horizontal pass.
Among the model’s strengths is its capacity to take into account that not all passes are made
equal. To calculate the ease of passing, we estimate the linear predictor setting each player’s random
coefficient to zero. The latter implies that the player‐specific passing abilities are excluded when
determining the probability of successfully passing the ball, resulting in the predicted pass completion
rate by an average player.
Figure 3
Ease of pass (average)
(a) LaLiga (b) Colombian League
Note: The horizontal dotted lines represent, from left to right, the 25, 50, and 75 percent quartiles.
Source: www.golyfutbol.com. Own calculations.
Figure 3 depicts the distribution of the probability of a successful pass by an average player.14
The further to the right the higher the likelihood of success, i.e., the easier the pass is. Similar to earlier
findings by Szczepański and McHale (2016), the distribution is highly skewed to the right. Comparatively
speaking there are significant differences between the Spanish and the South American competition as
passes in the latter tend to be easier than in LaLiga.
It is well‐known that the passing success varies by the nominal position held by a given player.
Figure 4 depicts the predicted pass completion rate per player’s position. As expected forwards and
midfielders’ pass are less prone to succeed, particularly true in Colombia.
Figure 4
Ease of pass (by player position)
(a) LaLiga (b) Colombian League
Note: The horizontal dotted lines represent the median
Source: www.golyfutbol.com. Own calculations.
This section’s findings can be summarized in Figure 5 which shows that, as hypothesized, LaLiga
players tend to have superior passing abilities than players in the Colombian League.
Figure 5
Predicting
passing
performance
The model has the capacity to predict the passing performance of any player when transferred
abroad. We derive some examples based on the estimates for the Colombian League and contrast the
prediction with actual performance. Specifically, we take the player’s estimates before moving to Spain
and, using those figures and given the team he is transferred to and the rivals he will face in Spain, we
calculate the predicted passing completion rate in his new Spanish team.
The three players considered competed in the Colombian League during the first semester of
2016 and moved to Spain for the 2016/17 season. Daniel Torres (midfielder) moved from Independiente
Medellín to Alaves, Marlos Moreno (forward) from Atlético Nacional to Deportivo de La Coruña, and
Rafael Santos Borré (forward) from Deportivo Cali to Villarreal. Note that the number of players that
moved from Colombia to Spain is irrelevant. The strength of the model is its ability to generate
predictions for any given player in the Colombian league.
The first step to determine their prospect passing performance in Spain consists of estimating
the linear predictor based on the Colombian model, setting to zero the random effects corresponding to
the own team’s ability to facilitate passes and the opposition to prevent them. These results are
averaged for each of the three players considered. The resulting predicted pass completion rate implies
that the expected passing skills depend solely on each players abilities, not on how his team aids in his
performance or on how the rivals obstruct his passing intentions. Lastly, for each of the three players,
we add his team and the opponents’ random effects for each game they can potentially play during the
2016/2017 LaLiga. The result, scaled using the inverse logit function, is the linear predictor considering
the player’s skills, and his new teammates and rivals abilities to facilitate and impede the player's passes
respectively.
Figure 6 depicts the average fixture‐specific predicted probability calculated as described above.
There are two observations for Villarreal, Deportivo, and Alavés because the players considered played
in one of these teams. Torres has the highest predicted probability consistently while Marlos Moreno
has the lowest. The predicted passing performance is relatively weak against Barcelona, consistent with
passing of their opponents. It should come as no surprise that the order of the teams in Figure 6 follows
the x‐axis in Figure A.3, which represents the team’s ability to prevent the opponents passes. The
magnitude of the fixture‐specific prediction is determined by each player’s skills and the team’s ability to
facilitate passes, i.e., the y‐axis in Figure A.3.
Figure 6
Source: www.golyfutbol.com. Own calculations
The passing performance estimates in Figure 6 are the ones that the Spanish managers would
receive as a quantitative indicator of what he should expect from each player analyzed. During the
2016/17 season, Torres was on the pitch for an average of 74 minutes over the 20 games he played for
Alavés. Moreno averaged on the pitch 45 minutes over the 18 games he played, and Santos Borré was
fielded 16 times for an average of 24 minutes per game.
Figure 7 presents the result of games where they participated, and the observed pass
completion rate is less than one.15 The continuous line in the graph represents the identity line
indicating that the predicted probability of successfully passing the ball is the same as the observed pass
completion rate.
Consider the case of Daniel Torres initially. The model predicts, for the most part, a fixture‐
specific probability of 90.4%. Indeed, he did achieve in most games around 90% pass effectiveness. Over
15 For a number of games they just played a few minutes during which they just had time to pass once or twice
the first six matches, Torres completed the entire game in five of them. In matchday 1, Alavés played
Atlético de Madrid. The model predicted an 89.9% passing accuracy and Torres delivered a passing
precision rate of 87.5%. In matchday 3, playing away against the mighty FC Barcelona, the model
predicts a passing success rate of 86.8%. His observed performance was 79%. Later on in the season,
when facing Barcelona in Vitoria (home game), the model predicted 89.7% passing accuracy. Torres’
observed rate was 88.8%. The Camp Nou, Barcelona’s stadium, sits comfortably near 100,000
spectators, the largest crowd in Europe, the second in the World. It could be argued that Torres,
relatively new in the Spanish league at the time of the match, was particularly impressed by the Camp
Nou’s atmosphere, leading to his relatively poor performance. His experience in Barcelona was useful
later on in the season, not only when Barcelona visited Vitoria (Alavés’ home city) but also when they
faced home Real Madrid, the other Spanish giant. The model predicted a 91.5% passing accuracy, pretty
close to the observed 92.7%. On average, Torres’ successful passing rate was as expected. His
unsurprising passing performance explains in great part why he was a starter in 17 games and why he
was kept in the Alavés squad for the following season.
Figure 7
Note: The continuous diagonal line is the identity.
Source: www.golyfutbol.com. Own calculations.
The model, for Marlos Moreno, estimates an average successful passing probability of 87.5%.
The observed average passing completion rate was 92%. However, Moreno was a starter in only nine
15 minutes and made more than five pass attempts, the observed percentage of successful passes drops
to 85.6%. Indeed Figure 7, which considers all games with a passing rate smaller than one, shows that
for games where Moreno actively participated, he, for the most part, underperformed. Added this to his
goalless record explains much on why he was unable to succeed in La Coruña.
Considering only games where Santos Borré played more than 15 minutes, i.e., when he actively
participated in the game, the observed percentage of successful passes was 83.2%. A starter in just two
games his predicted passing accuracy was 88.3%. Figure 7 shows that whenever his passing rate was
below one, he always underperformed. The latter and the fact that he only scored two goals explains in
part why Santos Borré had to move back to South America, specifically River Plate (Argentina).
The analysis suggests that players who underperform the model’s predictions have trouble to
make it to the starting eleven and to remain in the squad the following season. In our examples, only
Torres performed as the model expected. He played a large number of games, was part of the starting
eleven in most games where he played and was part of the squad for the following season. Moreno and
Santos Borré, both of who underperformed the model's expectations were dismissed at the end of the
season.
Conclusions
This paper estimates a generalized additive mixed model using event‐level data from the
Spanish and the Colombian football leagues. The model accounts for the player’s characteristics as well
as the teammates and the opposition ability to ease and deter the players’ passing effectiveness
respectively. It also models, nonparametrically, the origin and destination coordinates. In other words,
the model takes into account various characteristics of the game to assess the difficulty of the pass.
Once the model estimates are available, we show that the model predicts the passing ability accurately
in both leagues.
The model’s strength is its capacity to anticipate the passing performance of players when
transferred from one league to another. In an industry with millions of dollars at play, access to an
In practice, a manager will be interested in the passing performance of a player during the
following season, t+1, given the player’s performance in his home league during period t. To deliver the
predictions, the model needs to be also estimated using data from the destination league in period t.
This poses no challenge. The model will be estimated during the same season for the origin and the
destination leagues. In our example, it implies that the model is calculated for the first semester of 2016
using data from the Colombia league and using data for the 2015/16 Spanish season. The player’s
destination team and its opponent’s random effects need to be extracted from the latter and plugged
into the linear predictor (as discussed in the text) using the estimates from the Colombian model. The
implicit assumption is that in the destination league, the team’s ability to facilitate and prevent passes
does not change radically from one season to another. For newly promoted teams, where no random
effect predictions are available, the estimates used will be those of the relegated teams, as done
previously in the literature.
The model has, of course, some limitations aside from the obvious data requirements in both
the origin and destination league. Passing is probably the most important skill that a football player
should have. Bill Shankly, the legendary Liverpool manager, used to say: “Above all, the main aim is that
everyone can control a ball and do the basic things in football. It’s control and pass … control and pass …
all the time”.16 However, it is not the only relevant indicator to determine if a player is performing as
expected. In that sense, our model is limited to one particular, but probably the most important
performance proxy.
References
Anderson, C., and Sally, D. 2013. The Numbers Game. Why Everything You Know About Soccer is Wrong.
Penguin Books. New York.
Carmichael, F. and Thomas, D. 2005. Home‐Field Effect and Team Performance. Evidence from English
Premiership Football. Journal of Sports Economics, Vol. 6(3): 264‐281.
Flores, R., Forrest, D., and Tena, J.D. 2012. “Decision taking under pressure: Evidence on football
manager dismissals in Argentina and their consequences” in European Journal of Operational Research,
Vol. 222: 653‐662.
Krumer, A., and Lechner, M. 2017. “Midweek effect on Soccer Performance: Evidence from the German
Bundesliga” in Economic Inquiry (forthcoming).
Lin, X., and Zhang, D., 2017. “Inference in generalized additive mixed models by using smoothing
splines” in Journal of the Royal Statistical Society. Series B, Vol 61(2): 381‐400.
McHale, I. and Relton, S. D., 2017. “Identifying key players in soccer teams using network analysis and
pass difficulty.” Mimeo.
Reep, C., and Benjamin, B., 1968. “Skill and Chance in Association Football” in Journal of the Royal Statistical Society. Series A (General), Vol. 131 (4): 581‐585.
Szczepański, L., and McHale, I., 2016. “Beyond completion rate: evaluating the passing ability of
footballers” in Journal of the Royal Statistical Society. Series A, Vol 179 (2): 513‐533.
Tovar, J. 2014. “Gasping for Air: Soccer players’ passing behavior at high altitude” in Journal of Quantitative Analysis in Sports 10(4): 411‐420.
Wilson, J. 2008. Inverting the pyramid: the history of football tactics. Orion books.
Wood, S. (2006) Generalized Additive Models: an introduction with R. Boca Raton: Chapman and Hall‐
CRC
Appendix
Figure A.1
Value of the linear predictor by origin of the pass and player position
Center back originating the pass from just outside the player’s team own box
(a) LaLiga (b) Colombian League
Note: X represents the pitch’s length. Y, the width. X2 and Y2 are the destination coordinates of the
attempted pass by length and width. The origin represents the right corner of the player’s team own
goal line.
A center back is defined as the player that, given all available events, plays in the team’s defense, between the right and left backs.
Figure A.2
Value of the linear predictor by origin of the pass and player position
Center forward originating the pass from just outside the middle of the opponents’ box
(a) LaLiga (b) Colombian League
Note: X represents the pitch’s length. Y, the width. X2 and Y2 are the destination coordinates of the
attempted pass by length and width. The origin represents the right corner of the player’s team own
goal line.
A central forward is defined as the player that, given all available events, plays in the team’s offense,
between what would be the right and left wings.
Source: www.golyfutbol.com. Own calculations.
Figure A.3