^{1}

^{*}

^{2}

^{1}

^{3}

^{4}

Conceived and designed the experiments: AH CM OR NH BS. Performed the experiments: AH CM OR NH BS. Analyzed the data: AH CM OR NH BS. Contributed reagents/materials/analysis tools: AH CM OR NH BS. Wrote the paper: AH CM OR NH BS.

The authors have declared that no competing interests exist.

Whether a coach dismissal during the mid-season has an impact on the subsequent team performance has long been a subject of controversial scientific discussion. Here we find a clear-cut answer to this question by using a recently developed statistical framework for the team fitness and by analyzing the first two moments of the effect of a coach dismissal. We can show with an unprecedented small statistical error for the German soccer league that dismissing the coach within the season has basically no effect on the subsequent performance of a team. Changing the coach between two seasons has no effect either. Furthermore, an upper bound for the actual influence of the coach on the team fitness can be estimated. Beyond the immediate relevance of this result, this study may lead the way to analogous studies for exploring the effect of managerial changes, e.g., in economic terms.

Fred Everiss, responsible for the soccer team of West Bromwich Albion (UK) coached his team over 46 years (1902–1948) without any interruption. This is probably the all time world record for coaches in professional soccer. In Germany, for instance, Volker Finke is the record holder. He coached the professional soccer team of SC Freiburg for almost 16 years (1991–2007) without interruptions (German record), although due to the relegation into the Second German soccer league his team had to leave the Premier German Soccer league (the so called “Erste Bundesliga”, established 1963) four times. However, such loyalty is very unusual in professional team sports. Frequently, the usual response to a continuing series of recent lost matches is to dismiss and replace the coach mid-season. For example in the German Bundesliga the club “Eintracht Frankfurt” is leading in dismissing a coach during mid season (20 times in 47 years of the German Premier soccer league). Fired coaches are often hired by competitors who also dismissed the coach. For example, Gyula Lorant as well as Joerg Berger are the most often dismissed head coaches in the German Bundesliga (six times each).

The reason to fire a coach mid-season

Already in 1964

Many of these studies focused on coach dismissals in professional soccer in different
national leagues. These studies disagree with respect to the final result as well as
the used research design. Partly these results have to be questioned due to design
problems like a sub-optimal choice of the performance criterion

Heuer and Rubner

Most importantly, a team's fitness remains just about constant throughout a
season. Any variations during the season are due to temporal fluctuations (like
weather conditions, red cards) whereas systematic variations mainly occur
between different seasons

We analyze the Premier German soccer league (as we already mentioned, the so-called German “Erste Bundesliga”) which started in the season 1963/64. We consider all mid-seasonal coach dismissals (CDs) for all 46 seasons until 2008/09. Almost in each season every team has to play 34 games (except the three seasons 1963/64, 64/65 as well as 1991/1992). The entire data set covers 14,018 games. Since during the first decades of the Bundesliga several matches have been adjourned due to weather conditions etc. it is essential to take into account the correct order of matches for each team. The key procedure of our approach can be summarized as follows

To be able to quantify possible fitness variations due to the CD we
require that before and after the CD the team plays at least
m = 10 matches in that season, i.e.
10≤t_{CD}≤24 where t_{CD} is the match day
just before the CD. During the m = 10 matches
before the CD no other CD is allowed. Our final data basis contains 154
CDs out of 361 mid-seasonal CDs in total. To first approximation the CDs
are equally distributed in the time interval 10≤t_{CD}≤24
with an average value of around 17.

To quantify the effect of a CD we choose an appropriate control group.
For a specific CD event, occurring after match day t_{CD} (by
construction t_{CD}≥10), we identify all events where some
other or the same team during any season displays a similar goal
difference (more specifically with a difference of the goal difference
ΔG between control team and CD team in the interval
[0.185,−0.215]) during t_{CD} subsequent matches
and has still at least 10 matches to play after this time interval. The
minor asymmetry of the selection interval for control teams guarantees
an identical average value of ΔG of control and CD teams and just
reflects the Gaussian-type distribution of ΔG -values around zero

Going beyond most previous studies we have also corrected the
home/away-asymmetry

Our procedure implies some important methodological aspects that have to be kept in mind:

The value of m = 10 has been selected by the condition that the final result displays a minimum error. In case of a larger interval the number of CDs would be smaller, in case of a smaller interval the characterization of the team fitness would be worse.

A few times it occurs that within the m = 10 matches a new coach is already replaced by another coach. Sometimes this is planned (in case of a caretaker coach) or is the consequence of successive bad performance. As implied by our approach we have in that case incorporated the first CD but not the second one. This is motivated by the fact that otherwise we cannot judge the team quality during the short time (less than m matches) between the first and the second CD. In any event, our setup implies that the results exactly hold for all CDs where the coach was active for at least m matches.

Previous studies (see above) have restricted the control group to teams which did not dismiss the coach during the relevant period. This, however, introduces a bias towards a more positive expectation because teams with a bad future performance tend to be excluded. To overcome this statistical problem it is essential to use unbiased control groups.

The identification of control teams via all t_{CD} matches before
the CD is motivated by our previous observation that the change of the
team fitness during the season is neglible so that as many matches as
possible should be taken into account for the estimation of the team
fitness. However, based on the subsequent results we will conclude that
a minor modification of the selection process might be appropriate. In
any event, this will be discussed further below.

We have also studied all cases where a coach was changed (as a regular change or a dismissal) during the summer break. This event is denoted as CC (change of coach). We have considered those 141 cases (starting 1966/67) where the corresponding team played in the German Premier League in both seasons before and after the CC. Here we start somewhat later in order to have enough seasons to estimate the team fitness before the CC (see below).

An important aspect for the CC analysis deals with the prediction of the expected
outcome of a season. If during one season the goal difference is given by ΔG
(old) the expected average fitness F(est) in the next season can be consistently
estimated via F(est) = c_{F}+d_{F}
ΔG (old) _{F} and d_{F} are calculated from a regression analysis
for all teams which are not relegated. An even better estimator is obtained by
averaging (for all teams where this is possible) the outcome over the previous
three years with weighting factor 1.0, 0.7 and 0.5 for the determination of
ΔG (old). These parameters have been estimated by optimizing the prediction
process. If a team was not playing in the Bundesliga in the second and/or third
last season, these seasons were just omitted from the calculation of ΔG
(old). Note that our results are insensitive to the specific choice of these
weighting factors.

The temporal evolution of CD and CC events is explicitly shown in

In particular we show the intra-seasonal CD events after the
10^{th} match and before the 25^{th} match.

In

The time axis is shifted with respect to the time of the CD (occurring
directly after match day t_{CD}) to enable comparison of
different events. The average values for the prediction period are
included as solid lines. No effect of the CD is present within
statistical errors.

As already noted in literature

Repeating this analysis for different values of m, i.e. different time intervals
to define the selection and prediction period, the nil hypothesis is supported
for all choices, albeit with larger statistical errors. An objective approach to
judge the size of this effect is to compare the square of this maximum possible
improvement with the variance of the fitness distribution which is 0.27 (see
also ^{2}/0.27 = 0.02. This again clearly shows
that any possible improvement is absolutely negligible. Using a different
measure of the effect size, as standard in statistical literature, yields a
similarly small value

This apparent improvement in _{ΔG} of the average
ΔG value in the prediction period and the ΔG value in the selection
period. For the control teams one empirically obtains
r_{ΔG} = 0.53. Previous work has developed a
general formula stating the r_{ΔG} is approximately equal to
1/(1+f/t_{CD})<1 with f≈13; see _{CD} values as well its average value of
17 the relevant factor here is c(13.5, 17)≈0.56 which is indeed close to
0.53. The slight variation of f reflects the difference between
<1/t_{CD}> and 1/<t_{CD}>.

Please note that there is no gradual improvement during the m matches after the CD event. First, this result is consistent with the general observation that the team fitness does not change during the season. Second, this also implies that the cases where a carekeeper coach is replaced after less than m = 10 matches does not yield a further significant positive (or negative) shift.

We have repeated the analysis by restricting ourselves to the last 23 years of the Bundesliga. Here we find Δ (ΔG) = 0.08±0.06. Within the error bars this result is identical to that of the whole period and is thus again compatible with the nil hypothesis. Thus, there is no significant time dependence in the efficiency of CD events.

Interestingly, the CD teams play worse during the last two matches before the CD
event. Thus one might speculate that the CD event at least helps to stop this
emerging negative streak. This hypothesis can be checked by selecting control
teams which also have two worse results at the end of the selection period (see
above for details). The results are shown in

Again no effect of the CD is present.

Furthermore we checked that the CD events are not related to any effects of the home/away-asymmetry. Since we have corrected out this asymmetry no effects should be present. However, we explicitly checked that within statistical noise the number of home/away and away/home matches before the CD event is nearly equal and the fraction of two subsequent home or two subsequent away matches before the CD event is both less than 7%.

The results, reported so far, deal with the average effect of a CD event. In
particular they are still compatible with the hypothesis that the CD has a
positive effect for some teams and a negative effect for other teams. This can
be tested by analyzing the variance of ΔG -values. Results are shown in

In analogy to

In practice one is particularly interested in points P rather than in the goal
difference ΔG. Because of the important implications of our results we have
repeated the same analysis as in

As seen in

Again no effect of the CD is present within statistical errors.

However, comparisons of the results for ΔG and P explicitly show that the
information content of the goal difference is by far superior. As shown in

A further important question deals with the motivation to dismiss a coach.
Naturally, an unsatisfactory performance is expected to be the main reason. As
already discussed above the data in _{CD}) we systematically identify two matches where
the teams just had particular bad luck. It is consistent to exclude these two
matches from the fitness estimation of a team because these two data points are
biased. As a consequence the control teams on average should have the same
ΔG for t–t_{CD}<−1.

This argument can be rationalized with a simple example. In the “dice throwing premier league” a coach is dismissed after 2 times throwing a 1. Of course, in principle all teams have equal properties (average fitness 3.5). However, if the 10 matches before a CD event were analyzed exactly in analogy to our procedure one finds an average fitness of 3.3. The reduction is due to the systematic inclusion of the final two results with a 1. Thus, the fitness estimate is lower than the true fitness of 3.5. Excluding the last two results for the CD from the analysis yields a fitness value of 3.6. Now the value is larger as the true fitness because in our approach no second (1,1)-pair is allowed to occur during the 10 matches before the CD event. Thus we conclude that a better fitness estimate is obtained if we omit the two matches before the CD event. However, since this estimation would be slightly too optimistic, the optimum estimation lies in between both approaches (with and without the final two matches) as exemplified above.

Adapting the choice of control teams to this condition (omission of the last two matches) the average value of ΔG in the selection period reads −0.431 instead of −0.539. Correspondingly the optimized set of control teams also plays better in the prediction period (−0.235 instead of −0.287). Thus the effect of the CD gives rise to a negative value of Δ(ΔG) = −0.022±0.048 rather than Δ(ΔG) = 0.030±0.046 (as mentioned above). As a consequence our finding of a nil effect is further corroborated by this self-consistently modified procedure. As discussed in the previous paragraph for general reasons the “true” value is expected to lie between the original (0.030) and the new estimate (−0.022) which even better agrees with the nil hypothesis.

It is to be expected that beyond this triggering effect also the performance in the whole season is unsatisfactory. To quantify this effect we determine the expected number of points in a season P(est) as well as the expected goal difference ΔG(est) for all CD teams with the procedure, introduced in the method section. Then one can assess the degree of frustration of a team from comparison with the actual outcome. For this comparison we choose the number of points, i.e. P(true) – P(est), since this observable is relevant for managerial decision processes. Since the CD does not change the fitness of the team we can use the outcome of the total season to get an optimum statistical accuracy. To obtain an even more specific correlation we additionally correlate the difference P(true) – P(est) with ΔG(est), the latter representing the fitness of a team. In this way we can distinguish between the motivation of a CD for good and bad teams.

The results are displayed in

The solid line is the regression line. From this graph the motivation to dismiss a coach can be extracted.

We have repeated the analysis with evaluating the number of points after midseason, i.e.at the average time of the coach dismissal. The graph looks similar albeit with slightly smaller values for the number of points (because only half of the season is over). In any event, the interpretation remains exactly the same as before.

Having found no signature of the in-season CDs one may wonder whether changing the coach during the summer break, i.e. a CC, has an influence on the team performance. This question has two facets. First, independent of the quality of the coach the mere act of changing a coach may bring in a systematic shift in fitness. Of course, this shift may be positive (e.g. due to bringing in new stimulus in saturated structures) or negative (e.g. due to corrosion of well-established team structures). Second, beyond this systematic effect the different qualities of coaches might lead to the effect that some teams profit whereas other teams may suffer from this change (relative to the average). Whereas the systematic effect can be studied from the first moment of the appropriate performance distribution, the variance of this distribution contains additional information about the quality variation of different coaches, as already discussed in the context of CD.

In analogy to above we start by correlating P(true) – P(est) with
ΔG(est); see

The solid line is the regression line. From this graph the effect of changing a coach in the summer break can be extracted.

In the next step we study the variance of ΔG(true) - ΔG(est) of the CC teams. In what follows we restrict ourselves to the distribution of goal differences due to its superior properties as compared to the number of points. For the variance we obtain the value of 0.197±0.026. Here the statistical error is smaller than in the CD analysis because we include information from a complete season rather than just from 10 matches. To identify the statistical contribution (due to the random effects in a soccer match beyond the actual team fitness) we also determine the variance for all teams. We take the same seasons as for the CC teams and, of course, also require that the team was playing in the Bundesliga in the previous season (for the determination of ΔG(est)). Here the variance is given by 0.212±0.013. The difference of the variances thus reads −0.015±0.029. Within the statistical error no difference to the variance of the CC teams is present. Note that a significant quality variation among the coaches would have resulted in a positive value of that difference. In any event, the hypothesis that all coaches basically have the same or similar quality (or their quality is irrelevant for the team performance) and that a CC has no direct effect cannot be ruled out by studying the data of more than 40 years Bundesliga.

Taking into account the size of the statistical error one may estimate the possible relevance of the specific coach on the team performance. With an optimistic view the maximum increase of the variance is given by −0.015+2×0.029≈0.04. The value has to be compared with the fitness variance of all teams in the Bundesliga which is 0.27 (see above). This implies that with this optimistic estimation the relative contribution of the coach to the team fitness is 0.04/0.27, i.e. 15%. Most likely, however, this contribution is even smaller. This small value also reflects the fact that the group of coaches, which is considered to be hired in the Bundesliga, fulfills already high quality criteria so that the quality variation within this group is quite small.

This work can support the results of some previous studies

Changing the coach during the summer break results in the same nil effect. Most
interestingly, even the variance of the appropriate distribution of teams changing
the coach during two seasons does not show any effect. This has the immediate
consequence that the impact of coaches as “fitness producers” for the
teams is limited and is most likely (on average) much smaller than 15% as
compared to other factors (like the team wage bill