A summary of the key findings of the broader literature review is as follows:
- As already highlighted earlier in this chapter, robust policy evaluation in the field of business support policy has been called for (repeatedly) by several leading authors and bodies, with e.g. the OECD (2007) in their recent “OECD Framework for the Evaluation of SME and Entrepreneurship Policies and Programmes”, or the World Bank (2010) making the case for the evaluation of support schemes.
- One of the key concerns with the evaluations, particularly the few long-term ones, is the use of self-reported impact. This self-assessment, among other factors such as a firm’s satisfaction, perceived difference and additionality of programme, etc. is likely to be highly subjective, and lead to both over- and underestimations of impact.
- Studies drawing on performance data of firms such as sales and employment growth would be considered more reliable in that respect. Lambrecht and Pirnay (2005), assessing free consultancy days for a SME programme in the Walloon Region of Belgium, highlight the discrepancy between firms’ reported satisfaction (found to be favourable) and measurable performance effects – finding no significant impact of the programme on net job creation, turnover or financial indicators.
- Key constraint to evaluation is data availability, and required long-term planning. Data samples tend to be small and covering specific and quite local geographies, and therefore the ability to generalise findings is constrained.
- Increasing availability of micro-data on firms requires only the details of treated firms (some controls and characteristics should be available), and can then be linked to
- 50 -
performance without a need for liaising with firms. Data linking also enables the use of much larger samples.
- Highlights some studies benefitting from data linking (all, by nature, longitudinal) for Denmark and New Zealand, however, e.g. Japan has also seen the use of data linking for business support evaluation (Motohashi, 2002).
There were also a number of observations of interest from identified short-term evaluations (even if the focus was on long-term evaluations for the review):
- Selection issues are mostly considered and treated (the latter not in the case of Lambrecht and Pirnay above), for instance Storey (2000) with the “Six Steps to Heaven” framework was very influential in pushing the wider use of this.
- Selection bias is often addressed through Heckman’s two-stage model, but there is appetite for further advancement of methods in many studies.
- Just as Heckman’s selection model has become part of the state of the art standard in recent years, methodologies including difference-in-difference models, instrument variables and propensity score matching for control purposes have all become widespread practice in quantitative business support evaluations.
- The number of interventions and participation in other assistance programmes of an individual firm is vital information (the attribution problem), but difficult if not impossible to capture. The longer the timespan an evaluation covers, the more concern this causes. There are examples of studies, such as for New Zealand (MED 2009), where firm participation in a considerable number of schemes, additional to the one evaluated, was captured and controlled for.
- 51 -
Figure 1 – Other issues with robust “sixth step” evaluations
Issues - other than time - with robust “sixth step” evaluations
Other problems with evaluation remain. A Step 6 evaluation along Storey’s Six Steps framework, with measurements taken at one or preferably multiple carefully justified points in time (assuming that is at all possible) will still be only a technically most robust evaluation, but will be able to provide no better insights than what was measured. It is reasonable to assume that not all effects of policy are measureable.
Also, concerns remain about whether defined objectives appropriately reflect the expectations by both policy initiators and evaluators. Greene and Storey (2007) criticise what they refer to as the treatment of enterprise programmes as “black boxes”; in essence an evaluation construct where inputs such as the firm or individual are compared to the outputs (growth, survival, etc.). This is done to estimate the additionality of the intervention in question, but fails to account for the context in which the evaluation took place. Greene and Storey’s (2007) concern is that the expectations between those evaluating and those evaluated may deviate. On the one side evaluators’ objectives may be anything from attempting an objective and robust impact evaluation of the support provided, to using the evaluation as a mere medium to establish closer bonds with the evaluated. On the other hand, the evaluated may not be providing objective inputs due to fear of consequences (e.g. if the evaluation’s results are feared to result in the termination of the assessed support programme). Alongside, the financiers of an evaluation may have a vested interest in the outcome of the evaluation. Often evaluations are paid for by those who initiated a particular programme, and it is reasonable to assume that in many (political) circumstances a certain positive or negative outcome may be hoped for by the funders (and possibly explain their interest in the evaluation being undertaken at all). Further, funders may not have articulated programme objectives and therefore not measured and assessed these – Greene and Storey (2007) provide the example of funders saying they are interested in programme outcomes, but not highlighting the fact that they were hoping to improve the managerial processes used by the programme. In consequence, funders may then ignore the evaluation findings, claiming they are irrelevant or, worse, not delivering the “correct” message (van der Meer, 1999; Weiss, 1999; in Greene and Storey, 2007).
A further concern with programme evaluation, however robust, is its failure to address the relative merit of one programme over another. Whilst a specific intervention may yield results favourable over the counterfactual, that result cannot serve as confirmation of a programme being an (or in fact the most) appropriate intervention.
- 52 -