• No se han encontrado resultados

Las tres aetates y los tres status

In document La Congregación de la Granada (página 40-43)

JOAQUÍN DE FIORE, AMADEO DE SILVA Y FRANCISCO DE PAULA

2.2. Las tres aetates y los tres status

Two methods are used to describe the data. Firstly, the usual statistics such as mean, median, standard deviation as well as 25th and 75th percentiles are reported. This is supplemented by decision and classification trees to uncover which firms are more likely to pay lower BR or receive higher SBRR.

7.1.1 ARDx Data with no Modifications

The first part of this section focuses on the data before any major modifications in Table 7:1 (p. 189). It is worth noting that the data was already merged and limited to one local unit. As discussed in the Research Design Chapter, this made the datasets manageable and analysis could be executed with fewer assumptions and imputations of the critical variables (including BR) because they were not available at the local unit level.

7.1.1.1 Descriptive Statistics

As according to Table 7:1 (p. 189), there were 530,923 unique companies with 6,230,916 observations identified. On average, firms had approximately 14 employees and a turnover of £2,624,310. However, standard deviations and percentiles suggested that multiple outliers were present. For instance, BR expense was reported to be £35,709 on average, but the 75th percentile revealed that most of the businesses were paying £12,863.

The radar chart in Figure 7:1 (p. 188) defines the basic patterns around the not available values (NAs) within data. The administrative data and variables derived from the administrative data (Herfindahl Index, PS, PD, employment and turnover) seem to have little or no missing values. Some variables have a deficient proportion of non-missing values. For instance, the one indicating whether a company died during that period has around 0% of observations filled. It seems reasonable because each company has many observations and only the last one should be identified as death. However, just 0.16% of observations have reported foreign direct investment, which may be reasonable but cannot be included in the analysis owing to there being too few cases with non-missing values.

188

In the beginning, the exact inputs to productivity process were considered to be used in productivity analysis. However, the coverage was insufficient because some participants were asked only to answer the more extensive survey with those variables.

This extensive survey consisted of details on their spending. For instance, just ~0.34% of the whole sample answered the question about water expenditure. Thus, coverage was found to be insufficient for the productivity estimation with the exact inputs.

Figure 7:1 Missing values within the data. 0% indicates that variable has no missing values and 100% that all values are missing in that variable.

189

Sector 6230916 See Figure 7:3 Region 6230916 See Figure 7:3

Capex 4252279 0.432 28.363 0 0 0.023

Table 7:1 Descriptive statistics of data after minor cleaning; for continues

variables, count (N), mean, median, standard deviation (SD), 25

th

and 75

th

percentiles are provided.

190 7.1.1.2 Decision trees - turnover and employment

The lack of targeting was often mentioned as a drawback of SBRR and BRs in general. Thus, the focus of this section is on employment, turnover, sector, region and labour productivity.

Figure 10:4 and Figure 10:5 (Appendix 10.4.1, p. 276-278) broadly illustrate how these variables influenced the amount of BR paid by firms. It shows complex interactions.

The node shows that the size was the most important factor out of size, sector, year and labour productivity. Large and medium sized firms (with regards to turnover and employment) were separated from firms with micro and small firms. The larger ones operating in catering, production, retail and property sectors were often related to higher reliefs than firms operating in other sectors.

Much more complex groupings were evident for small and micro firms. Small firms in London were associated with higher BR bills. Also, those with lower labour productivity seemed to receive higher bills than those with higher labour productivity. However, the highest business bills were received by the micro firms in construction and other services sectors based in London. Furthermore, higher labour productivity was related to higher bills.

To supplement this tree, 500 trees were estimated with random forest algorithm with no restrictions, which sorted the importance of the variables in the following order:

turnover, sector, employment, region, labour productivity and year. The model suggested that these variables explain ~9% of the variance.

7.1.2 After Imputation and Matching

Once imputed, the overall statistics seemed to consist of lower values. As according to Figure 7:2 (p. 192) all key variables had a lower mean except investment with 1% higher mean than before imputations (£5,130 to £5,220). This increase may relate to the time trend. Investment in the most recent year (2015) was missing. Lower means on average seemed to be reasonable because of the sampling procedure. Smaller firms which possibly had lower values likely to be imputed were not always surveyed. The overall variance (SD) did not deviate much before and after imputation. It was reduced for GVA and rent (by 19%

and 10%, respectively) but increased for materials and investment (28% and 1%, respectively). The quantiles further explained that the overall imputed values were not extreme. However, for variables with many missing values, quantile values were far lower after imputation. For instance, 53,724 values were missing for rent. After imputation, the 75th quantile was reduced by 72%. This seemed reasonable since many businesses owned

191

premises and would therefore not be paying any rent. Thus, more 0s should appear.

Maximum and minimum values are not reported because of the disclosure controls, but they are similar before and after the imputation.

546 firms that received SBRR at least twice (with some exceptions as defined in the Research Design Chapter, Section 6.2.2.6.5.3.1) were matched to other 546 firms which had not received the relief, but had similar characteristics on the year before the relief became available. Figure 7:3 compares the distribution before and after matching and imputation with regards to region and sector. 3% of cases (or 1,092 firms) were present in the final matched sample. Scotland had proportionately 5% fewer observations than before matching, London and South East 2% less and South West 1% less. Whilst, Wales (4%) and Yorkshire and Humberside (by 3%) had the largest proportionate increase followed by West Midlands (by 2%) and East Midlands (by 1%). With regards to the sectors, the largest number of firms were matched in the production sector, resulting in a proportionate increase of 29%. There was some increase also in wholesale (5%) and a marginal proportionate increase in construction (1%) sectors. This resulted in substantial decreases in observations in retail (9%) and other services (20%) sectors and a marginal decrease in the property (3%) sector.

Table 7:2 (p. 192) compares the unmatched and uncleaned data with the matched sample after all corrections and imputation. The sample leans towards larger firms with employment and turnover increasing by ~6 times on average. Companies that reported larger investment, GVA and intermediate consumption were more likely to be included in this sample owing to how the surveys were conducted. All larger firms were surveyed, but smaller firms were just sampled, so they might not have at least two observations which were needed both to derive the SBRR variable and estimate the change. It is worth noting that other variables, such as investment and intermediate consumption also had higher values.

Figure 7:2 (p. 192) illustrates how the values were imputed for the most complicated case, GVA. It supports the descriptive statistics. It was evident that the clear majority of the imputed data is around the middle of the range and only relatively several unique values were imputed. These unique values usually had other values that were closely related to the imputed values.

192

After imputation 15,043 81.99 166.07

Before

Turnover

6,225,925 2,624.31 174,971.50

After 1 to 1 matching 15,952 14,767.65 54,046.51

After imputation 15,043 14,539.24 53,594.72

Before

Materials

341,127 3.00 443.32

After 1 to 1 matching 4,673 7,453.21 25,590.58

After imputation 15,047 6,133.42 26,554.34

Before

GVA (basic)

528,763 2,282.48 134,916.80

After 1 to 1 matching 6,268 342,588.30 46,263.02

After imputation 15,047 340,632.60 43,151.42

Before

Investment

4,252,279 0.43 28.36

After 1 to 1 matching 14,025 1.97 12.72

After imputation 15,047 2.11 21.36

Before

Capital

4,114,287 1.74 130.29

After 1 to 1 matching 14,025 10.40 46.25

After imputation 15,047 9.32 126.76

Table 7:2 Key variables before matching and imputation, after matching and after imputation.

Figure 7:2 GVA real values (green) combined with imputed values (red); The scale had to be removed so that the firms could not be identified.

40

40 Several restrictions were imposed by the data owner. See Section 6.2.1 for more information.

193

Before matching

After matching and imputation

Figure 7:3 Distribution of the data with regards to regions and sectors before

(upper graphs) and after matching and imputation (lower graphs), motors trade is

excluded from the graphics due to security restrictions, see Research Design

Chapter (Section 6.1.3) for more information.

194 7.1.2.1 Decision trees - SBRR

Figure 10:6 and Figure 10:7 (Appendix 10.4.2, p. 278-279) and an extract of these in Figure 7:4 show a tree with SBRR instead of BRs because matching and cleansing of the data allowed to estimate SBRR. Instead of explaining the amount of the BR paid by the firms, the aim was to predict the amount of SBRR by investigating firms’ characteristics. It is worth noting that the matched sample disproportionately boosted the number of firms receiving the relief. Additionally, the illustration was limited to splits on 5% significance and the final node size was limited to a hundred or more cases. These limitations were not imposed during the random forest estimations. To estimate prediction error in random forests, the dataset was split into the train (75% firms) and test (25% firms) parts. The 17% and 36%

classification errors were estimated with the train and test datasets, respectively. Also, 35%

of 𝑅2 was achieved by random forests. The output of random forests is provided in Appendix 10.4.8.

Figure 7:4 Aggregated SBRR tree (n refers to the group’s size). Size classification was used according to the EU definition offered in Table 1:1 (Introduction and Background Chapter). Full version with size, year, region and labour productivity variables is available in Appendix 10.4.2.

Unlike in BR case, both random forests and decision trees found year variable to be the most important. Random forests ordered other variables according to their importance:

labour productivity, turnover, sectors, regions, employment. Now, sector and region were found to be more influential than employment.

More specifically, the outputs showed the misclassification of SBRR. For instance, the intense mistargeting is evident in Figure 7:4, where not just micro but also medium and large firms were receiving substantial reliefs up to 100% and this misallocation particularly increased after 2009.

195

With regards to the timescale, the decision tree in Figure 10:7 (p. 279) showed that none of the firms received SBRR during 2000-2004. Between 2005 and 2009, highest relief (~30%) was received by micro (according to turnover) firms in construction, retail and other services sectors in North West, East of England and South West. Further branches on these sectors show that in Yorkshire, East & West Midlands, London and South East large, medium or small firms would be more likely to receive more extensive relief than micro firms (according to employment).

Somehow similar patterns but with higher SBRR were evident for 2010-2015 data.

More substantial reliefs were likely to be given to firms with a micro turnover. On average, firms with lower productivity were also likely to receive more substantial reliefs. Also, 109 micro (according to employment) firms in Yorkshire and London in catering, production and retail sectors were receiving lower SBRR than larger firms with exact circumstances.

However, micro firms according to both turnover and employment were receiving higher reliefs than other firms, but SBRR seemed to be highly dependent on regions. For instance, micro firms in North East, North West, Yorkshire, East Midlands and London were receiving lower reliefs than other areas.

Regional effects were present, but depended on other variables. This is understandable given different factors such as competition or supply of the premises.

Between 2005 and 2010, micro firms operating construction, retail, production, wholesale sectors were receiving 4% relief in London and North East, and Wales and Scotland just 1%

on average, while in other regions ~2%. Construction, retail and other service sectors deviated more substantially with an average relief of 33% in North West, East of England and South West, 8% in Wales, Scotland and North East and 26% in other countries (in other just with employees <= 10). More recently (2010-2015), micro firms in construction, property and other services in North East, North West, Yorkshire & Humberside, East Midlands and London received 58% SBRR, whilst similar firms in other regions received reliefs of ~36% on average. On the other hand, in catering, production, retail and wholesale sectors micro firms in Scotland, North West, West Midlands, South East and South West reported reliefs of 26%, while similar firms in other regions reported 51% on average.

196

In document La Congregación de la Granada (página 40-43)