Como se trata de un diferencial de fuerza por unidad de longitud, en realidad el término se refiere al peso de la sección circular del perfil tubular
+ CORROSIÓN + CRECIMIENTO MARINO
6.7. Análisis de carga última
6.7.1. Casos de cargas aplicadas
Over the last few decades, national and international agencies produce a large amounts of statistical data and composite indicators to measure various progress and competitiveness domains. Even though these raw data allow for high level evaluations, usually the data reported for comparison between countries is not systematically harmonised, which makes evaluations challenging and can hinder
5. Data and Methods of Development
the usefulness of the available data. Hence, the underlying nature of the data used for any study has to go through rigorous procedures before the development of SCI. Such procedures are known as data treatment techniques where the data from different sources, angles and perspectives has to be carefully collected, stud- ied, transformed, scaled and treated of an anomalies, outliers, or missing data and then summarised for easy calculations, evaluations and/or visualisations. Skip- ping through these important steps can lead to the production of “information poor” and “positively biassed” indicators which can lead to indices that confuse, mislead and overwhelm public officials, and citizens. In the following sections the methods to be used in this thesis for the purpose of constructing the pro- posed UKCI is explained. In general the notation that is adopted throughout this chapter is as follows unless it is stated otherwise.
Xict : Represent the value of variable i for country c at time t, with i = 1, . . . , M and c = 1, . . . , N
X0tic : Is the normalised value of indicator i
wri : weight associated to sub-indicator i, with r = 1, . . . , R
SCIt
c: value of the composite indicator for country c at time t.
5.4.1
Multivariate Analysis
The goal of multivariate analysis is to investigate the inherent structure in the indicators set to reveal how different variables change in relation to each other and how they are associated.
The first step in building a SCI is to decide whether the structure of the SCI is thoroughly described and if the variables are adequate or suitable to measure the phenomenon under investigation. This can be decided based on experts’ opinion or based on the arithmetical formation of the dataset, to provide a sound and defensible dataset. For example PCA or measurement of internal consistency (reliability) such as Cronbach’s Alpha can be used to investigate whether the different used variables are statistically well balanced to make the composite desired indicator. If this is not true, amendment of the used variables might be considered (OECD, 2008a).
5. Data and Methods of Development
Principal Components Analysis
Principal Components Analysis (PCA) is one of the multivariate and inputs re- duction method. The goal of PCA is to reveal how different variables are asso- ciated and how they change in relation to each other (endogenous vs exogenous variables). PCA is useful when we have two or more variables, and believe that there is some redundancy in those variables. In this case, redundancy means that some of the variables are correlated with one another, possibly because they are measuring the same construct. Because of this redundancy, it should be reasonable to reduce the observed variables into a smaller number of principal components “artificial variables” without much loss of information. The dimen- sionality reduction mechanism of PCA is to explain the variance of the observed data through a few linear combinations from the original data (Jolliffe, 2002).
The objective of PCA is to take m number of variables x1, x2, ..., xm, and spot
the linear combinations of these to produce uncorrelated principal components P1, P2, ..., Pm as follows: P1 = w11x1+ w12x2+ ... + w1mxm P2 = w21x1+ w22x2+ ... + w2mxm ... Pi = wi1x1+ wi2x2+ ... + wijxj + ... + wimxm ... Pm = wm1x1+ wm2x2+ ... + wmmxm
the weights wij -(also called component or factor loadings) applied to the variables
xi are chosen so that the principal components Pi satisfy the following conditions:
- they are uncorrelated (orthogonal);
- the maximum possible proportion of the variance of the set of xs, will be accounted by the first principal component, the maximum of the remaining variance will be accounted for by the second principal component, and so on until the all the remaining variance not accounted for by the preceding components will be absorbed by the last principal components.
5. Data and Methods of Development
In brief, PCA involves finding the Eigenvalues λj, j = 1, 2, ..., m, of a sample
covariance matrix V M V = v11 v12 ... v1m v21 v22 ... v2m ... vm1 vm2 ... vmm (5.1)
where the variance of xi and vij is the covariance of variables xi and xj and
represented by the diagonal line vii. The sum of the diagonal values equals the
eigenvalues of V . There are m eigenvalues, which are the variances of the principal components. This means that, the total of the principal components variances is equivalent to the total of the original variables variances and as follows:
λ1+ λ2+ ... + λm = v11+ v22+ ... + vmm. (5.2)
To prevent a certain variable from overriding the position over other variables on the principal components, it is suggested to first normalise the variables using the standardisation or (z-score) normalisation technique. As a result, all variables in the dataset will have equal means of “zero” and variances of “one” (Sarle,
1994). PCA was employed in this study to serve three purposes: first, to test if the variables could be reduced. Second, to reduce the number of indicators to a smaller subset. Third, to foresee the possibility of filtering out the trivial components, before we use it. The trivial components usually act as noise and could stand on the way of getting a sound and meaningful clustering results.
5.4.2
Data Normalisations
Normalisation usually is used to transform different measurement units into the same unit, so they can form a clear comparable elements, and to avoid problems in mixing measurement units (e.g. money, talent, skills) (Freudenberg,2003). For such cases it is recommended to use the standardisation or (z-score) normalisation technique. As a result, all variables in the dataset will have equal means of “zero” and a standard deviations of “one” (Sarle,1994). However, given that the selected indicators use different score scales and units in the collected dataset, the data
5. Data and Methods of Development
requires transforming or adjustments to become comparable and to convert the different ranges of indicators into a unified range. Hence, for this case the issue is not only the use of different measurement units, but also the difference in scores scale ranges. So to unify the score ranges between the different selected indicators, it is therefore suggested to use the Min-Max normalisation techniques, which can be applied by taking all the different scores ranges collected in the data set and transforming it to a value between 0 and 1, where the lowest (min) value is set to 0 and the highest (max) value is set to 1. These normalisation methods can be expressed as follows:
Xi0 = xi− minA
maxA − minA× (5.3)
(new maxA − new minA) + new minA
where x0i is the normalised score, xi is the actual score, minA, maxA are the
minimum and maximum values of the scores range within index A.
In the cases where a high value implies inferior result (e.g., ICT Price, cor- ruption, tariff rate, unemployment), this study resort to normalization formula that, in addition to converting the series into a [0 − 1] range, inverts it, so that 0 implies poor and all the way to 1 as the top possible performances:
Xi0 = xi− minA
maxA − minA × (5.4)
(new minA − new maxA) + new maxA
5.4.3
Measures of Correlation and Association
To investigate the relation between numerical variables, the data observed can be tested using correlation and contingency analysis. These tests allow us to test if the relation between variables is strong enough to indicate whether the produced results are significant. Two measures are summarised in the following sub-sections.
5. Data and Methods of Development
5.4.3.1 Pearson Correlation Coefficient
Pearson Correlation Coefficient ,r, can be used to test relations between different variables. It can be calculated by dividing the covariance of two variables x1
and x2 by the product of the standard deviations for both variables. r can be
expressed as follows. rx1x2 = Covx1x2 (σx1× σx2) (5.5) Covx1x2 = P(x1i− ¯x1)(x2i− ¯x2) (n − 1) (5.6)
where σ is the standard deviation of the variables x1 and x2, ¯x1 and ¯x2 are the
mean of the sample variables of x1 and x2 values.
5.4.3.2 Chi-Square Based Measures
One way to determine whether there is a statistical relationship between two variables is to use the chi-square χ2 test for independence. A cross classification table is used to obtain the expected number of cases under the assumption of no relationship between the two variables. Then the value of the chi-square statistic provides a test whether or not there is a statistical relationship between the variables in the cross classification matrix (Mantel, 1963). The following formula sums the procedure as follows:
χ2 =X (O − E)2/E (5.7)
where O is the observed frequencies and E is the expected frequency. The ex- pected frequency can be calculated using the following equation:
E = P Row × P Column
P Overall (5.8)
5.4.4
Cluster Analysis
Cluster Analysis (CA) is the process of finding similarities between homogeneous characteristics found in a data set. Hence, distinct or alike data points could be
5. Data and Methods of Development
mapped together, based on the distance between the data points, where large dis- tance means weak cluster, and small distance means strong similarities. The goal of CA is to decrease the dimensionality of a dataset by surfacing the unseen sim- ilarities and dissimilarities. Cluster Analysis is useful in that regard, and will be utilised in different sections of this study. Distance measures includes Euclidean (geometric) vector space and non-Euclidean, however, the most common is the Euclidean because they perform well in multi-dimensional space. The distance between data points reflect the detected similarities or dissimilarity, for example the distance between two points (X1, X2) over Nd dimensions can be calculated
using Euclidean Distance (ED) formula:
D(X1, X2) = s PNd i=1(X1i− X2i)2 Nd (5.9)