Ventajas y Desventajas de los Biocombustibles

2.3 BIOCOMBUSTIBLES

2.3.1 GENERALIDADES DE LOS BIOCOMBUSTIBLES

2.3.1.5 Ventajas y Desventajas de los Biocombustibles

The theme of the work here is to find a model to correlate the various interactions between the solute and solvent with the known solute descriptors. The general solvation equation 3.7 corresponds to the various processes and interactions between solute and solvent that are possible in the dissolution of a gaseous solute.

With the necessary descriptors for use in the LSER equation found, a method needs to be set up to generate the coefficients in the equation, and the method used is multiple linear regression analysis, MLRA. This is a common technique in statistics, and is an extension of a simple linear correlation, where a series of dependent y values may be linearly related to the independent variable x.

For simplicity, a relationship between two variables is shown, and if it is linear when X values are plotted against y values, then a straight line can be drawn through

the points, and the equation can be written as;

y = mx + c (3.20)

Where m is the slope of the line and c is the intercept on the y axis, x is the explanatory variable used to determine the dependent value y. If there are scattering points in the plot, then drawing a straight line may not be too obvious, and any line chosen will affect the prediction of y values. In such cases, a method called least squares often is used to decide the best straight line to choose. This works by taken into account all the deviations between observed and estimated value of the variable for a line, squaring them and adding them up.^’ The criterion of least squares is that the best line is the one with the least sum of squared deviation. This is not so difficult when only one variable is present, but complications arise as the number of variables increases, as in the case of the general solvation equation. Of course to undertake the application of least squares method on multiple variables would be horrendous without the aid of a computer, but nowadays that is not a problem.^* This technique assumes that any errors involved would be due to the y values, (which may not be entirely true, as in our case where the explanatory variables are mostly

that shouW

one set of regressions is compared with another, any errors in the explanatory variables would in effect cancel out.)

Correlation gives the association between the variables, but it is the regression that uses the variables to help explain the variation in the dependent variables and thus estimates the parameters of the model, and thus provides a test of the validity of the model and the calculation of the confidence limits of the parameters.^^

Correlation alone cannot measure the success of the relationship between the variables. Other statistical methods are used; the standard deviation of the estimate, sd, the correlation coefficient, r and the F statistic. Standard deviation is the square root of the quantity (sum of squares of deviations of individual results from the mean, divided by one less than the number of results in the set), and is given by

r -

sd = 2 [ ( X j - x ) 2 / n - l j (3.21)

Standard deviation has the same units as the property heing measureJ-It becomes a more reliable expression of precision as n gets larger, so sd is a means of assessing its reliability, and is also used in considering the significance of deviant points. The correlation coefficient gives the measure of suctess of the correlation of the dependent variable y against the independent variable x. Here we are interested in the regression as providing an explanation of results in relation to the numerical range of the data sets, and so a correlation obtained should take into account standard deviation, sd. These considerations are essentially embodied in the correlation coefficient equation, r;

r = [ 1 - s d ' ( n - 2 ) / ( f y n ] " ' (3.22)

Where cP, = Z(y- y)V n and y is the mean, £y/n. The quantity ( f , is the variance of the sample values of y. Looking at the equation, it can be seen that as the sd 0, r ^ 1 ie. the correlation gets nearer to perfection. Nowiithough the correlation r is often used, it is r^ that is meaningful, because it gives the fraction of the variance of y which is ‘explained’ by the regression equation. It is more convenient to convert it

to a percentage. Thus when x - 0.90 the regression equation explains about 80% of

the variance. It is very important that the correlation coefficient should be considered in relation to the number of data sets correlated. Correlation coefficient does not give any statistically significance evidence of an association between x and y ie. “could the relationship observed reasonably have occurred due to chance alone?”. Tests can be applied to investigate the significance of the coefficient, and dependent on the assumption made in errors distribution, a test is chosen. Student’s T- test assumes normal distribution of errors. The test is set at a confidence limit, usually at 95%, but can go higher to 99%, depending on the accuracy of the test required. This gives the limit to the range of value accepted at this confident interval.

In multiple linear regression analysis, the T- test is performed on each individual variable to test their significance, as sometimes not all variables are necessary and would be indicated by the level of significance, and so may be removed. Another significanatest, however, as used in MLRA is the F-statistic or the Fisher statistic. This test accounts for the number of variables, v, present and the number of data points, n. The value of F-statistic yielded gives an indication of the quality of the regression, the higher the value F, the better is the regression.

F = r " ( n - v - l ) / ( l - r > (3.23)

Here r is the correlation coefficent, n is the number of data points and v is the degree of freedom, which is (v-1), where v is the number of variables. From the equation, it

can be seen that the main factors that contribute^the improvement of the regression are; n and r, because as these two parameters increase, F increases.

In document Estimación de la Huella Hídrica de los cultivos de palma africana y maíz duro en la provincia de Los Ríos y caña de azúcar en la provincia del Guayas para la producción de biocombustibles (página 58-64)