7. CONCLUSIONES
7.3. CONCLUSIONES PARCIALES RESPECTO AL PROCESO DE SIMULACIÓN
7.3.1. Conclusiones RecurDyn
As already mentioned, the analysis of Functional Linear Regression was done using the Aemet weather data only. More datasets with similar intrisic features as the weather data from other fields of research should be investigated to involve more sci- entists in Functional Data Analysis. The R-package fda by Ramsay et al. (2009) has been around for a long time and is the most popular package for FDA. Unfortunately, the package is a bit restricted with other kinds of analysis involving: other types of basis functions (e.g. Gaussian Basis); other types of model evaluations methods (e.g. Penalized Maxmum Likelihood method) and other types model criteria (e.g. GIC, mAIC, GBIC). More packages must be released in that regard.
When it comes to Functional Linear Regression models, one of the main assumption is that the chosen basis function is the one that smooth the predictor and covari- ates. Which is not always a correct assumption because different variables exhibit different stochastic paths. One of the challenges with violating that assumption is the computation of the matrices Jφψ where φ(t) could be a Fourier Basis function
and ψ(t)Gaussian Basis function.
Thus there is still vast unexplored area of research in the field of Functional Data Analysis in general and specifically for Functional Regression Modeling. This disser- tation provides a first step in that direction.
R-Functions
In order to make some of the R-codes readable, most of the repeated operations have been wrapped up into functions that are used throughout the dissertation.
A.1
Matrices of Basis Functions and Model Selection
Gaussian bsplines Gaussian Basis functions with B-Splines
Description
This function is used to compute a matrix of Gaussian Basis functions with B-Splines. Its arguments are:
• tt being the vector of values {t1, . . . , tJ} ∈ T ;
• m represents the number of basis functions applied to the function.
R-Code
G a u s s i a n_b s p l i n e s = f u n c t i o n( tt , m ) {
r a n g e < - d i f f(r a n g e( tt ) )
kn < - seq(min( tt ) - (r a n g e /( m -3) )*3 , max( tt ) + (r a n g e /( m -3) )*3 , by = r a n g e /( m -3)
)
myu < - kn [ 3 : ( m +2) ]
h < - d i f f( kn , lag = 2)/3
B < - m a t r i x(0 ,l e n g t h( tt ) ,( m ) ) for ( j in 1: m ) { B [ , j ] < - exp( -0.5*( tt - myu [ j ]) ^2/( h [ 1 ] ^ 2 ) ) } r e t u r n( B ) }
Gaussian kmeans Gaussian Basis functions with K-Means
Description
This function is used to compute a matrix of Gaussian Basis functions using K- means. Its arugments are:
• tt is used to specify the vector of values {t1, . . . , tJ} ∈ T ;
• m is used to specify the number of basis functions applied to the function; • nyu is used to specify the hyperparameter.
The clustering method used is the one developed by Hartigan and Wong (1979).
R-Code G a u s s i a n_k m e a n s = f u n c t i o n( tt , m , nyu ) { k < - k m e a n s ( tt , c e n t e r s = m , a l g o r i t h m = " H a r t i g a n - W o n g ") myu < - as.v e c t o r( k$c e n t e r s ) h < - k$w i t h i n s s/k$s i z e B < - m a t r i x(0 ,l e n g t h( tt ) ,( m ) ) for ( j in 1: m ) {
B [ , j ] < - exp( -0.5*( tt - myu [ j ]) ^2/( h [ j ]*nyu ) ) }
r e t u r n( B )
Bsplines FDA B-Splines Basis functions
Description
This function is used to generate a matrix of B-Splines Basis functions. It uses the R-package fda. Its arguments are:
• tt is used to specify the vector of values {t1, . . . , tJ} ∈ T ;
• m is used to specify the number of basis functions applied to the function. • norder is used to specify the order of the B-Splines
R-Code B s p l i n e s_FDA < - f u n c t i o n( tt , m , n o r d e r =4) { r e q u i r e( fda ) b a s i s = c r e a t e. b s p l i n e . b a s i s ( r a n g e v a l = r a n g e( tt ) , n b a s i s = m , n o r d e r ) B < - e v a l. b a s i s ( e v a l a r g = tt , b a s i s o b j = b a s i s ) r e t u r n( B ) }
Fourier FDA Fourier Basis functions
Description
This function is used to generate a matrix of Fourier Basis functions. It uses the R-package fda. Its arguments are:
• tt is used to specify the vector of values {t1, . . . , tJ} ∈ T ;
R-Code F o u r i e r_FDA < - f u n c t i o n( tt , m ) { r e q u i r e( fda ) if(( m %% 2) = = 0 ) { m < - m + 1} e l s e { m < - m } b a s i s = c r e a t e. f o u r i e r . b a s i s ( r a n g e v a l = r a n g e( tt ) , n b a s i s = m ) B < - e v a l. b a s i s ( e v a l a r g = tt , b a s i s o b j = b a s i s ) r e t u r n( B ) }
Pen Max Likelihood Penalized Maximum Likelihood estimate
Description
This function is used to compute the Penalized Maximum Likelihood estimate. Its arguments are:
• B is used to specify the matrix of basis functions; • n is used to specify the number of basis functions; • lambda is used to specify log10(λ);
• y is used for the vector of observed values.
R-Code
Pen_Max_L i k e l i h o o d < - f u n c t i o n( B , n , lambda , y ) {
D < - m a t r i x(0 ,( n -2) , n ) D[1 , ] < - c(1 , -2 ,1 ,rep(0 ,( n -3) ) ) for ( i in 1:( n -4) ) { D[( i +1) , ] < - c(rep(0 , i ) ,1 , -2 ,1 ,rep(0 ,( n -3) - i ) ) } D[( n -2) , ] < - c(rep(0 ,( n -3) ) ,1 , -2 ,1) K < - t(D) %* % D l a m d a < - 1 0 ^ ( l a m b d a ) s i g m a < - 2 s i g m a 1 < - 1 w h i l e(( sigma - s i g m a 1 ) ^2 > 1 e -7) { B i n v < - s o l v e(t( B ) %* %B +n c o l( t r a i n . t e m p )*( l a m d a )*( s i g m a )*K ,d i a g(n c o l( K ) ) ) w < - ( B i n v ) %* % t( B ) %* %y [1 ,] s i g m a 1 < - s i g m a s i g m a 1 < - as.v e c t o r( s i g m a 1 ) s i g m a < - (1/ n c o l( t r a i n . t e m p ) )* t( y [1 ,] - B %* %w ) %* %( y [1 ,] - B %* %w )
s i g m a < - as.v e c t o r( s i g m a ) }
l i s t( l a m d a = lamda , s i g m a = sigma , K = K , w = w )
}