1.7 DELIMITACION Y/O RESTRICCIONES DE LA INVESTIGACION
2.2.3 Clasificación de las Competencias Gerenciales
The Beal model is a variation of the Linear Dynamical System or Kalman filter,
which in turn is a subclass of Dynamic Bayesian Networks (DBN) used to model
times series data. DBNs use probabilities given a sequence of observed variables to
calculate the relationships between them. For the purposes of explanation, in this
example the observed variables can represent a sequence of expression
measurements for a single gene at a set of given time points on a microarray. At each
time point there are factors that can affect the measured expression levels of the gene
on the microarray such as poor RNA extraction or low mRNA levels. These factors
cannot be quantified directly and are therefore “hidden” from the user and can be
referred to as the hidden state. At each time point the hidden state variables impact
upon the observed expression values, therefore when modelling regulatory
relationships between genes, the hidden variables must be taken into account. The
Kalman filter captures this process of change in the hidden state from time point to
time point, which in turn impacts upon the observed expression values at each time
point (Kalman 1960). The basic Kalman filter model is as follows:
,
w. ~Gaussian (0, Q)
v. ~Gaussian (0, R)
Where x represents a k-vector of hidden state variables that cannot be observed directly but impact upon y, a p-vector of observed variables that can be measured. A
represents a (k x k) transition matrix, which captures the process of change in the
state of hidden variables over time and C is a (p x k) observation matrix that captures
the change in observed variables over time. w and v are variables that represent the
state and observed variable noise respectively and Q and R represent the covariance
matrices associated with them. The noise represents imperfections in the data (in this
case a microarray) caused by a random set of variables such as temperature,
vibrations or even dust specks on the laser. These noise variables are thought to
occur randomly irrespective of any time index hence why w and v are followed by a “.” to emphasise their independence from any time step. It is important to take this noise into account when estimating the hidden state and observed variables as they
invariably have an impact. The state noise is also considered Gaussian; that is to say
normally distributed.Because the model captures the process of change in the hidden
state (and its subsequent impact on the observed values) over a single time step it is
defined as being a 1st order Markov model. This is because it has a memory of 1;
therefore the probabilities of the possible values of the next hidden state depend on
the values of the previous state.
In practical terms the model answers the following question:
“Given a set of observed variables and parameters what can be said of the hidden
state at time point t? “ (Roweis and Ghahramani 1999).
In the context of this chapter, determining the regulatory relationships between genes
over time is the main aim of this study, therefore it is important that the effect of both
the hidden and observed gene expression levels at each time point is not only
captured but that its impact is also incorporated into the values at the next time point.
The Beal model incorporates the principles of the Kalman filter whilst extending it to
include matrix B, which captures the influence of the observed variables from a
previous time point on the current hidden state and matrix D, which captures the
influence of the observed variables from a previous time point on the current
observed variables. By including these matrices Beal is able to model the influence
of previous observed measurements back on to the current hidden state and observed
measurements. The Beal model is thus defined below:
,
w. ~Gaussian (0, Q)
,
v. ~Gaussian (0, R)
Where x represents a set of hidden state variables, y represents a set of observed
variables. Matrix A represents a transition matrix capturing the change in the hidden
state variables over time t. Matrix B represents the effect of observed values from a
previous time point on the current hidden state. C symbolises an observation matrix
capturing the change in observed variables over time and D, a matrix containing the
effect of the previous observed values on the current observed values. w and v
symbolise the state and observed variable noise respectively. By incorporating the
influence of observed variables from a previous time point into the values of the
current time point, the model acts as a feedback loop whereby the outputted observed
expression levels and hidden state variables at time t-1 are used as input values to
help determine the hidden state and explain the gene expression values at t. Whilst
the Beal model is still a 1st order Markov model the inclusion of matrices B and D
has allowed the capture of gene relationships which are higher than 1st order.
Until now the example provided centred upon modelling the expression values of a
single gene over a given set of time points, however the model is designed to
estimate the influences of a set of genes on one another over a given set of time
points. In this case all matrices (A, B, C and D) would contain values for a set of
genes at time t. The model can then be used to characterise both direct gene-gene
regulatory relationships and those that occur through the hidden state. For example,
to observe the direct effect of gene a at t-1 on gene b at t, one must look at matrix
element [D]ba. Thus to capture all the effects of the hidden and observed states of
one gene on another over a single time step, the matrices must be combined. A
function of the model to do this is shown below:
( )
Where yt represents the observed expression level at time t and CB + D represents
the influences of the hidden state on gene expression values, the effect of gene levels
from a previous time point on the current hidden state and the effect of gene levels
Standard score or Z score values from this matrix (CBDZ) will be used to determine
which genes are showing “significant” regulatory relationships. This is defined as those genes whose relationship scores differ from a default normal distribution of
zero. Values at or close to zero indicate the genes involved do not have any
regulatory influence on one another. For genes to be considered as having regulatory
influences on one another, either directly or indirectly, their CBDZ values must be at
least 1.69 standard deviations away from the mean of zero, which equates to around
a 90% confidence that the values are significantly different.