2.10 Diagrama de conexión de los Nodos de Control
3.1.3 Caracterización del Módulo de Comunicación
3.1.3.5 Pruebas del módulo de comunicación entre el servidor y los nodos sensores
In recent years, many researchers devote their work to studying the properties of dif- ferent genome-scale data, resulting in many methods for reconstructing transcriptional regulatory networks. To understand the essential differences among these methods, it
is important to review existing methods based on the data source/sources they used in order to identify their merits and deficiencies, so that improvement can be made systematically.
4.2.2.1 Methods for single data source
Microarray data are perhaps one of the most widely used data sources in this area of research. Many efforts for the reconstruction of transcriptional networks are spent on analysing microarray gene expression data alone [49, 113]. Among earliest works, [50] is an influential paper based on Bayesian networks for gene network inference from gene expression data, with more recent perspectives in [49]. More Bayesian approaches to inferring sparse graphical (Gaussian) models [54] were described in [38,74].
In more recent years, two types of methods, dynamic Bayesian networks [11] and graphical Gaussian models, account for a major part of research. dynamic Bayesian networks have been widely used in time-series data analysis to account for system dynamics [11, 65, 154]. For example, a dynamic Bayesian networks approach based on a first-order auto-regressive model were applied to gene network reconstruction in [81]. However, inherent problems in dynamic Bayesian networks make them relatively ineffective for large-scale prediction, i.e., when there are many variables. A concern about the inefficiency of dynamic Bayesian networks inspires a number of variant ap- proaches, e.g. a fast “Bayesian-inspired” algorithm by Opgen-Rhein and Strimmer
[102].
Graphical Gaussian models are undirected graphical models well known for dis- criminating direct and indirect correlation between variables. In essence, partial cor- relation is used as the mathematical foundation for detecting meaningful interactions.
Partial correlation is indicative of direct interactions between a pair of variables/genes, by eliminating the effects from the rest of variables/genes [113]. Previously, graph- ical Gaussian models have been applied for the reconstruction of gene networks by selecting significant coefficients of partial correlation. Significant coefficients are in- dicative of direct interactions between genes and therefore represent existing edges in a network. As a breakthrough to solving the small sample problem in gene expres- sion data, Sch¨afer and Strimmer [113] proposed an shrinkage estimation method of partial correlation and the use of FDR for selecting significant coefficients of partial correlation.
4.2.2.2 Methods for multiple data sources
However, single data source is often not sufficient for accurate network modelling [11]. When more than one biological data source is available, integrative analysis is likely to offer significant advantages, and is currently the subject of ongoing research. By integrating multiple data types, one can expect false positives to be reduced and disparities between different levels of the system to be identified. Further, integration helps explain complex biological interactions on a higher level than using a single data alone [11]. Computational techniques have evolved from the simplest voting model [142] to more sophisticated Na¨ıve Bayesian Networks [82, 120], and progressively developed into substantially more complex and powerful systems nowadays [86,127]. In the integration context, Bayesian methods offer a range of advantages over con- ventional statistical techniques that make them particularly appropriate for complex and noisy biological data. The Bayesian statistical paradigm is probabilistic in the sense that observations, parameters and hidden variables are treated together in a con- sistent manner. Consequently, various Bayesian methods for data integration have been
explored for the reconstruction of regulatory networks [11,19,76,82,127,137,162]. Among earliest attempts, [120] set up two probabilistic models for gene expres- sion and protein-protein interaction data, respectively, that can only be solved when unified. Expression data were modelled with Na¨ıve Bayesian networks to define a joint distribution as a product of probabilities of disjoint classes, while protein-protein interaction data were modelled by a binary Markov random fields to represent connec- tions between neighbouring variables.
Later in [51], gene expression data and protein-DNA binding data were jointly considered to infer transcriptional regulatory networks for many chosen yeast tran- scription factors. However, different data types were not jointly modelled in a coherent framework, and associated measurement errors were not explicitly considered. More complicated integration system was presented by Liu et al. [86], where data were jointly modelled within the context specific Bayesian framework for infinite mixture models. In the experiments, the method was able to produce more functionally coher- ent transcriptional modules than two alternative algorithms, GRAM [5] and SAMBA [128].
Another type of approach uses one data source as prior knowledge to integrate with another in a Bayesian context. For example,Bernard and Hartemink[11] set up dynamic Bayesian networks for modelling gene expression data, combined with tran- scription factor binding data as prior knowledge and the edge distribution assumption made in [119]. They improved the method in [61] by suggesting a new prior and using dynamic Bayesian networks instead of Bayesian networks so that the network can in- clude cyclic structure. However, the experiment to validate this method was performed on a set of 25 genes with gene expression data consisting of 69 time points, which is far less genes than usually required for network reconstruction nowadays. Sun et al.
[127] treated transcription activity represented by expression as a result of transcription factor binding. If the binding data show evidence of regulatory relationships, then the relative binding intensity will be used in modelling the expression of the target gene.
Yet another common approach is to alternate between two data types during the computation process, especially when the main task is to identify regulatory motifs [19,76,162]. The strategy to accomplish this involves, first, clustering gene expression data sets, and then isolating the upstream regions of the clustered genes and analysing them for common cis-regulatory motifs. If the identified motifs correspond to known transcription factor-binding sites, the regulatory network that is responsible for the observed transcription state can be inferred.