Shauna Reckseidler-Zenteno, for their unwavering support and great mentorship throughout this thesis. The funding provided was not only conducive to my focus on research work but also helped in my national and international travel for disseminating research findings at various conferences. Lastly, to my colleagues in Professor Vive's research group, it was a wonderful experience to be part of the team and do my research side by side with you.
Pseudomonas aeruginosa is a gram-negative organism that is ubiquitous in the ecosystem and resistant to antibiotics. Next, node impact techniques were used to infer a dozen genes as key orchestrators of the survival phenotype. Among these genes, PA0272 was identified to be the root node in the learned network model.
INTRODUCTION
The second, i.e. the accessory genome, represents about 10% of the genome and varies from strain to strain. From these approaches, a third one known as hybrid method arose, which consists of the coupling of the two other methods. Formula (2.2) below shows the number of DAYs as a highly exponential function of the number of variables.
Basically, one would first take advantage of the reduced number of network candidates (CPDAGs) in the space for a faster selection of the best CPDAG model. In [37] it is noted that the BIC formula is the exact opposite (with minus) of the Minimum Description. Bayesian Network is one of the network models that can be used to reason under uncertainty based on the laws of probability.
On the other hand, based on the results, it examines all possible structures in space and assigns a point that corresponds to the measure of "goodness" of the Bayesian Network estimate for a given data set. This provides reasonable confidence in finding the optimal network, i.e., the most compact representation of the joint probability distribution over the 954 genes. The number of all possible search candidates is a function f(n) of the number of genes n as in formula 2.2.
Conditional probability tables in the definition of the model are used to determine the significance/thicknesses of the arcs [51]. Arc Force, that is, node Force, has been adopted to evaluate the participating power of genes in the regulatory network of survival. DL(B) and DL(D|B) are the structural complexity of the network graph and the adequacy of the Bayesian network to the data, respectively.
The data of this research is the representative collection of the expression level of P. One of the original assumptions was that not all genes would participate in the survival phenotype. The next day, 100 μl of the overnight culture was inoculated (subcultured) into 3 ml of fresh LB and incubated for 3 h at 37°C and 250 rpm at an optical density (OD600) of 0.5 to obtain cells in the medium. log growth phase.
Data collection was based on the PAO1 mini-Tn5-luxCDABE transposon mutant library. This research started by examining the data and microbiological literature review, especially on P.
LITERATURE REVIEW AND THEORITICAL FRAMEWORK
METHODOLOGY
Data integration: Comparison of the two datasets shows that they do not match perfectly in the genetic variable. Moreover, these components are particularly relevant to the two main dimensions of the data set (ie, large number of variables and small sample size). Considering domains with a small number of variables, [37] suggests that grouping variables into generic classes such as symptoms or diseases is a practical approach to reduce the cardinality of the domain order, without resorting to very greedy heuristics.
This function is the source of the efficiency of informed search over uninformed search, in that it is guided by the knowledge regarding where to look for potential solutions. Another perspective of the core technique in hill climbing is gradient descent in terms of cost minimization, which is similar to the task of getting a ping pong ball into the deepest crack in a bumpy surface. For example, [45] experimentally proved that Taboo Search provides the best performance for the quality of solution and the quality of the solution subspace.
For the parent set Pa(Gi), which as such is assigned to the gene Gi, the probability of the data is D. At the end of the iterative heuristic search, the optimal Bayesian network is returned as the result. In addition, survival experiments were performed for up to 3-4 months, with the cell concentration very close to that used at the beginning of the experiment (1 x 107 colony forming units/milliliter).
Because of the aforementioned variability, some cells that were likely dead near the beginning of the time course appeared to transition and eventually adapt to being more quiescent. Any non-root nodes that appear to be crucial in maintaining the observed phenotype will also be considered as potential causes of survival. This last measure is a generalization of Claude Shannon's definition of information to an abstract example.
With SG, the set of strongest genes in force, the set of viability genes is derived as: VG = RG. From the learned BN model, groups of strongly connected genes were computationally separated using Variable Clustering and then it was posited that some hidden common causes were the underlying factors of the strong intra-cluster connections obtained in the gene clusters, respectively. This perspective led to the speculation of the existence of an overarching factor that would characterize the bacterium as a higher-level entity.
RESULTS, EVALUATIONS AND DISCUSSIONS
Due to the statistical challenge of a large number of variables and a small number of samples, one cannot distinguish between all possible models as the small amount of data is not sufficient to identify a single most likely model. Contingency Table Fit (CTF) is the measure of the degree of fit between the joint probability distribution of the network and the data. The deviation is calculated based on the difference between the average log-likelihoods of the network and of the data.
This shows that the optimal graph with 249 genes is a good representation of the dataset compared to the global fragmented network. Searching for root nodes in the network of Figure 4.3 identified a single node as the root (PA0272). Below in descending order (Table 4.1) are the top ten most influential nodes of the network according to their node strength score.
This particular gene, PA0272, is actually the only one found to be a root node, that is, node at the top of the network's hierarchy. A look at its functional description - in the genomics knowledge base - showed it to be a transcript. PA0272, a transcriptional regulator, is an origin-root node in the learned network model-according to the computational analysis.
It turned on algU, which interestingly encodes a transcriptional regulator of the polysaccharide alginate and other polysaccharide genes involved in survival. One ml of the washed cells (concentration of 5 x 108 cfu/ml) was inoculated in 9 ml of sdH2O to a final concentration of approx. 5 x 107 cfu/ml. The bacterial quantification was performed by taking 100 µl of the sample and preparing 10-fold serial dilutions in a series of tubes followed by plating dilutions 10-4, 10-.
The decrease in survival is interesting because it is a sign of a mutation in the gene, ie. the gene is non-functional and therefore necessary for survival.
CONCLUSIONS AND FURTHER RESEARCH
This led to the choice of Bayesian networks due to their probabilistic nature to analyze this probabilistic phenomenon. The literature study continued with the learning and construction of Bayesian networks and heuristic search techniques. The graphical representation allows easy visual reading of the network of interactions between genes that regulate the expression of P genes.
This gene has a direct arc to algU, which interestingly is also a transcriptional regulator of the polysaccharide alginate and other polysaccharide genes involved in survival. Regarding the functional interaction, by Multiple Clustering technique, 33 clusters (Figure A.1) were made on one side and 107 clusters (Figure A.2 and Table A.2) on the other, suggesting that these could be the representations of the functional modules involved in the survival mechanism. This research recommends further comparative studies between the dendrograms of the 33 cluster and the 107 cluster to check and verify whether each cluster actually performs a function.
Also in this case, this study recommends further studies to investigate whether Factor_0 can be a computational representation of the bacterium and to see if a relationship can be inferred between its states and the experimental states of the P. Work by [55] is a good example of the power of ILP they used to design experiments and therefore come up with an autonomous scientist who discovered new knowledge about the functional genome of yeast. This requires a prior network model Bo that specifies the gene expression distribution at time T0, and a transition network BT that specifies the transition model.
Stover, et al., "Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen", Nature, vol. Lory, “Conservation of genome content and virulence determinants among clinical and environmental isolates of Pseudomonas aeruginosa,” Proceedings of the National Academy of Sciences , vol. Friedman, "Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data," Nature Genetics, vol.
Choi, “Clustering approaches for identifying gene expression patterns from DNA microarray data,” Molecules and Cells , vol. 34;A Theory of Inferred Causation." In Principles of Knowledge Representation and Reasoning: Proceedings of the Second International Conference on.., KR'91, Cambridge, MA, April vol. Mian, "Modeling Gene Expression Data using dynamic Bayesian networks", Technical Report, Division of Computer Science, University of California, Berkeley, CA, vol.