RESULTADOS - FACULTAD DE INGENIERÍA Y ARQUITECTURA

We have developed a probabilistic graphical model to study the domain structure visible in Hi-C contact maps. This model is based on a symmetric energy model where the interaction parameters come from the normalized entries of the contact matrix. Here the contact matrix is interpreted as a graph with N = n2 nodes each node having a feature variable. Already a model where the feature variable has just two values is sufficient

genome position [kbp] 0 1000 2000 3000 4000 0.0 0.035 0 100 200 300 0 20 40 60 80 diagon al distance [kbp] diagona l distance [kbp] A B

Figure 7.3: Application of our methodology to domain detection. A. Excerpt of the 45° anti- clockwise rotated Hi-C contact maps of the C. crescentus chromosome [48] and the resulting domain structure of both the approach of Le et al. using the directionality index and our probabilistic graphical model. Contrary to the directionality index approach our method does not yield domain boundaries or rather a contiguous domain structure (black), but a contour (dashed black) sepa- rating contact probabilities of a certain strength (depending on the coupling constant) from the background. This iso-strength contour allows to characterize the compartmentalization irrespec- tive of whether or not there are contiguous domains in the form of squares. B. Detailed view of the iso-strength contour yielded by our method. We define domain boundaries (red triangles) as minima of the contour (dashed blue) and characterize domains by Gaussians fitted to the contour between two neighboring domain boundaries. We differ between two types of minima of the contour: Minima marked as red triangles correspond to domain boundaries and minima marked as yellow triangles indicate a boundary within a domain, i.e. a nested domain structure.

7.5. Discussion 89

to identify synthetic domains. These domains incorporate partial noise as it would be expected in the actual contact maps. The domains themselves are set against a background of noise. The model is able to identify the noise through the average feature variable which is clearly distinct to the one in the domain. Within the domain, depending on the strength of the control parameter, the average value of the feature variable is homogeneous. This leads to the clear identification of the domain boundary as those nodes that have at least one of the nearest neighbors having a feature value different from the others. Using real Hi-C contact maps, we have showed that our method is able to identify domains like TADs or CIDs, loops and loop domains as well as multi-scale structures of complex shape.

ChrXIII ChrXIV ChrXV ChrXVI

0.000 0.004

0.001 0.002 0.003

Figure 7.4: Loop detection with our approach. Using Hi-C data of budding yeast cells arrested in G1 phase [136], we show that our method correctly identifies the pronounced trans interactions between the centromeres on chromosomes XIII to XVI appearing as dots of increased contact probability in the Hi-C contact map. The positions of chromosomes XIII to XVI are illustrated as gray bars below the heat map and the locations of centromeres on each chromosome are represented by black dots. Our method yields both closed contours around the loop locations and the previously discussed iso-strength contour along the diagonal; they are illustrated by red plus signs.

chr7:137Mb chr7:138Mb 0.000 0.006 0.000 0.008 0.000 0.014 0.00 0.02 A B C D

Figure 7.5: Multi-scale structure identification using our method. The contour that describes how a given Hi-C contact map is compartmentalized can be adjusted to a certain scale using the strength of the coupling via the control parameters of our model. With increasing coupling strength, the contour moves further away from the diagonal and detected structures become larger. Using an exemplary excerpt (the region between 137 and 138 Mbp of chromosome 7) of Hi-C data of human GM12878 B-lymphoblastoid cells with a resolution of 5 kbp [36], we show how our method identifies structures on different scales. A. On the lowest level the loop domain located at 137.65 Mbp is identified. B. The loop domain at 137.65 Mbp is now detected as a normal domain. Additionally, the loop at 137.45 Mbp is recognized. C, D. The previously detected structures become larger and further loops with lower signal intensity are identified.

Chapter 8 The Role of Loops on the Order of

Eukaryotes and Prokaryotes

References

The results presented in this chapter are published as and adapted from

• A. Hofmann and D.W. Heermann (2015), The role of loops on the order of

eukaryotes and prokaryotes. FEBS Letters, 589: 2958-2965. doi: 10.1016/j.feb-

slet.2015.04.021.

Chapter Summary

The study of the three-dimensional organization of chromatin has recently gained much focus in the context of novel techniques for detecting genome-wide contacts using next- generation sequencing. These chromosome conformation capture-based methods give a deep topological insight into the architecture of the genome inside the nucleus. Several recent studies observe a compartmentalization of chromatin interactions into spatially confined domains. This structural feature of interphase chromosomes is not only supported by conventional studies assessing the interaction data of millions of cells, but also by analysis on the level of a single cell. We first present and examine the different models that have been proposed to elucidate these topological domains in eukaryotes. Then we show that a model which relies on the dynamic formation of loops within domains can account for the experimentally observed contact maps. Interestingly, the topological domain structure is not only found in mammalian genomes, but also in bacterial chromosomes.

♦

8.1 Introduction

Mammalian interphase chromosomes are hierarchically organized [25, 72]. On the one hand, at the level of the nucleus, fluorescence in situ hybridization (FISH) and genome- wide chromosome conformation capture (3C) studies, such as Hi-C, have revealed an inter-chromosomal compartmentalization in the form of the formation of distinct chromosome territories [3, 24]. Individual chromosomes, on the other hand, also show a domain-like structure as observed in recent genome-wide high-resolution Hi-C and 5C studies [7, 8, 33]. These 3C-like studies indicate that eukaryotic genomes are partitioned, at the sub-megabase level, into discrete structural units with highly increased frequency of internal contacts, referred to under different terms, such as “topological domains” [7], “topologically associating domains” (TADs) [8] and “physical domains” [33]. We will stick to the term “topological domains” for these intra-chromosomal domains, within which the chromatin fiber preferentially interacts. This finding of a domain organization of individual chromosomes is not only supported by data stemming from 3C-like studies examining genomic interactions of a large population of cells, but also by an analysis of individual cells, the single-cell Hi-C methodology [59].

Besides the eukaryotic chromosomes of humans, mice and Drosophila melanogaster, bacterial chromosomes are also characterized by a hierarchical organization [137]. The

Escherichia coli chromosome consists of macrodomains on the megabase scale [46, 138],

which, in turn, are composed of topological domains on the smaller scale [139]. Recently, the circular chromosome of Caulobacter crescentus, as a further example, has been shown to be composed of topological domains with the help of an in-depth Hi-C analysis [48]. Taken together, these analogies to the organization in eukaryotes suggest that an intra- chromosomal domain structure is a fundamental building block of chromosome structure of organisms.

Although their important role in shaping the three-dimensional organization of the genome seems acknowledged, there remains the question how topological domains are

In document FACULTAD DE INGENIERÍA Y ARQUITECTURA (página 28-0)