• No se han encontrado resultados

Estudios del producto de la audiodescripción

Definición y ubicación del objeto de estudio

2.2. La investigación en audiodescripción

2.2.2. Estudios del producto de la audiodescripción

In this chapter, we studied models and algorithms to construct a family-free median from three extant genomes. We introduced problem FF-Median, which is a family- free generalization of the well-known mixed multichromosomal breakpoint median of three genomes. We then studied the complexity of problem FF-Median. In doing so, we reduced instances of the weighted independent set problem to instances of problem FF-Median, thereby proving NP-hardness of the latter. We then discussed a 0-1 linear program for its exact solution.

4.7. Results and discussion 0 500 1000 1500 2000 0 2000 4000 6000 marker length count (a) 0 5 10 15 0 200 400 600 800 1000

# markers per gene

count (b) 1 2 3 4 5 6 0 2000 4000 6000 8000

# genes per marker

count

(c)

Figure 4.9: Part (a) visualizes a histogram of the distribution of marker lengths in the assembled genome sequence of Rajaraman et al. [87]. The diagrams to the right show histograms of (b) markers associated to the same gene and of (c) genes associated with the same marker, respectively.

Whereas our model of family-free adjacencies, presented in the previous chapter, can tolerate effects of gene family evolution in the chromosomal gene order, our family-free median model can only resolve certain cases of gene duplication. It is generally susceptible to gene losses that occurred along the evolutionary paths between the three extant genomes that are subject to analysis and their common ancestor. However, there is no straightforward definition of a family-free median model that tolerates events of gene family evolution, yet at the same time facilitates the calculation of exact solutions within reasonable time. Therefore, we devised with algorithm FFAdj-3G-H a heuristic approach to obtain family-free medians that is able to tolerate the effects of gene family evolution. Our method is based on problem FF-Adjacencies for three genomes, which was introduced in the previous chapter. Further, algorithm FFAdj-3G-H relies on Tannier et al.’s algorithm [106] to obtain a median gene order.

The importance of accounting for events of gene duplication and loss in family- free analysis were shown in subsequently performed experiments on simulated datasets: FFAdj-3G-H performed considerably better than FF-Median in identi- fying positional orthologs and in reconstructing the true gene order of the median.

Lastly, we demonstrate the applicability of algorithm FFAdj-3G-H on biological datasets by reconstructing the gene order of protein coding genes of the black death from genomes of three extant Yersinia pestis strains. The four genomes are separated by only 650 years of evolution. We compare our results to those of Rajaraman et al. [87]. The outcome of the analysis is encouraging: The median reconstructed by FFAdj-3G-H shows reasonable similarity to the genome structure proposed by Rajaraman et al., although the latter used genomic markers for reconstruction, which were directly obtained from paleogenomic sequences of the black death.

Chapter

5

Family-free synteny

In the previous two chapters we described family-free models based on adjacencies. An adjacency is a simple proximity relation between two genes belonging to the same chromosomal sequence. However, gene orders become increasingly scram- bled over longer evolutionary periods of time. When comparing two genomes that are distantly related, gene order analysis based on identifying pairs of conserved adjacencies may no longer be feasible. Yet, relaxed constraints of gene order conser- vation are still able to capture weaker, but nonetheless existing remnants of common ancestral gene order. In this chapter, we will study a relaxed proximity relation that allows us to identify conserved regions in two genomes based on the concept of common intervals. We present a practical approach that does not reconstruct one-to- one orthology assignments between genes. This simplification allows us to obtain fast, exact algorithms with polynomial running times. We subsequently evaluate our models and algorithms on a dataset of 93 bacterial genomes and compare its performance with that of a gene family-based method developed by Jahn [57]. The herein presented work is published in [26] and [37].

5.1 Generalized adjacencies

In Section 3.2 we presented problem FF-Adjacencies, which does not allow genes lo- cated in-between conserved adjacencies. We now describe a parameterized model of generalized adjacencies that does not only tolerate orthology assignments of genes in-between conserved adjacencies, but goes one step further, by also allowing cross- ing pairs of “conserved adjacencies”, which we call conserved θ-adjacencies: Two gene extremities ga1 and g2b in a genome G form a θ-adjacency if at most θ1 genes lie between them. Two pairs of θ-adjacencies {g1a, gb2}in a genome G and{ha1, hb2}in a genome H form a conserved θ-adjacency if their corresponding genes are similar, i.e.,

Problem 7 (θ-Adjacencies) Given two genomes G, H, α ∈ [0, 1], and θ N>0, find a matching M in gene similarity graph B of G and H such that the following formula is maximized: Fθ α(M) =α·adj θ( M) + (1α)·edg(M), (5.1) where adjθ(M) =

{{g1,h1},{g2,h2}}⊆M, {ga1,gb2}∈Aθ(G M), {ha 1,hb2}∈Aθ(HM) s(ga1, gb2, h1a, h2b),

and Aθ(X) denotes the set of θ-adjacencies of genome X, for which holds that in any

adjacency{xa

1, x2b} ∈ Aθ(X), no more than θ−1 genes lie between x1and x2in genome X.

θ-Adjacencies have been described previously in literature for gene family-based

analysis [118] and are particularly useful for identifying gene clusters, which are small sets of genes that share an associated function and therefore remain locally preserved over longer periods of evolutionary time. Algorithm 1 can be easily adapted to find optimal solutions for Problem 7. Nevertheless, exact approaches become quickly computationally infeasible even for small θ. Moreover, the model itself can be criticized for its implicit handling of gene insertions and deletions, as well as its inability to account for unequal numbers of gene duplicates. Therefore, we will now study a broader definition of synteny and derive a family-free model that does not exhibit the described disadvantages of θ-adjacencies.