• No se han encontrado resultados

promoter region of STIM and ORAI. There, five of the interactions involving E2F1 and E2F4 on promoter regions of STIM2 and ORAI1 (Table 4.6). Previous studies showed that the transcription factors of E2F family (E2F1-8) is recognized to regulate many important genes and involved in many biological processes such as apoptosis, cell proliferation, differentiation, and DNA damage response [191,192]. Furthermore, we identified E2F1 was up-regulated gene in tumor samples and interact with RUNX1 in differential interaction of tumor-normal network. Moreover, our results show that E2F1 and E2F4 are found co-expressed in tumor samples. We speculate that E2F1 and E2F4 are regulating STIM2 and ORAI1 genes which also show relation to breast cancer. In addition, we identified an interaction of NFKB1:RUNX1 on the promoter region of ORAI2 which also found co-expressed in turquoise module of tumor samples. NFKB1 is one of the family members of NFKB1 transcription factors [193]. Several studies noted that NFKB among the Ca2+ sensitive transcription factors which are associate with STIM1 and ORAI1 to stimulate cell proliferation and differentiation [194–196]. On the other hand, two interactions are found involving the micophthalmia-associated transcription factor (MITF) on promoter region of ORAI3 (MITF:ELF1 and MITF:SPI1). We found that MITF are co-expression in both turquoise modules of normal and tumor samples. A study by Carmit and co-workers noted that MITF functions as master regulatory of melanocytes development and melanoma oncogene [197]. Furthermore, Stanisz and co-workers stated the role of STIM and ORAI in melanocytes and melanoma which significantly correlates with the expression of MITF [163,198]. Generally, our findings suggest several roles of STIM and ORAI genes in normal and breast cancer tissues.

4.4 Conclusions

In this work, we identified 13 non-redundant transcription factors targeting STIM and ORAI genes. We then found ten putative TFs interactions bound on promoter regions of STIM and ORAI genes predicted by the STRING, prePPI, and mentha databases. According to these interactions, we found 14 non-redundant TFs act as bridge proteins. Then, we identified 63 non-redundant genes as genes of interest for differential expression gene analysis and co- expression analysis of breast invasive carcinoma dataset. In the differential expression analysis, we found 26 up-regulated genes including ORAI1, ORAI2, and ORAI3, while the 19 down- regulated genes include STIM1 and STIM2. On the other hand, in the co-expression analysis,

64

we found two significant modules (blue and turquoise modules) in normal samples and three significant modules (brown, blue, and turquoise modules) in tumor samples which the expression patterns of the individual modules tend to provide clues about their functions. Next, we identified 83 differential interactions in normal-tumor samples and 61 interactions in tumor- normal samples. Finally, we identified CREBBP and ELK3 as hubs genes in the normal-tumor network, and ELK3 in the tumor-normal network. Overall, our findings form an important basis for identifying TFs targeting STIM and ORAI genes and demonstrate the significant involvements of STIM and ORAI genes in breast cancer. In ongoing work, we are extending the gene enrichment analysis and applying the framework to datasets of diseases reported linked to STIM and ORAI genes such as Alzheimers disease (AD) and prostate cancer.

65

Chapter 5

Evaluation of the Protein Pocket Identification Tools on

Protein-Ligand Complexes

My contribution was to write the manuscript, designed the research project, and analyzed the results together with the co-authors Zhao Yuan, Rahmad Akbar, and Volkhard Helms. I and Rahmad Akbar co-supervised Zhao Yuan. Zhao Yuan performed the calculations.

Abstract

Binding pockets are regions on protein surfaces where substrates of enzymatic reactions or effector molecules and co-factors may bind. Thus, identifying these cavities is often a prerequisite step for structure-based drug design. Various computational methods have been developed to identify such sites on protein surfaces. In this work, we evaluated the seven tools DEPTH, DoGSiteScorer, Fpocket, GHECOM, IsoMif, PocketPicker, and ProACT2 on a dataset of 167 non-redundant protein-ligand complexes. We analyzed how well the predicted pocket-lining residues overlap with the residues that contact the ligand. We used the residue overlap to define a score as a measure of the predictive capabilities of the tools. Even though the tools predicted pockets of various sizes and shapes we found comparable performance amongst the predictions of five tools (DEPTH, GHECOM, DoGSiteScorer, Fpocket, and IsoMif) in terms of average score. Using always the most suitable tool improved the average score by 28% over randomly selecting a tool. To support users in a pocket prediction scenario, we trained a random forest model (classifier) to output a list of suitable tools for a given protein structure. This classifier should be useful for prioritizing the tools to be used for unknown proteins or proteins that are not contained in our dataset.

66

5.1 Introduction

Proteins play major roles in practically all cellular processes. They typically interact with small molecules (ligands), nucleic acids or other proteins to perform a certain function. These interactions often occur in a particular site on the protein surface (binding site). As binding sites are thus often related directly to protein function, it is important to advance our understanding on these sites. The large collection of experimentally determined three- dimensional structures of protein-ligand complexes stored in the protein data bank (PDB) allows us to study these binding sites. For instance, it has been shown that binding sites for small molecule ligands tend to be rather hydrophobic with few selected polar and charged residues [199–201] and tend to be found in deep pockets on the proteins surface [53,202].

Based on this data, one can develop algorithms to identify cavities that may accommodate bound ligands on protein surfaces. The current batch of such algorithms fall into five categories (i) geometric methods that can be further grouped into the three subcategories grid system scanning, probe sphere filling, and alpha shape [49,50], (ii) energy based methods, (iii) evolution based methods, (iv) blind docking and molecular dynamics and, (v) combined approaches [48]. Grid system scanning basically projects a protein structure onto a three- dimensional grid of points and examines spatial overlaps on this grid. DEPTH [57,58], DoGSiteScorer [56], GHECOM [49], and PocketPicker [55] are tools that implement a grid system scanning approach. Probe sphere filling methods generate a set of probe spheres to fill cavities on protein surfaces. Pockets are then defined as those regions containing the highest amount of spheres, e.g. by the tool IsoMif [63]. Alpha shape methods rely on the alpha-shape theory and Voronoi tessellation to identify a pocket. Fpocket [68] is a representative of alpha shape methods. Energy based methods identify pockets using energetic criteria. Cavities with the largest total interaction energies are defined as pockets. For instance, ProACT2 [69,70] is a representative of energy based method. Other tools employ further strategies. For example, Rate4Site [203] uses an evolution based approach and MolSite [77] utilizes blind docking. As a wide range of different strategies and approaches are employed by current pocket identification tools, it is of interest to compare and contrast the performance of these tools.

Defining the correct pockets on protein surfaces is not an easy task [68,204]. Schematically, Figure 5.1 sketches three possible ways to define the pocket volume of a pacman-shape surface cavity that is shown in two dimensions. It is unclear what definition is correct and most useful. Here we analyzed how well the constructed pockets overlap with the protein contacts made by small molecule ligands in their X-ray conformations. One should add,

67

as a word of caution, that native or synthetic ligands may either be smaller than surface pockets or exceed the volume of the pocket into the solution. Figure 5.2 shows for a case system that these tools predict pockets of various sizes and shapes. Indeed, in some cases the ligand is not fully enclosed by the detected pocket whereas other tools generated rather large pockets.

Figure 5.1 Two dimensional illustrations of three possible ways to define the pocket volume of a pacman-shape surface cavity.

Figure 5.2 Pockets (blue) identified by DEPTH (lining residues) (top panel left), GHECOM (top panel middle), Fpocket (top panel right), DoGSiteScorer (middle panel left), PocketPicker (middle panel middle), IsoMif (middle panel right), and ProACT2 (bottom panel) that overlap with the ligand HNT (red sticks) bound to human phenylethanolamine N-methyltransferase, PNMT (PDB ID: 2G70). The figures were generated using PyMOL Molecular Graphics System [47].

68

Here, we compared the performance of seven tools using a set of quality metrics on a set of 167 protein-ligand complexes. We then computed a set of physico-chemical and geometric features for each ligand-bound pocket in the dataset. Correlation analysis of pocket features and the quality metrics revealed only weak correlation between pocket features and the predictive performance of the tools. In general, we found comparable performance in the predictions of five tools DEPTH [57,58], GHECOM [49], DoGSiteScorer [56], Fpocket [68], and IsoMif [63].

5.2 Material and Methods

Documento similar