• No se han encontrado resultados

6. RESULTADOS CARACTERÍSTICAS HUMANAS EMPRENDEDORAS

6.1. ANÁLISIS DE LAS CARACTERISTICAS HUMANAS EMPRENDEDORAS DE LOS

6.2.2 Características Interpersonales en el programa de Administración de Empresas

Parts of this thesis work have been published in journals, conferences and workshops of bioinformatics and computational biology, as well as under preparation for submission. Below is the list of the publications:

11 Related publications of Chapter 2:

Sumaiya Iqbal and Md Tamjidul Hoque. DisPredict: A Predictor of Disordered Protein using Optimized RBF Kernel. PLoS One, 2015, 10(10), e0141551, DOI: 10. 1371/ journal. pone. 0141551.

Sumaiya Iqbal, Denson Smith, Avdesh Mishra, Md Nasrul Islam, Md Tamjidul Hoque. Disordered Protein Prediction by Spiders. CASP11 proceedings, 2014, pp. 215–216.

Sumaiya Iqbal, Md Nasrul Islam, Md Tamjidul Hoque. Improved Protein Disorder Predictor by Smoothing Output. IEEE Conference on Computer & Information Technology (ICCIT), 2014, pp. 110–115, DOI: 10.1109/ICCITechn.2014.7073113, Dhaka, Bangladesh.

Related publications of Chapter 3:

Sumaiya Iqbal, Avdesh Mishra and Md Tamjidul Hoque. Improved Prediction of Accessible Surface Area Results in Efficient Energy Function Application. Journal of Theoretical Biology, 2015, 380, pp. 380–391, DOI: 10.1016/j.jtbi.2015.06.012.

• Avdesh Mishra, Sumaiya Iqbal and Md Tamjidul Hoque. Discriminate Protein Decoys from Native by using a Scoring Function based on Ubiquitous Phi and Psi Angles Computed for All Atom. Journal of Theoretical Biology, 2016, 398, pp. 112–121, DOI: 10.1016/j.jtbi.2016.03.029. • Avdesh Mishra, Sumaiya Iqbal and Md Tamjidul Hoque. An eclectic energy function to

discriminate native from decoys. The 4th Annual LA Conference on Computational Biology and Bioinformatics, 2016, New Orleans, LA.

Related publications of Chapter 4:

Sumaiya Iqbal and Md Tamjidul Hoque. Estimation of Position Specific Energy as a Feature of Protein Residues from Sequence alone for Structural Classification. PLoS One, 2016, 11(9), pp. e0161452, DOI: 10.1371/journal.pone.0161452.

Sumaiya Iqbal and Md Tamjidul Hoque. Estimation of free energy contribution of protein residues for structure prediction from sequence. The Great Lakes Bioinformatics and the Canadian Computational Biology Conference (GLBIO/CCBC), 2016, Toronto, Canada.

12

Sumaiya Iqbal, Denson Smith and Md Tamjidul Hoque. Accurate identification of disordered protein residues using deep neural network. The 4th Annual LA Conference on Computational Biology and Bioinformatics, 2016, New Orleans, LA.

Related publications of Chapter 5:

Sumaiya Iqbal and Md Tamjidul Hoque, A Study of Disorder-to-Order Transition by Characterizing the Binding Partners using a Statistical Potential, Biophysical Journal, vol. 112, p. 209a, 2017, DOI: 10.1016/j.bpj.2016.11.1153.

Sumaiya Iqbal and Md Tamjidul Hoque, Prediction of Peptide-Binding Residues of Receptor Proteins in a Complex, in the 5th Annual Conference on Computational Biology and Bioinformatics, New Orleans, LA, 2017.

Sumaiya Iqbal, Md Tamjidul Hoque, Modeling sequence Pattern of Peptide-Binding Domain Residue using Stacking (submitted), 2017.

Sumaiya Iqbal, Md Tamjidul Hoque, PBRpredict-Suite: Learning the Residue Pattern of Ppetide- Binding Domains from Sequence using Stacked Generalization (submitted), 2017.

Miscellaneous:

Sumaiya Iqbal, Tamjidul Hoque. hGRGA: A Scalable Genetic Algorithm with Homologous Gene Schema Replacement. Swarm and Evolutionary Computation, vol. 34, pp. 33 - 49, 2017.

Sumaiya Iqbal and Md Tamjidul Hoque. A homologous gene replacement based genetic algorithm. Genetic and Evolutionary Computation (GECCO), 2016, Denver, CO

• Tamjidul Hoque, Sumaiya Iqbal. Genetic Algorithm based Improved Sampling for Protein Structure Prediction. International journal of Bio-inspired Computation, vol. 9, pp. 129 - 140, 2017.

13

Chapter 2

DisPredict: A Predictor of Disordered Protein

 A Framework using optimized RBF kernel SVM

Intrinsically Disordered Proteins (or unstructured proteins) constitute a unique class of the protein kingdom, and have been recently recognized as a key player in the functional proteomics. Intrinsically disordered proteins or regions in proteins (IDPs/IDRs) lack rigid three-dimensional (3D) structure under physiological conditions in vitro [2]. However, IDPs, in full or in regions of the sequence, possess important biological functions despite their extremely flexible, essentially non-compact (or extended) structures. While the molecular recognition functions of IDPs/IDRs include pathways to carry out cell division, signaling, recognition and regulation [19], the structural heterogeneity of IDPs are highly linked to the amyloid aggregation that is involved in critical human diseases such as cancers, Parkinson’s disease, Alzheimer’s disease, type II diabetes and others [20]. Accurate identification of IDPs has significant implications in proper annotation of protein function and further understanding of drug design to combat disorder- associated diseases. Fast growing protein sequence repository [21] demands for high throughput computational techniques for identification of disordered residues from protein sequence, which is regarded as an imperative area of research in bioinformatics and computational biology.

In this chapter, we introduce our proposed disorder predictor framework, called DisPredict (Disorder Predictor) [10] that classifies ordered and disordered residues from protein sequence alone. DisPredict employs a support vector machine with RBF kernel. With an optimal set of parameters for RBF kernel and a unique set of features including several novel features for reliable characterization of protein structure, DisPredict yields promising performance in both order versus disorder, i.e., binary classification as well as

14

per-residue probability prediction, specifically in terms of Mathews Correlation Coefficient (MCC) and Area Under the receiver operating characteristics Curve (AUC).

DisPredict is evaluated using a 10-fold cross validation as well as tested with independent test datasets. The use of multiple data sources makes the predictor generic. Moreover, by comparison with other state- of-the-art approaches and case studies, DisPredict is found to be a useful tool with competitive performance. In addition to the development of the predictor, we performed an analysis of the structural features of experimentally annotated disordered and ordered regions of proteins using feature correlation plot. This experiment gave us insight of the collected overlapping annotation of the ordered and disordered segments of proteins in their feature space. The result of this experiment indicates the possible noise in the annotation of disordered and ordered residues in the available databases and instigates to formulate new characteristic feature to segregate disordered and ordered residues more clearly – in this direction, the outline of the rest if the chapter is given as follows.

• We start by giving the background information about intrinsically disordered proteins and their functions, and motivation behind developing a new predictor in Section 2.1.

• Next, we review existing disordered protein predictors in Section 2.2 along with our contribution. • In Section 2.3, we describe the experimental materials, such as data sources, data collection and mining

processes, input features used to train the predictor, and the criteria to evaluate and compare the predictor.

• Section 2.4 describes the first version of the predictor, DisPredict (version 1.0). In this thesis, by ‘DisPredict1.0’ or just by ‘DisPredict’, we refer to the first version of our disorder predictor.

• We described the performance evaluation related to optimal window size and parameter selection and for the comparison of the performance of DisPredict1.0 with existing predictors in Section 2.5. • The analyses of the results and datasets as well as the feature correlation are presented in Section 2.6. • In Section 2.7, we discuss an investigative strategy to make an improvement over DisPredict1.0. Keeping the similar framework and features to build the predictor, we included a post-processing of the output probabilities generated by DisPredci1.0 to develop DisPredic1.1, which improves the accuracy of prediction.

• Finally, we conclude in Section 2.8 with future research directions.