• No se han encontrado resultados

 es la conductancia del panel fotovoltaico.

4.3 Resultado experimental

The fields of bioinformatics and computational biology were widely used to investigate questions about biological composition, structure and function of gene/protein involved in this study. These approaches allow large-scale analysis (such as WES and targeted NGS analysis, Sections 2.10.2 and 2.11.2), designing (such as primer design, Section 2.4.1), prediction (such as software for predicting mutation pathogenicity, Section 2.14.2) and obtaining data from many disciplines. The list of bioinformatics tools used in this study are listed below.

2.14.1 Genetic, phenotypic and functional data sources

The basic information about the candidate genes including genomic sequence, intron-exon structure, location of polymorphisms and amino acid conservation was obtained using the UCSC Genome Browser (http://genome.ucsc.edu/), while information about disease phenotypes was collected using Online Mendelian Inheritance in Man database (OMIM - http://www.ncbi.nlm.nih. aov/omim). Literature searches of techniques, genes and proteins were carried out using PubMed (http://www.ncbi.nlm.

60

nih.gov/pubmed), Genecards (http://www.genecards.org/), the Ensembl Genome Browser (http://www.ensembl.org/index.html) and NCBI site (http://www.ncbi.nlm.nih. gov/gene/).

2.14.2 Software for predicting mutation pathogenicity

A large number of in-silico tools have been developed to predict the effect of an unclassified variant on the protein function. These software tools play a key role in prioritizing the causative mutation candidates. Some of these tools are discussed below.

2.14.2.1 Polymorphism phenotyping v2 (PolyPhen-2)

PolyPhen-2 is a freely available, web-based program used to predict the possible impact of a non-synonymous variant on the stability and function of the protein (http://genetics.bwh.harvard.edu/pph2 /index.shtml). This tool integrates the indexes of UCSC Genome Browser’s human genome annotations together with the Vertebrate Genome Annotation (VEGA) database. The software estimates the probability score based on a combination of structural properties, comparative evolutionary profiles, the differences between all functionally known damaging alleles with non-damaging and the differences present between human and vertebrate orthologues (Adzhubei et al., 2010). The differences between human disease-causing mutations in the UniProt knowledgebase (UniProtKB) (http://www.uniprot.org/help/uniprotkb) and common human non- synonymous single nucleotide polymorphisms (nsSNPs) with MAF>1% and no disease- associated annotation are also considered in the prediction. PolyPhen-2 scores between 0 and 1.00 are interpreted to give qualitative predictions as follows: <0.15 = benign substitution prediction, 0.15-0.85 = possibly damaging, and 0.85-1.00 = probably damaging.

2.14.2.2 Sorting intolerant from tolerant (SIFT)

SIFT is a web-based program that classifies the amino acid substitutions as tolerated or deleterious (http://sift.jcvi.org/). The probability matrix is calculated according to the degree of conservation of amino acid residues in multiple sequence alignments collected from homologues with similar functions using PSI-BLAST (Position-Specific Iterative Basic Local Alignment Search Tool). The software has a default cut-off threshold of 0.05.

61

SNPs with SIFT scores higher than this threshold are regarded as tolerated (Ng and Henikoff, 2003).

2.14.2.3 The BLOSUM62 matrix

The BLOSUM62 substitution matrix can score all the possible exchanges of one amino acid with another (http://www.ncbi.nlm.nih.gov/Class/FieldGuide/BLOSUM62 .txt). The matrix is derived from about 2,000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins. The classification of protein patterns into families depends mainly on the regions thought to be important to protein function (motifs) in addition to how often the amino acid is substituted within the block of human related proteins. The “star-tree” score model ranges from -4 to +3 for non- synonymous amino acid substitutions. A score of -4 means a big change in property when switching from one of the two amino acids in question to the other, which would be likely to alter protein function, so the amino acid substitution is highly unlikely to be benign. Conversely, a score of +3 means the substitution is between two amino acids with very similar properties, and is therefore likely to be benign. The Blosum62 substitution matrix should be used alongside other pathogenicity prediction tools because the data upon which it is based is restricted to a subset of conserved domains (Henikoff and Henikoff, 1992).

2.14.2.4 Align-GVGD program

The Align-GVGD program (Align-Grantham Variation and Grantham Deviation) is a web server that can localize the missense substitutions in genes of interest into a spectrum ranging from enriched neutral to enriched deleterious (http://agvgd.iarc.fr/ index.php). The program works on the combination of protein multiple sequence alignments (in FASTA format) and the biophysical characteristics of amino acids. The biophysical variation at each alignment position is converted to a Grantham Variation score. The prediction classes form a spectrum (C0, C15, C25, C35, C45, C55 and C65) with C65 most likely to affect the protein function and C0 least likely (Tavtigian et al., 2006).

62 2.14.2.5 MAPP program

Multivariate Analysis of Protein Polymorphism (MAPP) is one of the missense prediction tools that can be downloaded and run locally (http://mendel. stanford.edu/SidowLab/downloads/MAPP/index.html). This application can be used to predict if the effect of the mutation will have a good or bad effect on the physicochemical properties of the protein including polarity, volume and hydropathy (Stone and Sidow, 2005).

2.14.2.6 Mutation taster

Mutation taster is a fast web-based program (http://www.mutationtaster.org) used to evaluate different types of DNA mutations: synonymous, non-synonymous, nonsense and frameshift. The software integrates various data sources such as HapMap, Ensembl, dbSNP and SwissProt/UniProt. For this study, the scripts were downloaded and integrated into ANNOVAR software to run locally on a Unix-based system. A prediction is given as either ‘disease-causing’ or ‘polymorphism’ along with a P value indicating the security of the prediction (with 1 being most secure) (Schwarz et al., 2014).

2.14.2.7 CADD score

Combined Annotation-Dependent Depletion (CADD) is a novel functional meta- annotation tool (http://cadd.gs.washington.edu/) that can evaluate and score the deleteriousness of a large number of single nucleotide substitutions and indel variants (Kircher et al., 2014). CADD works as a framework that integrates data from 63 existing tools into one calculated metric score called the C-score of the variant. Unlike other annotation tools, CADD does not rely solely on the conservation information of the amino acid residues but also on the functional genomic data such as DNase I hypersensitivity and transcription factor binding; protein-level scores such as PolyPhen, SIFT and Align- GVGD; expression levels in commonly studied cell lines and exon-intron boundaries determined by transcript data.The C-score is calculated according to a combination of all of these data. A scaled CADD score of 10 means that a variant is amongst the top 10% of deleterious variants in the human genome. A scaled CADD score of 20 means that the variant is in the top 1%. A scaled CADD score of 30 means that the variant is in the top 0.1%.

63

2.14.3 Splice site prediction tools

In-silico splice prediction tools were used for the interpretation of intronic and exonic mutations that can lead to splicing defects. Two web based programs were used, Berkeley Drosophila Genome Project (http://www.fruitfly.org/seq_tools /splice.html) and NetGene2 (http://www.cbs.dtu.dk/services/NetGene2/) (Hebsgaard et al., 1996; Reese et al., 1997). These tools work as neural network based programs to find possible 5' and 3' splice sites. For each variant, two data sheets of reference and variant sequences including the surrounding genomic sequence of two or more exons were uploaded separately to the program. The output data of the possible splice acceptor and splice donor sites with the confidence scores were compared between the reference and variant sequences.

2.14.4 db SNP, 1000 Genomes, the EVS Server and ExAC database

The single nucleotide polymorphism database (dbSNP) is a public-domain archive for a broad collection of simple genetic polymorphisms for a variety of organisms, maintained at the National Center for Biotechnology Information (NCBI) (http://www. ncbi.nlm.nih.gov/ SNP/). The 1000 genomes project is a public catalogue of human variation and genotype data of over 1,000 unidentified individuals from around the world (US, UK, China and Germany) (http://www.1000genomes.org/). The Exome Variant Server (EVS) (http://evs.gs.washington.edu/EVS/) and Exome Aggregation Consortium (ExAC) (http://exac.broadinstitute.org/) are two different databases that collect frequencies of variants in populations from multiple studies. EVS based on WES data of 6503 well-phenotyped individuals from various ethnicities, while ExAC includes a larger cohort of 60,706 unrelated individuals sequenced as part of various disease- specific and population genetic studies. The data of individuals affected by severe paediatric disease has been removed from ExAC shared datasets, so these have been frequently used as a control population for calculating allele frequencies and filtering out potential benign variants observed at a relatively common frequency in the databases (Song et al., 2016).

64

2.14.5 Protein bioinformatics tools

Interactive protein analysis servers were used to perform basic bioinformatics analysis on any candidate protein. ExPASy translate tool is an online tool (http://web.expasy.org/translate/) that was used for translating a nucleotide sequence (DNA/RNA) to a protein sequence. ClustalW (http://www.ebi.ac.uk/Tools/msa /clustalw2/) and Clustal Omega (http://www.ebi.ac.uk/Tools/msa/clustalo/) are fast web- based programs that were used for multiple sequence alignments of amino acids in a protein. NCBI reference sequences of interest and orthologous protein sequences in FASTA format were pasted into this software. The output of multiple sequence alignment was arranged from top to bottom according to the degree of similarity indicating the conservation of an amino acid of interest and of the surrounding amino acid residues. Finally, the Protter tool was used for visualization of proteoforms (http://wlab.ethz.ch/protter/start/) and predicting protein sequence features (Omasits et al., 2014).

2.14.6 Linkage analysis

Two-point linkage analysis was carried out using Superlink (http://bioinfo.cs. technion.ac.il/superlink-online/) (Silberstein et al., 2006). This method uses a Bayesian network model to compute the likelihood scores for complex pedigrees, such as consanguineous pedigrees with multiple inbreeding loops. The software requires a pedigree (ped) file, describing the details and genotyping results of the individuals to be analysed in each pedigree, and a data (dat) file, describing the type of analysis required and allele frequencies. The resulting data is given as the logarithm of the odds (LOD) score.

Documento similar