• No se han encontrado resultados

Cambiar el tipo de gráfico, el diseño y sus elementos.

In document MICROSOFT POWER POINT MOS 2013 (página 70-74)

The compilation of datasets included in the AFND encompasses the largest collection of immune gene frequency data in the world. As mentioned in Section 1.1, the design of databases involves a number of challenges such as the standardisation of formats, the use of an appropriate controlled vocabulary and the interaction with other external databases for data validation and data exchange.

Due to the wide scope of data published on immune gene frequencies spanning the last two decades, the first objective in the design of the AFND was to provide a description of the population attributes to generate a controlled vocabulary based on the terminology found in the literature. In this process, several problems were found such as the standardisation of geographical regions and ethnic groups. During the

13 Barker‟s notation available in the PowerDesigner Software was developed by Richard Barker and collaborators as a method to represent Entity-Relationships Diagrams.

Chapter 2  Design of the AFND: datasets, schemas and methods

59

compilation of datasets it was decided to select the term which described the populations more appropriately. With the aim of formulating a standard terminology in reporting immune gene frequencies and population attributes, several groups from international organisations such as the Immunogenomics Data Analysis Working Group (IDAWG) (Mack et al. 2009) and the European network of the HLA diversity for histocompatibility, clinical transplantation, epidemiology and population genetics (HLA- NET) (Sanchez-Mazas et al. 2010) are actively participating in the definition of the controlled vocabulary.It is expected that in the near future the definition of terms used in these immune gene databases can be finalised. One of the advantages of the model described in this chapter is the flexibility in the modification of the physical schema which would allow the incorporation of new changes.

The second objective was focussed on the validation of the data submitted and the interaction with external databases such as the IMGT/HLA, IPD-KIR. As shown in Section 2.2.3.2, the list of valid alleles is automatically updated after a new release is available, assisting in the uniformity of data submitted by users.

Methodology used in the design of the AFND schema

The selection of a particular model in the design of databases has been a topic of controversy. Databases are generally constructed considering several factors such as the availability of data, the origins of the information, hardware requirements, etc. (Bornberg-Bauer & Paton 2002). Although it was believed that OODBs would replace RDBs, at present 90% of the biological datasets are still based on relational models (Hellerstein, Stonebraker & Hamilton 2007; Rob & Coronel 2008). Two of the main reasons in the delay of these implementations are the complexity of migrating existing relational models and the difficulty of performing new implementations in existing OODBs. Thus, the RDB schema was used as the model for the implementation of the AFND.

The first target in the design of the AFND schema was to generate a model which could encompass all four polymorphic regions presented in this research (HLA, KIR, MIC and several cytokine gene polymorphisms). After a preliminary analysis of the content of the different polymorphisms, it was concluded that the construction of general tables

Chapter 2  Design of the AFND: datasets, schemas and methods

60

for the storage of allele and haplotype frequency data was feasible (Figures A.2 and A.6). However, for genotype data, there was a need to generate a customised format to define the specific characteristics of the polymorphism of interest. This was the case of KIR genotypes. For cytokine frequencies it was opted for designing a separate table as the format in which cytokine polymorphisms were represented in the literature differed significantly from HLA, KIR and MIC.

The AFND as a generic framework for the collection of immune gene frequencies

The design of a generic model in the database schema permits the addition of other polymorphisms of interest with a minimum effort in the implementation. The use of preconfigured scripts and modules can speed up the incorporation of new polymorphisms such as minor Histocompatibility antigens (mHags), platelets, blood groups, etc. As such, an ongoing development includes a new section in the AFND to compile frequency data of minor Histocompatibility antigens (mHags) which are of relevance in haematopoietic stem cell transplantation (Simpson et al. 2002). Furthermore, other non-human species can be added into this schema as some of the genes share a similar structure, e.g. Bovine Human Leukocytes (BoLA), Dog Leukocyte Antigens (DLA), Feline Leukocyte Antigens (FLA), among others. This schema has assisted the implementation of a BoLA database prototype developed at the University of Liverpool which was used as supporting material for an ongoing grant application via the Biotechnology and Biological Sciences Research Council (BBSRC).

Accuracy of data

Unfortunately, on many occasions, data available in the literature are not always accurate. In many cases, Editors of journals were contacted to discuss these issues. Although the AFND cannot guarantee the accuracy of the tissue typing of the individuals, more than 90% of the data on the website has been peer-reviewed and published. Thus, the AFND relies on the accuracy of data being verified by the reviewers of the journals and acts mainly as a source for compiling data. Future developments in the AFND will include the collection the raw data in order that the

Chapter 2  Design of the AFND: datasets, schemas and methods

61

website can assist researchers in assessment of data quality. The module for the incorporation of raw data will be discussed in detail in Sections 8.2.4 and 8.3.1.

In document MICROSOFT POWER POINT MOS 2013 (página 70-74)