• No se han encontrado resultados

4. Métodos específicos para cada objetivo

5.3. Asociación entre la escala de EVS y diabetes tipo 2

3.2.1 Introduction

The CALIBER8 (Clinical research using Linked Bespoke studies and Electronic health Records)

platform is a series of linkages between UK electronic healthcare databases: Clinical Practice Research Datalink (CPRD) a longitudinal primary care database, Hospital Episode Statistics (HES) a database of hospital admissions and procedures, Myocardial Ischaemia National Audit Project (MINAP) a national acute coronary disease registry and the Office of National Statistics (ONS) for cause-specific mortality and social deprivation data. The CALIBER dataset used in this PhD holds data comprising of approximately 2 million patients from 225 general practices in England that have consented to linkage between 1997 and 2010.

3.2.2 Data linkage

CALIBER provides linkages between the anonymised datasets using encrypted CPRD patient identifiers allowing a more complete insight into patient’s medical journeys than if any data resource was used individually (Figure 3.2). Individual data sources alone may not accurately capture incidence of events, and different data sources collect different data.

Data linkage of patient level data from CPRD to other anonymised data sources are performed by a Trusted Third Party using NHS numbers, gender and date of birth. 58% of UK CPRD general practices currently consent to data linkage and it has previously been demonstrated that CPRD-HES linked data are representative of the general UK population.172-174

3.2.3 High resolution phenotypes

While in traditional cohorts diseases and events are confirmed at the time of occurrence or retrospectively by checking hospital notes, within CALIBER high quality disease phenotypes are developed using the linked data through collaborations between clinicians, epidemiologists and statisticians. The phenotype development process is outlined in Figure 3.3.

 In a test cohort descriptive and explorative analyses are performed on the potential phenotype components to determine which are suitable for use in the algorithm

95

 A preliminary phenotype algorithm is formed which is then tested and revised iteratively

 The revised phenotype algorithm is implemented and tested and revised accordingly to reach the final version of the algorithm

 The final phenotype is added to the CALIBER portal, including the codelists, metadata, code and programming scripts required to implement the algorithm given raw data  The phenotype is made available for the EHR community who provide feedback to

help further enhance the algorithm

Phenotypes for diseases and medical conditions have been developed for primary care records using Read codes in CPRD and secondary care records using ICD-10 codes in HES. The high resolution disease phenotypes usually include categories which are determined by wording and clinical usage of Read and ICD terms. Phenotype categories can denote:

 Subtype (e.g. MI: STEMI, NSTEMI, unspecified; Cancer: metastases, anatomical sites)  Severity (e.g. Renal disease: mild, moderate, severe)

 Status (e.g. History of; monitoring; possible diagnosis; confirmed diagnosis)

For many diseases, CALIBER researchers have developed composite phenotypes which fully harness the data across the linked data sources. The composite phenotypes often comprise of disease diagnoses in both primary and secondary care, procedures, test results and prescribed medications relevant to the disease. Such composite phenotypes ensure maximal case

ascertainment of a disease within the linked electronic health records.

Examples of phenotype development and validation In CALIBER have been published for myocardial infarction36 and atrial fibrillation.38 These studies demonstrate which codes are

used from the various sources or how diagnoses may be inferred (e.g. from relevant

biomarkers, prescriptions or procedures) and compare patient characteristics, risk factors or outcomes with those from traditional cohorts to confirm the validity of the phenotype. 3.2.4 CALIBER study approval

Studies of anonymised UK primary care data and linked data, such as CALIBER, are subject to approval from the Independent Scientific Advisory Committee (ISAC). ISAC approval is gained through submitting a protocol which outlines the study background and objectives, the data required including rationale and definitions (e.g. Read and ICD-10 codelists) for the study population, exposures and endpoints and a statistical analysis plan. The ISAC committee members (a multidisciplinary group of clinicians, statisticians, epidemiologists, health

96

informaticians, data scientists and lay members) provide detailed feedback and advise whether the study protocol is approved or requires revisions and resubmission.

Lay summaries of studies approved by ISAC are available online. Published research articles which use UK primary care and linked data are required to report their approved ISAC protocol number. Any minor or major changes to an ISAC approved study protocol are subject to re- review by the ISAC committee.

3.2.5 CALIBER user tools

The CALIBER data portal [https://www.caliberresearch.org/portal] contains a comprehensive collection of all phenotypes and their code lists developed in the CALIBER, spanning across numerous cardiovascular and non-cardiovascular disease areas. (Figure 3.4)

A series of R packages to support the use of CALIBER data have been developed by Dr Anoop Shah, CALIBERlookups, CALIBERcodelists, CALIBERdatamanage. These packages include dictionaries for ICD-10, Read and ONS codes, functions to look up and generate codelists, map codelists between dictionaries and to generally aid management of large datasets.

3.2.6 CALIBER data management

The CALIBER data platform is managed by the Data Lab: a team of data scientists who manage and maintain the catalogue of disease phenotypes in the CALIBER portal, perform data extraction and assist cohort formation.

3.2.7 Strengths of the CALIBER platform and approach to phenotyping

The CALIBER platform has a number of advantages for researchers. There is a wide range of linked data available, a representative sample of the English general population (2 million people registered at GP practices 1997-2010). The linkages between primary care, hospital admissions, disease registry and cause of death provide a comprehensive overview of these patients journey through the healthcare system.

For researchers using EHRs there is potential for disease definitions to vary from study to study. This can be problematic for interpreting and comparing results and study replicability. Under the CALIBER programme, there is a standardised approach for developing disease phenotypes. The portal contains a large and growing number of validated and reproducible disease phenotypes available to researchers, therefore encouraging scientific replicability.

97