4. Métodos específicos para cada objetivo
6.5 Limitaciones
3.3.1 Primary care data: Clinical Practice Research Datalink
The Clinical Practice Research Datalink (CPRD) contains pseudonymised primary care electronic patient records from UK general practices, currently covering 6.9% of the UK population with 4.4 million currently registered patients [129]. CPRD has been used extensively for epidemiological research with over a thousand research studies published in a wide range of disease areas [129].
History of the CPRD
The original name for what would become the CPRD was the ‘VAMP research databank’ (Value Added Medical Products). General practitioners were given free computers in re- turn for contributing data to the research databank. The VAMP Medical clinical system used the OXford Medical Information System (OXMIS) dictionary to encode clinical en- tries in order to minimise the use of hard disk space. Anonymised patient records were
Chapter 3. The CALIBER database Figure 3.1: How a patient’s medical history may be recorded in the CALIBER data
sources
collected at regular intervals from the practices, initially on tape and later electronically. The text-based VAMP Medical software was replaced by Vision, which had a graphical user interface, and practices switched from OXMIS to the Read Clinical Terminology at varying dates in the 1990s.
The research database was renamed the General Practice Research Database (GPRD) in 1993 [129]. It was transferred to the Office for National Statistics and sub- sequently to the Medicines and Healthcare products Regulatory Agency (MHRA) around the year 2000. A subset of GPRD practices in England consented to record linkage with other datasets, and were linked to Hospital Episode Statistics and the death registry. It was this subset that was further linked to MINAP to constitute the CALIBER dataset.
The recent change of name to ‘Clinical Practice Research Datalink’ reflects the aspi- ration of further linkages with other registries, clinical datasets and cohort studies in the UK [129]. The current CPRD contains only Read terms, with OXMIS terms having been converted to the Read equivalents. CPRD has now developed a system for collecting data from EMIS systems, which is the most commonly used GP system supplier in the UK [132].
Validity and representativeness of CPRD
Many studies using the CPRD performed validation of the coded diagnoses against anonymised paper records requested from the GP or anonymised electronic free text. A systematic review of CPRD validation studies found that diagnoses were generally reli- able [133]. However, because the only information in the database is that recorded during usual clinical care, clinical parameters or tests would not be recorded in individuals for whom it is not indicated.
3.3. Source datasets
Patients can opt out of data collection by informing their GP, but the majority of patients in CPRD practices contribute data, and CPRD patients are broadly representative of the UK population in terms of age, sex [129] and ethnicity [134]. However the practices in CPRD may not be representative of all practices in the UK by geographic distribution or size [135], and it is likely that the quality of data is also not representative, as CPRD practices are given data recording guidelines and regular feedback on the completeness and quality of data.
Structure of information recorded in CPRD
Information in the CPRD is recorded in a number of tables, which can be linked by the pseudonymised patient identifier in order to build up a complete picture of a patient’s healthcare experience.
Patients – one row per patient, with demographic details such as year of birth, date of death and registration dates.
Practices – one row per practice, giving details such as region of the UK and the date when the practice achieved a good standard of data completeness for research purposes (‘up-to-standard date’).
Consultations – each patient episode is considered a ‘consultation’ and all data are en- tered in consultations (face-to-face, telephone or administrative). This table allows diagnoses and prescriptions entered in the same consultation to be identified. It can also be useful for measuring healthcare utilisation and costs, and for linking to the staff table.
Staff – one row per staff member, with gender and role.
Events – there are a number of event tables, and a patient can have any number of events. Each event is linked to a single consultation and has an event date, a medical dictionary code (Read code) or product dictionary code (Multilex) and as- sociated information.
Clinical – Read coded diagnoses entered by the GP, additional data such as blood pressure measurements
Referrals – referrals to secondary care, with the indication recorded as a Read code
Immunisations – records of immunisations Therapy – prescriptions
Test – results of laboratory tests, each with a Read code
The type of information in the Test and Clinical tables is specified by the ‘entity code’ of each record. This defines what additional data are included with the record; for example
Chapter 3. The CALIBER database a haemoglobin (blood test) result will have a value, units and a normal range reported, whereas a blood pressure record will contain systolic and diastolic measurements and a record of the Korotkoff sound. Entity codes usually, but not always, correspond to Read codes; usually the Read code is more specific (as there are thousands of Read codes but only a few hundred entity codes). The entity codes can also be thought of as referring to different information models, or ‘archetypes’, which I discuss further in the last chapter (section 10.1.2 on page 235).
Read terms were designed for coding by GPs and incorporate synonymous terms with variations in the way doctors may express common diagnoses. Apart from diagnoses, the Read terminology includes codes for other categories of information such as history, examination findings, procedures and test results [136].
Considerations when using general practice data
UK general practice databases have the advantage that individuals are registered with a general practitioner (GP) over a defined time period, and the majority of the population are registered. This provides a denominator and enables the database to be used for estimating population incidence and prevalence of diseases [137, 138]. These estimates are based on the assumption that the research database population is representative and the condition of interest is completely recorded in general practice data.
Patients can only register permanently with a single GP in the UK, so when construct- ing a general practice database cohort, patients with temporary registration should be ignored.
3.3.2 Hospital Episode Statistics
Hospital Episode Statistics (HES) is the database of hospital admissions in England [130]. It contains coded information about each admission such as whether it was an emergency or elective admission, the specialty, dates of admission and discharge, and diagnoses (primary and secondary) coded using the International Classification of Diseases, 10th Edition (ICD-10) [139]. HES also contains records of procedures, coded using the Office of Population Censuses and Surveys codes (OPCS-4). Data are entered by clinical cod- ing clerks at the end of an admission. Outpatient information is available in the main HES database but has not been linked in CALIBER.
HES provides anonymised datasets for research, with patients linked between admis- sions and hospitals by an algorithm involving NHS number, date of birth, sex, postcode, hospital code and local patient identifier (hospital number) [140]. Episodes which cannot be linked will be considered to belong to a different patient. The completeness of the linkage is higher for recent years as a greater proportion of records have an NHS number (97% of inpatient activity in 2007–2008, compared to just 83% in 2000–2001).
Numerous epidemiological studies have been performed using HES, and validation studies found that for severe vascular disease, including myocardial infarction, stroke and
3.3. Source datasets
pulmonary embolism, HES records appeared to be both reliable and complete [141]. However, studies must exercise caution because it is an administrative dataset. The utility of HES for research is limited by the granularity of the data and the limitations of the ICD-10 coding system. For example, there are no specific codes for ST elevation (STEMI) and non-ST elevation myocardial infarctions (NSTEMI), although the probability the the event is coded using particular codes is different between STEMI and NSTEMI
(Table 4.3 on page 74). Investigations and minor procedures may be inconsistently
recorded. Uncommon or unusual conditions such as recreational drug toxicity may not be consistently recorded [142].
3.3.3 Acute coronary syndrome registry: MINAP
MINAP (the Myocardial Ischaemia National Audit Project) is the national registry of acute coronary syndromes, to which all acute hospitals in England and Wales contribute data. The primary purpose of MINAP was to improve quality of care but it has also become invaluable for research. MINAP comprises one entry per acute coronary syndrome ad- mission, with 123 fields containing detailed clinical information such as coded electro- cardiogram (ECG) findings, troponin results and interventions performed. Information is abstracted from clinical notes by audit nurses or clerks [9].
MINAP contains rich information on the clinical characteristics and treatment of pa- tients with acute coronary syndromes; for example it includes information on smoking status and ECG findings which are not recorded electronically in most hospital systems. However, as it is a voluntary registry there may be some selection bias in patients that are included, and this can vary by hospital. MINAP is thought to particularly under-report non- ST elevation myocardial infarction, but some myocardial infarctions are included twice if the patient was transferred between hospitals acutely [9].
The information in MINAP is not used for direct clinical care, but as a clinical audit it has been used to drive improvements in quality of care. Availability of MINAP data to researchers is governed by the MINAP Academic Group [9].
3.3.4 Death registry
The death registry for England and Wales curated by the Office for National Statistics (ONS) includes the date of death and the causes entered on the death certificate. A single underlying cause of death is allocated according to the WHO ICD-10 algorithm based on the information recorded on the death certificate, likely causal sequence and International Classification of Diseases selection rules [143].
Deaths in England and Wales have been coded using ICD-10 since 2001 [139] and ICD-9 in previous years. There was a change in the rules for selecting the underlying cause from ICD-9 to ICD-10 which means that causes of death are not directly compa- rable between 2001 onwards and previous years. However, the effect of these changes varied by cause of death; some causes of death such as pnemonia were particularly
Chapter 3. The CALIBER database affected but there was little effect on ischaemic heart disease and cerebrovascular dis- ease [143].
3.3.5 Deprivation index
The index of multiple deprivation is a composite measure of deprivation calculated using indicators in the following domains:
• Income deprivation • Employment deprivation
• Health deprivation and disability
• Education, skills and training deprivation • Barriers to housing and services
• Crime
• Living environment deprivation
The index is calculated for super output areas (postcode areas) and is available from the Office for National Statistics. In CALIBER it was linked to patient records via the postcode, but postcode is not included in the final pseudonymised CALIBER dataset [131].