• No se han encontrado resultados

I identified 15 relevant reviews in total (see Table 2.3). The identified reviews varied in

34 methodological quality (see Appendix A3 for summary of the reviews). The publication dates of the reviews ranged from 2000 to 2013. Using the AMSTAR criteria, all the reviews apart from one were moderate to high quality (see Table 2.3). Two reviews scored four and below: low quality, five reviews scored between five and seven:

moderate quality, and eight reviews scored between eight and twelve: high quality (see Appendix A3).

Table 2.3 Quality of identified reviews using the AMSTAR checklist

Reviews AMSTAR score Quality

Van Herck et al., 2010 11 High

Witter et al., 2012 11

Reda et al., 2009 10

Houle et al., 2012 10

Hamilton et al., 2013 9

Gillam et al., 2012 9

Scott et al., 2011 9

Huang et al., 2013 8

Petersen et al., 2006 7 Moderate

De Bruin et al., 2011 6

Eijkenaar, 2012 6

Chaix-Couturier et al., 2000 6

Christianson et al., 2008 5

Oxman and Fretheim, 2009 4 Low

Canavan et al., 2008 3

I identified 326 primary studies (excluding duplicates) spanning from 1990-2013 from 15 reviews, from other sources (bibliography etc.), and from updating the review conducted by Van Herck and colleagues (2010) (see Figure 2.1). I screened out P4P studies not targeted at health service providers (12) and studies that I could not find the full text articles (13). I assessed 301 studies for eligibility: I excluded non-evaluation P4P studies (146)2, descriptions of P4P schemes (36), and studies with unclear and/or poorly reported outcomes (12). In total, I identified 102 primary studies (including nine

2 Studies that did not specifically evaluate the effects of P4P on healthcare

quality/cost/performance/outcomes e.g. implementation studies or studies exploring the take up of p4p

35 qualitative studies) (see Figure 2.1). Out of the primary studies identified, only nine studies were randomised controlled trials (RCTs), 63 other studies had an adequate control group (quasi-experimental), and 30 studies had no control group.

For the meta-analyses, only 36 studies were included (6 RCTs, 20 quasi-experimental studies, and 10 pre-post studies with no control group) (see Figure 2.1). The reduced number of studies included in the meta-analyses was due to poor reporting of important data to aid conversion of the different reported outcome measures into a standard measure to enable the inclusion of as many as possible studies in the meta-analysis (see section 2.3.). The authors of the other studies were not contacted for more detailed information due to time constraints. Nevertheless, it was considered that the included studies provided some indication of the nature of the available evidence.

36 Figure 2.1 Flow chart of identification of included studies

Total number of primary studies identifies (n = 456)

Records screened (n =326)

Records excluded n = 13: full text articles not found

n=12: P4P schemes not targeted at health service providers

Full-text articles assessed for eligibility

(n = 301)

Full-text articles excluded n=146: non evaluations of P4P

(e.g. studies on physician participation of views on P4P)

n = 38: descriptions of P4P schemes

n=12: outcomes evaluated not clear

Studies included in narrative review (n =96) with numeric results on 213 process and

75 outcome measures (n=9) qualitative studies

Studies excluded from meta- analyses due to poor and

incomplete reporting of outcomes

(n=60) Studies included in meta-analyses

(n=36)

Duplicates (n =130) Number of reviews identified through

database search (n=15) Number of primary studies from the

reviews (n=414)

Number of studies identified through updated review by Van Herck and colleagues (2010), (n=42)

37 2.4.2. Overview of evidence (narrative review)

In this section, drawing from the identified reviews and primary studies, I summarise the available evidence and highlight the problems therein. First, I summarise the evidence on the general effectiveness of P4P. Then, I summarise the effectiveness of P4P on the most common areas in which it is used health care: smoking cessation and chronic disease management, pointing out variation in effects of P4P on process and outcome measures. Afterwards, I summarise country specific evidence, including the quality and outcomes framework (QOF) in the UK and some evidence in low and middle-income countries (LMIC), highlighting differences in results according to the rigour of evaluation. Finally, I summarise the available evidence on cost-effectiveness, sustainability of the effect of P4P, and the unintended consequences of P4P,

highlighting the limited evidence base.

2.4.2.1. Evidence on general effectiveness

An early review by Chaix-Couturier and colleagues in 2002, found that financial incentives could be used to reduce the use of health care resources by increasing

compliance with practice guidelines, but that it is more effective to use combinations of financial and non-financial incentives. Other systematic reviews that assessed the impact of financial incentive on health service providers for health quality measures also found mixed results. For example, Petersen et al., (2006), found that about half of the studies included in their review reported mixed results on the impact of financial incentives on quality measures. About 20% of their included studies showed no statistically significant results (for indicators assessed) while the other 20% showed positive impact, with one study even reporting negative effects of P4P on quality measures. Similarly, the review by Christianson et al., (2008) found improvements in some of the quality measures assessed, but the degree of contribution of P4P was not clear because the financial incentives were typically implemented in conjunction with other quality improvement efforts and there were no convincing comparison groups.

More recent reviews show similar mixed results. The review by Scott et al., (2011) found positive but modest effects in a few measures of quality of care, provided by primary care physicians. Houle et al., (2012) also found that financial incentives

modestly improved preventive activities, such as immunization rates, but there was little evidence that it was effective for other activities such as mammography referrals and

38 cancer screening. Further showing the variations in the effect of P4P schemes was a very comprehensive systematic review by Van Herck and colleagues (2010), which assessed impact evaluations of P4P schemes, as well as evidence on the impact of design choices and contextual mediators on the effectiveness of P4P. They found that financial incentives result in the full spectrum of possible effects for specific targets, from absent or negligible to strongly beneficial and that the effects findings of P4P are likely to relate to context.

Finally, the study by Basinga et al. (2011), which assessed the effects of P4P on maternal and child health services in Rwanda (using an RCT design) also found that P4P resulted in significant improvements in institutional deliveries and preventive (child) visits, but had no effect on prenatal visits (pregnant women) and childhood immunizations.

It is noteworthy that despite the overwhelming evidence of varied/mixed effects of P4P, none of these reviews explored this variation using statistical methods.

2.4.2.2. Evidence on management of chronic diseases and smoking cessation (process vs. outcomes)

Two of the systematic reviews assessed the effect of financial incentives on chronic care. De Bruin et al., (2011) assessed the effectiveness of P4P schemes used to stimulate delivery of chronic care through disease management (by health service providers) with regards to quality and costs. They found that most studies showed positive effects of P4P on healthcare quality. However, five out of the eight P4P schemes were part of a larger scheme of interventions to improve quality of care and it was not clear how much of the improvement observed is attributed to P4P. The review by Huang et al., (2013), further showed the inconsistency of effects of P4P on diabetes treatment and

management, (e.g., patients with records of total cholesterol or blood pressure). The authors also found that process indicators such as recording of blood pressure and cholesterol levels had higher rates of improvement than outcome indicators (intermediate) such as cholesterol and blood pressure reduction.

In the same way, evidence on smoking cessation interventions was mixed and suggests that P4P might be more effective for process measures compared to outcomes. A review by Hamilton et al. (2013) assessing the impact of financial incentives to health service providers on smoking cessation found that P4P improved some process indicators such

39 as recording smoking status, advice and referrals but not for outcome measures such as smoking quit rates. Reda and colleagues (2009), also found no evidence of effectiveness of P4P on smoking cessation interventions (both processes and outcomes).

Following this, a descriptive analysis of data extracted from primary studies showed that P4P had a positive effect on 148 out of the 213 reported process measures (70%), as opposed to the positive effect on 41 out of 75 reported outcome measures (55%) (see Appendix D1). The findings demonstrate that evidence of whether pay for performance will lead to better patient outcomes is unclear (Huang et al., 2013, Hamilton et al., 2013).

Understandably, some outcome measures would be dependent on patient behaviour as well as the quality of health care. Therefore, it might be more difficult for financial incentives to health service providers to have a positive impact on those measures. In addition, some incentivised processes might not necessarily impact directly or at all on patient outcomes. It is important that process of care measures used in P4P schemes should be chosen based on good and robust evidence that improving these processes leads to improved health outcomes (Oxman and Fretheim, 2009b, Oxman and Fretheim, 2009a).

2.4.2.3. Rigour of evaluation

All the reviews identified reported that there were large numbers of primary studies with poor evaluation quality (lacking adequate controls) (see Appendix A3). A number of reviews concluded that the evidence base was too weak (due to poor evaluation quality) to draw reliable and valid conclusions or that the validity of the effect of financial incentives on healthcare is limited (Chaix-Couturier et al., 2000, Witter et al., 2012, Gillam et al., 2012, Canavan et al., 2008). In the following paragraphs, I describe evidence on rigour of evaluations using country specific examples of P4P such as the Quality and Outcomes Framework (QOF) in the UK, and developing countries such as Rwanda.

The QOF is one of the largest and most evaluated financial incentive programmes. The systematic review by Gillam et al., (2012) assessed the impact of the QOF on quality measures. They found that the QOF programme improved the incentivised activities in the first year of the programme at a faster rate than the pre-intervention trend. They also

40 found negative effects such as worsening quality measures for non-incentivised

conditions, decline in person-centeredness of consultations, and decline in patients’

satisfaction with continuity of care. The conclusion of this review was limited because of the lack of adequate control groups in the evaluations of the QOF programme.A few researchers have evaluated the QOF scheme using a convincing control group. An example is the study by Serumaga et al. (2010) assessing the impact of the QOF on management and outcomes of hypertension using an interrupted time series design (a quasi-experimental design). The study found that improvements in hypertension management and outcomes were as a result of gradual improvements before the introduction of P4P and that P4P had no effect on hypertension management and outcomes. On the other hand, retrospective and cross sectional studies assessing the impact of the QOF on hypertension management and outcomes concluded that the introduction of P4P improved treatment and management (Ryan and Doran, 2012, Simpson et al., 2011).

In the same way, evaluations suggest that P4P might be effective for some quality measures in LMICs, especially in Rwanda, Haiti, and Burundi (Oxman and Fretheim, 2009a, Witter et al., 2012, Canavan et al., 2008). However, the review by Oxman and Fretheim (2009a) demonstrated that it was difficult to disentangle the effects of financial incentives from other quality improvement measures. Similarly, Canavan et al., (2008) found that that P4P evaluations showed remarkable improvements in health indicators (utilization, coverage and emergency referral) in Afghanistan, Democratic Republic of Congo, Rwanda, and Haiti but that it was not clear the extent of attribution of improvements to financial incentives because of the presence of confounding

contextual factors e.g. differences in infrastructure. Likewise, the review by Witter et al., (2012) found mixed results from their review of the effectiveness of P4P in LMIC.

They found that P4P was effective for some quality measures but not others. The high and moderate quality studies included in this review reported that some quality

indicators improved while there was no improvement in others. Two of the studies showed significant improvement for the intervention group, while two showed no significant difference.

2.4.2.4 Evidence on Cost Effectiveness

The cost effectiveness of these schemes is very important and should be central to the debate together with effectiveness because implementing a P4P programme can be quite

41 costly (the costs to be considered includes the incentive cost, administrative costs, monitoring and evaluation costs).

There were two published systematic reviews with an explicit focus on economic evaluations of P4P schemes. One of the reviews (Emmert et al., 2012), considered costs and consequences of the P4P intervention, and included nine studies. Out of these nine studies included, only three were considered to be full economic evaluations with good methodological quality, and these reported that P4P was not cost effective. For example, the study by Nahra et al. (2006), assessed the cost effectiveness of a P4P programme focusing on improving the quality of heart care (process measures) in the hospital setting over a period of four years. It found a cost per quality adjusted life year (QALY) of £30,081, which was above the cost-effectiveness threshold of around £25,000 as suggested by The National Institute for Health and Care Excellence (NICE) (McCabe et al., 2008).

The other six studies that were considered as partial economic evaluations demonstrated mixed results of cost effectiveness, with poor methodological quality to draw valid or strong conclusions.

A more recently published review by Meacock et al., (2013) focused on assessing the cost effectiveness of the Advancing Quality (AQ) incentive programme in England.

They critiqued the narrow range of costs and outcomes considered within previous economic evaluations of P4P schemes, before proposing a new and more comprehensive framework, which they applied to the Advancing Quality (AQ) programme. Their findings suggest that the AQ is likely to have represented a cost-effective use of resources during the 18-month period of their study by generating approximately 5200 quality-adjusted life years and £4.4million of savings in reduced length of stay in the hospital. The AQ programme was shown to be cost effective within the study period. An important question remains whether the benefits and cost savings are sustainable. An impact evaluation of the AQ conducted by McDonald and colleagues (2014) found smaller mortality reductions in five clinical areas (acute myocardial infarction, heart failure, coronary artery bypass graft, pneumonia, and hip and knee replacement) in the long term (i.e. at 42 months) compared to mortality reductions at 12 months.

42 Some other reviews have looked at the cost effectiveness of P4P schemes but not

comprehensively. Christianson et al., (2006) found two economic evaluations, which demonstrated that P4P was efficient. The methodology for these studies, however, was not assessed. The review by Van Herck et al (2010) also assessed the cost effectiveness of P4P schemes. They found mixed results and varying methodological quality from eight economic evaluations, which mirrored the studies included in the Emmert et al.

(2012) review (see Appendix A6 for abstract of review).

In summary, there is little and mixed evidence regarding the cost-effectiveness of P4P schemes. Most of the studies were methodologically weak, apart from the recent study by Meacock et al., (2013), which sets the standard so far for cost-effectiveness studies of these schemes. It is important to question the cost effectiveness of these schemes even if they appear to be effective. It is important to know whether benefits observed are worth the investments. There might be other alternatives that could be implemented at lower costs producing similar or greater impact on outcomes or the same resources might be spent in other ways that may produce greater total health or health equity benefits.

2.4.2.5. Evidence on Sustainability of effect

The evidence regarding the long-term effects of P4P is even more scarce, and longer-term evaluations are needed to capture this. No reviews explicitly assessed the long-term effects of financial incentives and effects after removal of the incentives. However, there were a few primary studies assessing the sustainability of the effect of financial incentives in a few countries.

Researchers explored the removal of financial incentives in the QOF (UK) on some clinical quality indicators: influenza immunisation, lithium treatment monitoring, blood pressure monitoring, cholesterol concentration monitoring, and blood glucose

monitoring. They found that all the indicators appeared to remain stable after the removal of incentive, apart from influenza immunisation which showed a statistically significant reduction (Kontopantelis et al., 2014).

Jha et al., (2012) assessed the long-term effect on the US Medicare Premier programme on patient outcomes. They compared data from hospitals implementing P4P and

hospitals implementing public reporting alone on 30-day mortality of patients with acute myocardial infarction, congestive heart failure, or pneumonia, or who underwent

43 coronary-artery bypass grafting (CABG) between 2003 and 2009. They found similar rates at baseline for both premier and non-premier hospitals, similar decline in mortality rates in both, which remained similar after six years under the P4P scheme. Werner and colleagues (2011) also investigated the effects of the Premier programme for a period of five years. They found that even though performance (based on some of the indicators) in the intervention group improved within the first three years of the study, the

performance of the hospitals in the control group caught up with and matched them in the last two years. The authors suggest the findings could be due to two things:

1. Participating hospitals could not improve their performance much more than they already had in the past two years.

2. It is also possible that the incentive programmes led non-participating hospitals to change their practices. For example, the hospitals in the control group assumed that the incentive programme might be extended to their hospitals therefore focused on improving their performance.

If true, this suggests that P4P might have knock on effects on non-participating hospitals or health facilities, which in turn might increase cost effectiveness.

2.4.2.6. Evidence on Unintended consequences

Two reviews and a few primary studies reported unintended and adverse effects of financial incentives on health care (Van Herck et al., 2010, Petersen et al., 2006). These include gaming, cheating, cherry picking, and neglect of non-incentivised services (Mannion and Braithwaite, 2012, Van Herck et al., 2010, Petersen et al., 2006).

A study conducted by Gravelle et al. (2008) showed evidence of gaming in the QOF.

There is also evidence which shows that health service providers in P4P studies focus most of their attention on the monitored and evaluated health service(s), leading them to ignore/neglect other unevaluated but equally important aspects of health services

(Cometto, 2008, Doran and Kontopantelis, 2013).

There is some evidence that P4P may worsen racial care disparities (Karve et al., 2008, Alshamsan et al., 2012). For example, the study by Karve and colleagues (2008) showed that under the Medicare P4P scheme, African American patients were less likely to receive evidence-based therapies compared to than white patients.

There is also some evidence that P4P may encourage health service providers to induce

44 demand for incentivised services over and above that which is clinically needed

(Cometto, 2008, Canavan and Swai, 2008, Powell et al., 2012). This behaviour takes the focus off the patient, possibly exposing the patients to the risk of iatrogenic injury, which could undermine the patients’ trust in the clinician (Cometto, 2008, Canavan and Swai, 2008, Powell et al., 2012). An example is the study conducted by Powell et al.

(2012) on the QOF, which showed that P4P induced pressure on clinicians to achieve high performance scores on incentivised services. This led to actions to improve scores even when they were not necessarily in the patients’ best interests, which took the focus off patient concerns and patient service, and made it difficult for patients to make informed decisions.

Lastly, evidence suggests that P4P may divert resources away from medically indigent communities or high-risk patients making health disparities worse (Friedberg et al., 2010). For example, a study exploring the impact of P4P on diabetes care in Taiwan found that P4P did indeed improve quality of care for enrolled patients. However, only

Lastly, evidence suggests that P4P may divert resources away from medically indigent communities or high-risk patients making health disparities worse (Friedberg et al., 2010). For example, a study exploring the impact of P4P on diabetes care in Taiwan found that P4P did indeed improve quality of care for enrolled patients. However, only

Documento similar