The Problem with HealthGrades – Robert S Poston, MD

Compared to the paternalistic era of Marcus Welbey, the public has become increasingly savvy about their health and is now hungry to better understand the value of therapies they receive, particularly for high impact services like heart surgery. This new era has revealed to the public that not every hospital delivers heart surgery at the same quality, cost or degree of innovation – a profound concept. It would be a gamechanger if sufficient numbers of consumers are able to understand and act on these differences. Cardiac surgeons would then be driven to compete on the basis of quality and not just their skill in swaying referrals from other professionals. Because of the profits that cardiac surgery programs bring in, hospitals have been reluctant to hold cardiac surgeons accountable for suboptimal outcomes as long as the referrals have been steady. Making data on quality readily available to the public holds hospitals accountable for ensuring that their highest risk procedures are being performed safely and according to best practices.

An additional argument for transparency can be made on ethical grounds. Patients recommended to undergo heart surgery are put into a state of extreme vulnerability. Most are far less equipped than a physician to understand how to judge the quality of a daunting, unfamiliar surgical procedure with major risks. This difference in expertise between patients and their physicians, described by ethicists as information asymmetry, intimidates many patients from asking questions about their options. A public that routinely spends weeks researching options prior to purchasing a car or large appliance, often puts minimal effort into independent research into their physician’s advice about the best place for heart surgery. Patients armed with even a little information about local programs start to feel more confident about conversing with their physicians. An initial conversation often spurs on a more serious dialogue about the reasons that one program is more appropriate than another as a place to be referred for surgery.

Discriminating the quality of one cardiac surgery program from another is a complex topic, even for a physician. For patients to participate in a meaningful way, they need a readily accessible resource that provides easy to understand and relevant information. The most ambitious and best known health ratings resource available to the public is Healthgrades, a business that is publicly traded on NASDAQ. Their website assigns cardiac surgical programs with star ratings of “best,” “as expected,” or “poor” on the basis of data on mortality and complication rates adjusted by applying proprietary risk-adjustment models. According to the founding CEO, HealthGrades was started with the “conviction that health care quality matters and a vision that patient access to meaningful healthcare information could improve the healthcare experience for both consumers and providers.”

Every lofty vision to improve healthcare experiences begins with a first step. For HealthGrades, this step was to gain access to data on surgical outcomes in order to grade heart surgeons and provide that information to prospective patients. This was likely the hardest step. Over 90% of CT surgery programs collect quality assurance data that includes outcomes but only a minority volunteer to provide public with access to these data, largely because they don’t have to. In a particularly egregious example, CNN investigators criticized a CT surgeon and hospital in Florida for not publically reporting their program’s high death rate and misrepresenting their outcomes to patients. Published under the headline “Secret Deaths” (http://www.cnn.com/2015/06/01/health/st-marys-medical-center/index.html), several patients stated on the record that would have gone elsewhere if the high death rate was made available to the public. Obviously, the business case against transparency at that Florida hospital was based on fears of lucrative surgical cases indeed going elsewhere. If CT surgical programs were publically traded stocks seeking investors, the Securities Exchange Commission would have overridden this conflict of interest and enforced public access to basic facts about the program’s outcomes so that investors have a common pool of knowledge to judge investment value. However, CT programs seeking patients have no such regulations for releasing the outcome data needed to make an informed judgment about the quality of one program vs. another.

In hopes of circumventing these transparency problems, HealthGrades exploited an important loophole. Hospitals perform heart surgery mainly on those that are elderly and insured by Medicare, who reimburse hospitals after they submit codes that reflect all the diagnoses in the patient’s records during the hospital stay. These codes are put into a database called MedPAR and have been used to determine risks and outcomes on patients undergoing heart surgery, at least for those over age 65. The advantage of MedPAR is that Federal law requires regular release of these data to the public, meaning the wall of silence is now broken. The main disadvantage is that these data were not generated with the goal of researching the quality of a surgeon but for the purpose of generating a hospital bill as high as legally allowed.

With a surgeon’s reputation and patient’s safety potentially at stake, data used for surgeon profiling should be at least as rigorous as that demanded of peer-reviewed clinical research. It is not appropriate research technique to define a patient’s underlying surgical risks and outcomes by simply accepting as fact all the diagnoses written in the medical chart. Instead, researchers apply specific and measurable criteria. For instance, a patient suggested in the medical record to have lung disease would have their actual lung function confirmed by objective testing before reaching this conclusion. The fundamental flaw of a database collected for financial reasons (called an administrative database) is there was never any incentive for this necessary rigor. In fact, quite the opposite is true. The hospital has a financial conflict of interest caused by getting reimbursed more by Medicare when patients are considered high surgical risk. Staff that review the medical records to generate codes from a list called ICD-9 aren’t qualified (or incentivized) to second guess the accuracy of a diagnosis documented in a chart by a physician. Their job is simply to generate as many codes as possible. Physicians themselves help coders do their job after receiving training called “clinical documentation improvement” (CDI). It is important and valuable for physicians to write notes in a way that optimizes reimbursement. However, this is a separate topic than quality assurance. In spite its Orwellian title, being trained for this type of “improvement” isn’t going to lead to more rigorous data.

Two recently introduced technologies may help HealthGrades in its mission. Electronic health records have been universally adopted and will provide more uniformity in the coding process used by hospitals (and therefore improve the data). In addition, the current coding system used for billing (ICD-9) is being revamped (ICD-10), in part so the codes themselves provide information detailed enough to clarify the connection between a provider’s performance and the patient’s condition. But the confusion that surrounds the early adoption phase of most innovations often makes things worse before they get better. If you are hopeful that electronic records and ICD-10 are going to help improve the quality of administrative data in the future, then you have to admit that their existing datasets are in need of improvement. Doesn’t it make sense to wait for the effects of these improvements over the next few years rather than go with inadequate ICD-9 codes collected using variable methods from frequently illegible paper records? The answer is clear considering the significant, unintended, and potentially deleterious consequences that can result if ratings based on inaccurate data do not effectively discriminate between CT surgical program’s performance. It’s a problem that reminds us of the computer axiom, “garbage in – garbage out”.

HealthGrades recently gave a presentation to my hospital on how to improve our star rating for CT surgery. At that time, I brought my concerns about the precision of their data to their team. The answer they gave was that a growing percentage of patients go online to find HealthGrades data and use it to inform their choices of surgeons. Because these data exist and are used by the public despite their limitations, our hospital’s program might as well find ways to make it as good as it can be. Their advice was to improve our medical record documentation. Because our mortality rate was already lower than average, only raising the perceived risk of our cases (i.e. generate more ICD-10 codes) would give us sufficient “credit” needed to then readjust our mortality ranking published on their site. The hospital administrators were quite enthusiastic about this advice based on the well-known financial benefits of CDI. At face value, their argument seemed like a win-win. However, digging deeper it was clear that the loser was the patient choosing a CT surgery program on the basis of HealthGrades ranking. I’m fairly certain that a patient and his/her family wants to know the program with the best actual results and not just the one that is the most effective at medical documentation. Eventually, it will become clear to patients that HealthGrades not a reliable source of advice, like the neighbor who keeps recommending movies that you find horrible.

Fortunately, there is an existing database that is able to pass muster without waiting for new technologies to improve its value. The Society of Thoracic Surgeons (STS) Adult Cardiac Surgery Database contains more than 5.5 million surgical records and represents >90% of CT surgery programs in the United States. These data are currently research-grade because they are collected by clinicians who use rigid criteria for each datapoint entered. HealthGrades would advance their mission more effectively by collaborating with STS to improve the public’s awareness of these data rather than sticking with the inferior MedPAR data. At least in cardiac surgery this makes sense. Other fields that don’t have the equivalent of an STS national database are different.

Mark Twain has been quoted: “Get your facts first, and then you can distort them as much as you please. Facts are stubborn, but statistics are more pliable.” When it comes to websites that rate various services, the public is comfortable with the idea of “buyer beware” and has learned to keep a critical eye on how the data are manipulated and presented. But if the debate is over the validity of the data itself, this initiative to empower patients is over before it started.