Can we uncover medical miracles without violating patient privacy?
In Conversation: Create the right path for research, or entrepreneurs will find the wrong one
INTRODUCTION
Perhaps in no field do innovation and privacy collide as dramatically as healthcare. Treatments, medications, therapies -- these might be the most intimate details of a person's life, secrets that need safekeeping. At the same time, the wonders of modern medicine and the power of artificial intelligence promise to unlock an avalanche of life-saving treatments. Better outcomes are within our grasp, but only if privacy concerns don't derail development. For years, technologists have been working on methods to de-identify patient information for research, promising the best of both worlds -- patient secrets kept and Mother Nature's secrets uncovered. But others have hacked away at such technologies showing that re-identification is still all too possible, making healthcare gains all too fragile, and perhaps most important -- putting a seed of doubt in the minds of consumers.
Matt Swafford is the general counsel at NovaSignal, a medical device company that makes machines which detect strokes and other blood flow issues. During a recent talk at Duke University (watch here), he described some of the hidden treasures of medicine that could be found within the millions of data points gathered by NovaSignal technology, and other medical devices -- and the potential privacy issues that might arise. So we invited him into this edition of In Conversation to further that discussion. Also with us are Duke's David Hoffman, Intel's Mario Romao, Ciitizen's Deven McGraw, and Brenda Leong from the Future of Privacy Forum.
(If you are new to In Conversation, I am a visiting scholar at Duke University this year studying technology and ethics issues. These email dialogs are part of my research and are sponsored by the Duke University Sanford School of Public Policy and the Keenan Institute for Ethics at Duke University. See all the In Conversation dialogs at this link.)
FROM: Bob
TO: Matt, David, Deven, Mario and Brenda
This topic is so important, and I think it's critical to get the privacy part right, right now, so an entire generation of life-saving technology doesn't get lost or delayed. Peeking inside patients' brains now seems like the easy part; winning the hearts and minds of regulators and consumers could be the bigger challenge. What are your ideas for doing that?
FROM: Deven McGraw
TO: Matt, David, Bob , Mario and Brenda
It is not “privacy concerns” that will derail development of artificial intelligence and other promising health care innovations that are powered by health data. It is the failure to adequately address these concerns that creates the drag on progress
Privacy protections are aimed at establishing trust (largely by data subjects) in the collection, use, and disclosure of data about them. We have customarily made identifiability of data a linchpin (if not the linchpin) of privacy protection. But - as you mentioned - the risks of re-identification of health data are rarely zero. HIPAA, which doesn’t cover all health data, has a legal standard for de-identification and identifies two methodologies for achieving “de-identification.” However, neither methodology is expected to achieve zero risk of re-identification. Nevertheless, when data qualify as de-identified under HIPAA, they are no longer regulated by HIPAA. Other laws in the US - such as the Federal Trade Commission Act - “regulate” the collection, use, and disclosure of identifiable data - but without a clear standard for achieving de-identification.
More recent privacy laws (such as GDPR, CCPA and CPRA) attempt to tighten up definitions of de-identification of personal data, but the impact of these efforts on protecting privacy and increasing consumer trust - as well as on assuring data, particularly health-related data, are available to improve individual and population health - is uncertain. The harder we work to make data less identifiable, the more we potentially risk impairing its utility.
Leveraging data that is less identifiable is more privacy-protective than leveraging fully identifiable (or easily identifiable) data. But addressing privacy concerns is not just about building a better de-identification mousetrap.
The public is becoming increasing suspicious of commercial big data initiatives, even where the data are “de-identified” - and those suspicions are not just about perceived or real re-identification risk. When the Wall Street Journal broke the news last year that Google was acquiring all of the data of Ascension Health - supposedly de-identified (although the details of the actual arrangement were spotty) - the public reaction was swift and negative, notwithstanding the potential good that could come from the arrangement.
What will it take to build public trust in the use of data to power AI and other health care innovations? It will likely take a multiplicity of approaches (like a security “defense in depth” strategy), none of which by itself is likely sufficient. Greater transparency of data collection, use, and disclosure - even when it’s “de-identified.” (No more black boxes.) Assuring public benefit from uses and disclosures of health data. Return of value to data contributors. Publicly available privacy impact assessments. Leveraging data enclaves (where researchers come to the data vs. sending copies to meet each need). Independent data ethics review boards to assure objective review of data initiatives. Genuine accountability for individual and population harms that occur from data misuse or unauthorized disclosure.
We tend to want a single, easily deployable solution to build trust in health data uses and disclosures - but I think it will take a much more comprehensive approach.
FROM: Mario Romao
TO: Matt, David, Deven, Bob and Brenda
I also think that using a number of different safeguards together with transparency in the conditions for re-use of health data is probably a sensible way moving forward. For instance, I see some of that about to happen following current initiatives in the EU where policymakers have decided on the creation of a European Health Data Space, and recent proposed legislation for the governance of the re-use of data (including private data). Alongside the GDPR, there will be a different piece of legislation that sets, amongst others, the possibility to setup protected computing environments to process non-anonymized sensitive data (under the vigilance of the regulator), along with de-identification provisions, contractual obligations and ethical reviews. A number of different ingredients to choose from are TBD as the data and use case dictates. The recipe, however, is a work in progress.
FROM: Matt Swafford
TO: David, Bob, Mario, Deven and Brenda
I fundamentally agree that we need to address the issue of misuse in a layered, robust way. I would add that it is important that the regulation, governance and technical structures be designed for privacy, security and utility. This is essentially an “ounce of prevention/pound of cure” issue. Likewise, these systems need to be dynamic – the advance of technical capability and human creativity will mean that there will always be new capabilities to use and misuse data and data insights. There are no fire and forget solutions.
This is one of the several reasons that “one ring to rule them all” (e.g GDPR) is a problematic approach. It is slow, and tends to be both under and overinclusive. Also, it and other formulations have problematically over-relied (and misconceived) the role of consent in data usage structures. To these points:
Speed. Data is valuable. The abilities to collect and derive valuable insights from it are advancing very, very quickly. Where researchers, governments and businesses see these opportunities, they WILL take them. Our best bet to minimize harmful use of these tools is to get those who would use them on board with the idea that it is in their interest to do so ethically (and what that looks like – systems that are transparent, designed for security and privacy, etc.). One tool for that is emphasizing the cycle of overreach and reaction that will result in slow, punitively regulated systems.
Consent. I think that we have fundamentally misunderstood the appropriate role of individual consent in privacy protection and data security. Requiring consent does not protect most people, who are acculturated and incentivized to simply “click through” and consent to whatever is put in front of them to get the benefit they are seeking. This is particularly true in the case of health care – if someone will consent to an unread ten-page user agreement to use a new social media app, can we expect them to read, consider the implications of and meaningfully consent to a data privacy/use agreement when it gates them or their loved one from needed health care? Likewise, why should an individual be entitled to deny the societally beneficial uses of data about them, but still obtain the benefits of the systems that generate and use that data (and were created using others’ data)? We should rather conceptualize consent at the social compact level. In order to participate in the systems that generate, use, and are improved and made more useful by data, it should be expected that our data will be collected and used in turn. In return, it should be expected that the data collection, storage and use will be under structures and principles that minimize or eliminate the possibility that the data or the insights derived from it will be used to harm those from whom it was collected.
The upshot of this is that we need to, and need to expect others to, move forward in collecting and using data (see, e.g. Project Nightingale). It essentially never works to hold up our hand and say “stop while we figure this out”. What we should do to help this be done well, and to avoid both negative unintended outcomes and attendant distrust and reactionary policies, is to demonstrate that a principled usage framework of privacy, security and utility by design is the most responsible, most effective, and FASTEST way to proceed. The most important principle of any such framework, particularly germane to health care, is that a system should be designed such that data collected from and about an individual cannot be used to harm that individual. Failing impossibility, by the way, which is a high bar, we should make it exceedingly hard, expensive, and therefore inefficient to misuse data in that way. Consider the common, highly harmful misuse cases and make them impracticable. If we really want the things not to happen, we should focus on making those things hard, inefficient and unprofitable. Simply making them illegal is not enough.
Finally, I want to pick up and cheer Deven’s enumeration of some very important, effective ideas (my taglines in brackets): “Publicly available privacy impact assessments. [Transparency] Leveraging data enclaves (where researchers come to the data vs, sending copies to meet each need). [Designed for security and privacy] Independent data ethics review boards to assure objective review of data initiatives. Genuine accountability for individual and population harms that occur from data misuse or unauthorized disclosure. [Dynamic systems]” All of these are great points, and reflect the idea that we CAN make data useful quickly, safely, and ethically. (I could go on about data enclaves for a while – I believe this is an incredibly important solution that flows from these principles.)
FROM: David Hoffman
TO: Matt, Bob, Deven, Mario and Brenda
What a fantastic discussion. I want to extend the direction Matt, Deven and Mario have taken. The US has primarily pursued two mechanisms to protect the privacy of health data: 1. Consent, 2. De-identification. Both have tremendous value, but it is also clear we potentially give up significant value from the innovative use of the data when we only rely on those two methods.
For a conceptual framework of optimizing for the ethical and innovative use of data, I have always thought that the more we want to use data, the more privacy we should provide. Privacy expert Ken Mortensen used to describe this using a traditional scales of justice metaphor. If you are going to put more on one side of the scales by using data in new and innovative ways, you need then to place more privacy protections on the other side.
For some who view privacy protections as just limited to minimizing collection and consent, this approach is counter-intuitive. However, if we look to the Organization for Economic Cooperation and Development Privacy Guidelines (often described as the global common language of privacy) we see that there are several other methods we can use to provide more privacy. The recommendations included in this In Conversation fall under those OECD principles. Data enclaves can adopt Security Safeguards that can be more robust than those used in clinical health settings, as fewer individuals will need access to the data.
We can and should also enforce strict use limitations for the data with only uses that can demonstrate substantial health benefits and which will not harm the individuals from whom the data was obtained (I am a big fan of this recommendation from Matt Swafford as a foundational rule). Those security safeguards and use limitations can be enforced by substantially higher accountability requirements including commitments on staffing, training, period risk assessment and sanctions for non-compliance. These accountability measures can and should be overseen by independent review boards that include members who are qualified to represent the interests of historically under-represented groups with a specific emphasis in the U.S. on the African-American and Black populations. These review boards should not just have the responsibility of approving of the use of data up front, but should also review periodic accountability reports from the employees who oversee the use of the data. We can and should use health data to improve the lives of everyone on the planet. We should do so by increasing data use AND privacy.
FROM: Brenda Leong
TO: Matt, David, Deven, Mario and Bob
Much of what has been (very well) said here are the traditional arguments around informed consent, social needs, and technical protections like de-identification. But honestly, I see little that is different than the same circles around privacy and data protection we've all been treading in recent years. How to do good with necessary (and in this case highly sensitive) data, while respecting individual rights, securing information, and monitoring and enforcing standards (where there are any) for compliance or bad behavior. (This is not criticism - just acknowledgement that these are the same hard challenges, still being considered).
I agree that one general, overarching rule is generally unlikely to be helpful beyond the broadest of safety floors - but is almost certainly necessary to establish at least a minimal basis for a trust framework. Consider the difference in perception of data protections in the EU compared to the US (outside CA) since the introduction of GDPR. Simply the act of addressing privacy protections, even before application and enforcement shows much impact, has the effect of generating more trust. If there could then be additional standards and protections, either be new/revised silo’d laws (like HIPAA for health data specifically) or for types of data (like higher protections for "sensitive data" as in GDPR), that trust could be expanded.
As far as technical protections, while de-identification will always be an important and necessary step or option in levels of protection, additional PETs such as homomorphic encryption and differential privacy will be needed as well. Likewise, the advancements of synthetic data could prove useful in health care as much as anywhere - especially when dealing with small populations of rare diseases. Finally, federated learning models and Edge AI systems offer the systematic opportunities to improve accuracy and outcomes while limiting sharing and transfers of data to the minimums possible.
There have been many discussions, gathering intensity in recent months, about doing away entirely with consent-based (notice and choice) models for data collection and use. I understand and follow the logic of doing so, but have yet to see strong or well-thought out alternatives yet, although I think progress is being made in the "food label" privacy models and some other minimal protection standards being considered.
As the point was made above, for health data in particular, there is an argument to be made that while people deserve to be protected from exposure of their own health data in unfair or commercially exploitative ways, they may not be entitled to demand excluding their data from legitimate and necessary research that holds social value. Even so, line-drawing will become very contentious in such a model (what you and I think of as socially beneficial research may not be unanimously seen as such, and so on.)
FROM: Deven McGraw
TO: Matt, David, Bob , Mario and Brenda
Seems like we are all aligned that it will take more than de-identification and consent to build trust in uses of health data. (BTW, I for one would not “do away” with notice and choice entirely - because that’s often what consumers say they want. I think we have sufficient proof that over-reliance on notice and choice to protect privacy does not work - but providing some choice, along with other protections and mechanisms of accountability, makes more sense to me.)
Agree with Brenda that there are PETs that are underexplored, underused, and undervalued in protecting privacy. I have often wondered whether the law provides sufficient incentives for use of these technologies. For example, under HIPAA, you get no credit for making data available in a data enclave (where you can put more controls on access) vs. making a copy of that data and sending it to the third party. Privacy law should provide incentives for use of technologies that minimize privacy risk. Is it the case that pseudonymization and de-identification are the only privacy risk reduction techniques for which there are incentives in the law?
I also want to +1 to David Hoffman’s call for using health data to improve lives of all of us AND protecting privacy. I recently co-authored a paper with Ken Mandl making the argument that health privacy laws spend too much time focusing on protecting the information and not enough time with assuring it is available, responsibly, to improve individual and population health. Should we require collectors of health data (and data intended to be used for health purposes) to commit to making that data accessible for good? We also made the argument that notice and choice and de-identification are not sufficient, and that we need a more layered approach. https://rdcu.be/cc66j (open access for anyone interested in reading it - fair warning that it’s a bit long).
Seems that one could combine Brenda’s minimal, baseline approach (applicable in most or all contexts) and then layer on additional protections a la David’s suggestion that more protective measures are required the greater the uses of data. Curious what the triggers would be for application of greater measures.
FROM: Bob
TO: David, Deven, Mario, Matt, and Brenda
That really makes me think. One of my hats is writing essays for PeopleScience.com on applied behavioral science. And this discussion makes me think about why computer security needs to ENABLE people to do the work they have to do, not just put up roadblocks. Because people with work to do will always find a way. and when infosec is a pain, people just do risky things to get their jobs done. Deven makes me think: Rather than a law that says medical researchers can't do and can't do that...we'd be better to create pathways that they CAN (and must) use. Otherwise, enterprising firms will....find a way, as they do now, that we'll probably dislike.
FROM: Matt
TO: Bob, David, Deven, Brenda, and Mario
Absolutely agree. If there is one crucial takeaway from the discussion that’s it. People find a way. Give them a safe, ethical, fast pathway to realize value and they’ll take it.