Personal data. We walk around emitting little streams of it, from our browsing history and smartphone apps to our credit card purchases. Most of the time we don’t even think about it — it’s become as a natural and invisible a process as shedding skin cells. Meanwhile, various entities are out there hoovering up all our data streams and aggregating them for various types of advertising, product development, and research. It may be irksome when we stop and think about it, but we ultimately allow most of it in exchange for the convenience of using various products and platforms.
Then something goes really wrong. Think Cambridge Analytica. We’re repulsed and upset at the breach of trust, at how our personal data were used in a way beyond our expectation and perceived approval (set aside what may be in pages-long terms and conditions and privacy policies). There is visceral repulsion and a difficult-to-repair damage in the relationship between data provider and data user.
As an academic researcher dealing with research participants’ clinical and genetic data, I’m very interested in understanding — and avoiding — the harms that can arise from various data practices. In particular, I want to parse that negative gut reaction we have upon learning that something unexpected has happened with our personal data. It’s useful to think of at least two categories of concerns, a dichotomization I first heard from bioethicist and medical geneticist Wylie Burke.
Risk-based concerns
The first category of concerns is risk-based and typically focuses on threats to privacy and anonymity. Here we’re worried about our identity being revealed, perhaps just as being part of a given dataset and secondly about having further information exposed. This concern exists regardless of what the data are being used for — it’s more of an issue of how data are handled. With genetic information this is an omnipresent threat, since genetic information is inherently identifiable.
Respect-based concerns
The second category of concerns is respect-based and asks whether what is being done with the data is consistent with the data provider’s expectations. I.e., have those who collected and used the data shown adequate respect for the provider’ wishes and expectations for how it be used. A violation of this expectation need not create any informational risks to the provider, i.e. their privacy and anonymity might not be compromised at all. But if the way data are used is a violation of trust, a harm has occurred nonetheless.
Cambridge Analytica: a double whammy
Taking these two categories of concerns back to the extreme example of Cambridge Analytica, I expect for most people their objections stemmed from both categories. First, in the risk-based category, people were likely concerned about threats to their privacy and anonymity upon learning that Facebook had shared their profile data with a third-party of seemingly questionable integrity. Pile on top of that the respect-based concerns — that one’s data had been used to create targeted political advertising that appears to have contributed to Trump’s election and the Brexit vote. Your data could be in a steel vault hundreds of miles away form the nearest internet router, but if it’s used in a way that offends you or violates your understanding of how it was to be used — a respect-based harm, or dignitary harm, has occurred.
Risk and respect in the research world
So clearly it doesn’t have to be either/or when it comes to risk- and respect-based concerns of data usage. In the research world, we’re certainly attuned to both. Institutions have to vouch for security of their IT and computing systems before downloading research data from federal databases. Similarly, there are regulatory and procedural controls in place to make sure that what research participants consent to when they’re signing up for a study is consistent with downstream uses. It can be tempting, however, to place a bit too much emphasis on mitigating risk-based concerns at the expense of respect. Why is this? It might be a little easier, or at least a little clearer, to design systems that protect privacy and anonymity. There are algorithms for this: encryption, differential privacy etc. Building systems to protect respect-based concerns is a little more difficult, perhaps a little squishier. It might take a human element to consider alignment of data uses with participant consent. It might require attending not to just individual-based harms but also harms to certain groups of people. As data stewards, researchers can and must attend to both.