Why your PC may mistake your genome for a mail contact

There is a file type used to store large-scale genetic data called a “vcf” file, short for “variant call format.” To a PC, however, a “.vcf” file extension means something completely different: it’s the “vCard” format used to send Microsoft Outlook contact information.

DNA helix is being fed into someone's computer screen but only confusion comes out the end.
Frustrations of personal genome analysis. (Image credit: source images from Pixabay and Wikimedia, compiled by me)

Therefore, if you click on a genetic “.vcf” file with a PC, it will likely suggest opening it with programs such as Microsoft Outlook, Windows Contact, or the like. In addition to being kind of hilarious, and potentially frustrating to a layperson trying to examine their genetic data, this clash is a microcosm of a fascinating larger trend. Non-specialists are getting access to personal genetic data and wanting to do something with it. But who or what assists them in these endeavors? How should the scientific community respond to people banging on the door of their genetic expertise and skillsets? I can’t supply all the answers, but I can break down this amusing “vcf” conundrum in service of exploring these larger questions.

Personal genetic data access

It is now easier than ever to get a hold of your genetic data. Millions1 of people have availed themselves of direct-to-consumer genetic tests, most of which allow the customer to download his or her “raw” genetic data file. In addition to DTC testing, people might gain access to their genetic sequence by getting a clinical genetic test. HIPAA laws allow people to access the full lab reports from clinical testing, so for sequencing tests this would likely include “raw” sequence data (given current data standards, probably in a “vcf” file). A third way people might gain access to their genetic data is by joining a research study that makes such data available to participants. This has historically not been common practice for research studies, but early adopters such as the Personal Genome Project and now the nation-wide Precision Medicine Initiative are allowing this.

You might argue that acquiring and wrangling with your genetic sequence is still a rather niche endeavor, and I think that’s probably true. (Though note this is an empirical question I’m trying to answer in my dissertation research — exactly who is doing this and why?) But even so, I expect that personal genetic data acquisition will become more mainstream in the future. You have only to look at the popularity of fitness trackers and other wearables to see our society’s obsession with amassing and tracing data about ourselves.

Redistricting expertise

Ok, so simply having a .vcf file of your genetic data doesn’t make you a genetics expert or even mean you can do the first thing with the data. But there are lots of middle men out there in the form of third-party interpretation tools that will help you “do something” with your data. (Note not all work with .vcf files, in part because DTC companies don’t typically provide customers their data in .vcf format, but that’s a technicality.) This ecosystem of raw data access plus third-party interpretation leads to the situation where people are trying to gain access to scientific expertise in new ways. You could say it’s a sort of redistricting who gets to look at genetic data and try to put it some use. The playing field is far from even when you compare a genetics researcher with a layperson, but the general trend is there.

Double clicktivism

This frustration someone might have trying to open a “.vcf” file is not hypothetical. I have heard of cases where people downloaded a “.vcf” file of their genetic data, from one of the third-party interpretation tools mentioned earlier, and were really annoyed and even angered by their inability to open and understand the file. And their PCs were of no help – potentially even actively misleading them as to the appropriate way to open the file (I admit I don’t know what a Mac OS would try to make of this file).

Why were people so mad? One possibility: we expect our technology to be intuitive. Our Google searches autocomplete, our smartphone reminds us to breathe, and we can shout at Alexa across the room to play our favorite song. Understanding how to work with and understand our genetic data is far more nuanced. Even for experts there is a lot of uncertainty about what certain genetic variants mean.

Ironically, this information age that is precipitating access to all this personal data may at the same time be conditioning us to expect instantaneous and even anticipated interpretation and utility from that data. If that’s true, it’s definitely a recipe for frustration when it comes to non-expert personal genetic data analysis.

Meanwhile, think before you double click.

1 – The three major DTC players are AncestryDNA, 23andMe, FamilyTree DNA. AncestryDNA has over 3 million customers genotyped: https://blogs.ancestry.com/ancestry/2017/01/10/ancestrydna-surpasses-3-million-custom. 23andMe has over 1 million customers genotyped: https://mediacenter.23andme.com/fact-sheet/. I have been unable to find a count of genotyped FamilyTree DNA customers — let me know if you have one!

Do we “participate” in Facebook?

Lately I’ve been reading, discussing, and thinking about the concept of “participation.” It’s an idea that gets thrown around a lot but without too much examination or critique. Specifically, I am researching theories and frameworks of “participation” as it relates to my dissertation. The question I’m asking is: does giving people access to their own genetic data increase their participation or level of empowerment: in their health care, in their research participation, in their lives in general?

While my project is about consumer genomics, my literature searches on “participation” have rippled out into politics, economics, media studies, and social and information sciences. Understanding how and where ideas of “participation” are invoked, and with what consequences, draws on all these fields. The topic is particularly salient in this age of digital information, where the ubiquity of the Internet and social media offers us unprecedented platforms to create, consume, and interact.

Below I’ll throw out some the ideas I’ve encountered in my literature search. This will be a bit of “spaghetti on a wall” type of exercise, so feel free to “participate” as much or as little as you’d like.

Colorful raised hands on white background
Image credit: http://www.publicdomainpictures.net. Karen Arnold

Dimensions of participation

Information scientist Kelty and colleagues [1] tease out seven dimensions of participation and examine how different projects or communities stack up on these different dimensions. The dimensions get at things such as: whether participants have control over resources (tangible or informational) and to what extent they help to define goals and tasks of the project. Perhaps my favorite dimension is the affective experience of participation, delightfully described as “collective effervescence.” Do you feel like you are participating?

The authors give Facebook as one example of a participatory project that succeeds in some dimensions while falling far short in others. Ability for Facebook users to participate in decision-making or goal-setting is basically nonexistent. On the other hand, the collective effervescence is staggering. We skip along, posting, liking, registering one of six emotions (like, love, wow, haha, sad, angry), and all the while Facebook accrues a staggeringly large, profitable, and powerful database of its 2 billion users [2].

Participation vs. engagement vs. involvement

In another paper, Woolley and friends examine the idea of “participation” as invoked in biomedical research [3]. It’s become very trendy to tap into ideas of “citizen science” and “participatory research” even in more centralized, national research strategies. But this article argues that we should distinguish between three things: participation, engagement, and involvement. People may “participate” in studies simply by signing a consent form and giving a biological sample. But are they really engaged? Probably not. There’s not typically an ongoing relationship between the participant and the researcher, and the participant is not really weighing in on any part of the research in a democratic sense (e.g., developing research questions, interpreting what the results mean, etc.).

I’m reminded of the different levels of participation in our political system. I might participate in our democracy simply by voting in major elections, but am I really engaged? Engagement seems to mean something more, maybe expending extra effort to stay on top of political news outside of major elections cycles and to regularly call my representatives to voice my opinion on various proceedings.

Empowerment with a hint of coercion

Moving into consumer genomics, my dissertation area, direct-to-consumer (DTC) companies have arguably introduced a more participatory form of genetic research. People can opt into research studies on a study-by-study basis (versus an up front, blanket consent to myriad possible uses of their data), and they receive access to information about their genome (not so in traditional research). This does seem more participatory. By the Kelty dimensions of participation, this comes in strong on resource control: people get to access their own data, not just contribute it to a larger effort; and affective capacities: customers can join online discussion boards, some oriented towards specific genotypes, as well as connect with family members if they so choose. You can just hear the collective effervescence fizzling.

But there are tensions underlying this participatory model of DTC genetics, some of which have been articulated by anthropologist Sandra Soo-Jin Lee and media scholar Kate O’Riordan. Lee has described how consumer genomics is concerned with “biological potential” — the idea that personal genetic information has some future ability to help people. But there are questions of power: who is equipped and positioned to realize the biological potential of having their genetic information? My question then becomes to what extent are DTC companies able to realize the collective biological potential of their customer base (probably quite a lot) to the exclusion of individual customers being able to realize and harness that potential? It’s not that the companies are ill-meaning, but it’s just a fact that for the most part studying genomes in the aggregate (i.e., lots of people at once, as is done in the research context) is more valuable than studying individual genomes. This is partly a result of the current state of genomics knowledge, so this might shift in the future. But as it stands, for most people, studying their individual genome is unlikely to lead to great insights about their health and identity.

O’Riordan has written about how DTC genetics and the subsequent access of personal genetic information by lay persons has created a “new digital genomic public” capable of new “readings” of genomes, now circulated as digital texts [4]. (Without delving into the full article, let me just assure you these ideas are as cool as they first sound.)  Skipping ahead to one of her conclusions:

“The features of DTC genomics are contradictory but indicate the conditions of a contemporary collectivity that is at once embodied and informatic, empowered and coerced, personal and public.”

Empowered and coerced — exactly. People submit samples to get genetic testing so that they can receive information about themselves, perhaps leading to empowerment, but in so doing they are perhaps coerced to share this information with others. Note that DTC companies generally allow customers to opt in or out of research, so people aren’t exactly coerced into participating in research. But they do become part of the company’s database, which can be used more broadly than just for research.

The Janus-face of empowerment and coercion circles back to the titular question of this post:  do we participate in Facebook? Kelty’s dimensions of participation show us how Facebook is participatory in some ways but not in others. My own personal experience of Facebook is definitely one of empowerment with a hint of coercion. Despite my misgivings about how much data Facebook has on all of us, I continue to use it for the convenience of staying in touch with friends and family. To get what I want, I stay hooked into the system.

This is not to demonize Facebook, but rather to articulate these tensions of participation, or being able to do what we want, and coercion into giving up some privacy or control over or personal (perhaps our genetic) data. We don’t need to quit using all these services, but at a minimum we should keep a critical eye on when and under what conditions we are being invited to participate.


[1]        C. Kelty et al., “Seven dimensions of contemporary participation disentangled,” J. Assoc. Inf. Sci. Technol., 2015.

[2]        “Is Facebook A Structural Threat To Free Society?,” TruthHawk (blog), 13-Mar-2017. [Online]. Available: http://www.truthhawk.com/is-facebook-a-structural-threat-to-free-society/. [Accessed: 27-Mar-2017].

[3]        J. P. Woolley et al., “Citizen science or scientific citizenship? Disentangling the uses of public engagement rhetoric in national research initiatives,” BMC Med. Ethics, vol. 17, no. 1, p. 33, Jun. 2016.

[4]        K. O ’Riordan, “Biodigital Publics: Personal Genomes as Digital Media Artefacts,” Sci. Cult. (Lond)., vol. 22, no. 4, pp. 516–539, 2013.

I’m sorry, is that too personal? Why our sleep habits may feel more private than our DNA

Imagine a stranger approaches you on the street and demands to either (1) take a sample of your spit so they can sequence your DNA or (2) plug a device into your smartphone that will transfer over to them your last month of sleep and activity data. Which are you more likely to hand over? Which feels less personal, less intimate?

Until recently, I would have assumed that for most people they’d be more reluctant to hand over their DNA sequence. But now I’m thinking it might be the opposite.

A screenshot of the iPhone "Health" modules, versus a person holding onto a DNA double helix
A smartphone health tracking section versus personal DNA sequence. Which feels more private?

I’ll share this but not that

Back in December I talked with someone who developed and runs a website where people can upload genetic and other data for public use. The idea is that making such data publicly available enables researchers and other citizen-scientist types to easily access it and pursue scientific questions such as which genetic variants are associated with which traits and diseases.

Importantly, unlike my hypothetical scenario above, on that website all such data submissions are entirely voluntarily, and in fact the creators even try to actively dissuade people from contributing just to avoid people doing it and regretting it later. Genetic data was originally the focus of the tool, but more recently the developers considered adding the capability for people to upload their FitBit and other self-tracking device data to the site.

Their users and other commentators were generally not enthused about the idea of sharing that type of data. Why is that? Here are some of the developer’s thoughts:

“I think because still like the genotype data is pretty muddly, in terms of what you can learn from it, whereas it’s probably much more interesting how much sleep you are getting every night, how active you are over the day, things like this…people were like yes — sharing your genome I can somehow see but then the sharing, like, your weight, how much you sleep and how much you move over the day, this people found less easy about, I would say.

Genetic anti-exceptionalism

Wait, your step counter is more precious to you than your “muddly” DNA? This all runs counter to the common phenomenon of “genetic exceptionalism,” where genetic information is held up above other types of personal information as more potent, more powerful, and perhaps in need of more protection. While many have argued this is a misguided position to take, especially when it comes to policy making and personal privacy protections, it is still a pervasive idea. But clearly not so much with the users of the data sharing website discussed above. People who decide to submit their genetic data for all the world to see are reluctant to share so openly data about their sleep, exercise, and nutrition.

What makes data personal?

What’s going on here? What are the criteria by which some information intuitively feels more private to us than others? I think there are at least three contributing factors.

  • Visible

Is the data visible to us, or tangible in some way? Even though our genetic sequence is partly responsible for building and maintaining our very visible and tangible bodies, it is a rather abstract concept to most of us. We can’t see or feel our DNA, unless we’ve done that favorite science fair experiment where we mix spit with some dish soap and other household items and watch our snot-like strands of DNA precipitate out of solution.

Sleep and activity, on the other hand, are very tangible, very immediate. We can envision the physical processes of going to bed and going for a walk. There are also specific places we go each day to carry out these activities.

  • Relevant

Luckily, for most people, our DNA sequence doesn’t seem to directly impact how we feel or how we move through the world on a daily basis. (I’m thinking in contrast to people with genetic disorders that may affect their movement, diet, cognition, etc.).

For sleep, on the other hand, we can physically feel the results of excesses and deficits. It also has a cadence, a longitudinal pattern, that I think also makes it feel a little more relevant, in contrast to our (mostly static) genome.

  • Identifiable

Now this one’s interesting. Because despite what anyone tells you about “de-identified” genetic data, genetics is inherently identifiable. Given two DNA samples from the same person, you can tell with a high degree of certainty it’s the same person (or their identical twin). Granted, I’ve thought more about the identifiability of genetic data than of sleep and activity profiles, but let’s consider those. With sleep patterns, you might not be able to say exactly who someone is. But maybe you could say what type of person they are based on sleep patterns. Things like a morning person vs. night owl would be relatively easy to tease out, as would perhaps parents with young children or someone who works a night shift.

Another potential factor here could be “judginess” of certain data. With all our FitBits and default smart phone activity tracking, there’s certainly some societal pressure to get in your 10,000 steps a day and your 8 hours a night (though some would rather brag about their ability to thrive on only 4 or 5). Would we be similarly judgy about each other’s DNA? Films like GATTACA suggest we would. But if I’ve brought up GATTACA, then it’s clearly time to wrap up this post.

I’m curious to hear your thoughts about what types of personal data feel more private to you? Which would you be more or less likely to share?

Regulation Red Tape or Good Glue?

Poor Little Bill

Regulation is getting a pretty bad rap these days. At the end of January, the Trump Administration announced that for every new federal regulation, two existing regulations should be eliminated (see here and here). Putting aside the immediate questions of logic and logistics of this order, the implication is that regulations are to be avoided or at least minimized. I think of that poor little bill from the Schoolhouse Rock video, sitting on Capitol Hill, waiting to become a law. Now he’s really screwed.

And admittedly “regulation” can often signal oppressive rules, red tape, and excessive bureaucracy — things we want to avoid. But what about positive connotations of regulation? Your body regulates things such as temperature, blood flow, and breathing, and that is ongoing regulation we should all appreciate. Moving from bodies to bodies of government, I’m also glad things such as drinking water and restaurants are regulated, as those regulations make me feel safer when I turn on the tap or go out to eat.  We were all mostly pretty happy about bank regulations imposed following the global financial crisis of 2007-2008. So why the love/hate relationship with regulation?

roll of red tapeRed Tape (Wikimedia Commons)

The Parable of the Pregnancy Test

Lately I’ve been thinking about regulation in the context of genetic testing, in particular direct-to-consumer (DTC) genetics. I have been asking others what they think about it too: whether and how DTC genetic testing should be regulated and also how third parties, providing interpretation of genetic data independent of DTC companies, fit into this regulatory landscape. One person I talked to gave me a very telling example of why he didn’t think DTC testing should be so strictly regulated. His argument went something like this:

Think of how you can go into any drugstore, buy a pregnancy test, go home and use it, and interpret the results yourself – without any doctor involved. That’s a huge piece of information to seek out on your own! Now with genetic information, why should it be any different?

I’m paraphrasing, but the key point he was making is that pregnancy tests are accessible to the public, why shouldn’t genetic information be? Next I gently pointed out that well, in fact, over the counter pregnancy tests are indeed regulated, by the FDA. They’re regulated AND available. Genetics could be the same way.

What I’m thinking is that when we first started talking about regulation, his mind went towards restriction — i.e., if access to genetic information is regulated it means people just can’t get it. But au contraire: regulation could mean that people can directly access their genetic information, just that there is some regulatory oversight to that process. It’s an understandable confusion, because when the FDA “cracked down” on 23andMe for giving health information, 23andMe’s compliant response was to stop giving health reports. The FDA stepped in; access was restricted. But that’s not the inevitable outcome of regulation. If things had happened in a different order, 23andMe may have only offered health information after seeking and obtaining FDA approval. The rollout would have undoubtedly been slower, but there wouldn’t have been the same feeling of having something yanked away from us due to regulators.

So let’s remember in this climate of hostility towards regulation — it is not always the slap on the wrist, the removal of personal freedom and access. Regulations, when properly structured and carried out, can actually do good and lead to safer and better outcomes. There are many debates to have on to what extent, and through what mechanisms, access to personal genetic information should be regulated. But confusing regulation with complete removal of access is unproductive.

Getting paid for your DNA

There’s a new genetic sequencing company that wants you to get paid for donating your DNA data to science. Launched in December, Genos’s service costs around $500 and gives customers the ~3% of their genome sequence that codes for proteins, the “exome.” Additionally, Genos’s platform allows users to share their exome data with (academic and commercial) researchers…and get paid for it.

Let’s evaluate what is and isn’t novel about this proposition. First, people have been able to order direct-to-consumer genetic tests for over a decade. Exome sequencing is a little newer and shinier compared to the smaller scale microarray technology used by major players such as 23andMe and Ancestry DNA, but the general idea is well-precedented. Second, it’s not unusual for people to get paid for participating in research, genetic or otherwise. You take time to fill out a survey or participate in an interview, so why not get a token of appreciation (maybe $20 or $50) for your time and efforts. Note researchers and regulators do want these “incentives” to stop short of being coercive, meaning that it shouldn’t be so much money that a person who otherwise wouldn’t want to participate feels compelled to because they want/need the money.

Ok, so DTC data: not new; getting money for research participation: not new. What does seem to be raising some eyebrows (and piquing some purse strings) is, I think, the directly transactional nature of a customer sharing their — already generated — genetic data with researchers and getting paid, potentially much more than a typical participation incentive. Perhaps over and over again, each time they give Genos permission to share their slice of the company database with another researcher.

Golden DNA helix with money bag at end.

It seems a bit….smarmy. Over commercialized and over commodifying genetic data at the boundary of commerce and research — granted that research can be academic or commercial. Feeling conflicted about these developments myself, I was glad to see a recent Commentary on the topic in Nature Biotechnology. Written by legal experts, the Commentary reviews the strengths and weaknesses of a system where research participants are paid directly for their data. I’ll summarize the arguments here, in part because the full article is behind a paywall, and also because it allows me to weigh in on parts of the argument.


1. Respect for persons. Research can be lucrative so the people contributing data should be more valued, i.e. via direct compensation.

2. Uniqueness of information. Paying for genetic data recognizes it’s high value and importance to people.

My note: This is a vote for genetic exceptionalism – the idea that genetic data is somehow special and sacrosanct compared to other types of personal data. To this point I would counter maybe we shouldn’t be encouraging people to attach so much self worth and interest to their DNA sequence.

3. Promotes fairness and equality. Researchers profit and benefit from people’s data, so people should similarly profit and benefit.

My note: Valid point, but this goes against legal precedents in the US that people do not hold a financial stake in their biological specimens, and extracted data, once donated to research. But law and ethics are not the same, so there is definitely something here.

4. Greater good. The Genos model may encourage more people to participate in research, which benefits all.


1. Might decrease willingness to share. Studies have shown that when people would usually contribute out of altruism, offering them something in return actually decreases participation.

2. Undermines individual autonomy. Offering money may coerce people to participate, actually reducing their personal freedom.

My note: see earlier mention of researchers not wanting to offer so much money to would-be participants, for this every reason. Maybe a matter of “how much”, not “if”, when it comes to payout.

3. The problem of valuation. It may be difficult to assign value to an individual’s genome and thus what people are compensated may not reflect the true value.

My note: right, even giving someone $200 for their genome (the upper limit of the compensation range noted by Genos) could fall short of how much that individual’s genome actually benefits the researcher or company.

While the paper does present both sides, it comes off as saying “this is basically inevitable, so let’s strive for the best outcome possible.”

Genos is making people’s exome sequence available to them, as a personal resource, and also one they can “shop around” to different researchers. Part of this is line with an argument I made in Nature last fall, that genetics researchers should offer to give data back to participants if the participants want it. The reasoning being that more people might turn away from traditional research in favor of consumer genomics, which does make the “raw” data available. I wrote about good things that happen when people become “stewards of their genetic data.” With Genos it goes one step further, in that people become their own data brokers.

It begs the question: what is the right way to reciprocate and engage with research participants? The most traditional approach is to rely on the individuals’ altruism to improve the greater good, promising them no individual benefit in return. I don’t think this is the right model anymore, especially when current information infrastructures make it easier to give something back. But is Genos the right model? What about research models that offer people back some bits of interpretation in exchange for donating their genetic data? Sites such as DNA.land or openSNP, for example. Is that an even and fair exchange? I do wonder, especially in our current state of knowledge where genetic data is arguably more useful in the aggregate, to answer research questions using thousands of people’s data at once, versus an individual with their own genome trying to squeeze some drops of meaning from wringing out that double helix. I think such tools might still be leaning more heavily on people’s interests in general altruism than personal gain.

It will be interesting to see how the Genos roll-out goes over the next couple of months. Perhaps they are bringing us closer to Professor George Church’s vision, quoted on the company home page: “It should be our birthright that everybody on the planet should have access to their own genome.” (Oh, but p.s. New Yorkers – it’s not your birth right yet; Genos can’t ship a test kit to you…)

“If I build it, I will come” Early thoughts from my research interviews

For my dissertation, I have been interviewing developers of third-party interpretation tools for consumer genomic data. These are tools such as Promethease, openSNP, and DNA.land, among many others, where people who have their genetic data file from consumer testing can seek further analysis and/or contribute their genome to research. Even though I’m only a few interviews in, I’m already seeing some interesting themes. This isn’t a formal analysis here, just some of my initial impressions — “field notes,” you might call it.

A door sign saying "interview in progress"
Image credit: laura pasquini, https://www.flickr.com/photos/souvenirsofcanada/15518999970.
  • Theme 1. Tools are heterogeneous in terms of their creators/developers, functions, purposes, and processes.

Just like with most studies, I have inclusion criteria to identify eligible tools. (Also called “scoping,” this is an essential process for any successful…and completable…graduate student project! We just can’t get enough of the scoping…). My three criteria for tool eligibility are:

  • Must allow users to upload (or analyze locally) their genetic data file
  • Must return some type of information or interpretation to the user
  • Must be active at the time of my study (and there were some tools that failed on this – i.e., they are now defunct)

This is a very user or customer-facing set of inclusion criteria, which I did on purpose. That is, I thought of a user sitting at their computer, having just freshly downloaded their genetic data from 23andMe or AncestryDNA. What tools might they search for or stumble upon as they pursued further self-directed analysis of their genetics? Those were the tools I wanted to study.

At the outset, I was aware I might be grouping together a lot of very different tools, and indeed after months of closer study this is definitely the case. Some tools might not even want to be called “interpretation tools,” as they are more focused on users contributing data to research and just happen to give some tidbits of info in return. I also wasn’t expecting to include some DTC companies, but turns out there are some that also offer analysis of existing genetic data, i.e. from another testing company.

So I have a sort of motley crew of a dataset, but I’m optimistic it will make for an interesting and fruitful analytic substrate.

  • Theme 2. Several developers built the tool they wanted for themselves.

This isn’t across the board, but I was struck at how many of the tools were born out of the developers’ own needs and desires. They had their own genetic data in hand and wanted to do something with it that existing tools couldn’t do, or couldn’t do in exactly the way they wanted. So with sufficient programming and bioinformatic skills, they build the tool they want and then expand it out for broader use.  It reminds me of the phrase “If you build it, they will come.” Except here we have the developer saying something like: “If I build it, I will come…and then I’ll let everyone else who’s interested come, too.”

  • Theme 3. The process of getting information is a source of information in and of itself.

I’ve been learning about tools by (1) studying the website and any associated papers or media coverage and (2) interviewing tool developers. The difficulty or ease of accessing these routes of information is sometimes telling about the tool itself. Some tools are run by companies where it’s very difficult to tease out any proprietary details of how they work (analytical tools and resources used, e.g.). Other tools are built on an idea of openness and are accordingly incredibly transparent about how they work, how many people are using it, etc. Reaching out to tool developers for interviews is similarly illuminating. But not always in the way I expected. Not surprisingly, several people are either ignoring or declining my inquiries, and it’s fair to say more companies fall into that category than not. But some companies are talking to me. And on the flip side, some academics are not. My goal is to get interviews with at least half of the tools I’m studying, which would be ~50% “response rate.” And the goal with qualitative interviews such as these is not to get to a statistically significant number, per se, but rather to gain some depth and texture to my understanding of who made these tools and why. Stay tuned for more!

Do We Exist Outside of Our Data Flows?

In the opening scenes of Peter Pan, Peter has been separated from his shadow, and he breaks into the Darlings’ house in his efforts to get it back. Wendy is awoken by the sound of the chase and finds Peter, having gotten hold of his shadow, trying to reattach it with a bar of soap. Knowing this will not do, Wendy helps Peter to sew his shadow back on with needle and thread. The reunion of Peter with his shadow thrills and consoles him so much that he breaks into (at least in the musical version) the “I’ve Gotta Crow” number.

Data shadows

I was reminded of Peter Pan’s desperate search for his shadow a few weeks back after attending a lecture by Geoff Bowker, a prominent social scientist and scholar of Science and Technology Studies. During the talk, Bowker posited that we as individuals do not exist outside of our “data flows.” To unpack that a bit: our existence is bound up in our technology and our movements are incessantly tracked, analyzed, and commodified by the tech giants of the world: Apple, Google, Facebook, Amazon, etc. If you erase all the data points we leave scattered behind us like so many breadcrumbs as we move through our day, are we still there? Perhaps not, he ventured. Since the lecture, I have been thinking that because “big data” is such an integral part of us, such a constant feature of the traces we leave, that perhaps it has become like our shadow: our data shadow. Like Peter, maybe we are lost without it.

Abstract streaks of blurred, colored light.
A visual representation of data flows. Photo Credit: D. Armstrong

Existence outside of data

This was a rather disturbing idea to me — and based on audience questions, to some others in the lecture hall as well.  My intuitive sense is that of course I exist outside of my data flows, detached from my data shadow. I’m signaling my age here, but I did not get an email address or start using the Internet until I was in high school. I didn’t join Facebook until about 10 years ago. Heck, I only joined Twitter in August. Did I not exist before I embedded myself into these webs of social media? Before I started trafficking in the Internet of Things? Not only do I intuitively feel like I exist to myself outside of data flows, I know I exist to my friends and family as well. So maybe it’s true that I don’t exist to Amazon outside of my online purchases or Prime Video streaming, or I don’t exist to Google when I’m not logged into Chrome – but they don’t define me to me or to those I interact with in the physical world.

 Transparency without control

Back to the Bowker lecture…if we accept that we are at least in part defined by our data flows, what are we going to do about it? How do we fight back? (Supposing that, to my dismay, using Google “in cognito” windows falls short.) Part of the way to fight our way out of our data prisons, Bowker argued, was through transparency. If the algorithms companies use to crunch our data and mine our data flows for precious ore — if those can be made public, we will have made one step further towards liberation, towards political action, he argued.

Ok, I get that — maybe transparency is a necessary first step. But it is far from sufficient. Seeing what controls and potentially manipulates you may be a bit empowering, but that bigger agent still has the data, still builds and runs the algorithms. You need skills, training, and access to be able to do anything with or about the algorithm. It’s akin to when you’re on a web page and you know you can browse the source code with a right click and “view source.” But if you don’t know html code and CSS (cascading style sheets) and what not, you can’t really do much to change what you’re looking at or what it does. Right there, me not even being able to list all the things you’d need to know — that’s an example of my ignorance and inability to take control just because something is made visible to me.

Visible genomes

Spoiler: here’s the point in my post where I draw in my dissertation research on consumer genomics, and in particular what people are doing with the “raw” or uninterpreted genetic data. I suspect that what motivates some people in this context is Bowker’s attractive idea that transparency alone is empowering. Just let me see my data – my sequence of A’s, C’s, G’s, T’s – and maybe that will bring me some insight and perhaps even some control over who I am, what diseases I may face later in life, etc.

But to the extent that might happen, I think it’s much more likely that rendering the raw data visible, downloadable, parse-able, may not do much at the individual level. Now granted I work with genetic data in a research context rather than in a clinical or medical context, so that skews my perspective. But the types of data people can currently get from consumer genomic testing, at least, is arguably much more valuable in the aggregate than to the individual who just has his or her data file. It’s analyses that are run on large scale genetic datasets that can teach us more about how human health and disease work, not so much about how DNA variants shape an individual’s trajectory.

I’m not saying transparency, of big data, algorithms, or genomes is bad — as Bowker indicated, it may indeed be a necessary first step. But let’s not get distracted by the allure of just seeing behind the curtain, of peering into Narcissus’ pool of data about ourselves.

Is Genomics in a Bubble?

There has been a lot of election “post-mortem” talk about living in bubbles. Urban bubbles, academic elite bubbles, blue state bubbles — all out of touch with and perhaps at times dismissive of rural America, no college degree America, red state America.  [For a concise articulation of the problem, see November 8th New York Times editorial here.] I am likely guilty of all those bubble accusations: I live in Seattle so, blue state — check, urban center — check (though granted I grew up in Tennessee!). I have a Master’s degree and am pursuing a PhD. My graduate studies are in genomics, and in particular consumer genomic testing. Are my academic interests and pursuits elitist and out of touch with reality? Perhaps. But not entirely. Let me defend why.

Is genomics in a bubble?
Is genomics in a bubble? [clip art, Microsoft PowerPoint 2010].
Genomics in a bubble: the prosecution

First, why is genomics in general part of the bubble? An illustrative moment came during a talk I heard at the American Society of Human Genetics (ASHG) Annual Meeting back in October. I attend ASHG regularly and it is typically populated, as you would expect, with genomics cheerleaders (imagine the cognitive dissonance of devoting your career to something you don’t believe is important). But this particular presentation was about trying to implement genetic testing in low resource settings, specifically at Federally Qualified Health Centers.  In the study being presented, providers at FQHCs described the challenges of using genetic information to guide care of patients with so many more pressing needs and problems: job, food, and housing insecurity among them. Genetics — in particular mildly predictive genetics that only influences complex disease risk by a smidgen up or down — does and should take a back seat when people are in need of food, shelter, and employment.

Genetics in the clinic aside, direct-to-consumer (DTC) genetic testing may be in even a more bubblier bubble. Indeed, empirical studies of DTC users have found them to be more white, more educated, and older than the average adult in the US. It is a luxury to have one or two hundred dollars kicking around that you can use to send off for your 23andMe or Ancestry DNA spit kit. It’s sometimes called “recreational” genomics for a good reason: people are often doing it for fun, or maybe even because someone gave it to them as a gift, and so why not. It’s also a luxury to have the free time to sit down at your computer and pour through your results on the company website and an even greater luxury to have the time to download your “raw” genotype data and poke around with it in third party interpretation websites (the specific area of my dissertation research). Yes, this is probably not your average person.

Genomics in a bubble: the defense

So we’ve established that it is probably not your average person who spends their time and money on DTC genetic testing and subsequent self-directed analysis in third party interpretation systems. But who is this person? What types of people are doing this and why? What are they doing with the information they get back and with what consequences? I’m trying to understand just that. To characterize this pursuit, however bubble-wrapped these people and these experiences may be.

My defense is that I do think this type of endeavor extends beyond just consumer genomics, and by understanding it better we can apply those insights to other types of pursuits. People, regardless of location, income, education, political leanings, etc., are typically engaged in some type of search:  for personal and/or familial identity, for meaning, for connection with others, for health and wellness. Not everyone turns to their genetics for this, but people do turn to or seek out something. I think the dynamics of consumer genomics may well apply to these other areas. What motivates people, how do they satisfy those motivations, and what do they do with the result. That’s why I’ve chosen this area of research that I hope extends beyond the narrowly scoped instance of consumer genetics.

Genomics back in the bubble: precision medicine

Note I’ve been focusing on consumer genomics here, as that’s the subject of my dissertation research, but I do want to mention another hugely important, bubble-relevant application of genomics: genomics in clinical care, sometimes called “precision” or “personalized” medicine. As do many people, I fear that the integration of genomics in medicine may further widen existing disparities in access to prevention, care, and treatment in our health care system. That is, like many other medical technologies and knowledge, people with more education, higher SES, and better insurance are more likely to have access. Many genetic tests are not even currently covered by health insurance (it varies widely based on context, type of test, and insurer), meaning those who benefit are those who can pay out of pocket. Indeed, given limited resources and more pressing needs, genetics takes a deserved back seat, as we were reminded of by the ASHG presentation described above.

Carving out my corner of inquiry

So yes, I realize my doctoral work may be esoteric at times — narrow in focus at best and myopic at worst. But dissertation projects typically do (and must) carve out some narrow area of inquiry that may, on the face of it, be of concern to no one else but the student and their committee (and perhaps not even every member of the latter!). Part of that is just so we have a contained and sufficiently scoped project to complete in a few years. But we typically hope and strive to have broader relevance, to pierce the bubble and bring some goodness and understanding to others. And of course to take this time to listen and learn from those with different backgrounds, experiences, and interests. Including those who don’t buy that studying consumer genomics could ever teach us anything of real importance. Though I would hope to prove them wrong.


Thoughts on Translation Tools, Expertise...and Italy

I recently spent 12 days vacationing in Italy with my mother and two older sisters. While my body is still processing large quantities of delicious cheeses, pasta, and gelato, my mind is digesting the experience of touring a foreign country with different norms, cultural nuances, and of course — a different language (though the diversity of head scratching bathroom set-ups also bears mentioning). On this trip, translation was always on the brain: translating my thoughts to others and in turn trying to understand the information presented to me, whether on signs, at train platforms, or spoken by short-tempered wait staff. Because despite my half-hearted attempts at learning Italian with the DuoLingo language app, or my high school courses in the nearby romance language French, I was nearly useless in speaking or understanding Italian.

The Need for Translation

My week plus of translation needs in Italy got me thinking about the role of translation in biology and, in particular, in genetics. In both contexts, translation carries at least two main functions: (1) operationalizing and (2) meaning-making. Operationalizing means to make functional, or to make a thing do.  Translation is a key term in the Central Dogma of Biology. DNA isn’t terribly useful just sitting all spaghetti’d up in our cells. Rather, DNA carries instructions on how to make proteins that build us and do most all the work in our body (this is DNA as the “instruction book” or “blueprint”). The Dogma states that DNA gets transcribed into RNA, a molecule very similar to DNA but more easily accessed by other cellular machinery. Then RNA gets translated into protein, going from a nucleotide code (the A’s, C’s, G’s, and T’s – actually U’s, for RNA) to a chain of amino acids that gets all folded up into a beautifully complex protein. Translation is the operationalizing of DNA, the process that makes it do.

The Central Dogma is great and all but it’s a process scientists have understood for about half a century now, so not exactly breaking news. The challenge currently facing genetic researchers is truly understanding what different variations in genetic sequences actually mean for people’s health and well-being — and perhaps their identity. Here the challenge is translating knowledge of DNA sequence into actual meaning. Perhaps into meaning for an individual patient and their health care provider making a treatment decision. Or perhaps meaning for a large group of people by better understanding how a disease or other biological process works. The questions are more than just what changes in DNA do to proteins, which could take us back to that literal translation step of the Central Dogma. The questions spiral out: only ~3% of our DNA codes for proteins, but all that non-protein coding DNA could affect other things like regulation or as yet undiscovered cellular processes. Also, our genetics interact with other things in our bodies and in our lives, further complicating the meaning-making part of the translation puzzle.

The Tools of Translation

My needs for translation in Italy were pretty much the same: to be able to do things and to make meaning. I am not an expert traveler nor linguist, but I did have some amateur tools at my disposal. First and foremost: on my smartphone, Google Translate (with Italian downloaded for offline use) and an Italian phrasebook app. Off the screen, my sisters and mother who had also done some DuoLingo lessons, and my occasionally useful knowledge of French. Google Translate, which I used quite frequently, would often give me incomplete information — sometimes a word wouldn’t translate, or it would give me something I had no idea how to pronounce (and the audio pronunciation isn’t available offline). I knew some of the rules, for example: “ch” is a hard C, as in chianti, while “ci” is the “chuh” sound, as in ciabatta bread. But usually I was moving through the world with partial information, still enjoying myself and interacting meaningfully with my surroundings.

Cherub in Scuba Mask
Cherub in Scuba Mask: street art I saw while in Florence, Italy.

I bring up the amateur aspect of my translation experiences in Italy because I see parallels with the phenomenon of consumer genetic testing. While scientists are still wrestling to make meaning of human genetic variation, consumer companies have gone ahead (some would say prematurely) to make interpretations of personal genetic data available directly to consumers. The majority of these consumer genomics customers are, like I was with Italian, not specially trained to interpret or filter genetic information. Yet if given some tools and some rules, they can probably navigate the unfamiliar territory with some degree of enjoyment and success. Sure they might make a wrong turn or get caught in a tourist trap pizzeria (darn you Piazza della Signore in Florence!). But should they be denied access for their lack of expertise, or for being only armed with some amateur and partial tools of comprehension?

Of course in my Italy metaphor the answer is “No!”, but I recognize that consumer genomics is more complex — and newer, which makes it harder to identify and weigh potential risks and benefits. Should access to personal genetic data be limited to specialists? Should specialists make better tools to enable amateurs to pursue their own translational and meaning-making activities? Tourists have been bumbling around foreign countries since there was bumbling to be had: that’s just part of the human experience. Is bumbling around our own genomes also going to become part of the human experience?

Mapping Metaphor across Big Data, Biotechnology, and Genome Sequencing

Everyday metaphors

Before I was geeky about science I was geeky about words. For my 16th birthday, my best friend gave me the “Encyclopedia of Etymology” — a giant tome about the origins of words (not bugs, people! That’s entomology). So of course I get excited when science and language interact, which happens a lot with metaphor. I even did my Master’s research thesis about metaphor (more on that later). One of the most surprising things I learned early on in that project was that most metaphor is actually lurking beneath the surface of how we talk and think on a daily basis, rather than being mostly confined to speeches and fancy poems (e.g., “Shall I compare thee to a summer’s day?”).

An example of a quite basic metaphor is that up is good and down is bad. Would you rather have things “looking up” or to be “feeling down”? Granted this metaphor may not hold across cultures, but in Western societies it is so ingrained as to almost be invisible. Note I did not discover all of this, but rather was introduced to these ideas in Lakoff and Johnson’s seminal 1980 book “Metaphors We Live By”. Think of Lakoff and Johnson like the Watson and Crick of modern metaphor studies. (If there is a Rosalind Franklin out there in this analogy, then my apologies in advanced for the omission!)

Metaphors for “big data” – h/t to Sara Watson

Metaphor is subtly sprinkled throughout our daily speech, and it can have powerful effects on how we think and act. Which is why it’s so important to identify metaphor and understand its sway on us. So I was pleased to recently come across self-proclaimed “technology critic” Sara Watson’s article on dominant metaphors for big data. She does a lovely job of breaking down dominant industrial metaphors for big data and suggests that replacing them with embodied metaphors, those more tied to our lived experience — our physical bodies — might help people exert more control over data and its downstream uses.  Otherwise big data becomes this inevitable industrial, machine complex bearing down on us, so better hop on board or get out of the way.

Today’s society has a borderline morbid fascination with big data, which I’ve also written about previously in “Big Data, Big Deal?”, and you can see how the dominant metaphors perpetuate this fascination.  A particularly problematic metaphor in my mind is that of data as a natural resource that should be mined, extracted, and purified. In this construct, data are commodified and spatialized. Just think of all the untapped reserves of “raw” data waiting for the boldest and most pioneering person to tap into: data logged daily by our smartphones, our Facebook profiles, and even our very bodies. In this metaphor, data become pre-factual and given, rather than contextual and imagined (whereas in actuality you have to conceive of something as a data point before you collect it — aha, even there,  I did it: “collect data” as if I was picking wild huckleberries on a mountainside…which I recently did, incidentally). But full circle back to etymology: the very word “data” is from the Latin verb for “to give”….so it’s not totally our fault that it’s easy to take data as “a given.” (More on other cool things you can learn about the word “data” in my earlier post.)

The need to tease out metaphorical concepts

Sara Watson’s article articulates metaphors as “metaphorical concepts”, or “X is Y”: e.g., “Data is a natural resource.” Formulating metaphor this way is helpful in understanding the consequences or “entailments” of the metaphor and to raise further questions. If data is a natural resource, is it a renewable one or something finite (e.g., fossil fuel) that we may run out of? If data is a natural resource, who is “mining” it and who is using or buying it?

Metaphorical concepts are rarely stated outright, but identified through analyzing different expressions of the metaphor. You can see these expressions listed under the heading of the metaphorical concepts in Watson’s article: words like “raw,” or “trove”. Analysis of metaphor involves picking out those instances and then drawing out the underlying metaphorical concept.

Critique of a CRISPR metaphor analysis

Metaphor analysis that stops short of articulating metaphorical concepts is less useful. Last fall I wrote a piece along with two of my thesis committee members critiquing a metaphor analysis of the gene-editing system CRISPR that had this very problem. We argued that failing to articulate underlying metaphorical concepts resulted in a missed opportunity to understand who uses CRISPR to do what? Is CRISPR, as a technology, the subject of the metaphor or is the scientist using CRISPR the subject? It’s an important question of who or what has the agency to act and make decisions about gene editing.

Also, because the authors didn’t identify metaphorical concepts, most of the metaphors they report were about the genome itself rather than about CRISPR. It would have been easier for them to draw robust conclusions about CRISPR metaphors if they’d been able to separate out genome metaphors (to separate the “text” from its “editor,” as we allude to in the title of our critique).

Metaphors about genome sequencing: my MPH thesis

Oh – and did I hear someone ask about my Master’s thesis? I’m going to assume that’s a “yes.” For my Master’s in Public Health degree in Public Health Genetics, which I completed Spring 2014, my thesis project was a metaphor analysis of research participants discussing whole genome sequencing. I was fortunate enough to have access to several transcripts from previously conducted interviews and focus groups where people were asked to discuss genome sequencing in the context of research and medicine. No one was asked about metaphors specifically, but because of the frequency of underlying metaphors in spontaneous speech, instances of them popped up often in the participants’ discussions.

One of the most common metaphorical concepts I identified was “Genetic information is a weapon.” In some cases, getting personal genetic information was seen as a weapon in the hands of the individual, something empowering them to act, to defend themselves against disease or other potentially negative experiences. For other people, the weapon metaphor was one where genetic information was used as a weapon against them, to knock them over or leave them “shell shocked.” So even the same metaphorical concept can have different  consequences, here depending on what  or who is in control of the information.

Full disclosure was that initially I wasn’t forming my results as metaphorical concepts (“X is Y”) but more like keywords or domains (as we later critiqued in the CRISPR metaphor analysis). My committee member and resident metaphor expert, Leah Ceccarelli, strongly encouraged me to find the metaphorical concepts. My only real objection was “that sounds hard” (remember I’d never done formal metaphor analysis before), so once I realized that was lame I made myself do it – and ended up with a much stronger project for it.

You can read my whole thesis on ProQuest: search for title “Mapping Metaphor: A qualitative analysis of metaphorical language in discussions of receiving exome and whole genome sequencing results” (or, if you don’t have access to ProQuest, I’m happy to email it!). I also had peer-reviewed journal article published here. (Yes, it took an extra ~18 months to have that paper see the light of day – see my earlier discussion of the iterative and often trying nature of scientific publication here.) Meanwhile, here’s a table summarizing the main metaphorical concepts I identified.

Table of metaphorical concepts from my thesis research project, with one or two example quotes from focus group and interview participants.
Metaphorical conceptExample quote(s)
GENETIC INFORMATION IS A TOOL[Getting genetic information] “might just be one additional piece of information to add to the toolbox”
GENETIC INFORMATION IS A WEAPON[Receiving genetic results for a child] “could be a piece of information for them…to have in their arsenal for decisions that they’re going to make in their lives”

“So you don’t want too much information and, and with, I think with this, it’s so much. Genetic, there’s so much out there, you don’t want to be bombarded either.”
GENETIC INFORMATION IS LIGHT[Receiving positive results, e.g., about athletic ability] “would be like hey there's a light in the end of the tunnel”
GENETIC INFORMATION IS DARKNESS“To know that I would develop early onset Alzheimer’s or, or something like that, I think it would be a consistent cloud over my life”
GENETIC INFORMATION AS GOODS INSIDE A BOX“I’m going to want to [get] results on all of them. I’m curious like that. But I’m…not very confident. Kind of like opening Pandora’s box, do you want to know what’s inside?”

[On choosing when to receive results] “I want to open that box that’s, that’s mine.”
GENETIC INFORMATION IS A PICTURE“I don’t think I’m closed out to anything. I, I like the good and the bad because it all makes the whole picture.”
GENETIC INFORMATION IS A DOCUMENT“If there was an architect going through the neighborhood and they were drawing plans, I want a copy of the plans of my house… I’m not going to build a house, I just want it.”

“…it would be nice to know, I guess I’m thinking of credit score like, here’s your credit score and here’s how you can improve it.”

Other recommended reading:

Ceccarelli, L. (2013). On the Frontier of Science: An American Rhetoric of Exploration and Exploitation. Michigan State University Press.

Condit, C. M. (1999). The meanings of the gene: public debates about human heredity. Madison: University of Wisconsin Press.

Lakoff, G., & Johnson, M. (1980). Metaphors We Live By. University of Chicago Press.