Monthly Archives: April 2017

Why your PC may mistake your genome for a mail contact

There is a file type used to store large-scale genetic data called a “vcf” file, short for “variant call format.” To a PC, however, a “.vcf” file extension means something completely different: it’s the “vCard” format used to send Microsoft Outlook contact information.

DNA helix is being fed into someone's computer screen but only confusion comes out the end.
Frustrations of personal genome analysis. (Image credit: source images from Pixabay and Wikimedia, compiled by me)

Therefore, if you click on a genetic “.vcf” file with a PC, it will likely suggest opening it with programs such as Microsoft Outlook, Windows Contact, or the like. In addition to being kind of hilarious, and potentially frustrating to a layperson trying to examine their genetic data, this clash is a microcosm of a fascinating larger trend. Non-specialists are getting access to personal genetic data and wanting to do something with it. But who or what assists them in these endeavors? How should the scientific community respond to people banging on the door of their genetic expertise and skillsets? I can’t supply all the answers, but I can break down this amusing “vcf” conundrum in service of exploring these larger questions.

Personal genetic data access

It is now easier than ever to get a hold of your genetic data. Millions1 of people have availed themselves of direct-to-consumer genetic tests, most of which allow the customer to download his or her “raw” genetic data file. In addition to DTC testing, people might gain access to their genetic sequence by getting a clinical genetic test. HIPAA laws allow people to access the full lab reports from clinical testing, so for sequencing tests this would likely include “raw” sequence data (given current data standards, probably in a “vcf” file). A third way people might gain access to their genetic data is by joining a research study that makes such data available to participants. This has historically not been common practice for research studies, but early adopters such as the Personal Genome Project and now the nation-wide Precision Medicine Initiative are allowing this.

You might argue that acquiring and wrangling with your genetic sequence is still a rather niche endeavor, and I think that’s probably true. (Though note this is an empirical question I’m trying to answer in my dissertation research — exactly who is doing this and why?) But even so, I expect that personal genetic data acquisition will become more mainstream in the future. You have only to look at the popularity of fitness trackers and other wearables to see our society’s obsession with amassing and tracing data about ourselves.

Redistricting expertise

Ok, so simply having a .vcf file of your genetic data doesn’t make you a genetics expert or even mean you can do the first thing with the data. But there are lots of middle men out there in the form of third-party interpretation tools that will help you “do something” with your data. (Note not all work with .vcf files, in part because DTC companies don’t typically provide customers their data in .vcf format, but that’s a technicality.) This ecosystem of raw data access plus third-party interpretation leads to the situation where people are trying to gain access to scientific expertise in new ways. You could say it’s a sort of redistricting who gets to look at genetic data and try to put it some use. The playing field is far from even when you compare a genetics researcher with a layperson, but the general trend is there.

Double clicktivism

This frustration someone might have trying to open a “.vcf” file is not hypothetical. I have heard of cases where people downloaded a “.vcf” file of their genetic data, from one of the third-party interpretation tools mentioned earlier, and were really annoyed and even angered by their inability to open and understand the file. And their PCs were of no help – potentially even actively misleading them as to the appropriate way to open the file (I admit I don’t know what a Mac OS would try to make of this file).

Why were people so mad? One possibility: we expect our technology to be intuitive. Our Google searches autocomplete, our smartphone reminds us to breathe, and we can shout at Alexa across the room to play our favorite song. Understanding how to work with and understand our genetic data is far more nuanced. Even for experts there is a lot of uncertainty about what certain genetic variants mean.

Ironically, this information age that is precipitating access to all this personal data may at the same time be conditioning us to expect instantaneous and even anticipated interpretation and utility from that data. If that’s true, it’s definitely a recipe for frustration when it comes to non-expert personal genetic data analysis.

Meanwhile, think before you double click.


1 – The three major DTC players are AncestryDNA, 23andMe, FamilyTree DNA. AncestryDNA has over 3 million customers genotyped: https://blogs.ancestry.com/ancestry/2017/01/10/ancestrydna-surpasses-3-million-custom. 23andMe has over 1 million customers genotyped: https://mediacenter.23andme.com/fact-sheet/. I have been unable to find a count of genotyped FamilyTree DNA customers — let me know if you have one!