All posts by sarahcn

Thoughts on Translation Tools, Expertise...and Italy

I recently spent 12 days vacationing in Italy with my mother and two older sisters. While my body is still processing large quantities of delicious cheeses, pasta, and gelato, my mind is digesting the experience of touring a foreign country with different norms, cultural nuances, and of course — a different language (though the diversity of head scratching bathroom set-ups also bears mentioning). On this trip, translation was always on the brain: translating my thoughts to others and in turn trying to understand the information presented to me, whether on signs, at train platforms, or spoken by short-tempered wait staff. Because despite my half-hearted attempts at learning Italian with the DuoLingo language app, or my high school courses in the nearby romance language French, I was nearly useless in speaking or understanding Italian.

The Need for Translation

My week plus of translation needs in Italy got me thinking about the role of translation in biology and, in particular, in genetics. In both contexts, translation carries at least two main functions: (1) operationalizing and (2) meaning-making. Operationalizing means to make functional, or to make a thing do.  Translation is a key term in the Central Dogma of Biology. DNA isn’t terribly useful just sitting all spaghetti’d up in our cells. Rather, DNA carries instructions on how to make proteins that build us and do most all the work in our body (this is DNA as the “instruction book” or “blueprint”). The Dogma states that DNA gets transcribed into RNA, a molecule very similar to DNA but more easily accessed by other cellular machinery. Then RNA gets translated into protein, going from a nucleotide code (the A’s, C’s, G’s, and T’s – actually U’s, for RNA) to a chain of amino acids that gets all folded up into a beautifully complex protein. Translation is the operationalizing of DNA, the process that makes it do.

The Central Dogma is great and all but it’s a process scientists have understood for about half a century now, so not exactly breaking news. The challenge currently facing genetic researchers is truly understanding what different variations in genetic sequences actually mean for people’s health and well-being — and perhaps their identity. Here the challenge is translating knowledge of DNA sequence into actual meaning. Perhaps into meaning for an individual patient and their health care provider making a treatment decision. Or perhaps meaning for a large group of people by better understanding how a disease or other biological process works. The questions are more than just what changes in DNA do to proteins, which could take us back to that literal translation step of the Central Dogma. The questions spiral out: only ~3% of our DNA codes for proteins, but all that non-protein coding DNA could affect other things like regulation or as yet undiscovered cellular processes. Also, our genetics interact with other things in our bodies and in our lives, further complicating the meaning-making part of the translation puzzle.

The Tools of Translation

My needs for translation in Italy were pretty much the same: to be able to do things and to make meaning. I am not an expert traveler nor linguist, but I did have some amateur tools at my disposal. First and foremost: on my smartphone, Google Translate (with Italian downloaded for offline use) and an Italian phrasebook app. Off the screen, my sisters and mother who had also done some DuoLingo lessons, and my occasionally useful knowledge of French. Google Translate, which I used quite frequently, would often give me incomplete information — sometimes a word wouldn’t translate, or it would give me something I had no idea how to pronounce (and the audio pronunciation isn’t available offline). I knew some of the rules, for example: “ch” is a hard C, as in chianti, while “ci” is the “chuh” sound, as in ciabatta bread. But usually I was moving through the world with partial information, still enjoying myself and interacting meaningfully with my surroundings.

Cherub in Scuba Mask
Cherub in Scuba Mask: street art I saw while in Florence, Italy.

I bring up the amateur aspect of my translation experiences in Italy because I see parallels with the phenomenon of consumer genetic testing. While scientists are still wrestling to make meaning of human genetic variation, consumer companies have gone ahead (some would say prematurely) to make interpretations of personal genetic data available directly to consumers. The majority of these consumer genomics customers are, like I was with Italian, not specially trained to interpret or filter genetic information. Yet if given some tools and some rules, they can probably navigate the unfamiliar territory with some degree of enjoyment and success. Sure they might make a wrong turn or get caught in a tourist trap pizzeria (darn you Piazza della Signore in Florence!). But should they be denied access for their lack of expertise, or for being only armed with some amateur and partial tools of comprehension?

Of course in my Italy metaphor the answer is “No!”, but I recognize that consumer genomics is more complex — and newer, which makes it harder to identify and weigh potential risks and benefits. Should access to personal genetic data be limited to specialists? Should specialists make better tools to enable amateurs to pursue their own translational and meaning-making activities? Tourists have been bumbling around foreign countries since there was bumbling to be had: that’s just part of the human experience. Is bumbling around our own genomes also going to become part of the human experience?

Mapping Metaphor across Big Data, Biotechnology, and Genome Sequencing

Everyday metaphors

Before I was geeky about science I was geeky about words. For my 16th birthday, my best friend gave me the “Encyclopedia of Etymology” — a giant tome about the origins of words (not bugs, people! That’s entomology). So of course I get excited when science and language interact, which happens a lot with metaphor. I even did my Master’s research thesis about metaphor (more on that later). One of the most surprising things I learned early on in that project was that most metaphor is actually lurking beneath the surface of how we talk and think on a daily basis, rather than being mostly confined to speeches and fancy poems (e.g., “Shall I compare thee to a summer’s day?”).

An example of a quite basic metaphor is that up is good and down is bad. Would you rather have things “looking up” or to be “feeling down”? Granted this metaphor may not hold across cultures, but in Western societies it is so ingrained as to almost be invisible. Note I did not discover all of this, but rather was introduced to these ideas in Lakoff and Johnson’s seminal 1980 book “Metaphors We Live By”. Think of Lakoff and Johnson like the Watson and Crick of modern metaphor studies. (If there is a Rosalind Franklin out there in this analogy, then my apologies in advanced for the omission!)

Metaphors for “big data” – h/t to Sara Watson

Metaphor is subtly sprinkled throughout our daily speech, and it can have powerful effects on how we think and act. Which is why it’s so important to identify metaphor and understand its sway on us. So I was pleased to recently come across self-proclaimed “technology critic” Sara Watson’s article on dominant metaphors for big data. She does a lovely job of breaking down dominant industrial metaphors for big data and suggests that replacing them with embodied metaphors, those more tied to our lived experience — our physical bodies — might help people exert more control over data and its downstream uses.  Otherwise big data becomes this inevitable industrial, machine complex bearing down on us, so better hop on board or get out of the way.

Today’s society has a borderline morbid fascination with big data, which I’ve also written about previously in “Big Data, Big Deal?”, and you can see how the dominant metaphors perpetuate this fascination.  A particularly problematic metaphor in my mind is that of data as a natural resource that should be mined, extracted, and purified. In this construct, data are commodified and spatialized. Just think of all the untapped reserves of “raw” data waiting for the boldest and most pioneering person to tap into: data logged daily by our smartphones, our Facebook profiles, and even our very bodies. In this metaphor, data become pre-factual and given, rather than contextual and imagined (whereas in actuality you have to conceive of something as a data point before you collect it — aha, even there,  I did it: “collect data” as if I was picking wild huckleberries on a mountainside…which I recently did, incidentally). But full circle back to etymology: the very word “data” is from the Latin verb for “to give”….so it’s not totally our fault that it’s easy to take data as “a given.” (More on other cool things you can learn about the word “data” in my earlier post.)

The need to tease out metaphorical concepts

Sara Watson’s article articulates metaphors as “metaphorical concepts”, or “X is Y”: e.g., “Data is a natural resource.” Formulating metaphor this way is helpful in understanding the consequences or “entailments” of the metaphor and to raise further questions. If data is a natural resource, is it a renewable one or something finite (e.g., fossil fuel) that we may run out of? If data is a natural resource, who is “mining” it and who is using or buying it?

Metaphorical concepts are rarely stated outright, but identified through analyzing different expressions of the metaphor. You can see these expressions listed under the heading of the metaphorical concepts in Watson’s article: words like “raw,” or “trove”. Analysis of metaphor involves picking out those instances and then drawing out the underlying metaphorical concept.

Critique of a CRISPR metaphor analysis

Metaphor analysis that stops short of articulating metaphorical concepts is less useful. Last fall I wrote a piece along with two of my thesis committee members critiquing a metaphor analysis of the gene-editing system CRISPR that had this very problem. We argued that failing to articulate underlying metaphorical concepts resulted in a missed opportunity to understand who uses CRISPR to do what? Is CRISPR, as a technology, the subject of the metaphor or is the scientist using CRISPR the subject? It’s an important question of who or what has the agency to act and make decisions about gene editing.

Also, because the authors didn’t identify metaphorical concepts, most of the metaphors they report were about the genome itself rather than about CRISPR. It would have been easier for them to draw robust conclusions about CRISPR metaphors if they’d been able to separate out genome metaphors (to separate the “text” from its “editor,” as we allude to in the title of our critique).

Metaphors about genome sequencing: my MPH thesis

Oh – and did I hear someone ask about my Master’s thesis? I’m going to assume that’s a “yes.” For my Master’s in Public Health degree in Public Health Genetics, which I completed Spring 2014, my thesis project was a metaphor analysis of research participants discussing whole genome sequencing. I was fortunate enough to have access to several transcripts from previously conducted interviews and focus groups where people were asked to discuss genome sequencing in the context of research and medicine. No one was asked about metaphors specifically, but because of the frequency of underlying metaphors in spontaneous speech, instances of them popped up often in the participants’ discussions.

One of the most common metaphorical concepts I identified was “Genetic information is a weapon.” In some cases, getting personal genetic information was seen as a weapon in the hands of the individual, something empowering them to act, to defend themselves against disease or other potentially negative experiences. For other people, the weapon metaphor was one where genetic information was used as a weapon against them, to knock them over or leave them “shell shocked.” So even the same metaphorical concept can have different  consequences, here depending on what  or who is in control of the information.

Full disclosure was that initially I wasn’t forming my results as metaphorical concepts (“X is Y”) but more like keywords or domains (as we later critiqued in the CRISPR metaphor analysis). My committee member and resident metaphor expert, Leah Ceccarelli, strongly encouraged me to find the metaphorical concepts. My only real objection was “that sounds hard” (remember I’d never done formal metaphor analysis before), so once I realized that was lame I made myself do it – and ended up with a much stronger project for it.

You can read my whole thesis on ProQuest: search for title “Mapping Metaphor: A qualitative analysis of metaphorical language in discussions of receiving exome and whole genome sequencing results” (or, if you don’t have access to ProQuest, I’m happy to email it!). I also had peer-reviewed journal article published here. (Yes, it took an extra ~18 months to have that paper see the light of day – see my earlier discussion of the iterative and often trying nature of scientific publication here.) Meanwhile, here’s a table summarizing the main metaphorical concepts I identified.

Table of metaphorical concepts from my thesis research project, with one or two example quotes from focus group and interview participants.
Metaphorical conceptExample quote(s)
GENETIC INFORMATION IS A TOOL[Getting genetic information] “might just be one additional piece of information to add to the toolbox”
GENETIC INFORMATION IS A WEAPON[Receiving genetic results for a child] “could be a piece of information for them…to have in their arsenal for decisions that they’re going to make in their lives”

“So you don’t want too much information and, and with, I think with this, it’s so much. Genetic, there’s so much out there, you don’t want to be bombarded either.”
GENETIC INFORMATION IS LIGHT[Receiving positive results, e.g., about athletic ability] “would be like hey there's a light in the end of the tunnel”
GENETIC INFORMATION IS DARKNESS“To know that I would develop early onset Alzheimer’s or, or something like that, I think it would be a consistent cloud over my life”
GENETIC INFORMATION AS GOODS INSIDE A BOX“I’m going to want to [get] results on all of them. I’m curious like that. But I’m…not very confident. Kind of like opening Pandora’s box, do you want to know what’s inside?”

[On choosing when to receive results] “I want to open that box that’s, that’s mine.”
GENETIC INFORMATION IS A PICTURE“I don’t think I’m closed out to anything. I, I like the good and the bad because it all makes the whole picture.”
GENETIC INFORMATION IS A DOCUMENT“If there was an architect going through the neighborhood and they were drawing plans, I want a copy of the plans of my house… I’m not going to build a house, I just want it.”

“…it would be nice to know, I guess I’m thinking of credit score like, here’s your credit score and here’s how you can improve it.”

Other recommended reading:

Ceccarelli, L. (2013). On the Frontier of Science: An American Rhetoric of Exploration and Exploitation. Michigan State University Press.

Condit, C. M. (1999). The meanings of the gene: public debates about human heredity. Madison: University of Wisconsin Press.

Lakoff, G., & Johnson, M. (1980). Metaphors We Live By. University of Chicago Press.

The Contradictions of Consumer Genomics

Through numerous conversations I’ve had with scientists, ethicists, and health care providers over the past few years, I’ve picked up on an odd and seemingly contradictory view of consumer genomic testing: it is both meaningless and dangerous. Not to paint a picture of complete professional consensus, as there is none, but from what I hear it’s these two threads that keep intersecting: danger…and irrelevance. To my mind, to be dangerous means to have power or at least be of some import, which implies having some meaning. This leaves me scratching my head and wondering: “Can consumer genetics really be both?” So I’ve been thinking of different scenarios that could explain these seemingly contradictory stances, which I explore below.

Recap of Consumer Genomics

First as a reminder, consumer genomics, or “over the counter genetics,” as I called it in a previous post, refers to companies offering genetic testing direct to the consumer (DTC), versus through a health care provider. These companies may return a range of reports, including on features such as genetic ancestry, who else in the company databases you might be related to, and risk for certain diseases. In addition, and of particular interest to my dissertation research, most DTC companies also offer to customers their “raw” (or un-interpreted) data for download.

You can see several ways both risk and unimportance could stem from these sorts of results. With genetic ancestry, you can learn roughly what proportion of your genome derives from which different geographic populations.

  • Risk: the information is imperfect, because the reference populations are contemporary proxies, not the actual ancestral populations, and not all populations are represented.
  • Irrelevance: people often already have a good sense of this (was I really surprised to learn my genome is 98.2% Northern European?), and even if they didn’t, is it really going to change most people’s conception of their racial, ethnic, and/or cultural identity?

With disease risk it’s hard to generalize given the range of diseases and the relative importance of inherited genetic variation to each one. But let’s focus on common, complex diseases such as type 2 diabetes or heart disease.

  • Risk: people will not understand the limitations of the test results, which are about susceptibility and NOT diagnosis or deterministic prediction, and either over engage in or fail to engage in healthy preventive behaviors or screening tests. Or just generally freak out (“psychosocial distress”).
  • Irrelevance: for many diseases, genetics plays such an infinitesimal role compared to factors in our environments that in the base case (i.e., barring some overwhelming family history) it’s usually pointless to even talk about inherited genetic factors.

Before I get into two scenarios that could explain the meaningless/dangerous tensions, for comedic relief here is one of my favorite XKCD comics on the topic. You can easily see the danger of misinformation and misinterpretation:

webcomic of two people discussing DTC ancestry testing
XKCD web comic,

One should never avoid chocolate without solid evidence.

Scenario 1: Dangerous because it’s meaningless

The first scenario I want to explore is that consumer genomics is dangerous because it’s meaningless. This could be dangerous in two ways. First, it could be just a waste of time and distraction from more important things, especially when it comes to health. Indeed, in the early days of DTC genetics many experts worried that DTC customers would glut the health care system, making unnecessary appointments with their doctors to follow up on meaningless results, sucking up scarce time and resources. Customers (who are also potential patients, in this narrative) think a result is important, take it into their doctor, who in turn doesn’t think it’s important (and maybe rightly so).

And years later there appears to be some merit to that concern, as some studies found that DTC customers did increase the number of screening tests. (For a good review of empirical research on DTC customers and what they do and don’t do, see Roberts and Ostergren, 2013).

A second way meaningless could be dangerous is just the larger issue of people wasting their time and money. This isn’t a problem for the health care system, exactly, but a broader societal issue. Though to this I would argue that there are numerous other areas where we are not particularly protected against misdirected (or misspent) attentions. (I’m looking at you, Netflix!)

Scenario 2: Dangerous because of future meaningfulness

An alternate scenario is that consumer genomics is dangerous because it’s relatively meaningless now but hopefully won’t be in the future. That is, given our current knowledge of how genetic variation contributes to health and disease, there’s not much useful to be learned from the types of tests DTC companies offer. But, as research continues and we get better at integrating genomics with other types of information, the tests may improve. The whole idea of precision medicine is banking on this process getting better.

So there’s a bit of a reputation issue at stake. If people get exposed to genomics through DTC testing, where the results seem dubious and hand wave-y, they may not take it seriously 10 years down the road when their doctor wants to run a whole genome sequencing test on them. If genetics is currently recreational for most people, it may be difficult to recast it in a more serious light in the future. I’m reminded of a comment I heard at a genetics conference a few years ago, surveying the current and future state of the field of genetic research and medicine. The attendee made a comparison to a historical split between astronomy and astrology — at one point in time, everyone was just studying the stars, then the real science split off from the pseudoscience. He posed to the audience: do we (the professional genetics community) want genetics to go the way of astronomy (presumably what real researchers are doing) versus astrology (presumably the speculative activities of consumer genomics)?

It’s an interesting comparison, but also one that highlights the subjectivity of meaningfulness. Some people read their horoscope every day and find it quite meaningful; others would think that a superstitious waste of time. At they very least, I do think we need to keep an eye on the increasing commercialization of genetics and other types of health data, lest the potential gravity and utility of those data for research and medicine become obscured. And, if you do decide to do a DTC test, make sure it’s your spit and not your dog’s that gets sent in. 😉


Acronymity When concepts hide behind acronyms and jargon

I was recently sitting across from my financial adviser, at his desk on the floor of a busy bank in Seattle. I panicked as I realized that, through a slippery stream of acronyms and jargon, I had lost track of the conversation. ETFs, A-shares, C-shares, rights of accrual…I had even studied up on mutual fund terminology for this meeting, and yet I had still gotten lost. It was distressing, as I had always been a good student and was trying to be a good adult. Then I had two saving thoughts: (1) I am still a smart person, just not well-informed in this particular area and (2) I might not actually care.

jumble of wooden letters
Acronym alphabet soup/word scramble.

This got me thinking about the role of acronyms, terminology, and general jargon in other areas.  Including science. Genetics is a great one for jargon: DNA, WGS, SNP*. As genetic information becomes more widely available to non-genetics experts, the barriers to understanding put up by terminology become even more problematic. Or do they?


Why jargon?

Disciplines and professions use specialized terminology (and yes, acronyms!) presumably for one of two reasons. First, sheer convenience. It’s too much work to spell everything out or give long winded definitions each time a concept is needed. Instead we use shortcuts. Within the given discipline or profession, this is generally unproblematic because the meaning is known and therefore the shortcut is sufficient to communicate the idea. (Side note that assuming shared meanings of terms and concepts presents a real challenge — and opportunity! — in interdisciplinary work.)

The second potential reason is for exclusivity. Terms and acronyms can be used to exclude non-experts from the conversation. The terminology allies someone with a discipline or profession, identifying them as a group member and as a practitioner/follower of a certain set of ideas, principles, and knowledge. It also carves out who doesn’t belong in that group. Just like when your older sibling used to speak in Pig Latin with her friends to keep you out of the conversation.

Need to know basis

My financial adviser is very kind, and I don’t believe he was intentionally excluding me from the conversation — but that was the ultimate effect. I could have stopped and asked him to break down and define each term, but that would have taken hours…and, remembering my earlier thought, I didn’t entirely care. I was content to be on a “need to know” basis about these transactions, and entrust him to make the best decisions on my behalf.

 It was a wake-up call for me to realize that many people probably feel the same level of disinterest and “happy to defer” attitude about genetics as I do about my mutual fund investments. And probably this is a good thing, to have allocations of expertise so that we don’t all have to become an expert in everything. Those who would argue that genetic information should only be available through a physician, rather than direct-to-consumer, might subscribe to this idea. There’s no harm in trying to self-educate, but just because you have the internet at your disposal doesn’t mean you can study up enough to make fully-informed and autonomous decisions about every aspect of your life. Maybe getting genetic information about yourself should involve an expert fluent in all the jargon. In particular because you might not actually care enough, or have adequate time to study up.


I’m still not convinced, though, and think that personal genetic information needs to be made accessible to non-experts. Direct-to-consumer genetic testing companies do a lot of customer education, partly because no one is going to buy a product of which they have zero understanding. There is also a small but admirable set of genetic counselors out there who for decades have been working on educating patients and families to make well-informed decisions about getting and acting on personal genetic information.

For me, the breakdown with the mutual fund comparison is that my money, while personal, is far less personal than my actual body – my personhood. With intimacy comes the desire to stay informed, and to make autonomous (or at least partly autonomous) decisions. My health, well-being, and genetic information are much more intimate than financial investments, so I’m — well — much more invested in making those decisions.

*DNA=deoxyribonucleic acid,  the molecule that carries genetic information in most organisms. WGS= whole genome sequencing, the process of determining the entire DNA sequence of an organism (e.g., a person!). SNP=single nucleotide polymorphism, a change in a single base pair of DNA. Pronounced “snip”

Science: Not as Smooth as Its Seams

I was recently introduced to the term “seamful design” which, in contrast to “seamless design,” refers to a way of making things that doesn’t cover up all the messy inner workings — doesn’t remove all signs of the makers and their processes. A seamful design is one that may be more transparent, perhaps making the designers/creators more accountable to their users and audiences.

While my impression is the term has been used primarily in computer science, I got to thinking about seamful design in the process of scientific research. One of the most important products in science is the peer-reviewed journal article. These publications are often how a researcher’s scientific merit is judged and are a big part of hiring, promotions, and reputation-building in general. The genre is basically a write-up of the scientific procedure behind any study or experiment: you review what’s known, describe your questions, describe your methods, and then describe what you found.

But articles are atrociously seamless when it comes to accurately portraying how science is done. Articles are neat, linear, and nicely packaged. Research is messy, iterative, and linked to many other ongoing projects. And they take a friggin’ long time to get published. These realities are barely visible to the reader, and as a junior researcher I find this highly discouraging. After reading a paper I’m often left with the question: “But how did they really DO that?” Or, to paraphrase one of my professors: “What did they actually do on a Wednesday morning?”

Muppets Beaker and Bunsen
Muppets Beaker and Bunsen “do science.” (Source:

The loooong road to a paper

In a nod to seamful design, and because it is somewhat cathartic for me, I will briefly walk you through how one paper I’m about to have published actually got done. Rewind to January 2015: I had just finished some analyses for my job and was presenting them to our group. Note the results were from analyses I’d started several months before, and we’d decided to tack on some extra checks to make sure everything was working well. One faculty member came up to me after the presentation and said, “Wow, those are really interesting findings — you should publish them!” (referring to the results of the extra checks).

I was flattered but didn’t pick back up on the idea until a few months later, as I had other more time-pressing work and school tasks to attend to. First we had to get our idea approved by a publications committee. That happened in March. In May I was finally able to sit down and start drafting the paper. By mid-summer I had a preliminary version to show my immediate supervisor. We went back and forth on it, and discussed who else should be a co-author. By autumn we had sent a draft around to the chosen co-authors, who had some minor suggestions and revisions. All along we’d been thinking about what journal to submit our paper to. There are considerations like prestige or “impact factor,” likelihood of acceptance, and turnaround time. We decided on a target journal and submitted in December 2015.

Two months later we got a set of comments from three anonymous reviewers of the article. To fully respond to their comments we had to run some additional analyses, some of which we didn’t really agree with but it’s the gesture that counts. The results didn’t tell us a whole lot, but since we’d already done the work and we thought it would appease the reviewers, we added the extra results in as supplementary material (like an appendix to the paper). The journal had originally asked for the revision back in 4 weeks, but we asked for an extension because I again was having to work on it amidst various other work and school tasks (note to self: next time avoid needing to ask for the extension! embarrassing…). We submitted our revision in April and then heard June 1 that our revision was accepted for publication. In a month or two the advance version of the paper will probably be available online through the journal website, but it will likely be several months more before it makes it into a print copy. So almost two years after I did the work, the paper will come out.

That account surely doesn’t show all the seams, and granted it was a much smoother process than it could have been. Often the first journal you submit to returns the paper without review. And other times your revisions get rejected so you’re back to the drawing board with finding another journal, and you’ve lost months trying to get into the first choice. I’ve also only been talking about the publication process, less about the stages of designing and conducting the research. But I just wanted to impress how little of the struggle and setbacks (and time!) the finished journal article can portray.

Showing the seams elsewhere in life

I’ve been talking about the genre of the scientific manuscript as an example of not very seamful design. But you can expand the argument out to how we interact with and perceive each other more broadly. For instance, on social media we see these selective, highly abridged versions of each other: a series of Facebook posts, Tweets, or Instagram pictures. It’s what we want to show and put out there, our best bits. Where are the seams? To be sure we shouldn’t advertise every misstep and pain, illuminate every wrinkle and pore. But we’re definitely not getting the whole story, just like a scientific paper doesn’t give you a very accurate picture of what lays underneath. We’re still left asking “But how did they DO that?”

My current dilemma as a research participant

The Background

In 2008 I enrolled in a study run by the National Institute of Environmental Health Sciences (NIEHS). The study team came to the human genetics research center at Duke University where I was working at the time, so it was easy to sign up (and good strategizing on their part to recruit other researchers!). The purpose of the study was broad: to understand how genetic variation, along with things in our environment such as pollution, diet, etc., influences common, complex disease (think diabetes, cancers, the whole gamut).  I gave my consent to participate, answered a few very basic questions, and had my blood drawn. My blood sample went into a biobank, which is a repository for ongoing research. So I knew at the time that my biological material (DNA, blood, etc.) may be studied for many years to come.

DNA helix growing into tree, printed on person's foremarm skin
Image Credit: Alan Hoofring, NIH Medical Arts Branch

I recall that a short time after enrolling the NIEHS team gave me some blood chemistry results — cholesterol and glucose, if I recall — as a sort of “thank you” for participating. But, as with most research, the motivation to participate has to stem from general altruism: you don’t expect to benefit as an individual, but to help society overall. I pretty much forgot about the whole thing.

The Dilemma

A few weeks ago, the study emailed me to ask for my consent to a new part of the study. They want to sequence the entire genome of participants in the biobank. (In some cases they might just sequence the ~1% of the genome that codes for proteins, but that’s still ~30 million DNA bases.) At first I read the email, thought “hmmm,” and marked as unread. Five reminder emails later, and I’m still not sure how to answer.

If I consent, my genome sequence data will be deposited into a federally maintained research database where other researchers can request permission to access it (stripped of most identifying information except, well, my whole genome).  I understand this process very well — it’s all part of the research machinery around which my job as a research scientist revolves. Whole genome sequencing on a large scale is still a relatively new part of this machinery, but it’s already something I have experience with on the researcher side of things. So let’s be frank: people saying “yes” to the types of questions I’m being asked in this re-consent process is my bread and butter. Why am I not leaping to the chance?

A few things. First, my entire genome feels very personal (not to be essentialist, but it is a large part of my personhood). The earlier phases of the project were likely only looking at relatively small subsets of my genome. It feels like the difference between Google Earth collecting some aerial shots of your house versus a team of people coming inside and searching through all your cabinets and drawers. Second, my sequence would be potentially available to a large number of researchers who would put it to a variety of uses. That’s not necessarily bad, but the broadness of possibilities is a little unsettling. These repositories are controlled by committees that review access requests, so it’s not a free-for-all by any means. And I know that researchers (myself included!) are trying to do good. It’s just that it can be difficult to control all uses of the data, especially under a very broad consent such as “general research use” (a common way data is collected and, though I can’t recall exactly, likely what I signed up for in 2008).

Third, and this is the real kicker, I probably would’ve said “yes” right away if I was going to get my own sequence data back. On principle, if a bunch of other people can see and use my sequence data, why can’t I? Now there’s no rare and/or undiagnosed condition that I or my family members suffer from, which is one scenario in which my sequence data might be immediately useful. I still want it. Some scientists agree that research participants should have access to their sequence data as an act of reciprocity.  And in my own master’s thesis research, I saw non-scientists saying that if their genome were being sequenced they would want the data back. One participant compared her genome to architect plans for her house: even if she wasn’t planning to do anything with those plans, she would just want them on principle. Because it was about her, and it was hers.

The Implications

The implications of my current dilemma are both personal and societal. Personally, I feel hypocritical. The person I may not fully trust with my own whole genome sequence data is someone like me: a researcher at a university trying to answer important questions about human health and disease. On a broader level, I see my own inner conflict as representative of an emerging conflict for traditional ways of doing research. Historically participants donate data or samples and don’t expect data or information back (or are at least told that they shouldn’t expect it). I’m not sure people are going to settle for that anymore. Consumer genomics and self-tracking and mobile health research are changing peoples’ expectations — they may stop settling for one-way research. The “Gimme me my damn data” movement is surging.

Another thing I can’t let go of is that in this case, NIEHS actually could contact me and ask specifically about sequencing. In many other studies (and believe me because I see this all the time in my job), it’s not feasible to contact participants and ask about new uses of their data. Not feasible perhaps because the researchers don’t have adequate contact information or perhaps there are just so many people that it would take years and extra research staff that there just isn’t funding to support. In these situations, the decisions are usually left up to ethics review boards at the researchers’ institutions: the boards have to look at the original consent and decide whether it could cover new uses of the data. But this projection is not straightforward, particularly when the technologies in question (i.e., whole genome sequencing) didn’t even exist when the participant was consented.

I’m just lucky that NIEHS has a current, working email for me. Just think if I’d enrolled under an old email, they couldn’t even reach me to ask, and I suspect that my original consent was broad enough to move forward with whole genome sequencing.

So what am I going to do? I’m going to call them up  and talk about it. What was my original consent, exactly? What are the foreseen uses once my data go into the federal repository? I think they’ll either be thrilled or super annoyed to talk to someone like me. But it’s a conversation that needs to be had before I opt in to this next “sequence” of events.

Swipe right if you’re interested in research Using your smartphone to share genetic data with scientists

Smartphones are pretty amazing, even my dinky little iPhone 5c with only 8 GB storage. In many ways they are extensions of our physical, mental, and emotional selves. Because our phones are typically either on our personage or within arm’s reach, they can track our movements and activity. This capability has made phones incredibly useful to researchers trying to understand human health and disease. For example, Seattle-based Sage Bionetworks recently completed a Parkinson’s disease study called “mPower” using a mobile phone app to track people’s symptoms. The mPower researchers were among the first to use Apple’s relatively new “Research Kit” framework, which allows research groups to design their own apps for data collection in an ongoing, real-time, and potentially massive way.

Direct-to-Researcher Genetics

While Apple’s Research Kit came out over a year ago, it was just last month that they announced a deal with direct-to-consumer genetic testing company 23andMe to enable customers to share/send their genetic data to researchers with one swipe. This set-up greases the wheels for DTC customers to become research participants and for researchers to recruit DTC customers to participate in their studies. Win-win, right? Perhaps, but my eyebrows did raise a bit when I learned that this development also means researchers can purchase 23andMe tests for their participants, essentially outsourcing the processes of informed consent, collection of biological samples, and generation of genetic data to 23andMe. These would have been expenses for the researcher regardless, but now 23andMe gets that revenue and the participant gets the 23andMe product.

Background: To return or not to return?

Relevant backstory here is that for a long time genetic researchers have struggled with if and how genetic results should be returned to research participants. There are many nuanced pros and cons, but one of the very practical cons is that there is usually no infrastructure (time, funding, people) to do it. But 23andMe has a pretty slick pipeline to give people back their genetic information, one they’ve been working on for almost a decade now. So with 23andMe’s new “direct-to-researcher” testing model (my name, not theirs), the participant gets the data they likely want and the researcher has both data and a happy participant. And 23andMe builds their revenue and their own customer database.

 Informed consent

This all sounds pretty great, but I do wonder whether people understand what they’re signing up for with research-oriented phone apps. All research involves an informed consent process, where the potential participant is told about the study and possible risks and benefits to joining. Ideally the consent process would involve a leisurely, face-to-face conversation with the researcher, but more often it’s a long, arduous paper form to read through and sign (indeed, the consent does need to be documented) or sometimes an informational DVD to watch. Granted, the research community has recognized for a long time that the status quo is not ideal, and recent federal laws seek to simplify and streamline the process (e.g., requiring shorter and simpler consent forms).  Even so, if the traditional research model of informed consent is a “gold standard,” it’s a rather tarnished one.

Image of toggle screen in mPower app
Figure 1 from Wilbanks & Friend, Nature Biotech, 2016. Permission to reuse from Nature Publishing Group.

So there’s definitely room for improving informed consent, but is smartphone research the place to make that happen? Perhaps. In their write-up of the Parkinson’s mobile phone study, Sage Bionetworks touts their “e-consent” process as innovative and “visually engaging.” The Sage group required participants to correctly answer a few quiz questions before they could sign up, as a way to test their comprehension of the informed consent material. Maybe e-consent can be better than paper forms, though as someone who scrolls past/clicks through Apple’s terms of service every time my phone needs an iOS upgrade, I still have some reservations about smartphone consent processes.

Frictionless swipe

With all this Apple Research Kit and smartphone-powered research stuff, I’m reminded of a TEDx talk I attended in Seattle a few years back where the speaker introduced the idea of a “frictionless like.” The archetypal example is “liking” a Facebook post: you don’t have to comment or contribute anything, just an easy little – frictionless – thumbs up. (Though the concept is a bit outdated given that now Facebook’s “reaction” buttons afford us the complete range of human emotions right?  Here I would put a sarcasm face.) Is sending your 23andMe genetic data to a distant researcher going to lead to “frictionless” research participation?

I can’t say whether participating in research via smartphones will make people more or less active and engaged in the whole process – that’s an empirical question that merits its own research. But I’m a little skeptical about who is really benefiting from smartphone-enabled science. Are participants empowered? Are researchers more successful? Do tech companies make a wad of cash? Probably a mix of the above. In any case, it’s worth keeping an eye on. Meanwhile my other eye will be scrolling through my Facebook news feed.

Infophile, Infofull, Infofool


Infophilia: love of information. I think we’re all involved in that romance a bit these days, given the abundance and 24/7 availability of information. Anything we want to know is just one Google search away, and we need only to reach for our smartphones to complete that search. It is seductive, this draw to know, and our ability to instantly satiate the cravings just increases our hunger.

What feeds this desire? I think it is at least partly driven by the sense that having information lets us control it. And this is not entirely irrational, as having information is probably a pre-requisite to having control over it. But the having isn’t enough — it’s “necessary but not sufficient” — a part we often gloss over. The need to know is probably also driven by fear of the opposite: fear of being left in the dark. Everyone else can know and does in an instant, so we have to keep up.

If information is our drug of choice, we are in no shortage of dealers. Information and communication technologies, mainly internet-enabled devices and wearables, save us from information withdrawal. Almost every aspect of our lives — banking, healthcare, grocery shopping — has some online, trackable component. The thought of not being able to access something online seems bizarre and unacceptable.

love of data
Image Credit: me + PowerPoint + clip art


A recent curtain ordering experience got me thinking about dependency on information and the assumption that transparency of information equates to control of it. I ordered some thermal curtains, from an online retailer that was going to ship my order first through UPS and then to USPS. I was given a long delivery estimate, 2 weeks or something (Amazon Prime has spoiled me). A week went by and I thought to myself, “Hmm, where are those curtains?” Armed with my two tracking numbers I took to the internet. On the UPS website, my tracking number was accepted, and I could see the transit points from the warehouse to the USPS facility. Ok, next I went to USPS site and put in that tracking number — no record yet. My curtains were in UPS/USPS purgatory. It stayed like that for days. I once tried calling USPS to ask about it, but was forced to leave a message. More purgatory. I had both tracking numbers and an order number, and it seemed certain that those pieces of information should empower me to know — to understand — where my curtains were. But they didn’t. It was a hole in the matrix. My information had failed me.

(Don’t worry, I eventually got the curtains, but I was left shaken.)


I suspect that information intoxication is also part of what draws people towards direct-to-consumer (DTC) genetic testing. We love information, we love ourselves, so what could be better than more information about ourselves? Whether or not it’s all that useful is secondary. Some argue that DTC genetic testing is at best a waste of money and at worst a path to people misdiagnosing themselves or otherwise unnecessarily freaking out. And I know that can happen and indeed has happened. But for the majority of customers, I think it’s more about the information rush of it. It’s information about more than just a set of curtains you ordered, it’s about the DNA that is in the nucleus of trillions of cells in your body, information about the information that was used to literally build your body. In that sense it’s information2 – a super strong hit. You can argue about whether it’s reasonable, but I think the temptation of data is stronger than reason.

Love of information (infophilia) leads to lots of information (infofullness) which may indeed lead to information letting us down (infofooled). But I don’t see many people turning down the opportunity for more information, or lining up to throw their smart phones off a cliff. The same holds for genetic information, in many respects — even if you made it illegal, people would still find a way to take a hit.


The Myth of “Raw” Data

Previously I wrote about the allure of big data. Now I turn to the question of “raw” data. Is there such a thing or is it a myth, an oxymoron — like “jumbo shrimp” or “just one episode on Netflix”?

Why do we cling to this notion of raw data if it doesn’t exist?

I recently read “Raw Data” is an Oxymoron (2013, edited by Lisa Gitelman), which is a fascinating book that turns “raw” data on its head just about every which way, looking both back in time and across disciplines. Just listen to this mind-blowing sentence from the introduction: “Indeed, the seemingly indispensable misperception that data are ever raw seems to be one way in which data are forever contextualized — that is, framed —according to a mythology of their own supposed decontextualization” (pg 6). Basically, by thinking the thought of “raw data” we are already framing and molding it to our preconceptions of rawness.

Data: from Latin to English

One of my favorite chapters in this edited volume was a historical and linguistic overview of the word “data.” When faced with any dauntingly broad topic, I always love honing in on the word itself to get some definitional and etymological clarity (remember, etymology=words, entomology=bugs). So I just ate up Daniel Rosenberg’s chapter on “Data Before the Fact”; the following sections are my summary of this chapter.

Rosenberg traces the word “data” from Latin into English, a “naturalization” process that occurred during the 1700’s. (I suppose that if people can be naturalized when they change citizenship, so too can words when they take up residence in a new language.) “Data” comes from the Latin verb “dare,” to give, so right off the bat we have this inclination to think of data as “a given.” The common Latin phrase “data desuper” means “given from above.”

Indeed, the early English language instances of the word “data” were primarily in the context of theology and mathematics. Data were either given from above and were therefore not questioned, or they were given as a set of assumptions before starting a mathematical proof. Either way, data were something you started with, something everyone mutually agreed were “beyond argument.”

Data: from given to gotten

By the 1800’s, English language usage of the word “data” had begun to shift away from something given to something obtained. Specifically, data came to be thought of as something gained through empirical observation and experimentation. This latter connotation is closer to what we have today: even if we think of data as raw, we do tend to think of it as something that you get or collect from out in the world.

Rosenberg made these observations by searching a large collection of texts called the Eighteenth-Century Collections Online. While not available at the time of his research, he also discusses using Google Ngram for these types of queries. Just for fun, I tried a Google Ngram of the words “data, fact, and evidence” from 1800 to 2000. Here’s what that looks like:

I have to note the irony of the Google Ngram page footer: “Run your own experiment! Raw data is available for download here.”

Data: from plural to “mass” noun

Returning to the book’s introduction, which explains how data are inherently “aggregative” — i.e., we tend to think of data in herds rather than as solo animals. And here I was sadly robbed of what I thought was one of my solidly “smarty pants” moves. I used to pride myself on correctly conjugating “data” as a plural noun: i.e., “data are” versus “datum is.” But now I understand that it’s about equally common to say “data is” versus “data are,” and bright folks like Steven Pinker are telling us to wake up and smell the mass noun (pg 19). This “massness” of data is broader than just a grammatical issue, however. I tie it back to the concept of big data: data are (is? ack!) powerful in the aggregate. Data kind of presupposes a horde of like-minded data, such that we don’t pay much attention to an individual datum/data point.

Data: rawness is relative              

My experience with the idea of “raw” data is that it’s all relative. In the genetics data coordinating where I work, raw data are the genetic data (the A’s, C’s, T’s and G’s) that we get from the genotyping lab. When I’ve talked to people who work in genotyping labs, they say “Oh no, the raw data is what comes off the machine” (here the “machine” being a genotyping or sequencing machine). Seems like data are raw when they first come into our possession — at least that’s a convenient way to think about it. Similar to when you go to the grocery store: the raw produce are in the bins, it’s what you take home to chop up and cook. Rawness may be relative in practice, but in absolute terms – Gitelman and the book’s contributing authors would remind us it’s elusive!

My CV and a decade of evolving DNA genotyping technologies

Lately I’ve been describing myself as “having over a decade of experience in human genetics research,” which makes me feel rather old (I recognize that older people will scoff at this and younger people will smirk and nod). Nevertheless, it’s true: I started working in human genetics research at Duke University right after finishing my undergraduate degree, in the winter of 2005. In the summer of 2009 I moved to a genetics research group at the University of Washington, where I still work.

The developments I’ve witnessed in just this relatively short time demonstrate how quickly the field of genetic research has been changing. This is no coincidence, as one reason I was drawn to genetics was the promise that I wouldn’t be stuck doing the same thing my whole career.  And it’s proven true thus far, because my job path has been partly shaped by different phases or waves of genotyping technology. Each is a way of looking at DNA – to tell which of the chemical bases A, C, T, and G exists at specific places in the human genome. I’ll walk you down memory lane, stopping at three signposts along the way to discuss these different technologies. But I’ll also start and end with a detour….

Detour 1: One summer during my undergraduate

One summer at UNC-Greensboro where I was doing my undergrad, my genetics professor hired me to do a small summer project. It was only a few hours a week and a little bit of money, but I was thrilled to have something to supplement my job at Panera’s. The task was to use a software program to help design bits of DNA that can be used to genotype single genetic variants. These bits of DNA are called primers, and they are made up of typically ~20 to 30 DNA bases that are located near the variant of interest. The primers are used to bind to the nearby DNA and then make many copies of it so that it’s easier to measure the variant. This whole process is called polymerase chain reaction, or PCR, and it basically launched modern biotechnology.

I was tasked with designing primers for a few dozen variants that my professor and his collaborators wanted to study in relation ADHD. So I was going into genetic databases, finding the flanking sequences, and then plugging them into this primer design program to find the optimal bits of DNA to use. This was the whole summer project. Now, I haven’t done this type of work since, but my guess is that current bioinformatics tools would enable one person to do that whole project in an hour. Maybe even 30 minutes and still have time to get a coffee.

Phase 1: Single variants (Duke)

When I started working at Duke, things were pretty far along. We had PCR, we had the Human Genome Project and thus a database of the complete human genome sequence. The projects I worked on initially were genotyping single variants at a time, via something called  a TaqMan assay (“taq” is a special type of the enzyme polymerase – yes, the same polymerase of PCR fame!). A single person working in the lab could push through a dozen or so TaqMan assays in a day, if they were wearing their headphones (with no music) just so other people wouldn’t bother them on the lab floor (I know for a fact this is done). This single variant approach was pretty standard at the time. Before I left Duke, however, this trickle of genetic data was starting to turn into babbling stream.

Phase 2: Microarrays (Duke -> UW)

In the early 2000’s, companies were starting to develop ways to multi-plex these genotyping assays. Called microarrays, or DNA “chips,” these were small surfaces on which you could array hundreds of thousands (now millions) of genotyping experiments at once. One of the Duke projects I worked on did a microarray experiment during my last year there. I remember that it was too much data to go through our normal database process, so our senior programmer had to manually force it in. All of a sudden there were 300,000 more variants than before. Of course then there’s the data cleaning, which was then required on a much larger scale. And that’s what brought me to UW….

I came to UW to work on a new set of projects instigated by the National Institutes of Health to look at gene and environment interactions in a series of complex human diseases. Each of these projects was using microarray technology, so they needed a lot of manpower and brainpower (and Sarah power!) to help do quality control and assurance for all that microarray genotyping data.

Phase 3: Sequencing (UW -> now)

While my work at UW is still primarily with microarray datasets, our center is starting to work more and more with DNA sequencing data. Recall microarray experiments involve looking at a million or so pre-defined places in the genome. DNA sequencing, on the other hand, is going base by base to look at almost every site. Even though sequencing has gotten must faster and cheaper in the past few years, it’s still too pricey to be the de facto approach for every research project. But give it a few years and it will likely have supplanted DNA microarrays.

DNA sequencing readout from an automated sequencer.
DNA sequencing readout from an automated sequencer. Image Credit: NHGRI Image Gallery

Detour 2: Doby-Croc

When I first started dating my husband a few years back, there was some lore generated about what I did for a living. During a conversation I was not present for, my now husband told his uncle that I worked in genetics and inevitably their conversation ended with the decision that I should make the “Doby-Croc.” Half Doberman Pinscher, half crocodile, slogan “the ultimate in homeland security” (don’t tell Trump!). Clearly they envisioned me tinkering away at a lab bench with a white coat and safety goggles, bioengineering the species mash-ups of tomorrow. (Had I been there I would have headed off this misconception at the pass by clarifying that I work at a computer, in what otherwise looks like your typical office job).

DNA technology isn’t quite there yet, though with CRISPR who knows – but that’s a story for another day!