Quick free association exercise: I say “big data” and you think…maybe, Google? NSA? The bane of my existence? The promise of tomorrow? I have a little bit of all of the above. But firstly I think of big genetic data, which is all the rage in biomedical research. Big data in genetics has come about due to advances in both computing technology (which underlies most of the “big data” we talk/hear about) and in laboratory equipment that measures genetic variation. A few decades ago it took a whole painstaking experiment to look at one single genetic variant. In the mid-2000’s genotyping arrays came about, which enabled one experiment to measure hundreds of thousands of variants in many people all at once. Now there are arrays that measure up to 5 million variants in one go (~0.1% of the 3 billion sites in the genome).
But presently the poster child technology of big genetic data is next generation sequencing. It’s “next generation” compared to the earlier sequencing technology used during the Human Genome Project in the 1990’s. Sequencing means that you’re looking at each letter in a stretch of DNA (perhaps the whole genome), rather than a priori selecting letters to look at as you do in a genotyping array experiment. It’s similar to strategically sampling a subset of people to survey (genotyping arrays) versus doing a census of the whole population (sequencing). Sequencing is now fast and relatively inexpensive – we’ve heard for several years about the imminent “$1,000 genome,” though my laboratory colleagues will have to tell me if that’s in fact a reality. But it does mean that sequencing many genomes whole is becoming more commonplace in research and, in some limited ways, in medicine.
There’s lots to be said about these trends, but I want to focus on one question: what makes genetic data — especially “big data” — valuable? In market speak, where and when is the “value add,” because presumably just the bucket loads of A’s, C’s, G’s and T’s aren’t getting people up in the morning. A few years ago during a TEDx talk (and my apologies to the presenter, whose name I don’t have a record of) I was introduced to the “knowledge hierarchy,” also called “knowledge pyramid” or “data-information-knowledge-wisdom” (DIKW) framework.
It’s a relatively intuitive way to think of the relationship between data, information, knowledge, and wisdom: it’s a hierarchy and one level builds off or assumes the previous. The concept is usually traced back to a 1989 article by organizational theorist Richard Ackoff, published in the Journal of Applied Systems Analysis (sounds like a page turner).
I liked the framework so much I made a folder on my laptop called “DIKW” where I started to collect articles and jot down thoughts on DIKW issues in genetics. Now the folder is called “DIKW_dissertation” and it’s where I store everything related to my Public Health Genetics dissertation project.
With genetic data, we inevitably start at the bottom of the period. Certain practices, such as variant annotation and interpretation, guided by bioinformatics tools and by research initiatives, allow us to move the data further up into information and perhaps knowledge. Wisdom? I’m not sure if and how that’s possible, but it’s another open question for genetics. I’m interested in how and why genetic data moves up the hierarchy not just as an abstract concept but because there are real controversies in the communities of genetics research and genetics medicine that tie into this framework.
More on all of that in future posts, but I want to encourage you to think about this DIKW framework when you encounter these discourses of “big data” in other arenas. My sense is that we sometimes get enamored of big data because we feel there is an inevitability to at least a partial trajectory up the knowledge hierarchy. But I’d be very interested to hear how you see DIKW around you.
The early days
I’m not sure I would be in the field of genetics if I didn’t have blue eyes. I remember my first introduction to genetics, in my 7th grade “life sciences” class, which is just biology for middle schoolers. We were learning about Mendel, his peas, and the laws of genetic inheritance. One way to depict the transmission of genes and traits from parents to offspring is with a Punnett Square.
In its simplest form, the square illustrates how the form of one gene in the parents, call the two forms “A” and “a,” can be passed on to an offspring. Let’s just say “kid,” lest we sound like cold, hard scientists. Different forms of the gene, or genotypes, in the parents lead to different expected proportions of genotypes in the kids. If both parents are Aa, then you’d expect 25% of the kids to be AA, 50% to be Aa, and 25% to be aa. This is because each parent only passes one of their forms to the offspring: the father gives one copy in the sperm and the mother gives one copy in the egg. Each time an egg or sperm goes down the chute, it has an equal chance of being either of the parent’s two forms, meaning each kid has the same chance of being either AA, Aa, or aa.
Don’t get hung up on the AA/Aa/aa stuff, especially since that can be a really non-intuitive way to think of genes (the DNA molecule is made of 4 chemicals nicknamed A, C, T, and G – all uppercase). More important for this story is that my teacher drew a Punnet Square for the trait of eye color. Eye color is used all the time to demonstrate genetic principles, which is ironic given that it’s actually a very complex trait and not well-understood. (If you want to pass yourself off as a human genetics scientist, just say something like “we’re still trying to elucidate the genetic architecture of eye color,” and they’ll let you into all the parties, journals, grants, etc.) But in the toy example, “A” represents a form of the eye color gene for brown eyes and “a” for blue. Aa individuals might either have a blended form, such as hazel, or just have a really loud A form that drowns out the blue “a” form, resulting in brown eyes.
I have blue eyes, so what I saw in that little quadrant of the Punnet Square up on the chalkboard was an opportunity for uniqueness, for exclusivity. I am the youngest in a family of three daughters, my first name was arguably the most popular of all girls born in 1983, I have brown hair and grew up middle class in suburban East Tennessee. Granted many people have blue eyes, but no one in my immediate family. Granny (my paternal grandmother) did. My mom has brown, my dad has hazel, and both my sisters have brown. So let’s remember that eye color is not as simple as one gene with two forms A and a, but something happened in the making of me that caused a previously hidden blue eyed “gene” in my mom to combine with my dad’s partially observable blue eyed “gene,” such that I have 100% blue eyes. Step aside loud brown A genes, and let that “aa” shine.
So fast forward almost two decades. Now I’m in something like 21st grade and pursuing a PhD in Public Health Genetics, an interdisciplinary field that studies the science of genetics, but also the ethical, legal, and social implications (“ELSI”, pronounced else-ee) of using genetic information – in research, in health care, and in everyday life. My research is particularly concerned with that latter piece, the “everyday life” part. Many people, not just blue eyed 7th graders, have noted that genetics (DNA, genes, genomes) has a captivating mystique. DNA is generally hidden to us yet integral to our existence and function as living beings. It is part of who we are, where we came from, and to some extent where we are going.* The first time we knew the full sequence of the human genome was in ~2001, after over 10 years and $3 billion devoted to the Human Genome Project (alongside some private ventures). The cost of sequencing has fallen so hard and so fast that sequencing whole genomes is becoming an increasingly common part of research and, in some currently limited ways, in medical practice. But for most people, at least our own genome still remains hidden to us, though we carry it around with us all the time.
*I do not endorse these “isms”
Above I wrote that DNA is “part of who we are, where we came from, and to some extent where we are going.” A quick but important aside to emphasize “part of” and “to some extent” in that sentence and to clarify what I am not saying about genetics. I don’t think genetics is a crystal ball or sacred text or panacea for all our personal and communal ills on this planet. Many of the societal and health related problems we face as a nation and as a planet have very little to do with genetics. In fact, efforts to understand genetics can often detract (attention, funding, etc.) from more pressing needs and problems. Social inequalities in health and access to health care are traceable to assaults much bigger and much further upstream than genetics. Basically, genetics isn’t deterministic in the sense that having certain DNA sequences translates 100% of the time into certain outcomes. Rather, there is a complex web of ecosystems, communities, families, and individuals that all interact to yield certain social and health outcomes. So genetic determinism is not something I subscribe to or espouse.
Another “ism” I do not intend to promote with my comments here is “genetic essentialism”, or the idea that we can be reduced down to, or “essentialized” as our genetic make-up. We are way more complicated than that (maybe this is a good time to check your Facebook news feed and confirm that last statement).
Open Reading Frame
I’ll talk more about what I think about genetics and what I’m doing in my PhD research in future posts. I’m still working out what the right blog frequency is, both for me and for you, my esteemed reader. But my initial thought is to post at least every week or two. Sign up to get notified of new posts via RSS feed or stuff like Digg Reader and then you’ll automatically receive your next dose.
Ps – Thanks to Ms. DeRoos, my 7th grade life sciences teacher, for teaching me about Punnett Squares, albeit through the intuitive but slightly misleading example of eye color.