Quick free association exercise: I say “big data” and you think…maybe, Google? NSA? The bane of my existence? The promise of tomorrow? I have a little bit of all of the above. But firstly I think of big genetic data, which is all the rage in biomedical research. Big data in genetics has come about due to advances in both computing technology (which underlies most of the “big data” we talk/hear about) and in laboratory equipment that measures genetic variation. A few decades ago it took a whole painstaking experiment to look at one single genetic variant. In the mid-2000’s genotyping arrays came about, which enabled one experiment to measure hundreds of thousands of variants in many people all at once. Now there are arrays that measure up to 5 million variants in one go (~0.1% of the 3 billion sites in the genome).
But presently the poster child technology of big genetic data is next generation sequencing. It’s “next generation” compared to the earlier sequencing technology used during the Human Genome Project in the 1990’s. Sequencing means that you’re looking at each letter in a stretch of DNA (perhaps the whole genome), rather than a priori selecting letters to look at as you do in a genotyping array experiment. It’s similar to strategically sampling a subset of people to survey (genotyping arrays) versus doing a census of the whole population (sequencing). Sequencing is now fast and relatively inexpensive – we’ve heard for several years about the imminent “$1,000 genome,” though my laboratory colleagues will have to tell me if that’s in fact a reality. But it does mean that sequencing many genomes whole is becoming more commonplace in research and, in some limited ways, in medicine.
There’s lots to be said about these trends, but I want to focus on one question: what makes genetic data — especially “big data” — valuable? In market speak, where and when is the “value add,” because presumably just the bucket loads of A’s, C’s, G’s and T’s aren’t getting people up in the morning. A few years ago during a TEDx talk (and my apologies to the presenter, whose name I don’t have a record of) I was introduced to the “knowledge hierarchy,” also called “knowledge pyramid” or “data-information-knowledge-wisdom” (DIKW) framework.
It’s a relatively intuitive way to think of the relationship between data, information, knowledge, and wisdom: it’s a hierarchy and one level builds off or assumes the previous. The concept is usually traced back to a 1989 article by organizational theorist Richard Ackoff, published in the Journal of Applied Systems Analysis (sounds like a page turner).
I liked the framework so much I made a folder on my laptop called “DIKW” where I started to collect articles and jot down thoughts on DIKW issues in genetics. Now the folder is called “DIKW_dissertation” and it’s where I store everything related to my Public Health Genetics dissertation project.
With genetic data, we inevitably start at the bottom of the period. Certain practices, such as variant annotation and interpretation, guided by bioinformatics tools and by research initiatives, allow us to move the data further up into information and perhaps knowledge. Wisdom? I’m not sure if and how that’s possible, but it’s another open question for genetics. I’m interested in how and why genetic data moves up the hierarchy not just as an abstract concept but because there are real controversies in the communities of genetics research and genetics medicine that tie into this framework.
More on all of that in future posts, but I want to encourage you to think about this DIKW framework when you encounter these discourses of “big data” in other arenas. My sense is that we sometimes get enamored of big data because we feel there is an inevitability to at least a partial trajectory up the knowledge hierarchy. But I’d be very interested to hear how you see DIKW around you.
3 thoughts on “Big Data, Big Deal (?)”