Our Data, Ourselves

Nancy Roseman

Nancy A. Roseman

Dickinson Matters

by President Nancy A. Roseman

What a privilege it is to reach out to the Dickinson community through my first magazine column. While I look forward to addressing a wide variety of topics, how convenient that the focus of this issue is science! That said, you have my promise thatDickinson Magazine will not become a science journal.

Big data is a significant new idea with seemingly limitless reach. Information about us is gathered constantly, sometimes with our knowledge, but increasingly not. Our wired world tracks our movements, what we purchase, what Web pages we briefly linger on, the GPS coordinates of photos we take. Enormous amounts of information, and evolving technologies to organize it and wield it usefully, have spawned a new discipline: big data. 

Earlier this year, a research group publishing in the journal Science demonstrated that using anonymously provided DNA-sequence information, they could identify from whom the DNA came. They performed this feat using a relatively small amount of genetic information, combined with publicly available genetic databases that are used either for genealogy studies or biomedical research, and other publicly available information. Experts in the field­—and the authors themselves—were surprised by how easy it was to pluck the needle out of the human haystack. In fact, the databases used in this study are all freely available on the Internet. The process they used was not itself particularly sophisticated, but like most breakthroughs, it required a novel and imaginative approach. 

This work has sparked much debate among biomedical researchers, bioethicists and policy makers about a host of issues given that our DNA serves as what some have called a future diary of our health—a diary that becomes easier to read as biomedical research advances. Already, there are well-defined markers for certain cancers and chronic diseases. The ability of insurers and employers to discriminate against individuals based on their genetic information is not so farfetched, and, as a result, there are those lobbying for more robust laws protecting against a potential new form of bias: genetic discrimination.

In reading about this leap from an anonymous piece of DNA sequence to the unique individual it belongs to, I reflected on touring, in the mid-’90s, one of the most sophisticated automated DNA-sequencing laboratories in the country, at MIT. Eric Lander, the extraordinary scientist who led the charge to sequence the human genome, told us of the thousands of DNA letters, or nucleotides, they could sequence in a day. That was actually the easy part. He then described the real problem being solved in a back room of the building, a fair distance from all of those robots busily going about the business of generating all those data. 

There, computer scientists, geneticists and mathematicians were furiously trying to develop software that would be able to organize and analyze all of those sequence data, make sense out of that four-letter alphabet constituting our genetic code. At that time, the main challenge of the Human Genome Project wasn’t generating the sequence; it was having the tools to assemble and interpret it. There simply wasn’t any software that could handle the sheer volume of data those robots were generating. Having that four-letter alphabet laid out before us had little utility then, but with the advent of what we now call big data and the development of the necessary computational tools, scientists reached the point that allowed biomedical researchers and others to get to work studying the human genome.

What is so remarkable, and a little frightening, is that in approximately 20 years, we have gone from that four-letter alphabet as disorganized white noise to identifying a single person out of the human family using relatively little genetic information. 

Big data is playing an increasingly significant, oftentimes invisible, role in our lives. Navigating all of its ramifications, as exemplified by the implications surrounding this latest manipulation of human genetic information, is exactly the kind of challenge a liberal-arts education prepares us for. Biologists who are worrying that they are unprepared for the ethical and other challenges posed by the very research they are doing would have benefitted from what the liberal arts have to offer. 

Dickinsonians are ready for this increasingly interconnected and complex world. Instead of being limited to becoming highly specialized and technically competent, especially in a world where competency is fleeting as technology speeds by, our graduates have the foundation to navigate and succeed in a world where intellectual flexibility and breadth across disciplines is more valuable than ever. 

Published July 24, 2013