Dickinson College Home Page Admissions Contact Us!

Installation of First Stafford Chair
Friday, January 25, 2002

The John R. Stafford '59 and Inge Paul Stafford '58 Endowed Chair in Bioinformatics had its first professor installed in a ceremony held in the Stern Center. Professor Kirsten A. Guss is the first incumbent of this chair.

Photos by A. Pierce Bounds, Dickinson College

Dickinson and DNA: Engaging the World with Bioinformatics

Kirsten A. Guss
January 25, 2002

Kirsten Guss

President Durden, Mr. and Mrs. Stafford, Dean and Provost Weissman, Trustees of the College, colleagues, family, friends, and students, it is a great honor to be the first recipient of the Stafford Endowed Chair in Bioinformatics.

The faculty position I am fortunate to hold was created by a $900,000 grant awarded to the College by the Howard Hughes Medical Institute to establish a new faculty position in biology.

This position will now be perpetuated through the generosity of the Staffords.

This is a tremendous gift the Staffords have given the College, and it resonates with our vision to continue to excel in the sciences and to integrate scientific knowledge of the most modern kind across the curriculum.

In the five months that I have been part of the Dickinson community, I have been asked a number of questions: "How do I like teaching? How do I like Carlisle?" But the most popular by far is "What IS bioinformatics, anyway?" My hope is that the first step in integrating bioinformatics into the college cognition can begin here today, with this audience. My hope is that everyone will leave this room with a good understanding of these tools and their applications and implications.

Bioinformatics is a term used to describe computer tools that are used to analyze biological information.

In fact, in any discipline where the computer can be used to organize and analyze information relevant to that discipline, and model or extrapolate some process, then the word "informatics" is frequently added as a suffix. In this case, the 'bio' comes from biology, and the "informatics" comes from the computer.

In biology, frequently the information being subjected to computational analysis is the sequence of DNA in the genome.

The term "genome" refers to the sum of all the genetic information in an organism, and the term "gene" refers to a unit of genetic information.

Usually a gene contains information necessary to direct the production of a protein, which is a type of physical building block of an organism.

A gene also contains information that directs the production of that protein in the appropriate place within the organism, at the appropriate time during its life.

Genes are made of DNA. DNA, in turn, is composed of four nucleotides, or bases, abbreviated A,C,G, and T. So, the genome is basically a really long string of these four bases.

In the case of the human, this string is 3 billion bases in length. That is a 3 with 9 zeros.

Kirsten Guss

This is the textbook I had my students purchase for my introductory class, This is your life, an overview of the human life cycle. It's over a thousand pages long.

If these 3 billion bases were listed in order, without any punctuation of any kind, they would fill the equivalent of nearly 400 of these. And that's without all the pictures.

The process of sequencing a genome refers to determining the order and arrangement of these four bases, because information is encoded in how these four bases are put together.

The cells in your body have the ability to read the language of your genome and discern the information there. This process is similar to our ability to read and understand a page of text, of a language in which we are fluent, and find information there.

We can discern meaning from how the letters are put together. For example, we know where a sentence starts by the presence of a capital letter, and we know where the sentence ends by a period, and we know when a new idea is starting based on a new paragraph. There is information encoded in the order and arrangement of the four bases in DNA, just as there is information encoded in the order and arrangement of the 26 letters of the English alphabet (usually).

DNA encodes the genetic information, the blueprint, for more than just humans. In fact, all the living diversity around you is encoded in the assembly of just these four bases. This is true whether it is the bacteria that makes you ill, the yeast you use in your bread machine, the plants in your garden, your pet, the fruit flies buzzing loose in your lab, your family, or you.

Mother Nature is a very efficient communicator. The genetic information for every organism is encoded by the arrangement of just four bases. That's like saying that all the languages that are used for communication on the planet use the same 4 letters. We here at Dickinson know better than anyone that this is not the case, since we graduate proportionately more foreign language majors than any other college or university in the U.S.

We are just learning the language of the genome. We have the order of the letters, but we are just beginning to decipher the information encoded there. Using our language analogy, there are a few spaces, a few capital letters, and few periods marking the beginning and ends of a few known genes, but it's mostly just a string of 3 billion As, Cs, Gs, and Ts. We have the letters, now the question is, what are they saying?

This is where bioinformatics comes in. These are computer tools that help scientists manage, understand and read the language of the genome. These tools are necessary because of the volume of information (remember the 400 textbooks). The use of computers rapidly increases the speed with which we can analyze and discern the information encoded in a genome.

In fact, the determination and assembly of the sequence of the human genome in the first place would not have been possible without computers, and the people who know how to program them. Now we turn again to computers to expedite the process of discovery.

What kind of information in encoded in the genome for humans? First of all, as a result of having the genome sequenced, we know about how many genes we have; it's around 45,000.

What do these genes do? It's possible that every human characteristic and trait has some roots in the genome.

Your genes provided the blueprint that directed your development from one cell that can barely be seen with the naked eye. to you and how you look. You are composed of thousands of cells, of thousands of types, all organized and patterned and put together appropriately.

Your genes influence how you act, the fact that you can speak, how fast you can run, (swim and bike) what might make you sick, and how related you really are to that fruit fly buzzing loose in your lab. And those are just some of the ones we know about. These bioinformatic tools may also reveal the genetic basis of certain human conditions.

We are very interested in the genes that cause disease, those good genes that go bad. When your doctor enquires about your family history, that is because some diseases "run in families" which means they have a genetic basis. That means that there is a gene, or genes, that when altered or mutated, gives rise to some outcome that is harmful to human health.

How do these genetic mutations work? A mutation is a change in the sequence of a gene, a change in the way those four bases are arranged. Sometimes it can be the change of a single base, in a gene that is thousands of bases long.

Using our language analogy, let's look at the difference that one letter makes in the meaning of a sentence, and the resulting implications. Let's take the sentence "The talk is here", with here spelled h-e-r-e. Now, let's say the final word was misspelled as "h-e-a-r". It sounds that same, but on paper it doesn't make any sense. This is what we would call a nonsense mutation. Now let's go back to our original spelling of here, h-e-r-e. This time let's add a letter, a t. This makes the sentence "The talk is there." What if "here" means the Great Room in the Stern center, and "there" means the cafeteria in the HUB? Two completely different outcomes would result. Imagine that such a mutation happened today. It might result in some very confused people hoping to learn about bioinformatics waiting in the dinner line.

Mutations in the genome work in similar ways. They change the meaning of the gene, which may change the protein that it encodes. The protein might no longer work, and as a result, cause harm to the individual carrying the mutation.

Some genes linked to human diseases were identified in what has been referred to in the pre-genomic era, before the completion of the human genome. These identifications took years of hard work, and the efforts were usually focused on a single gene, for a single disease. The identification of genes linked to human diseases represents only the first step in the long process of determining how a mutation in that gene causes the respective disease, and how it might be fixed.

Now, with the existence of the human genome and the appropriate computer tools, this process is expedited. First, the scale of a search can be broadened from a single gene to all 45,000 in our genome. These candidate genes can be identified much more easily, as can the mutations in those genes, so that the work of determining what goes wrong and how to fix it can begin.

The following represents just a sampling of human conditions that have been linked to a genetic basis:
Breast cancer
Cystic fibrosis
Nutrition issues
Psoriasis
Heart disease
Osteoarthritis
Autism
Neurodegenerative diseases

The sequence of chromosome 21, an extra copy of which causes Down's syndrome, has been determined.

The genomes of a number of organisms that cause human diseases have been sequenced. These include the microbes that cause Tuberculosis, cholera, and plague.

Bioinformatic tools will revolutionize not only the identification of genes associated with disease, but also with the development of very targeted cures. Some patients do not respond to a given treatment or therapy. That is because the disease may result from a different type of mutation. As was revealed in the example I just used, a gene can undergo mutation in a variety of ways. The application of bioinformatics may allow the development of drugs tailored to an individual's genomic make up. The application of bioinformatics to generate individual-specific therapies is called pharmacogenomics.

The word genomics has also become a popular suffix. It means "to study the genome." The suffix refers to the study of the genome to learn something about the prefix, pharmacology in this case.

These tools are revolutionizing the process of biological discovery, and we will ensure that our students are right there, discovering. Familiarity with these types of tools and their application will become a fundamental aspect of the Dickinson undergraduate biology curriculum.

I'd like to read to you a quote from a News Feature from the issue of Nature (2001. 409: 758-760) that published the publicly funded draft of the human genome. The Feature is entitled "Are you ready for the revolution?" The tag line is "If biologists do not adapt to the powerful computational tools needed to exploit huge data sets, they could find themselves floundering in the wake of advances in genomics."

Not us, we know what that word "genomics" means.

The quote is the following: "In the long run, change will come through the emergence of a new breed of biologists who are steeped in computational biology as an integral part of their education. This means that the subject must be included as a core module in all undergraduate biology courses, rather than as a specialist option."

At Dickinson we are nurturing more than just biologists who are ready for the revolution. First, we are nurturing the leaders of the revolution, and second, not just biologists. Our vision is to cultivate students who are well versed in these tools, in their respective disciplines. This might be the psychology student, who is modeling neural networks in the brain, the chemistry student who is predicting the interaction of an enzyme with its substrate, and the computer science student who wrote the programs for both of them.

These tools also allow us to explore the very nature of what it means to be human. The sequencing of the genomes of many other organisms, the number is nearly 50, allows us to probe our molecular origins and relationships with our non-human neighbors. We feel pretty good about that 45,000 gene number until we learn that fruit flies have nearly a third as many. At least they don't have more.

And, we have the opportunity to explore who we are as individuals. Are our personalities and the wide range of emotions that we get to experience: love, worry, anger, pride, confidence, and happiness, are those factors that make us "us" rooted in our DNA, and if so, how deeply? Or are they "emergent properties"—those unique, unanticipated outcomes from our collections of genes? The nurture versus nature debate will take on a whole new spin.

It should be cautioned that with the power of these bioinformatic tools comes a responsibility. That is the ethical and moral use of the information produced by these tools, particularly with respect to the human genome. I feel that we at Dickinson, as purveyors of a useful education in the sciences and liberal arts, are uniquely poised to produce leaders in both the gathering of this data, and the responsible caretakers thereof.

Thank you.