Richard A. Gibbs, Ph.D.
The director of the Human Genome Sequencing Center at Baylor College of Medicine sat down with Texas Medical Center executive vice president and chief strategy and operating officer William F. McKeon to discuss the value of integrating genetic data, and a future where genomic sequencing can lead to personalized patient care and treatment.
Q | Tell us a little bit about your formative years.
A | I was born in rural, Southern Australia. I went to college in Melbourne and was immediately enamored with biology and biophysics. Serendipitously, I came to the United States to work here in the Texas Medical Center with Dr. Thomas Caskey, at a time when we knew almost nothing about the structure of DNA changes that lead to genetic disease. But it was at a time when the technology was beginning to evolve and the human genome project was just a ‘twinkle in the eye’ of some key leaders. But this was a wave of excitement that began at that time.
Q | Where were the epicenters of discovery at that time?
A | The early genomics groups were widely distributed at that time. There were many groups that were involved at the genesis of the human genome project. There was a period of discussion in the late 1980s and a realization internationally that this might be a good thing—even though we didn’t really quite know what the ‘it’ would be. It was similar to the Mars mission now. For a while we talked about the ‘moonshot of biology,’ but really because there are so many challenges, it is more like today’s view of the Mars journey.
That was the mood for the early thoughts of the human genome project. Not even having a clue about what the good way to get there was. So there were probably about 20 groups that were seriously engaged in the project internationally. And of course, there were model organisms that were also tackled because they were simpler problems. So there was a period then, towards the late 80s, when things began to ferment. That was before the project really hit the ground in the very beginning of the 90s. It began with trying to improve the technology, a long way from being able to sequence the first human genome. The methods were really very crude by today’s standards.
Then there was a collapsing down of the number of groups that were involved. This reduction in the number of participants was partly because of the focus of the funding agencies. But also there was a realization by the participants that completing a human genome was a full time, long-term commitment. It wasn’t something one could do part time! You really had to want to get this task done. So the struggle to get the methods to high enough efficiency continued on through the mid ’90s, then there was just about the moment that we saw the light at the end of the tunnel, a fierce competition with a private group emerged. That really kicked the public groups into action. Because all those involved in the human genome community were committed to the idea that these data should be freely available to all researchers. The idea there was that we didn’t want a private group to sequence the genome and exercise DNA patents—like a ‘land grab’. We wanted to stimulate research, not hide information away and lock it up into early patents.
So we fought that battle and won. We got the data out there into the public domains by the early 2000s. That was the whole 13-year project from 1990 to 2003- 2004. It was originally supposed to be a 15-year project with a $3 billion dollar budget. We did it a little faster and cheaper.
Q | I’ve heard people describe early genome sequencing as being like a New York City phonebook. There is a lot of information in there, and it’s hard to know what to do with it all.
A | Great analogy—except at that time we did not even how many telephones are in New York or how to organize them! That would be the better analogy. So it wasn’t just taking existing phonebooks and stacking them in a pile. It was really figuring out what a phonebook might look like. There are very basic principles of biological research that were forged during that period. Those are some of the unrealized contributions of the human genome project, and more subtle transitions that occurred during that period, which people are really feeling the ramifications of now.
So what are those things? There is digitalization. Biology was completely an analog science up until then. That’s really critical because with the digitization you have a precision and an operability that you don’t have otherwise. Comprehensiveness was emphasized—the idea that you don’t just nibble at the side of a problem. Instead you slice it, you dissect it fully and then you completely describe it. That’s really a fundamental principle that’s practiced widely now.
Free and open data release was also a product of the project. Historically, scientists are very secretive about their data, right up until they publish it. We changed that principle. We developed a model where data can go straight from a machine, into the public view. That’s a huge contribution and one that is echoed in many projects now. In fact now, if you do a large project in biology, it’s very difficult to get traction and support unless you are supportive of free data release and free sharing. Those principles really have been transitional.
Another one worth mentioning is simply scale. Biologists historically have been small thinkers. I really mean focused thinkers. They look down and practice reductionism to figure out problems. But the human genome project challenged that. I am not sure that is recognized. Now, when big projects come along, like Google mapping the world, people will often say ‘Well we can map the entire planet and have a click on view of every room in the world.’ You think that’s pretty amazing, but biology really got that going. Let’s take something as vast as the human genome and have a comprehensive complete view of it. So those principles really came out of the human genome project.
Q | Talk about the cost of sequencing a genome today, as compared to the past.
A | The advances are astounding. It was three billion dollars for the first half genome. We have two copies in each of us, so we only sequenced one for the reference. It was also a mosaic of many people’s DNA, and that’s what you see in the reference database now. If you get a new sequence, you compare it to that reference. The reference is pretty refined, but it’s still only a
half copy and it costs three billion or so dollars to produce. Now, we have tremendous advances in the DNA technologies. These are all technologies that were around at that time, but took this long to do the engineering required to support them. But now, we are talking realistically about the $1,000 genome. Today it costs anywhere from $5,000 to $8,000 for a genome sequence, but we’re really heading to a point where a $1,000 genome is realistic. Now we’ve also got methods were we can look at just the interpretable part of the genome, the one percent that contains the genes. That cost today is now about $600, and we’re thinking we can get that down to under $100. So that’s very affordable. This is really key because when you think about general use, where a genome sequence becomes as accessible as an X-ray, that’s the order of cost that you need.
Q | With this technology coming faster and being less expensive than it has been in the past, what should we envision in the next five to ten years relative to genomic data?
A | I think it depends on how far you want to project, but in the five to ten year time frame, it’s almost certain that genome sequencing will be a routine part of your medical workup. That is, unless you have some personal objection. But from the medical point of view, there’s not rational reason to object. So because this is inexpensive and comprehensive, there are issues that can be discovered within that data that may be critical to you. It will be a standard of practice. That’s a reasonable prediction.
Q | How much of what you’re doing is starting to affect decisions on the therapeutic side?
A | Well if you’re asking about what we can do today, it really does depend on your definition of intervention. It is growing, but we have some way to go. For example, we can ask today, if you are a cancer patient and you have your tumor genome sequenced, how often would that data impact your care? That’s an important question. The answer ranges from five percent to 25 percent. In five percent of the cases, you can find an answer that is truly directive for your therapy. This is herceptin or other drugs that are targeted to specific genetic changes. There’s a clear direct link there between what you’ll find in the cancer profile versus what can be chosen for therapy. Now, to go from five percent to 20 percent, the definition of intervention needs to be ‘useful information.’ Then that includes a patient being steered towards a new trial or towards another drug that may not have been in the frontline before. Then there may be a subtle shift in the therapeutic regimen, so I think we’ve got ourselves in a bit of a corner if we paint the picture that this is truly directive for all patients at this time. If you are in the group that is helped, this
is already very important. In the next five to ten years, more and more will be in that category.
Q | Can you tell us about the team you have built within the Human Genome Sequencing Center?
A | We have an accumulation of expertise. If you look back a decade, DNA sequencing was an art form. Now, even though many aspects of the process are ‘plug and play,’ the process is still challenging. There are parts of the genome that are very hard to understand and physically difficult to decipher. And it turns out that there’s actually enrichment of those hard-to-decipher regions that cause disease, so the challenge is even harder than it might seem at first. So simply what you get from some of these new machines with the standard methods analysis often needs to be improved by human experts.
Next, there is the general question of data analysis. Even if you have high quality data, how do you interpret it? That’s a whole industry on it’s own. There are complex challenges including those that result from the databases of reference information not being perfect.
Q | If a patient came to you with a specific health issue of concern, what are the various techniques that could be employed?
A | So the corner stone of this whole branch of science and medicine is genetics and genetic determinism. When you ask ‘when is something genetic’ and ‘what is the evidence that something is genetic,’ the best thing to do is to look and see if it runs in families. When you have an aggregation of a disorder with very similar pathology within individuals in a family, the immediate logical jump is that it likely has a genetic component. Now in some cases, that genetic component is very easy to track. If the disorder is very clear and its pattern of segregation in the family is unambiguous, then that is a simple genetic problem often caused by a single gene, and a single letter change in the genome. You can go work with a family like that, track the gene for a disease successfully.
Other disorders just have a loose tendency to run in families. Examples are many of the behavioral disorders and cancers. We know there are genetic contributors and we know that understanding those will help us understand the mechanism of the disease. But it is a much more difficult problem to track down the genes for these common conditions. This is the big challenge in genetics and we are making good progress.
One of the things we discovered in the last five years is how much natural variation there is between each of us. We have known for a long time that if we sequence any new person, there would be about three million differences between them and the reference database. But we’ve learned more recently that even in the gene regions—the one-percent that really matters—there are hundreds of individual DNA changes that we will not likely see in anybody else on the planet. So we’re very different genetically from each other. Now that’s thinking just in terms of population, genetics and the structure of life. But it is also a big practical problem when you try to figure out these genetics stories.
For example, we can examine different families with multiple siblings, some of whom have heart disorders, to find genes that cause adult heart problems. But when we do that, there are going to be a lot of parts of the DNA sequence that are unique to individual families, but that are not related to the disorder. So we have an issue of scale. We can’t learn things from three families, but perhaps we can learn something from 3,000 families. So if we want to solve these genetic problems and we want to apply the genetics tools we have developed, we really need to work with more families.
There is a big sea change in the ability to scale these activities. Historically, research and clinical care have been two activities that are well separated. For example, if you told your clinician about a family history of heart disease, I’m sure that you’d get that acknowledged but it would not have led to you becoming a research subject. If you heard about a heart disease study you might volunteer as a research participant, but not through your regular physician. Previously, the research investigator would have looked at enough families who volunteered in order to make a discovery about a particular gene. Then the researcher would declare that gene important in these families, and perhaps develop a test for and then tell the physician ‘You should test for this gene in families like that.’ That’s the current cycle that we have.
What we envision in the future when the act of sequencing entire genomes is more routine for all sorts of reasons, is that we can generate the data to complete that kind of research activity in a much faster and more direct way. So it would be as simple as an investigator could come to the clinical databases that have DNA sequence information, and with the right IRB approvals, and with consent, they could mine that data and say ‘Hey look at this. In all the families with that history of cardiac problems we find these genes have these mutations every time.’ Hence, the discovery would be catalyzed by better collection of clinical data together with DNA sequence. It will be a faster and more efficient cycle.
Q | You’ve played a really key role in leading the new TMC Genomics Institute design team. What would be your vision for a world-class clinical genomics program?
A | Consider some of the things I said earlier about how genomics has changed research. One of the changes is the definition of deliverables in research. The genome project showed that quantifiable deliverables can be a part of a dynamic and active and flexible research program. So in the case of the human genome, we said ‘We’re going to determine this many letters’ and we then did that. This basic concept has become a quality of many different projects in biology. Now we can say that with new patients in the medical center, we quantitate the number that we can advantage by providing their genome sequence data, in their medical record. We can be specific and quantitative about the classes of patients, the needs of different groups of patients and the speed with which we can deliver genomic data to them as an adjunct to their current care.
Q | So for families in the future, when sequencing may be as common as getting an X-ray, where do we begin to set that new standard of care?
A | So it’s all about risk and familiarization. I think the clearest place to start is in reproductive health and in the administration of carrier testing for couples planning to have children. Clearly you want to be aware of the possibility of some inborn error that is inherited. That’s the easiest scenario to conceptualize. But the impact of these technologies actually goes on from there. For example, you can monitor the fetal genome early in development and that can impact care as well. There are really three main fronts in this area. One is these very early predictors to improve child health. The second one is in cancer prediction. Of course, many of us carry cancer predisposition genes and the story of the breast cancer gene is the most dramatic because you have the clear impact of the locus of the gene and the clear clinical follow up that you need if you are at high risk. But there are many other genes that impact cancer and should be considered. The third category of risk is where we’re lagging behind the most. You are an essentially healthy adult, you’ve had all of your cancer screens and you don’t have cancer in your family, why would you want your genome sequenced? What we tell healthy people right now, with the current state of knowledge, is that your need to have your genome sequenced today is minimal.
But the fact is that there are many adults who are not healthy—or who have family risk they have not considered—or soon will have disease.
We can’t sequence everybody today. Even at $100 a person it’s still too much of a burden. But as we see these methods grow to a greater efficiency and an even greater interpretability, and as we build programs that are based upon new families and upon people with pre-malignancy, we are going to see the numbers increase.
Q | Tell me about your vision for a TMC Genomics Institute.
A | I think that the unique nature of this opportunity is a reflection of the wonders of the TMC. When we talk about the mission of enhancing discovery to drive better health care, then we’re talking about programs that require scale and integration of very different kinds of data and effort. In a sense, this new opportunity is a kind of reflex from the specialization that we see in the current institutional structures. We really want to integrate the range of data that come from newborns to adults who have cancer. That’s the whole range there. Right now, if you’re in one category or the other, you basically go to different institutions. This is a chance to network and integrate the data and really synergize the information to create new discoveries. These new discoveries will improve care.
Q | Do you see that exist anywhere in the world today?
A | No. I think that is what keeps many of us here in Houston and so excited everyday. When you look out the window of our buildings, you see the vast, rich opportunities here. I don’t think there is anything quite like the TMC elsewhere on the planet. Certainly not from what I’ve seen. You see elsewhere the obstacles to achieving this kind of integration and comprehensive amalgamation of data.
Q | You were recently awarded the Companion of the Order of Australia, a prestigious honor issued by the Australian government. What did it mean to you?
A | It’s very nice to get awards. It’s a reminder of the special opportunities that you have—to be part of the human genome project, for example. To be here and to be part of the teams that work here. It is an amazing community. It’s a synergistic community and when you look around the rest of the world at different places and you see the failures of others to interact, you realize that we’re doing pretty well here. We really are. So to be part of the TMC, and be part of the process, is a privilege. To have your life’s work do something that is enriching and improving other people’s lives, that’s just a really special opportunity. The accolades are nice, but that’s nothing compared to that. But I do get to go to Australia and have all my family with me. We’ll have a nice dinner at the governor’s house. That’s going to be fun.
Q | Tell us about some of the mentors who have impacted your career.
A | Well, the medical center here has a tremendous asset and legacy in doctor Tom Caskey. He virtually invented the field of human molecular medical genetics. He was an early force in the conceptualization of the human genome project. And, of course, Dr. Michael DeBakey. I only had a small number of interactions with him, and he was a no-nonsense person. His emphasis on excellence is so important. If you do something, do it totally! That very basic concept is very powerful and should influence any investigator or clinician. So those two mentors were pivotal.
Q | You’re young in your career, but have already accomplished so much. What are you most proud of ?
A | Interesting question! I think maintaining the standard of intellectual excellence in genomics and being a proponent of that. For a long time, and even now to some extent, there’s been a division between what’s regarded as experimental science—which is supposedly more intellectual—versus technology-driven science. So the notion was that somehow if you’re doing something with demanding technology, then that means you are not bringing intellectual wisdom to your work. That is simply wrong. I always tell students that their colleagues initially called the great evolutionary biologists of our time ’bug collectors.’ So I think to be able to drive these technologies and the idea of high throughput biology and computational approaches to understanding biology into the practice of biological research is an achievement. There are individual inventions along the way, but this general achievement is probably the most important.