Genomics euphoria: ramblings of a scientist on genetics, genomics and the meaning of life

Monthly Archives: September 2012

The rise of “entertainment-science”

I wanted to write about the very interesting back to back Cell papers of c-myc (this and this). But at the last second, I changed my mind. I want to write, instead, about what people have been writing about all week… that is the paper by French scientist attacking the safety of genetically modified food. Genetically modified organisms are crops and animals that have been genetically engineered to improve their efficiency and productivity, from pest-resistance to higher milk productions. They have been around for sometime now and despite countless studies, there is very little evidence threatening the safety of the whole category…

The controversy, however, is alive and kicking and the Seralini et al paper drops the hammer on the safety of GM maize. I don’t want to talk about the short-comings of this paper, you can get that from other sources (e.g. here and here). Instead, I want to talk about a broader phenomenon, the rise of entertainment value in controversial science:

  1. I am sure there are many world-class scientists who focus on GM-food safety. But I don’t know any of them, do I? Instead, I and many like me (who read science news) know Seralini, mainly because his papers are always controversial and go against the consensus of the field. There is nothing inherently wrong with that… but I think amazing claims require amazing accompanying evidence. It is OK when papers show up that go against the norm… either the whole field is wrong (which sometimes happens) or there is something wrong with that study (which happens quite often, case in point: faster than light neutrinos). But what I think is weird, is that these studies get WAY more publicity than they should. And in the end of the day, it makes us scientists look bad. People hear about all these extraordinary findings that are then debunked in a couple of months… no wonder we have problems with the perception of science in public.
  2. From linking disease to vaccination to discovering a new Arsenic-based form of life, we have let science down again and again… not by making mistakes. Being wrong is fine, it is the first step towards getting it right, but rather through pushing our results into the spotlight. Holding press conferences, enforcing gag orders on collaborators and pulling off all the PR stops. I assume, soon we’ll see trailers of upcoming publications on TV (“…starring POLR2A as RNA polymerase II”) with entertaining twists of course.

The majority of science is still working the way it should, but that is not the part we read about in science news sections. I want to think scientists are better than this PR stuff… We were supposed to be skeptics, we were supposed to be above this… we were supposed to be living in our ivory towers. We shouldn’t care about the results of our experiments (positive or negative), we are supposed to be searching for truth… or so I thought.

Synthesizing a genome like a boss

Despite recent leaps in artificial synthesis of long custom DNA sequences and the hype surrounding similar large-scale projects (e.g. the first synthetic genome), the process is far from mainstream. Synthesis of genes or even minigenes is still expensive, slow and tedious. As a lazy scientists who despises cloning and prefers to synthesize the whole construct at the touch of a button (or a click of a mouse), I am all for methods that advance this area of biotech. So, I am very excited about this paper that recently showed up in Nature methods (and I think it’s pretty smart).

The rationale here is based on the fact that short-DNA (i.e. oligonucleotide) synthesis, which forms the basis for longer molecules, is still very error prone and finding a molecule without mutations requires the direct sequencing of many instances from the final product (only a small fraction of the final products are mutation-free). Now, here, what they have accomplished is that they have successfully shifted the sequencing step to the oligo level. Basically, they tag all the oligos to be synthesized with random barcodes. They sequence the whole population using high-throughput sequencing, identify the correct oligos and use their barcodes to enrich them from the initial population using specific primers and PCR.

I assume companies that synthesize custom DNA can take advantage of a sequencing run to find the correct oligos for multiple independent orders, thus significantly reducing the cost of sequencing. However, high-throughput sequencing is still slow. So, I assume this method doesn’t significantly cut the time requirement of the production phase… but I’m not too familiar with the industrial pipeline, and maybe it does help. I think we’ll know soon enough.

Of horses and men: the genetic makeup of racehorses

Horse locomotion and speed is one of the most complex behaviors that people seem to be interested in (for obvious reasons). There is some correlation between how a horse runs and how fast it runs. In other words, it seems that there are successful styles of running and these styles can be treated as phenotypes (or traits) and effectively studied through genetics. In general, genetic studies of dogs or horses have been significantly more successful than those of human, partly due to very controlled mating across the breeds and also excellent record keeping by the breeders throughout many generations. Now there is a paper out in Nature that looks at pacing in icelandic horses, which has a high heritability in this breed, and successfully maps this phenotype to a nonsense mutation in Dmr3.

The study contains an association study between 30 horses that don’t pace and 40 that do which resulted in the discovery of a highly significant SNP (single-nucleotide polymorphism) on chromosome 23. Genome re-sequencing in this region showed a nonsense mutation in Dmr3 as the likely candidate.

What distinguishes this study from similar ones I had read over the years is the fact that they closely follow up on the functionality of Dmr3 and its mutated form. They make the case that this protein functions in neural development using mouse models. And this is the part that gets me excited… this study sets a bar for genetic projects. It’s not enough to just list mutations in a bunch of genes along with their contribution to the phenotype. We need more mechanistic and functional results that can actually augment our knowledge to a degree that a simple gene/mutation list cannot. I am sure this is not a perfect project either and if we look closely there are things that could be done differently/better. Nevertheless, it signals the arrival of a new kind of genetic studies, one that is more function oriented.

The role of regulatory genome in human disease

A recent paper in Science perfectly captures the post-ENCODE mood of the community. It seems like we suddenly realized the coding genome is not actually that important. We have remarked over and over in the past couple of weeks that the majority of whole-genome association studies (GWAS) actually map to non-coding DNA as opposed to coding sequences. And now, armed with the knowledge that the non-coding genome has far-reaching regulatory consequences, it is very likely that the genetic component of many complex human diseases are in fact driven by regulatory interactions. And this Science paper very clearly portrays this idea. They use DNAse I hypersensitivity data as a proxy for the parts of the genome are bound by proteins in vivo. They then look at the overlap between DNase I hypersensitive sites (DHSs) and the available phenotypic and disease data. They show that many variants at these sites have regulatory consequences and they make the case that the role of regulatory genome in disease is ubiquitous and profound.

As I said, this study very well captures the consensus view-point of the community and I think we’ll see an explosion in these type of studies that would put the regulatory genome front and center as opposed to the coding DNA. With the low-hanging fruits already discovered in genetic studies, and the emergence of effective methods based on high-throughput sequencing, we are now poised to better understand regulatory networks in all their glory.

Decoding the ENCODed DNA: You get a function, YOU get a function, EVERYBODY gets A function

It has been almost half a century… since we started drilling the concept of “central dogma” (which is DNA->RNA->protein in some sense equals life) into the psyche of the scientific community and human population as a whole. The idea was that everything which makes us human, or a chimp a chimp, is encoded in As, Gs, Cs and Ts, efficiently packaged into the nuclei of every cell. Every cell, it went, has the capacity to reproduce the complete organism. What seemed to be missing in our daily conversations (or conveniently omitted) was how is it that the cells in our body have such different cellular fates, if they start with the same information which they hang on to for the entirety of their lifespan. The answer came, miraculously enough, from the Jacob and Monod and their work on lac operon in E. coli: it is not the book, but how it is read that defines the fate of every cell. Which parts of this genomic library is transcribed (into RNA) and expressed (via the protein products) is ultimately decided by the “regulatory” agents toiling away in the cell. These regulatory agents come in many forms, the first generation were themselves proteins (first repressors and then enhancers). Then came micro-RNAs, small RNA molecules that can locate specific target sequences on RNA molecules and affect their expression (for example through changing the life-span of an RNA molecule). And now, we have identified an arsenal of these regulatory mechanisms: chromatin structure (how DNA is packaged and marked affects its accessibility), transcription factors, miRNAs, long non-coding RNAs and… In the end of the day, it seems that the complexity of an organism largely stems from the diversity and complexity of these regulatory agents rather than the number of protein-coding genes in the genome. It’s like chemistry: the elements are there but what you do with them and how you mix them in what proportions gives you a functional and miraculous product.

Genome Project

The “Human genome project” was the product of the classic “central dogma” oriented view-point. Don’t get me wrong… this was a vital project and what we know now largely depended on it; however, this project was initially sold as the ultimate experiment. If we read the totality of the human DNA, the reasoning went, we’ll know EVERYTHING about humans and what makes them tick. But obviously, that wasn’t the case. We realized that it is not the DNA but the regulatory networks and interactions that matter (hence the birth and explosion of the whole genomics field).

The ENCODE project


The ENCODE project was born from this more modern and regulation-centric view of genomics. And the recent nature issue has published a dozen papers from ENCODE along with accompanying papers in other journals. This was truly an accomplishment for science this year, rivaled only by the discovery of Higgs boson (if it is in fact Higgs boson) and the Curiosity landing on Mars. At the core, what they have done in this massive project is simple: let’s throw whatever we have in terms of methods for mapping regulatory interactions at the problem. From DNAse I footprints to chromatin structure and methylation. And what they report as their MAIN big finding is the claim that there are in fact no junk DNA in the genome, since for 80% of the genomic DNA they find at least one regulatory interaction, which they claim as “functional”.

As I said, this was a great project and will be a very good resource for our community for many years to come. But there are some issues that I want to raise here:

  1. I think we’re over-hyping this. Not every observed interaction means “functionality”. We already know from ChIP-seq datasets that for example, transcription factors bind to regions other than their direct targets. Some of these sites are in fact neutral and their interactions may very well be a biochemical accident. Now one might claim that if the number of transcription factors is limited, these non-functional sites may show some functionality through competing with actual sites to decrease the effective concentration of the transcription factor in vivo.
  2. The take-home message from the ENCODE project seems to be debunking the existence of “junk-DNA”. But to be honest, not many of us thought the genome had significant amount of junk anyways. I am sure that ENCODE provided us with a great resource, but pointing to this as its major achievement does not seem logical. To be honest, I think a resource project like this doesn’t really have an immediate obvious ground breaking discovery; however, the policy makers want to see something when they fund these types of projects… and this is one way of giving it to them.
  3. Funding is another issue here. This was a very expensive endeavor (200 million dollars, was it?). Now I am all for spending as much money on science as possible; however, this is not happening and funding in biosciences seems to be tight nowadays. We can legitimately ask if this amount of money may have been better spent on 200 projects in different labs as opposed to one big project. A project, let me remind you, that would have been significantly cheaper to do in near future due to the plummeting sequencing costs. I’m not saying ENCODE was a waste of money, I just think we’re at a point that things like this should be debated across the community.

Nevertheless, the ENCODE consortium should be commended on performing one of the most well-coordinated projects in the history of biosciences with astounding quality. I think compared to the human genome project, this was a definite success. I have never seen the community this amped up, with everyone poring through the gorgeous interactive results, going over their favorite genes and making noise on twitter. This is a proud moment to be a biologist… I think we have officially entered the post-“central dogma” age of biology.

Living an “organic” life: debunking the supremacy of the organic produce

A couple of years ago, I took a course titled “The use of science in public policy” taught by Prof. Lee Silver at Princeton University. The goal of the course was to bring basic science and policy students together and expose them to the challenges generally faced by each group. The take home message for me, as a scientist, was the paucity of black and white issues and how complex even mundane policies become when they’re applied to the entirety of a society. However, what shocked me the most was how difficult it is to bring the scientific world-view, which is some times counter-intuitive, into policy making. And this is nowhere more obvious than issues like “genetically modified food”, “homeopathic medicine” or even “organic food”. Organic produce, which have been supposedly grown chemical-free, is branded as “natural food” and has formed one of the most successful industries in the US with a growth of about 10,000% over 10 years. What distinguishes the organic food from conventional products however is for the most part in its sticker price. But the consumers are buying organic products in droves, assuming that its health benefits far outweighs the cost. There have been studies looking at this claim but a recently published study in Annals of Internal Medicine does a good job of bringing together all the available data over many years to perform an effective meta-analysis of this subject.


I think the main challenge in discussing organic produce is the laxity of standards in its definition. From what I understand, every farm has its own way of defining organic food. And without a national standard, it is more difficult (although not impossible) to study the health benefits of this category. Nevertheless, organic farms have been very successful in convincing the consumers about this matter, so much so that other industries are taking notes. Case in point, the rise of the “organic laundry” and “organic detergents” that (spoiler alert) has nothing to do with “organic” in the sense we use in our everyday lives.

In general, the industries seem to be very good at persuading the public about health benefits of food (e.g. brand names like Vitamin water). Also, as humans, it seems it is difficult for us to take into account non-linear relationships; for example, if a little bit Vitamin E is essential for your health, adding as much as you can to your diet should be even more beneficial. Teaching the public that there isn’t a linear relationship between health impact and intake seems to be a daunting task. Labels that connotate “nature” are very potent, which for me is very counter-intuitive. Nature is full of dangerous toxins… not every natural product is beneficial.

Another important challenge has to do with how well we can disentangle “organic food” consumption from other aspects of life. We can assume that people who care enough to pay 2-3 times more for organic produce just because they think it’s healthier, are also more likely to go for their annual checkups, exercise and in general be concerned about their health. Statistically speaking, it would be very difficult to correct for these covariates without doing a true double-blind experiment. Double-blind experiments need sponsors, which leads me to ask whether the government may in fact have a role to play in this matter. At this point, FDA doesn’t regulate anything that is “natural” which puts organic food outside of its jurisdiction.

Given these challenges, it is rather obvious why we needed several decades worth of data to be able to perform a decent meta-analysis. And to be honest, I still think this study could be much better and more distant from academic hype.


Despite these challenges, these researchers, I think, have done a decent job of analyzing the data. They find very little evidence in support of organic produce. Sure, they see marginally higher pesticide levels in conventional produce, but the levels are far lower than the risky threshold. Also, organic farms do use pesticides, but use natural ones instead of chemical ones and we don’t know if natural pesticides are in any way safer than chemical ones. More importantly, because natural pesticides are less potent, higher quantities needs to be used (adding more stuff to the soil and environment).

There isn’t enough data to fully debunk the perceived value of organic food, but at this point, I’m pretty sure it’s not worth the significantly higher sticker price.

Reading an Ancient Genome

Recently, a paper appeared in Science magazine describing a multinational effort to sequence the genome of an archaic individual (an 80,000-year-old Denisovan girl). It actually created a fait bit of hype with news snippets abound (e.g. this one) and a nice wired blog post. Much of the hype, I think, was warranted and this study offers a blueprint for how studies are shaped in the age of genomics and whole-genome sequencing. I will first talk about why I think this study tackles an important problem and then move on to the methodology and results.

Following the genetic trail: the tale of the third chimpanzee

Looking back, I think my introduction to human evolution was mainly through an outstanding book written by Jared Diamond, called “The third chimpanzee“. A lot has changed since then (although, I think the book is still very relevant and a very good read). Many more fossils have been discovered around the world, from Lucy (Australopithecus afarensis) dated to about 3 million years ago to Homo heidelbergensis and Homo erectus specimens dated to less half a million years. These at times partial fossils tell a convincing, albeit incomplete, story of human evolution. However, it was the “Neanderthal Genome Project”, reporting the whole-genome sequence of a 38,000 year old sample from the femur of a Neanderthal specimen, that turned a page on studying the genetics of human evolution. DNA is a vast collection of information and comparing these collections between different species portrays a more vivid picture of their evolutionary trajectories with outstanding details. This information goes significantly beyond what we can learn based on the shape of the fossils and their dates and the circumstances of their finding. It is like finding the black box of a fallen plane: rife with key information that truly shapes our understanding of the events. For example, the Neanderthal Genome Project showed that there had been very little if any interbreeding between humans and Neanderthals (1-4%) with insignificant effects on the evolutionary trajectory of the human genome. Our new-found ability to look into the DNA information has enabled us to reconstruct the evolutionary trajectories with unprecedented resolution. Why is this important? Genetically speaking, I think it is based on where we came from that we can learn where we are headed as a species. And we owe this knowledge to recent advances in high-throughput sequencing.


Sequencing old DNA

DNA is one of the key building blocks of life and its use as genetic material stems, in part, from its surprising stability. Nevertheless, DNA is susceptible to erosion and degradation. This degradation results in very poor DNA quality when extraction is attempted on fossils. Another important point is that conventional methods for preparing samples for high-throughput sequencing relies on double stranded DNA, while DNA degradation results in single stranded DNA becoming a significant portion of the population in fossils. Relying on double-stranded DNA methods not only loses this sizable fraction of DNA, but also results in an enrichment of exogenous contaminant DNA from bacteria or even humans. For example, in the Neanderthal genome project, a significant correlation was observed between the length of the fragment and its similarity to modern humans implying that large fragments (which come from higher quality DNA) were in fact from contaminants. This issue is the exact problem that this study has tackled. They have developed a sequencing strategy that involves single stranded DNA rather than double stranded ones. This method would better capture the degraded samples and it is due to this enhancement that they had actually succeeded in producing a rather high-quality sequence of the ancient fossil. They achieved more than 20-fold coverage of the genome on average, meaning that each position in the genome was read 20 times independently which significantly increases the accuracy of the sequence. In comparison the Neanderthal project scored a 1.5-fold coverage of the genome. This surprising jump in quality is a testament to the effectiveness of their proposed method in sequencing fossilized DNA.

Why does it matter?

This level of coverage and accuracy in sequence enables us to make key inferences both about the individual and the population where she came from. While the fossil was really only a partial bone from a finger and a couple of teeth, the researchers have determined the color of her eyes, hair and skin. But more importantly, with such accuracy, we can tell apart the maternal chromosomes (those coming from the mother) from paternal ones (those coming from the father). What can we do with this information? For starters we can determine whether the parents where close relatives or not (which in this case they weren’t). However, while the parents were not closely related, their genome shows portion of significant similarity. This observation implies that the population in which this girl was living, showed very low genetic diversity. This can be due to the population size. Small populations results in an effect called “bottlenecking” in which few individuals shape the markup of the whole population resulting in very low diversity. Another important finding is that Denisovans (which this girl is a member of) split from modern humans around 180,000 years ago.

What else?

We can gain important insights by comparing this ancient genome to those of modern humans in terms of differences. Looking at the two genomes side-by-side, these researchers observe that a significant fraction of these changes affect brain development and function. While this may not be a surprising observation (since we already associate modern humans with higher brain size and function), it underscores the potential role of brain morphology and development in the evolution of our species.

The plethora of information and knowledge that we gain from a single sequenced genome is outstanding. While we should be cautious of generalizing our findings based on a single individual to whole populations across Asia or even the world, there are in fact no other sources where this knowledge can be found. Similar studies can portray a detailed picture of our genomes and its evolution through ages. A low hanging fruit is to use this novel sequencing methodology to sequence other available samples (including the Neanderthal ones).

The best book of all time

What is the best book of all time? That seems like a stupid question. It’s vague, subjective and quite meaningless. However, before I get replies like “Harry Potter”, “Twilight” or “Fifty shades of grey”, let me provide some context for this question. As this is a scientific blog, first and foremost, the answer should be a scientific book… and no, I am not talking about “On the origin of species”. I am looking for something grander, something without which Darwin’s publication was not even possible. The answer to this question, I think resoundingly, is Novum Organum by Francis Bacon, the 17th century philosopher scientist. Why is that? Well, I don’t know… may be because it formulated scientific method in its most basic form. For centuries, thinkers, authors and philosophers were expanding upon the ideas put forth by Aristotle and Plato. However, Francis Bacon made the case that, for the most part, these were mostly just ideas (what we now call hypotheses). He advocated a new setting in which philosophers could rise above these ideas and formulate new ones. However, he also clearly indicated that the validity of these ideas need to be tested through a rigorous “method” involving data collection, its interpretation, and even designing new experiments.

Novum Organum

Now why do I think Novum Organum is the best scientific book of all time? Because it made a case for scientific method, before it was cool. And most important of all, he didn’t only talk the talk… he actually walked the walk. He contracted pneumonia while studying the effects of freezing on preservation. While I don’t condone working oneself to literal death, we should realize that the scientific process owes an undeniable debt of gratitude to giants who devoted their lives to science. Novum Organum is available on Amazon for $1 (the Kindle edition that is), I don’t have to tell you that it is worth every penny. I don’t believe in the “Great man theory“. I am not saying that had he not published this body of work, we would still follow an Aristotelian method. If Edmund Hillary hadn’t climbed Everest, someone else would have. But this fact does not make his trial more trivial and his adventure less dangerous. The same holds for Francis Bacon and his seminal work “Novum Organum”.

Francis Bacon

Francis Bacon


Genophoria is a portmanteau of “genomics” and “euphoria” and no, it’s not the opposite of genophobia (which is fear of sex). However, I wish I could say that the pun was not intended. Now, what is my mission? Simply put, my goal is to make my field of study more accessible to the masses. Why is it that what we do is done the way we do it? What are the day to day consequences or is there an overarching story behind a study? I do realize that these posts may just be me rambling about science, the scientific method and common sense. But where is the harm i that? I do believe that scientists and academics need to leave their ivory towers and do a much better job of informing the masses. There was a time that science was “cool”. After World War II, you could even imagine a scientist as president. But not anymore… those days are gone. As researchers shrugged off their newfound glory and retreated into the comfort of their academic bubble, we lost a golden opportunity. We lost our chance to be part of the policy making process and bring the rigorous scientific tools at our disposal into the realm of politics. Now, 50 years on… we have all but lost. Now, geeks are just that, GEEKS.

However, I refuse to believe that we cannot change this. I think that we can still win the heart and minds of the masses and show them that science is not just an everlasting endeavor in pursuit of knowledge, but rather it is so much more than that. It’s a way of life, it’s a worldview, it’s a way of doing things. And everyone should take part in it… everyone should rejoice in a new discovery not but watching it happen on TV but by being part of it. I might not possess the knowledge to send a rover to Mars, but I can be part of the culture that enabled it to happen. A culture that does value curiosity  adventure and the power of knowledge.