It has been almost half a century… since we started drilling the concept of “central dogma” (which is DNA->RNA->protein in some sense equals life) into the psyche of the scientific community and human population as a whole. The idea was that everything which makes us human, or a chimp a chimp, is encoded in As, Gs, Cs and Ts, efficiently packaged into the nuclei of every cell. Every cell, it went, has the capacity to reproduce the complete organism. What seemed to be missing in our daily conversations (or conveniently omitted) was how is it that the cells in our body have such different cellular fates, if they start with the same information which they hang on to for the entirety of their lifespan. The answer came, miraculously enough, from the Jacob and Monod and their work on lac operon in E. coli: it is not the book, but how it is read that defines the fate of every cell. Which parts of this genomic library is transcribed (into RNA) and expressed (via the protein products) is ultimately decided by the “regulatory” agents toiling away in the cell. These regulatory agents come in many forms, the first generation were themselves proteins (first repressors and then enhancers). Then came micro-RNAs, small RNA molecules that can locate specific target sequences on RNA molecules and affect their expression (for example through changing the life-span of an RNA molecule). And now, we have identified an arsenal of these regulatory mechanisms: chromatin structure (how DNA is packaged and marked affects its accessibility), transcription factors, miRNAs, long non-coding RNAs and… In the end of the day, it seems that the complexity of an organism largely stems from the diversity and complexity of these regulatory agents rather than the number of protein-coding genes in the genome. It’s like chemistry: the elements are there but what you do with them and how you mix them in what proportions gives you a functional and miraculous product.
The “Human genome project” was the product of the classic “central dogma” oriented view-point. Don’t get me wrong… this was a vital project and what we know now largely depended on it; however, this project was initially sold as the ultimate experiment. If we read the totality of the human DNA, the reasoning went, we’ll know EVERYTHING about humans and what makes them tick. But obviously, that wasn’t the case. We realized that it is not the DNA but the regulatory networks and interactions that matter (hence the birth and explosion of the whole genomics field).
The ENCODE project
The ENCODE project was born from this more modern and regulation-centric view of genomics. And the recent nature issue has published a dozen papers from ENCODE along with accompanying papers in other journals. This was truly an accomplishment for science this year, rivaled only by the discovery of Higgs boson (if it is in fact Higgs boson) and the Curiosity landing on Mars. At the core, what they have done in this massive project is simple: let’s throw whatever we have in terms of methods for mapping regulatory interactions at the problem. From DNAse I footprints to chromatin structure and methylation. And what they report as their MAIN big finding is the claim that there are in fact no junk DNA in the genome, since for 80% of the genomic DNA they find at least one regulatory interaction, which they claim as “functional”.
As I said, this was a great project and will be a very good resource for our community for many years to come. But there are some issues that I want to raise here:
- I think we’re over-hyping this. Not every observed interaction means “functionality”. We already know from ChIP-seq datasets that for example, transcription factors bind to regions other than their direct targets. Some of these sites are in fact neutral and their interactions may very well be a biochemical accident. Now one might claim that if the number of transcription factors is limited, these non-functional sites may show some functionality through competing with actual sites to decrease the effective concentration of the transcription factor in vivo.
- The take-home message from the ENCODE project seems to be debunking the existence of “junk-DNA”. But to be honest, not many of us thought the genome had significant amount of junk anyways. I am sure that ENCODE provided us with a great resource, but pointing to this as its major achievement does not seem logical. To be honest, I think a resource project like this doesn’t really have an immediate obvious ground breaking discovery; however, the policy makers want to see something when they fund these types of projects… and this is one way of giving it to them.
- Funding is another issue here. This was a very expensive endeavor (200 million dollars, was it?). Now I am all for spending as much money on science as possible; however, this is not happening and funding in biosciences seems to be tight nowadays. We can legitimately ask if this amount of money may have been better spent on 200 projects in different labs as opposed to one big project. A project, let me remind you, that would have been significantly cheaper to do in near future due to the plummeting sequencing costs. I’m not saying ENCODE was a waste of money, I just think we’re at a point that things like this should be debated across the community.
Nevertheless, the ENCODE consortium should be commended on performing one of the most well-coordinated projects in the history of biosciences with astounding quality. I think compared to the human genome project, this was a definite success. I have never seen the community this amped up, with everyone poring through the gorgeous interactive results, going over their favorite genes and making noise on twitter. This is a proud moment to be a biologist… I think we have officially entered the post-“central dogma” age of biology.