Genomics euphoria: ramblings of a scientist on genetics, genomics and the meaning of life

Tag Archives: genomics

Letters from the trenches of war on cancer (Part I)

As I get older, cancer surpasses a scientific curiosity and morphs itself into a harsher reality. As our parents start to get worried about every mole and lump, we also accompany them through the ensuing emotional roller coaster. Working close to a hospital is not helping either… while the tumor samples you see every day are assigned random numbers, it is quite impossible not to see the human suffering behind every biopsy. While I still firmly and deeply believe in the fact that ultimately it is the basic research that can revolutionize health and medicine, I can also sense the urgency of now and the need to act on that front. It is this dichotomy that has shaped my research for the past few years, the fruits of which are finding their way into the annals of science.

It is not news to anyone that I study the biology and regulation of RNA (see the two previous posts on this very blog: here and here). I have specifically focused on developing computational and experimental frameworks that help reveal the identity of post-transcriptional regulatory programs and their underlying molecular mechanisms. Towards the end of my tenure as a graduate student, building upon the work by talented postdocs in the Tavaozie lab at Princeton University (namely Olivier  and Noam who published their work back in 2008) and with the help of my genius friend Hamed, we developed, benchmarked and validated a computational method named TEISER that extends motif-finding algorithms into the world of RNA by taking into account the local secondary structure of RNA molecules as well as their sequence.

When I started out as a postdoc, my goal was to study post-transcriptional regulation using cancer metastasis as a model. In addition to its clinical impact, studying metastasis also has the added benefit of access to a large compendium of high-quality datasets as well as rigorous in vivo and in vitro models for downstream validation of interesting findings.

When it comes to tumorigenesis in general, there is a large body of work focusing on the role of transcriptional regulation, specifically  transcription factors as suppressors and promoters of oncogenesis. However, other aspects of RNA life-cycle are substantially understudied. The success of our lab and many others in revealing novel and uncharacterized regulatory networks based on the action of various miRNA in driving or suppressing metastasis highlights the possibility that heretofore uncharacterized post-transcriptional regulatory programs may play instrumental roles in tumorigenesis.

Given the success of miRNA regulation and my previous work on RNA stability, performing differential transcript stability measurements between highly metastatic cells relative to their poorly metastatic parental populations seemed like a logical step. Using thiouridin pulse-chase labeling and capture followed by high-throughput RNA-seq, we estimated decay rates for every detectable transcript (~13000 transcripts total). It was around this dataset that we built an ambitious study, pushing ourselves to dig deeper at every step. We generated, analyzed, and interpreted heaps of data of various kinds: in silico, in vitro, and in vivo. The results of this study was the discovery of a novel post-transcriptional regulatory program that promotes breast cancer metastasis. Our results were recently published in Nature, however, I also gained insights that could not be included in a 4-page paper. As such, in the upcoming posts, I’ll try and expand on various aspects of this study that I found fascinating. Stay tuned…

The rise of circular RNAs: a whole swath of circular Spongebobs

Recently, we’ve been bombarded by high-profile studies about a class of RNAs, called circular RNAs. Resulting from non-canonical splicing events (see below), circRNAs seem to be more prevalent than previously thought. They’re identified in mammals, plants and even archaea.

The formation and identification of circRNAs

The formation and identification of circRNAs

The recent papers in Nature (Memczak et al. and Hansem et al.) argue for a broad, even tissue specific, functionality for these type of RNAs. Memczak et al. report a comprehensive atlas of thousands of circRNAs in various organisms through a computational approach, to which they assign an impressive 75% sensitivity and very low false-discovery rate.

circRNA statistics according to Memczak et al.

circRNA statistics according to Memczak et al.

The significantly high stability of these RNAs, according to these authors, puts them in perfect position to function as post-transcriptional regulators through sponging other regulatory trans factors. They focused on miRNA sites to find circRNAs that show higher than expected occurrence of these elements. Ant they in fact find circRNAs that can bind and trap miR-7 loaded RISC, results that are corroborated in other recent papers.

Personally, I find sponging a very low-complexity function… meaning, they arise after the fact, with the cell taking advantage of non-coding RNAs that are already available. This means either that circRNAs first arose as aberrant splicing events, i.e. mistakes in donor-acceptor identification or either them or their splicing partners play other, more complex roles that we should be able to identify soon.


Memczak S, et al. (2013). Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. doi:10.1038/nature11928

Hansen TB, et al. (2013). Natural RNA circles function as efficient microRNA sponges. Nature. doi:10.1038/nature11993

Synthesizing a genome like a boss

Despite recent leaps in artificial synthesis of long custom DNA sequences and the hype surrounding similar large-scale projects (e.g. the first synthetic genome), the process is far from mainstream. Synthesis of genes or even minigenes is still expensive, slow and tedious. As a lazy scientists who despises cloning and prefers to synthesize the whole construct at the touch of a button (or a click of a mouse), I am all for methods that advance this area of biotech. So, I am very excited about this paper that recently showed up in Nature methods (and I think it’s pretty smart).

The rationale here is based on the fact that short-DNA (i.e. oligonucleotide) synthesis, which forms the basis for longer molecules, is still very error prone and finding a molecule without mutations requires the direct sequencing of many instances from the final product (only a small fraction of the final products are mutation-free). Now, here, what they have accomplished is that they have successfully shifted the sequencing step to the oligo level. Basically, they tag all the oligos to be synthesized with random barcodes. They sequence the whole population using high-throughput sequencing, identify the correct oligos and use their barcodes to enrich them from the initial population using specific primers and PCR.

I assume companies that synthesize custom DNA can take advantage of a sequencing run to find the correct oligos for multiple independent orders, thus significantly reducing the cost of sequencing. However, high-throughput sequencing is still slow. So, I assume this method doesn’t significantly cut the time requirement of the production phase… but I’m not too familiar with the industrial pipeline, and maybe it does help. I think we’ll know soon enough.

Decoding the ENCODed DNA: You get a function, YOU get a function, EVERYBODY gets A function

It has been almost half a century… since we started drilling the concept of “central dogma” (which is DNA->RNA->protein in some sense equals life) into the psyche of the scientific community and human population as a whole. The idea was that everything which makes us human, or a chimp a chimp, is encoded in As, Gs, Cs and Ts, efficiently packaged into the nuclei of every cell. Every cell, it went, has the capacity to reproduce the complete organism. What seemed to be missing in our daily conversations (or conveniently omitted) was how is it that the cells in our body have such different cellular fates, if they start with the same information which they hang on to for the entirety of their lifespan. The answer came, miraculously enough, from the Jacob and Monod and their work on lac operon in E. coli: it is not the book, but how it is read that defines the fate of every cell. Which parts of this genomic library is transcribed (into RNA) and expressed (via the protein products) is ultimately decided by the “regulatory” agents toiling away in the cell. These regulatory agents come in many forms, the first generation were themselves proteins (first repressors and then enhancers). Then came micro-RNAs, small RNA molecules that can locate specific target sequences on RNA molecules and affect their expression (for example through changing the life-span of an RNA molecule). And now, we have identified an arsenal of these regulatory mechanisms: chromatin structure (how DNA is packaged and marked affects its accessibility), transcription factors, miRNAs, long non-coding RNAs and… In the end of the day, it seems that the complexity of an organism largely stems from the diversity and complexity of these regulatory agents rather than the number of protein-coding genes in the genome. It’s like chemistry: the elements are there but what you do with them and how you mix them in what proportions gives you a functional and miraculous product.

Genome Project

The “Human genome project” was the product of the classic “central dogma” oriented view-point. Don’t get me wrong… this was a vital project and what we know now largely depended on it; however, this project was initially sold as the ultimate experiment. If we read the totality of the human DNA, the reasoning went, we’ll know EVERYTHING about humans and what makes them tick. But obviously, that wasn’t the case. We realized that it is not the DNA but the regulatory networks and interactions that matter (hence the birth and explosion of the whole genomics field).

The ENCODE project


The ENCODE project was born from this more modern and regulation-centric view of genomics. And the recent nature issue has published a dozen papers from ENCODE along with accompanying papers in other journals. This was truly an accomplishment for science this year, rivaled only by the discovery of Higgs boson (if it is in fact Higgs boson) and the Curiosity landing on Mars. At the core, what they have done in this massive project is simple: let’s throw whatever we have in terms of methods for mapping regulatory interactions at the problem. From DNAse I footprints to chromatin structure and methylation. And what they report as their MAIN big finding is the claim that there are in fact no junk DNA in the genome, since for 80% of the genomic DNA they find at least one regulatory interaction, which they claim as “functional”.

As I said, this was a great project and will be a very good resource for our community for many years to come. But there are some issues that I want to raise here:

  1. I think we’re over-hyping this. Not every observed interaction means “functionality”. We already know from ChIP-seq datasets that for example, transcription factors bind to regions other than their direct targets. Some of these sites are in fact neutral and their interactions may very well be a biochemical accident. Now one might claim that if the number of transcription factors is limited, these non-functional sites may show some functionality through competing with actual sites to decrease the effective concentration of the transcription factor in vivo.
  2. The take-home message from the ENCODE project seems to be debunking the existence of “junk-DNA”. But to be honest, not many of us thought the genome had significant amount of junk anyways. I am sure that ENCODE provided us with a great resource, but pointing to this as its major achievement does not seem logical. To be honest, I think a resource project like this doesn’t really have an immediate obvious ground breaking discovery; however, the policy makers want to see something when they fund these types of projects… and this is one way of giving it to them.
  3. Funding is another issue here. This was a very expensive endeavor (200 million dollars, was it?). Now I am all for spending as much money on science as possible; however, this is not happening and funding in biosciences seems to be tight nowadays. We can legitimately ask if this amount of money may have been better spent on 200 projects in different labs as opposed to one big project. A project, let me remind you, that would have been significantly cheaper to do in near future due to the plummeting sequencing costs. I’m not saying ENCODE was a waste of money, I just think we’re at a point that things like this should be debated across the community.

Nevertheless, the ENCODE consortium should be commended on performing one of the most well-coordinated projects in the history of biosciences with astounding quality. I think compared to the human genome project, this was a definite success. I have never seen the community this amped up, with everyone poring through the gorgeous interactive results, going over their favorite genes and making noise on twitter. This is a proud moment to be a biologist… I think we have officially entered the post-“central dogma” age of biology.