Genomics euphoria: ramblings of a scientist on genetics, genomics and the meaning of life

Category Archives: Post-transcriptional regulation

Functional impacts of RNA modifications

My good friend and all-around awesome scientist Claudio has been working on this very cool idea the fruits of which is now available to all. I had the pleasure of being on his team and I’m personally fascinated by the problem. The broad question that Claudio is tackling is “what to RNA modifications do in the cell?” In particular he’s been focused on m6A modification and its role in miRNA processing.

But the real story is actually more fascinating. For the longest time, Claudio was looking for the molecular mechanism through which the miRNA mir-126 is down-regulated in highly metastatic cells. A possible solution formed in the form of the gene METTL3 which methylated RNA. Knocking down METTL3 indeed reduced mir-126 levels. But while doing the necessary controls, Claudio noticed that the reduction was not limited to mir-126 but was actually a more global effect impacting a large fraction of miRNAs. It was from this initial observation that his grand hypothesis was formed: RNA methylation (m6A) has a direct role in miRNA biogenesis. And this is were I came in…. a quick look at miRNA sequences showed that m6A sites were located close to but not on primary miRNA sequences. This suggested that m6A markings could serve as a beacon for recruiting the miRNA processing machinery (specifically the dsRNA-binding protein DGCR8). Claudio then used a series of focused experiment to prove this hypothesis (as much as anything can be proven in science). You can read all about this in his very nice paper in Nature.

However, as is usually the case in science, solving one problem leads to even more questions. As I mentioned earlier, m6A sites are not directly recognized by DGCR8, so there was a missing link between RNA modification and the recruitment of processing machinery. To approach this problem, Claudio did the an IP-mass spec of m6A-modified RNA and found a very good candidate in the form of the ubiquitous RNA-binding protein HNRNPA2B1. What was especially important was that the RGAC motif targeted by METTL3 actually has similarities to HNRNPA2B1 binding sequence. In fact the RGAC motif is very much enriched among the binding sites of HNRNPA2B1. Now the question is whether methylating these sequences can impact HNRNPA2B1 binding. In other words, are there sites where modifying the A to m6A will increase affinity to HNRNPA2B1. In a series of experiments (both high- and low-throughput) we showed that this is in fact the case. In general we observed a broad functional entanglement between METTL3 and HNRNPA2B1. These results were recently published in Cell and I invite everyone to read this paper.

HNRNPA2B1 Is a Mediator of m6A-Dependent Nuclear RNA Processing Events

HNRNPA2B1 Is a Mediator of m6A-Dependent Nuclear RNA Processing Events

My thoughts? I think this is just the beginning. There are two points to consider: (i) HNRNPA2B1 is not the only reader of m6A and (ii) m6A is not the only RNA modification. Together, I think these studies and those of other groups on m6A (and RNA modification in general) suggests the birth of a new field of research with broad functional consequences on gene expression regulation.

Letters from the trenches of war on cancer (Part I)

As I get older, cancer surpasses a scientific curiosity and morphs itself into a harsher reality. As our parents start to get worried about every mole and lump, we also accompany them through the ensuing emotional roller coaster. Working close to a hospital is not helping either… while the tumor samples you see every day are assigned random numbers, it is quite impossible not to see the human suffering behind every biopsy. While I still firmly and deeply believe in the fact that ultimately it is the basic research that can revolutionize health and medicine, I can also sense the urgency of now and the need to act on that front. It is this dichotomy that has shaped my research for the past few years, the fruits of which are finding their way into the annals of science.

It is not news to anyone that I study the biology and regulation of RNA (see the two previous posts on this very blog: here and here). I have specifically focused on developing computational and experimental frameworks that help reveal the identity of post-transcriptional regulatory programs and their underlying molecular mechanisms. Towards the end of my tenure as a graduate student, building upon the work by talented postdocs in the Tavaozie lab at Princeton University (namely Olivier  and Noam who published their work back in 2008) and with the help of my genius friend Hamed, we developed, benchmarked and validated a computational method named TEISER that extends motif-finding algorithms into the world of RNA by taking into account the local secondary structure of RNA molecules as well as their sequence.

When I started out as a postdoc, my goal was to study post-transcriptional regulation using cancer metastasis as a model. In addition to its clinical impact, studying metastasis also has the added benefit of access to a large compendium of high-quality datasets as well as rigorous in vivo and in vitro models for downstream validation of interesting findings.

When it comes to tumorigenesis in general, there is a large body of work focusing on the role of transcriptional regulation, specifically  transcription factors as suppressors and promoters of oncogenesis. However, other aspects of RNA life-cycle are substantially understudied. The success of our lab and many others in revealing novel and uncharacterized regulatory networks based on the action of various miRNA in driving or suppressing metastasis highlights the possibility that heretofore uncharacterized post-transcriptional regulatory programs may play instrumental roles in tumorigenesis.

Given the success of miRNA regulation and my previous work on RNA stability, performing differential transcript stability measurements between highly metastatic cells relative to their poorly metastatic parental populations seemed like a logical step. Using thiouridin pulse-chase labeling and capture followed by high-throughput RNA-seq, we estimated decay rates for every detectable transcript (~13000 transcripts total). It was around this dataset that we built an ambitious study, pushing ourselves to dig deeper at every step. We generated, analyzed, and interpreted heaps of data of various kinds: in silico, in vitro, and in vivo. The results of this study was the discovery of a novel post-transcriptional regulatory program that promotes breast cancer metastasis. Our results were recently published in Nature, however, I also gained insights that could not be included in a 4-page paper. As such, in the upcoming posts, I’ll try and expand on various aspects of this study that I found fascinating. Stay tuned…

RNA Structurome

The weekly or monthly updates that appear in my e-mail account from various journals that I have subscribed to serve as a reminder that every single day we are expanding our knowledge and adding to the repertoire of scientific conquest. Sometimes reading these papers, however, is a chore… Not every paper is well-structured, not every project deserves the attention that it receives, and not every study stands the test of time. Every now and then however, I read papers that leave a profound mark on how I view biological systems. These studies are not necessarily large-scale or even complex but the mere act of reading them changes my way of thinking. The transformation may be nuanced or not even noticeable, but the effects will remain… for a while. If pressed, each scientist may come up with a unique collection of such publications–what we find exciting is ultimately a subjective matter–but I think we all, to some extent, can appreciate the underlying attraction.

The late January issue of Nature carried a few papers of this type for me. Rouskin et al. and Ding et al. reported the use of DMS (dimethyl sulfate)-based modification of exposed ribonucleotide bases coupled with high-throughput sequencing to provide a snap-shot of RNA structural preferences in vivo (in yeast, mammalian cells, and Arabidopsis). Despite the need to overcome certain technical hurdles, the methods themselves are logical extensions of the methods that were published previously for low throughput and in vitro RNA structure determination. What I found intriguing, however, was how Rouskin et al. turned their observations into an actionable hypothesis. Given the nature of the data they had gathered, this paper could have easily turned into a descriptive publication. But the authors took a step further and put forth a hypothesis that best explained the major trends in their data. I am confident it would have been easier for them not to do so… I am also confident that because of this hypothesis, they had a harder time convincing the reviewers than they would’ve otherwise. But they clearly didn’t shy away from going were the data had taken them and they should be applauded for doing so. They put this hypothesis front and center; early on in their paper they state:

“Comparison between in vivo and in vitro data reveals that in rapidly dividing cells there are vastly fewer structured mRNA regions in vivo than in vitro. Even thermo-stable RNA structures are often denatured in cells, highlighting the importance of cellular processes in regulating RNA structure. Indeed, analysis of mRNA structure under ATP-depleted conditions in yeast shows that energy-dependent processes strongly contribute to the predominantly unfolded state of mRNAs inside cells.”

For me, it all comes down to the phrase: “the importance of cellular processes in regulating RNA structure.” We have read about numerous examples where the structure of RNA acts as cis acting factors in RNA biology, however, thinking of RNA structure itself as an intermediate target of regulatory programs on a whole-transcriptome level is very intriguing. I always suspected this much but reading this sentence just toggled a switch in my head–in a good way.


DMS signal in RPL33A mRNA shows a region that is unstructured in vivo but forms a stable structure in vitro.

DMS signal in RPL33A mRNA shows a region that is unstructured in vivo but forms a stable structure in vitro (Rouskin et al, 2014).

Based on their own DMS-seq data, Ding et al similarly report:

“…mRNAs of cold and metal ion stress-response genes folded significantly differently in vivo from their unconstrained in silico predictions (Fig. 4c, d and Extended Data Fig. 8a, b). Interestingly, these stresses are known to affect RNA structure and thermostability.”

This statement, despite being more descriptive, tells a similar story. And I think this is a very important hypothesis. Understanding RNA structure as a dynamic phenomenon in the cell, and not just a byproduct of thermodynamics coded within the sequence, with far-reaching regulatory consequences opens up a new field of research studying transcriptome-wide consequences of factors that affect RNA structure and their functional consequences.

I should also mention that in the same issue, a study by Howard Chang, Eran Segal and colleagues reported:

“Comparison of native deproteinized RNA isolated from cells versus refolded purified RNA suggests that the majority of the RSS [–RNA secondary structure] information is encoded within RNA sequence.”

On the surface this statement contradicts those reported by the Weissman lab. However, this latter study was using de-proteinized RNA and as Rouskin et al. have clearly stated: “analysis of mRNA structure under ATP-depleted conditions in yeast shows that energy-dependent processes strongly contribute to the predominantly unfolded state of mRNAs inside cells.” So the observation made by Wan et al. is a consequence of the in vitro nature of their study. If it turns out that the differences between in vivo and in vitro RNA secondary structures are pervasive, as Rouskin et al. suggest them to be so, we need to rethink how much stock we’re willing to put into the descriptive studies that have reported on RNA structure using in vitro methods.


  1. Rouskin et al., 2014.  Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701–705
  2. Ding et al, 2014. In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505, 696–700.

RNA rises

These are exciting times to be an RNA biologist. Next generation sequencing revolutionized genetics, but now the RNA methodologies have caught up. For every DNA technique, we have developed an equivalent RNA method and then some. For example, there is CLIP-seq and Par-CLIP replacing ChIP-seq in RNA studies but then there is also recently developed high-throughput methods for probing the secondary structure of RNA in vivo (Roushkin et al. 2013, Nature). Last year the first ever large scale binding information for a compendium of RNA-binding proteins (RBPs) was published (Ray et al, 2013, Nature). The computational methods are also gaining, from SeqFold (Ouyang et al, 2013, Genome res) to our TEISER (Goodarzi et al, 2012, Nature). Did I mention these are exciting times?!

It is in light of these advances that making sense of the underlying post-transcriptional regulatory networks that control different aspects of RNA life-cycle and behavior has become ever more important. Five years ago, we embarked on a path to catalog the sequences in RNA that play substantial regulatory roles, by providing linear or structural information for trans factors to recognize and act on. Given the state of technology at the time, we were limited by the diversity of the library we could generate. So, we decided to focus on 3′ UTR sequences that are conserved across vertebrates. We synthesized these sequences in short spans on a custom-designed Agilent array and cloned them downstream of mCherry in a bidirectional promoter which also drives the expression of GFP as an endogenous control. Our goal was to then use FACS to choose the sub-populations that show higher/lower relative expression of mCherry. We could then amplify the cloning site in the selected populations and re-hybridize them back to our Agilent array for quantification (Figure below). It was all good on paper, but as is always the case, we ran into myriad technical problems, ranging from generating a library with enough independent cells (high coverage) to reproducible FACS measurements. By the time we were done trouble-shooting these problems, a lot had changed in the field. For example, sequencing had really become the staple of RNA biology (which we decided to use instead of array hybridization for quantification purposes), Agilent had started to provide custom oligo libraries directly to consumers (which means that this approach can easily be implemented in every lab) and more importantly, FlpIn system (Invitrogen) appeared that significantly affected the reproducibility of our measurements (since all clones in the library are inserted in a unique site in the genome). As is always the case with method developments, we needed to perform innumerable validation assays to evaluate the efficacy of our approach in finding known and novel regulatory elements. Our findings were published last week in Cell reports (Oikonomou et al, 2014) which I encourage you to read. Interestingly, David Erle’s group also published a similar approach which beat our paper by a few days (Zhao et al, 2014, Nature biotech).

These reporter based approaches, insulate each element and studies their effect in isolation; however, real transcripts carry many elements and the fate of the RNA is decided as a cumulative consequence of all the interacting factors. Knowing the initial building blocks, however, enable us to then construct networks and modules of regulatory elements that likely interact and function in an overlapping space (which we tried to infer in our paper using our information-theoretic tools).


Systematic dissection of conserved 3′ UTR sequences in endogenous transcripts

In the end, I wanted to mention that the downside to all the current attention in the RNA field seems to be a fast-paced publication cycle which results in mostly descriptive papers. There is nothing wrong with descriptive studies per se, but sometimes the downstream or underlying mechanisms are so very very much missing. I think, we are also guilty of this to some extent. Our goal was really to identify novel trans factors that interact with the elements we identified using our approach. This is something we are still trying to do and hopefully will manage to better functionally annotate the cis elements and the molecular mechanisms through which they exert their regulatory roles.