Genomics euphoria: ramblings of a scientist on genetics, genomics and the meaning of life

Category Archives: Regulatory networks

Letters from the trenches of war on cancer (Part I)

As I get older, cancer surpasses a scientific curiosity and morphs itself into a harsher reality. As our parents start to get worried about every mole and lump, we also accompany them through the ensuing emotional roller coaster. Working close to a hospital is not helping either… while the tumor samples you see every day are assigned random numbers, it is quite impossible not to see the human suffering behind every biopsy. While I still firmly and deeply believe in the fact that ultimately it is the basic research that can revolutionize health and medicine, I can also sense the urgency of now and the need to act on that front. It is this dichotomy that has shaped my research for the past few years, the fruits of which are finding their way into the annals of science.

It is not news to anyone that I study the biology and regulation of RNA (see the two previous posts on this very blog: here and here). I have specifically focused on developing computational and experimental frameworks that help reveal the identity of post-transcriptional regulatory programs and their underlying molecular mechanisms. Towards the end of my tenure as a graduate student, building upon the work by talented postdocs in the Tavaozie lab at Princeton University (namely Olivier  and Noam who published their work back in 2008) and with the help of my genius friend Hamed, we developed, benchmarked and validated a computational method named TEISER that extends motif-finding algorithms into the world of RNA by taking into account the local secondary structure of RNA molecules as well as their sequence.

When I started out as a postdoc, my goal was to study post-transcriptional regulation using cancer metastasis as a model. In addition to its clinical impact, studying metastasis also has the added benefit of access to a large compendium of high-quality datasets as well as rigorous in vivo and in vitro models for downstream validation of interesting findings.

When it comes to tumorigenesis in general, there is a large body of work focusing on the role of transcriptional regulation, specifically  transcription factors as suppressors and promoters of oncogenesis. However, other aspects of RNA life-cycle are substantially understudied. The success of our lab and many others in revealing novel and uncharacterized regulatory networks based on the action of various miRNA in driving or suppressing metastasis highlights the possibility that heretofore uncharacterized post-transcriptional regulatory programs may play instrumental roles in tumorigenesis.

Given the success of miRNA regulation and my previous work on RNA stability, performing differential transcript stability measurements between highly metastatic cells relative to their poorly metastatic parental populations seemed like a logical step. Using thiouridin pulse-chase labeling and capture followed by high-throughput RNA-seq, we estimated decay rates for every detectable transcript (~13000 transcripts total). It was around this dataset that we built an ambitious study, pushing ourselves to dig deeper at every step. We generated, analyzed, and interpreted heaps of data of various kinds: in silico, in vitro, and in vivo. The results of this study was the discovery of a novel post-transcriptional regulatory program that promotes breast cancer metastasis. Our results were recently published in Nature, however, I also gained insights that could not be included in a 4-page paper. As such, in the upcoming posts, I’ll try and expand on various aspects of this study that I found fascinating. Stay tuned…

RNA rises

These are exciting times to be an RNA biologist. Next generation sequencing revolutionized genetics, but now the RNA methodologies have caught up. For every DNA technique, we have developed an equivalent RNA method and then some. For example, there is CLIP-seq and Par-CLIP replacing ChIP-seq in RNA studies but then there is also recently developed high-throughput methods for probing the secondary structure of RNA in vivo (Roushkin et al. 2013, Nature). Last year the first ever large scale binding information for a compendium of RNA-binding proteins (RBPs) was published (Ray et al, 2013, Nature). The computational methods are also gaining, from SeqFold (Ouyang et al, 2013, Genome res) to our TEISER (Goodarzi et al, 2012, Nature). Did I mention these are exciting times?!

It is in light of these advances that making sense of the underlying post-transcriptional regulatory networks that control different aspects of RNA life-cycle and behavior has become ever more important. Five years ago, we embarked on a path to catalog the sequences in RNA that play substantial regulatory roles, by providing linear or structural information for trans factors to recognize and act on. Given the state of technology at the time, we were limited by the diversity of the library we could generate. So, we decided to focus on 3′ UTR sequences that are conserved across vertebrates. We synthesized these sequences in short spans on a custom-designed Agilent array and cloned them downstream of mCherry in a bidirectional promoter which also drives the expression of GFP as an endogenous control. Our goal was to then use FACS to choose the sub-populations that show higher/lower relative expression of mCherry. We could then amplify the cloning site in the selected populations and re-hybridize them back to our Agilent array for quantification (Figure below). It was all good on paper, but as is always the case, we ran into myriad technical problems, ranging from generating a library with enough independent cells (high coverage) to reproducible FACS measurements. By the time we were done trouble-shooting these problems, a lot had changed in the field. For example, sequencing had really become the staple of RNA biology (which we decided to use instead of array hybridization for quantification purposes), Agilent had started to provide custom oligo libraries directly to consumers (which means that this approach can easily be implemented in every lab) and more importantly, FlpIn system (Invitrogen) appeared that significantly affected the reproducibility of our measurements (since all clones in the library are inserted in a unique site in the genome). As is always the case with method developments, we needed to perform innumerable validation assays to evaluate the efficacy of our approach in finding known and novel regulatory elements. Our findings were published last week in Cell reports (Oikonomou et al, 2014) which I encourage you to read. Interestingly, David Erle’s group also published a similar approach which beat our paper by a few days (Zhao et al, 2014, Nature biotech).

These reporter based approaches, insulate each element and studies their effect in isolation; however, real transcripts carry many elements and the fate of the RNA is decided as a cumulative consequence of all the interacting factors. Knowing the initial building blocks, however, enable us to then construct networks and modules of regulatory elements that likely interact and function in an overlapping space (which we tried to infer in our paper using our information-theoretic tools).


Systematic dissection of conserved 3′ UTR sequences in endogenous transcripts

In the end, I wanted to mention that the downside to all the current attention in the RNA field seems to be a fast-paced publication cycle which results in mostly descriptive papers. There is nothing wrong with descriptive studies per se, but sometimes the downstream or underlying mechanisms are so very very much missing. I think, we are also guilty of this to some extent. Our goal was really to identify novel trans factors that interact with the elements we identified using our approach. This is something we are still trying to do and hopefully will manage to better functionally annotate the cis elements and the molecular mechanisms through which they exert their regulatory roles.