Genomics euphoria: ramblings of a scientist on genetics, genomics and the meaning of life

Greetings from UCSF

As of May 1st, I have started as an Assistant Professor at UCSF, Department of Biochemistry and Biophysics. I am also affiliated with the Department of Urology and the Helen Diller Family Comprehensive Cancer Center. The switch from postdoc to assistant professor has been daunting and at times overwhelming, but overall I am excited about starting this new chapter in my career. My home departments are filled with amazing scientists, some of whom have been my long-time heroes. I feel supported and fortunate to be where I am and to have the opportunities that I have. There is a lot on the scientific front that I want to write about that will show up in the near future on this blog. But for now, I wanted to keep this announcement short. Please visit our labpage at and feel free to contact us should the need arise. As we are building our group, we are always looking for smart students and postdocs who would like to be part of our team as we try to build a multidisciplinary group worthy of 21st century science.

Functional impacts of RNA modifications

My good friend and all-around awesome scientist Claudio has been working on this very cool idea the fruits of which is now available to all. I had the pleasure of being on his team and I’m personally fascinated by the problem. The broad question that Claudio is tackling is “what to RNA modifications do in the cell?” In particular he’s been focused on m6A modification and its role in miRNA processing.

But the real story is actually more fascinating. For the longest time, Claudio was looking for the molecular mechanism through which the miRNA mir-126 is down-regulated in highly metastatic cells. A possible solution formed in the form of the gene METTL3 which methylated RNA. Knocking down METTL3 indeed reduced mir-126 levels. But while doing the necessary controls, Claudio noticed that the reduction was not limited to mir-126 but was actually a more global effect impacting a large fraction of miRNAs. It was from this initial observation that his grand hypothesis was formed: RNA methylation (m6A) has a direct role in miRNA biogenesis. And this is were I came in…. a quick look at miRNA sequences showed that m6A sites were located close to but not on primary miRNA sequences. This suggested that m6A markings could serve as a beacon for recruiting the miRNA processing machinery (specifically the dsRNA-binding protein DGCR8). Claudio then used a series of focused experiment to prove this hypothesis (as much as anything can be proven in science). You can read all about this in his very nice paper in Nature.

However, as is usually the case in science, solving one problem leads to even more questions. As I mentioned earlier, m6A sites are not directly recognized by DGCR8, so there was a missing link between RNA modification and the recruitment of processing machinery. To approach this problem, Claudio did the an IP-mass spec of m6A-modified RNA and found a very good candidate in the form of the ubiquitous RNA-binding protein HNRNPA2B1. What was especially important was that the RGAC motif targeted by METTL3 actually has similarities to HNRNPA2B1 binding sequence. In fact the RGAC motif is very much enriched among the binding sites of HNRNPA2B1. Now the question is whether methylating these sequences can impact HNRNPA2B1 binding. In other words, are there sites where modifying the A to m6A will increase affinity to HNRNPA2B1. In a series of experiments (both high- and low-throughput) we showed that this is in fact the case. In general we observed a broad functional entanglement between METTL3 and HNRNPA2B1. These results were recently published in Cell and I invite everyone to read this paper.

HNRNPA2B1 Is a Mediator of m6A-Dependent Nuclear RNA Processing Events

HNRNPA2B1 Is a Mediator of m6A-Dependent Nuclear RNA Processing Events

My thoughts? I think this is just the beginning. There are two points to consider: (i) HNRNPA2B1 is not the only reader of m6A and (ii) m6A is not the only RNA modification. Together, I think these studies and those of other groups on m6A (and RNA modification in general) suggests the birth of a new field of research with broad functional consequences on gene expression regulation.

How to run kallisto on NCBI SRA RNA-Seq data for differential expression using the mac terminal

Very useful post on Kallisto.

Andrew T McKenzie

Attention Conservation Notice: This post explains how to run the exceptionally fast RNA-seq k-mer aligner kallisto from the Pachter lab on data you download from NCBI’s Short Read Archive, and then analyze it for differential expression using voom/limma. As with everything in bioinformatics, this will likely be obsolete in months, if not weeks.

Kallisto is really fast and non-memory intensive, without sacrificing accuracy (at least, according to their paper), and therefore has the potential to make your life a lot easier when it comes to analyzing RNA-seq data.

As a test data set, I used the very useful SRA DNA Nexus to search the SRA database for a transcriptome study from human samples with 2-3 biological replicates in 2 groups, so that I could achieve more robust differential expression calls without having to download too much data.

I ended up using SRP006900, which is titled “RNA-seq of two…

View original post 513 more words

RNA Structurome (part II)

Recently, two very great papers came out in Nature that I urge you to read:

1. The icSHAPE method, developed by the Howard Chang’s group at Stanford, which I think is the gold-standard for in vivo RNA structurome data.

2. The hiCLIP method that provides a principled approach for looking at interactions with dsRNA binding proteins. For example, this would have been very helpful for our work on the dsRBP TARBP2.

In general, I’m quite excited by the technologies that are now available for probing the structural component of RNA and really gauge its importance in RNA-mediated interactions and RNA biology in general.

The two cultures of mathematics and biology

An excellent take on interdisciplinary research at the intersection of math and biology!

Bits of DNA

I’m a (50%) professor of mathematics and (50%) professor of molecular & cell biology at UC Berkeley. There have been plenty of days when I have spent the working hours with biologists and then gone off at night with some mathematicians. I mean that literally. I have had, of course, intimate friends among both biologists and mathematicians. I think it is through living among these groups and much more, I think, through moving regularly from one to the other and back again that I have become occupied with the problem that I’ve christened to myself as the ‘two cultures’. For constantly I feel that I am moving among two groups- comparable in intelligence, identical in race, not grossly different in social origin, earning about the same incomes, who have almost ceased to communicate at all, who in intellectual, moral and psychological climate have so little in common that instead of crossing the campus…

View original post 5,989 more words

Confessions of a bad writer

“One day I will find the right words, and they will be simple.” –Jack Kerouac

I am a bad writer! And a piece by Steven Pinker in chronicles of higher education made me feel awful about it. I recommend that you read it too… and I’m sure you will feel self-conscious about your writing as well. I even went back and checked some of my papers. Examples cited by Steven Pinker are aptly chosen and I’ve come by many such convoluted verbiage myself in my day-to-day readings. For example:

The methods section of an experimental paper explains, “Participants read assertions whose veracity was either affirmed or denied by the subsequent presentation of an assessment word.” After some detective work, I determined that it meant, “Participants read sentences, each followed by the word true or false.” The original academese was not as concise, accurate, or scientific as the plain English translation.

To be fair, I know scientists who write with an enviable clarity. My PhD advisor, Saeed Tavazoie, is one of those academics. The ease with which he finds the right adjectives and adverbs, even in the context of a casual conversation about his scientific ideas, has always baffled me. More importantly, I believe that the ability to deconstruct a complex notion into parts that can be simply conveyed is a measure of true understanding of scientific matter. Unfortunately, I’m not one of those scientists… at least not yet.


Your writing is bad, and you should feel bad… Dr. Zoidberg is judging you!

There is one point, however, that I think should be discussed. Scientists publish two kinds of scientific material, on the one hand we write papers usually aimed at scientists in our own field (or even sub-field and sub-sub-field); on the other hand, we write blog posts, perspectives, insights, reviews and etc. The latter is meant for a broad audience and should be amply clear to everyone with or without expertise. There are no arguments there. However, when it comes to papers, I’m not 100% sure that writing for a broad audience should be our main goal… I don’t think it should even be on the list at all. I’m not talking about clarity, rather, I’m talking about assuming a broad readership. For example, Steven Pinker writes:

“A considerate writer will also cultivate the habit of adding a few words of explanation to common technical terms, as in “Arabidopsis, a flowering mustard plant,” rather than the bare “Arabidopsis” (which I’ve seen in many science papers).”

My argument is that writing a paper for a broad audience may create a false sense of comprehension. For example, if someone doesn’t know what Arabidopsis is, they should not be reading the text in the first place. Words like significantinformativeassociation, sufficientcausal and … have specific scientific and mathematical meanings that are distinctly different from their everyday use. Not knowing the scientific context in which these words are used, will clearly change our understanding of the text. While academics should absolutely engage in popular science writing, I don’t think scientific papers are the right medium for this endeavor. We as scientists may even want to use the language of the abstract to set a bar for who should be reading the paper in the first place. Public media is rife with examples were non-experts feel they have understood the subject matter in cases where they clearly have not. How many times have we discussed the misuse of “correlation” versus “causation” by the media originating from poor understanding of a published paper.

Again, I’m not talking about convoluted writing here, but rather technical writing. For example, as part of the qualification exams for my PhD candidacy, I was assigned a paper from Bernhard Palsson’s group. The abstract has phrases like “constraints-based in silico models have been used to calculate optimal growth rates” or “incorrect predictions of in silico models based on optimal performance criteria”. I was completely baffled by the text and I couldn’t even begin to understand  the problem the authors were trying to address (you can try this yourself by reading the abstract). It was after reading many MANY papers from this field that I began to understand the language and with that the science behind it. In other words, comprehending the language is serving as a gauge for the ability to understand the science behind it with minimal risk of unintended misunderstandings. As JRR Tolkien wrote, at the Doors of Durin, you should be able to speak friend to enter… scientific papers, akin to mines of moria, are dangerous territories.

Letters from the trenches of war on cancer (Part I)

As I get older, cancer surpasses a scientific curiosity and morphs itself into a harsher reality. As our parents start to get worried about every mole and lump, we also accompany them through the ensuing emotional roller coaster. Working close to a hospital is not helping either… while the tumor samples you see every day are assigned random numbers, it is quite impossible not to see the human suffering behind every biopsy. While I still firmly and deeply believe in the fact that ultimately it is the basic research that can revolutionize health and medicine, I can also sense the urgency of now and the need to act on that front. It is this dichotomy that has shaped my research for the past few years, the fruits of which are finding their way into the annals of science.

It is not news to anyone that I study the biology and regulation of RNA (see the two previous posts on this very blog: here and here). I have specifically focused on developing computational and experimental frameworks that help reveal the identity of post-transcriptional regulatory programs and their underlying molecular mechanisms. Towards the end of my tenure as a graduate student, building upon the work by talented postdocs in the Tavaozie lab at Princeton University (namely Olivier  and Noam who published their work back in 2008) and with the help of my genius friend Hamed, we developed, benchmarked and validated a computational method named TEISER that extends motif-finding algorithms into the world of RNA by taking into account the local secondary structure of RNA molecules as well as their sequence.

When I started out as a postdoc, my goal was to study post-transcriptional regulation using cancer metastasis as a model. In addition to its clinical impact, studying metastasis also has the added benefit of access to a large compendium of high-quality datasets as well as rigorous in vivo and in vitro models for downstream validation of interesting findings.

When it comes to tumorigenesis in general, there is a large body of work focusing on the role of transcriptional regulation, specifically  transcription factors as suppressors and promoters of oncogenesis. However, other aspects of RNA life-cycle are substantially understudied. The success of our lab and many others in revealing novel and uncharacterized regulatory networks based on the action of various miRNA in driving or suppressing metastasis highlights the possibility that heretofore uncharacterized post-transcriptional regulatory programs may play instrumental roles in tumorigenesis.

Given the success of miRNA regulation and my previous work on RNA stability, performing differential transcript stability measurements between highly metastatic cells relative to their poorly metastatic parental populations seemed like a logical step. Using thiouridin pulse-chase labeling and capture followed by high-throughput RNA-seq, we estimated decay rates for every detectable transcript (~13000 transcripts total). It was around this dataset that we built an ambitious study, pushing ourselves to dig deeper at every step. We generated, analyzed, and interpreted heaps of data of various kinds: in silico, in vitro, and in vivo. The results of this study was the discovery of a novel post-transcriptional regulatory program that promotes breast cancer metastasis. Our results were recently published in Nature, however, I also gained insights that could not be included in a 4-page paper. As such, in the upcoming posts, I’ll try and expand on various aspects of this study that I found fascinating. Stay tuned…

RNA Structurome

The weekly or monthly updates that appear in my e-mail account from various journals that I have subscribed to serve as a reminder that every single day we are expanding our knowledge and adding to the repertoire of scientific conquest. Sometimes reading these papers, however, is a chore… Not every paper is well-structured, not every project deserves the attention that it receives, and not every study stands the test of time. Every now and then however, I read papers that leave a profound mark on how I view biological systems. These studies are not necessarily large-scale or even complex but the mere act of reading them changes my way of thinking. The transformation may be nuanced or not even noticeable, but the effects will remain… for a while. If pressed, each scientist may come up with a unique collection of such publications–what we find exciting is ultimately a subjective matter–but I think we all, to some extent, can appreciate the underlying attraction.

The late January issue of Nature carried a few papers of this type for me. Rouskin et al. and Ding et al. reported the use of DMS (dimethyl sulfate)-based modification of exposed ribonucleotide bases coupled with high-throughput sequencing to provide a snap-shot of RNA structural preferences in vivo (in yeast, mammalian cells, and Arabidopsis). Despite the need to overcome certain technical hurdles, the methods themselves are logical extensions of the methods that were published previously for low throughput and in vitro RNA structure determination. What I found intriguing, however, was how Rouskin et al. turned their observations into an actionable hypothesis. Given the nature of the data they had gathered, this paper could have easily turned into a descriptive publication. But the authors took a step further and put forth a hypothesis that best explained the major trends in their data. I am confident it would have been easier for them not to do so… I am also confident that because of this hypothesis, they had a harder time convincing the reviewers than they would’ve otherwise. But they clearly didn’t shy away from going were the data had taken them and they should be applauded for doing so. They put this hypothesis front and center; early on in their paper they state:

“Comparison between in vivo and in vitro data reveals that in rapidly dividing cells there are vastly fewer structured mRNA regions in vivo than in vitro. Even thermo-stable RNA structures are often denatured in cells, highlighting the importance of cellular processes in regulating RNA structure. Indeed, analysis of mRNA structure under ATP-depleted conditions in yeast shows that energy-dependent processes strongly contribute to the predominantly unfolded state of mRNAs inside cells.”

For me, it all comes down to the phrase: “the importance of cellular processes in regulating RNA structure.” We have read about numerous examples where the structure of RNA acts as cis acting factors in RNA biology, however, thinking of RNA structure itself as an intermediate target of regulatory programs on a whole-transcriptome level is very intriguing. I always suspected this much but reading this sentence just toggled a switch in my head–in a good way.


DMS signal in RPL33A mRNA shows a region that is unstructured in vivo but forms a stable structure in vitro.

DMS signal in RPL33A mRNA shows a region that is unstructured in vivo but forms a stable structure in vitro (Rouskin et al, 2014).

Based on their own DMS-seq data, Ding et al similarly report:

“…mRNAs of cold and metal ion stress-response genes folded significantly differently in vivo from their unconstrained in silico predictions (Fig. 4c, d and Extended Data Fig. 8a, b). Interestingly, these stresses are known to affect RNA structure and thermostability.”

This statement, despite being more descriptive, tells a similar story. And I think this is a very important hypothesis. Understanding RNA structure as a dynamic phenomenon in the cell, and not just a byproduct of thermodynamics coded within the sequence, with far-reaching regulatory consequences opens up a new field of research studying transcriptome-wide consequences of factors that affect RNA structure and their functional consequences.

I should also mention that in the same issue, a study by Howard Chang, Eran Segal and colleagues reported:

“Comparison of native deproteinized RNA isolated from cells versus refolded purified RNA suggests that the majority of the RSS [–RNA secondary structure] information is encoded within RNA sequence.”

On the surface this statement contradicts those reported by the Weissman lab. However, this latter study was using de-proteinized RNA and as Rouskin et al. have clearly stated: “analysis of mRNA structure under ATP-depleted conditions in yeast shows that energy-dependent processes strongly contribute to the predominantly unfolded state of mRNAs inside cells.” So the observation made by Wan et al. is a consequence of the in vitro nature of their study. If it turns out that the differences between in vivo and in vitro RNA secondary structures are pervasive, as Rouskin et al. suggest them to be so, we need to rethink how much stock we’re willing to put into the descriptive studies that have reported on RNA structure using in vitro methods.


  1. Rouskin et al., 2014.  Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701–705
  2. Ding et al, 2014. In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505, 696–700.

RNA rises

These are exciting times to be an RNA biologist. Next generation sequencing revolutionized genetics, but now the RNA methodologies have caught up. For every DNA technique, we have developed an equivalent RNA method and then some. For example, there is CLIP-seq and Par-CLIP replacing ChIP-seq in RNA studies but then there is also recently developed high-throughput methods for probing the secondary structure of RNA in vivo (Roushkin et al. 2013, Nature). Last year the first ever large scale binding information for a compendium of RNA-binding proteins (RBPs) was published (Ray et al, 2013, Nature). The computational methods are also gaining, from SeqFold (Ouyang et al, 2013, Genome res) to our TEISER (Goodarzi et al, 2012, Nature). Did I mention these are exciting times?!

It is in light of these advances that making sense of the underlying post-transcriptional regulatory networks that control different aspects of RNA life-cycle and behavior has become ever more important. Five years ago, we embarked on a path to catalog the sequences in RNA that play substantial regulatory roles, by providing linear or structural information for trans factors to recognize and act on. Given the state of technology at the time, we were limited by the diversity of the library we could generate. So, we decided to focus on 3′ UTR sequences that are conserved across vertebrates. We synthesized these sequences in short spans on a custom-designed Agilent array and cloned them downstream of mCherry in a bidirectional promoter which also drives the expression of GFP as an endogenous control. Our goal was to then use FACS to choose the sub-populations that show higher/lower relative expression of mCherry. We could then amplify the cloning site in the selected populations and re-hybridize them back to our Agilent array for quantification (Figure below). It was all good on paper, but as is always the case, we ran into myriad technical problems, ranging from generating a library with enough independent cells (high coverage) to reproducible FACS measurements. By the time we were done trouble-shooting these problems, a lot had changed in the field. For example, sequencing had really become the staple of RNA biology (which we decided to use instead of array hybridization for quantification purposes), Agilent had started to provide custom oligo libraries directly to consumers (which means that this approach can easily be implemented in every lab) and more importantly, FlpIn system (Invitrogen) appeared that significantly affected the reproducibility of our measurements (since all clones in the library are inserted in a unique site in the genome). As is always the case with method developments, we needed to perform innumerable validation assays to evaluate the efficacy of our approach in finding known and novel regulatory elements. Our findings were published last week in Cell reports (Oikonomou et al, 2014) which I encourage you to read. Interestingly, David Erle’s group also published a similar approach which beat our paper by a few days (Zhao et al, 2014, Nature biotech).

These reporter based approaches, insulate each element and studies their effect in isolation; however, real transcripts carry many elements and the fate of the RNA is decided as a cumulative consequence of all the interacting factors. Knowing the initial building blocks, however, enable us to then construct networks and modules of regulatory elements that likely interact and function in an overlapping space (which we tried to infer in our paper using our information-theoretic tools).


Systematic dissection of conserved 3′ UTR sequences in endogenous transcripts

In the end, I wanted to mention that the downside to all the current attention in the RNA field seems to be a fast-paced publication cycle which results in mostly descriptive papers. There is nothing wrong with descriptive studies per se, but sometimes the downstream or underlying mechanisms are so very very much missing. I think, we are also guilty of this to some extent. Our goal was really to identify novel trans factors that interact with the elements we identified using our approach. This is something we are still trying to do and hopefully will manage to better functionally annotate the cis elements and the molecular mechanisms through which they exert their regulatory roles.

Should you do another postdoc?

A good read…

zinemin's random thoughts

Here is yet another article about the negative effect that having to do 2-3 postdocs before being able to apply for tenure-track positions has on people’s life, which has been widely shared and discussed on my facebook.

I left academia after 2 postdocs only 3 months ago, but already now I feel a greater clarity is coming over me regarding the topic. People are forever questioning themselves whether the postdoc lifestyle is still worth it for them or whether they should leave. I have asked myself the same question for 6 years almost every day.

Now I think the situation is much, much simpler than I thought.

In truth, there are only two reasons why you should not quit your postdoc tomorrow and find another job:

(i) You like living abroad and having to change country every 2-3 years. You think this lifestyle has clear advantages to living in one…

View original post 878 more words