Genomics euphoria: ramblings of a scientist on genetics, genomics and the meaning of life

War on cancer: Notes from the frontlines

Last month, I was fortunate enough to attend the tumor heterogeneity and plasticity symposium. The conference which was jointly organized by CNIO and Nature, was held in Madrid. I thought it would be a good idea for me to write some notes on what I heard at the conference.


  1. There were two keynote speakers, Kornelia Polyak from Dana Farber and José Baselga from across the street from us at MSKCC. Jose’s talk was very much clinical as he enumerated the MANY trials that they are conducting. Kornelia’s talk, on the other hand, was more basic-research-y (is that a word? if not, it should be…). I really cannot distill down a whole keynote into a few sentences, but the bottom-line was: (i) diversity is bad; (ii) we can develop rational experimental approaches to find cancer drivers that are sub-clonal; (iii) sub-clonality brings forth the idea of growth-promoters vs. competitors. All in all, a very good and complex talk… Maybe one of the few talks at the conference with significant mechanistic and functional observations.
  2. The field, given how young it is, is very descriptive. It largely involves researchers making cool observations… don’t get me wrong, I don’t mean it in a derogatory sense; what I am trying to say is that it would probably take years before we can make sense of many of the observations.
  3. On the sequencing side, Elaine Mardis talked about very deep whole-genome sequencing of matched primary and metastatic tumors, followed by rigorous validations of each genetic variation. She talked about tumor lineage and pressures exerted by therapeutic manipulations and how they shape the heterogeneity of the metastatic sites. One interesting thing that she mentioned was that very few single-nucleotide variations are actually expressed (I think she said something like 44%).
  4. Sean Morrison gave a very rigorous presentation on heterogeneity and metastasis. He had used genomic tools plus xenografting in NSG mice to study melanoma.
  5. Dana Pe’er talked about analyzing mass-cytometry (Cy-TOF) data using ViSNE. Cy-TOF follows the same logic as FACS, with the main difference that instead of fluorophores, other elements with distinct and sharp mass-spec peaks are conjugated to antibodies. About 40 markers can be measured simultaneously for each cell (compared to 7-8 for FACS). However, making sense of a 40-dimensional dataset is not straightforward, which is why ViSNE comes into play. This tool however, can be used for other types of dimensionality reduction approaches as well. Think of it as non-linear PCA…
  6. Charles Swanton spoke about spatial heterogeneity in tumors where multiple biopsies from the same tumor were sequenced. The level of heterogeneity was scary… For example, we find driver mutations based on their clonality and we aim to target them therapeutically, but there were sub-populations in the tumor that had already lost the driver mutations even in the absence of therapeutic selection pressure.
  7. A number of speakers touched on this idea that the resistant population is already present in the primary tumor and does not necessarily arise after treatment.

All in all, I met some new people and listened to some amazing talks…

In situ genotyping

We have glimpsed the future of genetics and it is colorful

Two papers showed up in the last issue of Nature methods that caught my attention:  I mean Levsesque et al of U Penn and Ke et al. from Stockholm University. I’m generally biased towards methods papers and I find them WAY more interesting than your run-of-the-mill biology papers. But why these two papers? I have a conviction… I say conviction because until now there was very little to back it up: the future of biology rests not just in our understanding the populations of cells and observing the “average” phenotypes/phenotypes, but rather a single-cell approach is required to understand all the complex dynamics upon which biological systems thrive. FIrst came single-cell sequencing… then came single-cell barcoded RNA-seq… and now single-cell in situ genotyping/sequencing is here.

There is a magic to watching cells shine under a microscope… there is a feeling of vindication… of assurance. But it is not just that. A cell is already an “average”; an average of paternal and maternal genetic information. And FISH-based methods enable us to distinguish between them. More importantly, in addition to quantity (which to be fair is not its strongest suit), we get a feel for localization in different compartments.

Granted I’ve never done a FISH experiment. So I really don’t know how much of this is real and how much of it cherry-picking. But I’m sure if we can get there, it will be illuminating. And I don’t mean that we should drop the population for single-cells… I think we’ll be in for a surprise. I think we’ll understand, at last, how stupid a single cell is. We’ll know that individual cells make a lot of mistakes and it is the averaging effect of a population that results in a coherent behavior. In other words, I’m not interested in studying single-cells per se, but rather I want to know about them so that I can have smarter models of a population. The same way a thought forms in our brains despite many erratic firings of individual neurons, I want to know how stable phenotypes emerge in spite of, not because of, single-cell variations. Only then we’ll know which pathways are consistently crucial for a single cell and which ones are meaningful only in the context of a population.

Genophoria returns

As it’s probably obvious to the two people who follow my blog, I’ve been on a sort of hiatus. I’be been nursing a torn TFCC with significant debilitating effects on my ability to work/write. Which reminds me… watch out for that pesky lid on your liquid N2 storage units (you think they’ll stay up, but sometimes they won’t). But as my range of motion and quality of life increases, I’m planning my way back to writing about things that I find interesting in the forefronts of science.

A lot has happened since I was gone:

  1. CISPRs: I wrote about the CRISPR/CAS system a while back and now they’re everywhere… with a vengeance! Multiplexed genome targeting of various genomes (including stem cells), guide-RNA mediated gene silencing, and other exciting tools. There are even companies starting to sell these products as evident in my targeted ads on various journal home-pages. I’m sure this is just the beginning… a lot more is coming.
  2. RNA elements: At last the first compendium of sequences bound by various RNA-binding protein was published. It’s in-vitro and limited but still… it’s a huge deal! I’m even more excited because my dear friend Hamed was the fifth co-first author on that paper (yes… there is such thing as 5th co-first author).
  3. Cancer research: there is no shortage of papers (great or otherwise) published on the subject of cancer/cancer metastasis. But it’s not every day that our lab is involved in major steps forward. Well… it may be every day but people have to actually wait until they come out.
  4. And many other things: I had flagged many papers to write about but to be honest a lot of them have lost their “it” factor at this point. I’ll write about some of them in an up-coming post very soon.

Reviewing the “peers”

A while back, with some not too complex detective work on the part of USA Today, we got our hands on the reviewer comments from the infamous #arseniclife paper. Looking at the reviews, I agreed with the experts, including Leonid Kruglyak, who “reviewed the reviews”, that they generally looked normal… yes, they were quite positive at times but let’s face it, some scientists are nice and some aren’t and by chance alone, you might have a group of three that use encouraging words even if there are criticisms. But since then, I’ve seen opinions written out there that are not too kind to these reviewers… good thing they’re anonymous, or we were looking for our pitchforks now! Even Ash Jogalekar over at “The Curious Wavefunction“, whose posts I often find refreshing, called the reviewers out on their subpar job.

But what do “I” think about this whole fiasco? I think hindsight is 20/20… It all comes down to taking the authors at their word that there was no phosphorus in the media they used… everything else follows… But as it turns out, that is very difficult to achieve. But I didn’t know that and if I was a reviewer, I wouldn’t have brought it up either. I probably would’ve asked for specific experiments but they would be based on my background (and could be claimed to be “outside the scope”). And that is all it comes down to, isn’t it? Our backgrounds… and the reviewers are in fact chosen from different areas. For a paper like this, you probably will get a bacteriologist, may be someone working on Archea and maybe a chemist would not even be on the short list (let alone one that would have the relevant knowledge). In the end of the day, I don’t think that as researchers, we want reviewers to be too ambitious… we have all got bad reviews that we think are nonsensical. I actually think the reviewers should just make sure, given the facts and their knowledge, the results are not seriously flawed. And we all know that a published paper is not exactly the word of God and it can be easily refuted/corrected/expanded upon.

So, if we all know these things, then what is the source of the outrage? I think there are a number of points:

  1. This is Science we are talking about here… we like to think that the impact factor of the journal has a strong correlation to the strength of its underlying science. We all toil away for years hoping to get a paper in Science and it breaks our heart to see a flawed paper like this appear on its pages. I agree that this paper, in retrospect, is outrageous but this is not the first time a bad paper is published in Science or Nature or any other journal.
  2. This process shows every thing that’s wrong in research nowadays… holding press conferences like a true salesman, relying on seemingly random reviews to accept a paper and the disproportionate value associated to papers in big name journals.

Are there solutions to these problems? Of course there are… pretty old ones actually. Switching to Arxive model instead of the journal model makes a lot of sense… in that model, studies are presented on equal footing and it is on their own merit that they gain attraction. Can we ever do that? Not in the near future… biology is expensive and researchers need decent publications for every grant cycle. We simply don’t have the luxury of time to wait for our papers to climb the citation ladder (that is assuming citation is even a good measure of merit). The current journal structure enables us to “assign” a “value” to a paper based on a couple of samplings, even before the paper is published. Is it a flawed process? Of course. Is it at times unfair? Absolutely. But let’s realize what the problem is: just too many people, not enough money. Let’s not heap all that on the shoulders of three reviewers…

The rise of circular RNAs: a whole swath of circular Spongebobs

Recently, we’ve been bombarded by high-profile studies about a class of RNAs, called circular RNAs. Resulting from non-canonical splicing events (see below), circRNAs seem to be more prevalent than previously thought. They’re identified in mammals, plants and even archaea.

The formation and identification of circRNAs

The formation and identification of circRNAs

The recent papers in Nature (Memczak et al. and Hansem et al.) argue for a broad, even tissue specific, functionality for these type of RNAs. Memczak et al. report a comprehensive atlas of thousands of circRNAs in various organisms through a computational approach, to which they assign an impressive 75% sensitivity and very low false-discovery rate.

circRNA statistics according to Memczak et al.

circRNA statistics according to Memczak et al.

The significantly high stability of these RNAs, according to these authors, puts them in perfect position to function as post-transcriptional regulators through sponging other regulatory trans factors. They focused on miRNA sites to find circRNAs that show higher than expected occurrence of these elements. Ant they in fact find circRNAs that can bind and trap miR-7 loaded RISC, results that are corroborated in other recent papers.

Personally, I find sponging a very low-complexity function… meaning, they arise after the fact, with the cell taking advantage of non-coding RNAs that are already available. This means either that circRNAs first arose as aberrant splicing events, i.e. mistakes in donor-acceptor identification or either them or their splicing partners play other, more complex roles that we should be able to identify soon.


Memczak S, et al. (2013). Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. doi:10.1038/nature11928

Hansen TB, et al. (2013). Natural RNA circles function as efficient microRNA sponges. Nature. doi:10.1038/nature11993

Genome editing via the CRISPR system: the triumph of basic research

I wanted to write a quick note about the use of CRISPR/Cas system in gene editing. Several labs in parallel have developed a CRISPR system (Clustered Regularly Interspaced Short Palindromic Repeats) for gene editing purposes in a variety of organisms from zebra fish to humans. The CRISPR/Cas systems and their function as an immunity-type response in bacteria is on its own very exciting and I encourage you to read about it, if you haven’t (e.g. see this paper in Science from 2010 or simply visit wikipedia).

In short, this system records foreign DNA and uses an RNA intermediate (crRNA) to target other encounters of the same invasive DNA species via specific nucleases (i.e. Cas). But how unbelievably “cool” this system is aside, recently it has been adopted for gene editing. In this context, the crRNA is replaced with a target sequence (a sequence that we want to alter in the genome). The activity of the CAS/crRNA complex, if properly expressed, then results in double stranded breaks at the site of interest. The cell then uses end-joining repair system to correct the break; however, the error-prone nature of this mechanism results in deletions at the site of action. Now if the target sequence was selected from an active gene, this mechanism would effectively mutate the gene into an inactive copy.

CAS system structure

CAS system structure

Even better, by modifying the CAS enzymes, we can limit the nuclease activity to a single nick in the DNA, as opposed to double-stranded breaks. In this case, the cell employs homologous recombination to repair the nick and if we provide a mutated homologous sequence in trans, the system may use it as a template to correct the nicked site and in effect transfer the mutation to the genome with surgical precision.

Not only these are all very exciting, this is a prime example of how important basic research is. Just imagine the first grant written for studying the CRISPRs, and I’m paraphrasing here, “ahem…, there are these repetitive sequences in bacteria and some obscure archaea… we have no idea what they do, but we kind of wanna know… so fund us may be?”. Pursuing this simple curiosity  however, has resulted in a promising method for genome editing in humans and is poised to transform how we do genetics (mainly due to its low cost of implementation). This is a very good example of where targeted funding of “translational research” fails. Ultimately, leaps in life sciences (and science in general) come from systems that we don’t even know exist. The same goes for other amazing tools that have become mainstays of molecular biology. Similarly, I assume the proposal for studying fluorescent proteins went something like this: “well… we have this cool organism that glows in the dark. We kind of wanna know why. Will it cure cancer? probably not…”. But all kidding aside, these are all very good reminders of how important basic research is to our collective knowledge.

Sources: “Cong et al, 2013, Multiplex Genome Engineering Using CRISPR/Cas Systems, Science DOI: 10.1126/science.1231143″ among others.

The triumph of mathematics (or how Nate Silver got drunk)

“Drunk Nate Silver stumbles through traffic on the Jersey Turnpike, screaming out what time each driver will get home.” @davelevitan

I know… I am late to the game… let’s chalk it up to a very busy schedule in the lab. But I want to write about the elections (queue eyes rolling).

I arrived in the US in 2006, so I was fortunate enough to witness the Obamania that swept this nation in 2008. I was quite fascinated with the dynamism of the elections and I was watching it VERY closely. That was the first time I came across, a blog started by a sports statistician named Nate Silver. His simple yet elegant model correctly predicted the election outcome in 49 out of 50 states. Despite his rise, the one-sided 2008 election was not a very good indicator for the supremacy of his model. In 2012, however, everyone believed the race to be a significantly close one. While pundits called the race a virtual toss-up, Nate Silver (and other statistician/bloggers like him, including Sam Wang of Princeton Election Consortium) were assigning very high chances of winning to president Obama throughout the campaign season. This made Nate Silver a punching bag for the TV hosts, and the punditry in general, in the run up to the elections… however, the accuracy of his statistical model proved to be quite impressive (it called all 50 states and all but one of the senate races). This made him the true winner of the elections… with his book becoming a best-seller and #drunknatesilver becoming a popular hashtag on twitter (where I got the quote in the beginning of this post from).

Nate Silver’s model is fairly simple and is very similar to the models used by other poll aggregators (who predicted pretty much the same outcomes). I think, anyone with an adequate knowledge of statistics would have come up with a comparable model. I really don’t want to talk about the model or why it worked so well (which I don’t think is very surprising to any scientists). What caught my attention however, was the extent to which people were shocked by the efficacy of these statistical models. This, I think, clearly indicates that people underestimate science and its ability to deliver. I think, as scientists, we should be worried about this. Why this is the case, I really don’t know… is it the successful war on science? Is it the botched PR dramas by fraudulent scientists? Is it going head-to-head with religion and losing? I don’t know… what I do know, is that Nate Silver is not an extraordinary researcher/mathematician. He has a job and he does it well, but what he’s doing is not groundbreaking. Nevertheless, in this election, science squared off against ideology and won a decisive victory. And we should take this as an opportunity and build upon this. How? I am again not sure… I just know that this opportunity should not be wasted.

DrunkNateSilver from Gawker

Dan Levitan started a game on Twitter: #DrunkNateSilver: things Nate Silver might do/say when he’s drunk.

Shut up and take my money

Probably everyone knows that science funding is not doing ok in the US (or anywhere else for that matter). The grant application success rates have dipped below 15% or even 10% for larger grants. Scientists have been reduced to grant writers: a long and seemingly futile endeavor that is taking more and more away from research time. Basically, people are spending more and more time explaining what they want to do, and less and less time actually doing it. This is not the only problem… with low success rates, the funding process becomes conservative, less imaginative and the word “feasible” transforms into an utterly subjective concept in the mind of the reviewer. Basically, as a young scientist, you need a proposal that is both conventional and innovative at the same time… which seems like a paradox. To be honest, scientists themselves are part of the problem… like any other fraction, every scientist comes with biases, convictions and unfounded belief-systems that clouds his/her judgment. And as the number of grants per researcher shrinks, these biases become an important factor in rankings and scoring applications. The funding problem needs to be dealt with, and I think it will be dealt with in one form or another in the next 4-5 years (things simply cannot go on like this). But those who have power to change anything have not felt the problem yet and like any other profession, the young and less-established investigators suffer the most. Now Ethan Perlstein and his colleagues have come up with a short term solution to fund their innovative ideas. They have started a project in Rockethub to crowd fund their project. I think this is a step forward in the right direction. At this point, they are half way there (their goal is 25,000 dollars)… if you are reading this, head to their project, read their statement and consider fueling this study.

Shut up and take my money

Shut up and take my money

Solving the directionality problem of RNA polymerase

Every now and then, a study appears that reminds us how little we know about some of the most basic subjects in molecular biology, while at the same time expanding the connotations associated with these seemingly simple mechanisms. A recent paper in Science by a multinational collaborative team was a perfect example of one such moment for me. The problem statement is relatively simple: how does RNA polymerase recognize the orientation of DNA; in other words, how does it know towards which direction it should be heading? The answer as I knew it, was two parts: (i) there are certain promoter elements that are in of themselves directional, meaning the transcription complex specifically recognizes one strand and not the other (e.g. the world famous lac promoter is one such example). (ii) in cases where there is no directionality coded in the DNA or the epigenome, the polymerase in fact does go the wrong way, which produces the myriad anti-sense RNAs in the cell. Granted, there might be functionalities associated with these anti-sense RNAs, however, established examples are few and far between.

The more important observation, however, is the fact that there are genetic components to when the anti-sense RNA is transcribed and when it isn’t. The aforementioned study starts from one such mutant (ssu72) and goes on to dissect the mechanism through which Ssu27 establishes directionality of the RNA polymerase complex. The results are very simple and elegant: Ssu27 is a part of a bridging complex that demarcates the start and end of the gene, and consequently the correct direction for transcription (below you can see the figure from the main paper).

Ssu72-mediated loop formation

Ssu72-mediated loop formation

Now one might be wondering why all promoters are not directional at the sequence level? The short answer, I think, is “regulation”. There are a variety molecular mechanisms through which promoter directionality can be used in gene regulation, both for the downstream gene as well as the upstream ones. For the immediate gene, losing half of initiation complexes to the wrong direction ensures lower expression, a fraction that can very well be modulated (e.g. through regulating ssu72 in this example). And for the upstream of genes (as well as the downstream one), the presence of anti-sense RNA spells some form of doom or desist.

Peri-translational regulation… yes, I just made that up

A few years back, both ENCODE and FANTOM Consortium reported a comprehensive polling of transcription start and termination sites of transcribed RNAs in both human and mouse. Our first realization was that, most of the genome, it seemed, was being actively transcribed. A notable fraction of these transcripts, however, are not translated into proteins and are called non-coding RNAs (ncRNAs). There are further classifications (e.g. lncs and SINEs and…) but that is beside the point here. The discovery of ncRNAs started a race for a functional annotation of these molecules. After close to a decade however, the published functional examples are few and far in between. Now, either we over-estimated how relevant these ncRNAs are or they function in such nuanced and complex manner that evade our puny genetic and genomic methods.

Despite the scarcity of findings, one class of ncRNAs have been proved very promising, namely the antisense RNAs. As their name implies, these molecules are usually transcribed from the opposite strand of known protein coding RNAs. The complementarity between these anti-sense RNAs and their functional counterparts hints at a regulatory function mediated through direct interactions between these molecules. A paper published in Nature by an Italian group has functionally characterized an antisense RNA (anti-Uchl1) with impressive detail. Apparently, the export of anti-Uchl1 from nucleus can be controlled effectively. When in cytoplasm, anti-Uchl1 then activates polysome formation and active translation. These data reveal another layer of gene expression control at the post-transcriptional level, which I hereby dub peri-translational regulation.

Interaction between anti-Uchl1 and Uchl1