I recently began working for a company that markets clinical tests and research assays based on next-generation sequencing (NGS). The company is big – a few thousand employees – and has extensive experience developing and marketing genetic tests. Just not NGS tests. So, they purchased an Illumina HiSeq 2000 and, a few months later, a MiSeq and an Ion Torrent PGM. And they hired me to help out on the bioinformatics side. Oddly enough, I didn’t have any experience with next-gen data, either.
This blog is meant to communicate a bit about what I’ve learned, the projects I’ve worked on, and some thoughts about the state of the industry. Since joining the company a few things have become clear to me. First, next-gen sequencing (aka massively-parallel sequencing) holds incredible power and promise to shape medicine, population genetics, and much more. Second, it’s full of misconceptions, mistakes, and bad choices that were made early on and are now very hard to undo. The hope is that by bringing some of these to light we can make better choices going forward.
A bit of background: I finished a PhD in Biology in 2009 in which I studied mathematical population genetics. Most of my projects involved a mix of analytical modeling – the kind you can do with a pencil and paper – and computer modeling, often agent-based simulation. After graduating I was lucky enough to land an NSF Postdoctoral Research Fellowship (in, cryptically, “informatics”) that sent me to the Felsenstein / Kuhner lab at the University of Washington. There I studied methods of inferring population history from genetic data, mostly Bayesian coalescent MCMC stuff. I spent a lot of time writing code and pondering evolutionary theory, and less time doing math. So when my postdoc ended and I got a “real job”, I knew a few things about probability and statistics, a bit of Java and python, just one or two things about ODE’s, and nothing about next-gen sequencing or human genetics. Funny how things work out sometimes.