The rapid growth in the number of sequenced genomes and available transcriptomics data demands new solutions that can scale to this magnitude and beyond. There is evidence that long noncoding RNAs outnumber protein-coding genes, but at the same time more and more presumed noncoding transcripts are found to harbor short open reading frames that confound current computational coding potential evaluation tools. Similarly, the analysis of gene expression data often focuses on the most differentially expressed genes, which may not identify subtle patterns. Although these questions have been well studied, current models are buckling under the weight of the growing annotation data. These challenges can be addressed through the development of high-order, multi-layered, deep learning algorithms. We present a recurrent neural network model for transcripts and for the detection of coding signals. We also present a Stacked Denoising Autoencoder for the analysis of gene expression data.
David Hendrix studied math and physics at Georgia Tech before getting a PhD in physics at UC Berkeley. There, he gained an interest in computational genomics and bioinformatics, and the interface of statistical physics and pattern discovery algorithms. He went on to do postdocs at Berkeley and MIT in the field of genomics and computational biology before becoming faculty here at Oregon State in the department of Biochemistry and Biophysics and in the School of Electrical Engineering and Computer Science.