NIH R15 grant for improved sequence database search

The Wheeler lab has been awarded an NIH R15 grant from the National Institute of General Medical Sciences to develop “Improved protein-DNA models for translated sequence search with profile Hidden Markov models”. The grant is for $426K over three years, beginning April 1, 2017.

Fast and sensitive sequence database search is fundamental to modern molecular biology. The funded research will improve the accuracy of annotation of protein-coding content in sequenced genomes and metagenomic datasets. The research builds on established sequence database search software that employs probabilistic models to increase sensitivity through greater statistical power and ability to better model family complexity. The probabilistic models are called profile hidden Markov models (profile HMMs), and the software is HMMER.

Dr. Wheeler’s group will develop new models that account for frameshifting mutations or errors that obscure the protein-coding nature of sequence, and for splice sites that break genes or domains into distant fragments on the genome. Through a combination of new algorithms and application of existing approaches, these models will be fast enough to use for large-scale annotation, such as in the EMBL European Bioinformatics Institute’s Metagenomics Portal.

(See the press release: here)

RNAcentral now uses nhmmer for searches

RNAcentral is an open resource that provides a unified platform for accessing non-coding RNA sequences from a broad range of ncRNA databases. Anton Petrov has just announced a 3rd release. Along with a slew of new features, RNAcentral is now powered by our nhmmer which they’ve found to be more sensitive than Exonerate. They’re using the recently-released HMMER3.1b2, a stable beta release that implements our new accelerated DNA search based on seed finding with the FM index data structure (I’ll describe that here once we get the paper submitted … soon). Exciting days.

A powerful HMMER for data mining

New paper describing recent advances in the HMMER web server (primarily driven by the endlessly talented Rob Finn and Jody Clements):

Finn RD, Clements J, Arndt W, et al. (2015) HMMER web server: 2015 update. Nucleic Acids Res. DOI: 10.1093/nar/gkv397

Also, a little love from the EMBL-EBI press machine: A powerful HMMER for data mining. (The last line reads like a call to arms: “The next step for the collaborators is to extend the software to accommodate DNA searches, which involves far larger datasets.”)

Building Momentum in Missoula

It’s been a good first year in the Computer Science Department at the University of Montana, in amazing Missoula. I’ve had the pleasure of teaching great students and been fortunate to build a nice little research group consisting of some really talented people (a PhD student, a couple Masters students, and three Undergrads). We’ve started to make some nice progress on research related to translated search in HMMER, improvements to the application of fast text indexes (esp. the FM index) to remote homology search, and annotation of Transposable Elements. I’ll write more about both the students and their research projects in future posts.

Continue reading