New $3.2M NIH Grant Supporting Dfam

Mammalian and most other eukaryotic genomes contain a large amount of repetitive sequence,  mostly the remnants of ancient duplications of DNA segments called transposable elements (TEs). TEs have played a critical role in mammalian evolution, and their presence complicates genome sequence analysis in ways that demand high quality methods for identifying and labeling them.

In 2012, we released Dfam, an open-access database of profile hidden Markov models (HMMs) and corresponding metadata for transposable elements in the human genome, and showed that the use of profile HMMs enabled annotation of an additional 5% of the human genome (>150 million nucleotides). We used the human TE families for this proof of principal project and shortly thereafter expanded to include TE families from 4 additional model organisms, demonstrating both the utility and viability of this resource. The Dfam datasets have been utilized in a wide variety of research endeavors and despite the small number of species represented in this proof-of-principle resource the Dfam papers have been cited nearly 200 times [ 12 ]. In addition, we integrated Dfam with RepeatMasker, using our software nhmmer, making it possible to produce high-quality annotations of TE families in complete genomes.

The Dfam consortium has now been awarded a 5-year, $3.2M NIH resource grant to build a sustainable framework for the expansion and improvement of the Dfam resource, with ~$400K supporting work in the Wheeler lab at the University of Montana. With support from this grant, we will develop the Dfam infrastructure to expand to 1000s of genomes, and establish a self-sustaining TE Data Commons that enables community contribution of TE datasets with limited centralized curation. We will also  improve the quality of repeat annotation through development of methods for more reliable alignment adjudication, will expand approaches to visualization of this complex data type, and will improve the modeling of TE subfamilies. By further developing this open access database, we will provide a strong incentive to reverse the trend of proliferation of unaffiliated non-standard repeat datasets and ease the burden of data management for those developing TE libraries.

Progress is already underway. Kaitlin Carey, a graduate student in the Wheeler lab, has made important progress in understanding the landscape of annotation confidence in this complex domain. Meanwhile, Jeb Rosen (recent graduate from the University of Montana Computer Science program) has joined forces with Robert Hubley and Arian Smit, where the three are developing the infrastructure required to support future Dfam growth.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s