Meet Dfam2.0

Dfam is growing up. This is the first major expansion of the database since it’s inception. We’ve added repeat families from four new organisms: mouse, zebrafish, fruit fly, and nematode. In total, this release includes 2,844 new familes ( 4,150 total ).

New organisms and coverage

In expanding Dfam to include families from multiple genomes, we chose the four species named above because of their status as model organisms covering a broad range of the animal kingdom. Using Dfam profile HMMs within RepeatMasker, we see a marked increase in the fraction of repeat annotation in the genomes (relative to using RepBase consensus sequences) : +5.1% for human, 5.5% for mice, 4.4% for zebrafish, 0.7% for flies, and 6.5% for nematodes.

New data management challenges

Changes on the frontend of the website are modest, but extensive work was done on the backend to support this and future expansion. The addition of new organisms required that we deal with a number of new model properties.

  • Each family is associated with an NCBI taxonomy clade, which indicates where instances are found. Because of horizontal transfer, the family profile HMM may associate with multiple clades, via “model specificity” (MS) lines.
  • For a family found in multiple species, a separate score threshold has been calculated for each appropriate reference species.
  • We’ve used Dfam models to identify family members in each organism. To account for this large influx of new search data (and prepare forthcoming flood when more organisms are added), the underlying database schema has been broken into a single central schema and multiple per-assembly schemas. Many of the backend scripts were refactored to better handle the large scale of these data.

New location

Shortly before migrating to 2.0, we moved the Dfam website from it’s pilot location at the HHMI Janelia Research Campus to its new home at the University of Montana. We’d like to shout out a big thank you to HHMI for funding the pilot project, to the University of Montana for funding purchase of the new server, and to the IT departments at both HHMI and Montana for making the transition as smooth as possible.

New paper

Recent improvements to Dfam, including the 2.0 release, are described in a manuscript that’s just been accepted to the NAR 2016 databases issue. Here’s a link to the preprint.

Source: Meet Dfam2.0

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s