Folks in the Wheeler lab maintain versioned software using git, with repos on GitHub. I live in (minor) fear that something bad might happen – maybe a repo gets accidentally deleted at exactly the same moment that the only computer it’s on bursts into flames, or all the repos get intentionally deleted by some hacker who demands my first born child for their safe return (maybe worth it, to save the effort of cobbling together all the various projects that are currently strewn across more than a dozen computers).
That’s where Uhoh comes in. Uhoh lets you back up all your GitHub repos quickly and conveniently using, well, Git. Thanks to the incomparable GeorgeLesica for that (see his original release post).
Uhoh queries the GitHub API for a list of repos (which can be filtered by owner and name), then checks its backup location for a clone. If it finds one, then it runs a git pull. If not, it runs a git clone. Either way, you end up with a backup copy of your repos. Run it in a nightly cron job, and you’ll have one less to worry about.
We were originally planning to host the 7th International Conference on Algorithms for Computational Biology (AlCoB) in Missoula back in April of this year. Then 2020 decided it didn’t want conferences in April (also, there was this pandemic; maybe you heard about it?), so we put it on ice.
Well … it’s back. Working under the optimistic assumption that in-person conferences will make sense by June 2021, we’re all set to host a new-and-improved “7th-8th International Conference on Algorithms for Computational Biology”, which will merge the scheduled program for AlCoB 2020 with a new series of papers submitted for the current year. Find out more (and submit a paper) at https://irdta.eu/alcob2020-2021/.
We’ve just been awarded a $1.05M DOE grant, in collaboration with Jason McDermott‘s group at PNNL, to develop Machine Learning approaches for integrating multi-omics data, with the goal of expanding microbiome annotation.
The project is motivated by the need to understand soil communities that play a key role in the plant-soil dynamic, with impacts on food- and fuel-crop production. To understand the roles of these microbial communities, it is vital to maximally annotate their genomic and functional capacity, yet the majority of data from newly acquired microbiomes remains unannotated.
This project will focus on the development of a novel method for incorporating non-genomic information into the process of annotating genomic sequence, and two complementary strategies building on recent advances in alignment-based and alignment-free labeling. In combination, these approaches are expected to substantially increase the completeness of labeling for difficult-to-annotate microbiome datasets.
If you’re reading this, and think “hey, that sounds like fun!”, get in touch!
The Wheeler lab has been awarded a $1.15M four year grant (NIH R01) to develop machine learning approaches for improved accuracy and speed in sequence annotation.
Alignment of biological sequences is a key step in understanding their evolution, function, and patterns of activity. We will develop Machine Learning approaches to improve both accuracy and speed of highly-sensitive sequence alignment. To improve accuracy, we will develop methods based on both hidden Markov models and Artificial Neural Networks to reduce erroneous annotation caused by (1) the existence of low complexity and repetitive sequence and (2) the overextension of alignments of true homologs into unrelated sequence. We also address the issue of annotation speed, with development of a custom Deep Learning architecture designed to very quickly filter away large portions of candidate sequence comparisons prior to the relatively-slow sequence-alignment step.
If you’re reading this, maybe you’ve caught the big picture: we’ll be looking for people to help with these important and exciting projects. If they sound fun to you, get in touch!
Lab member Anna Marbut just presented a workshop on data management for granting organizations, at the Space Grants Western Regional conference. Anna is pictured on the right in the photo below. Caitlin Stainken (of Submittable) and some NASA employee are to her left.
The Dfam group met up in Palm Springs this week to attend FASEB Mobile DNA 2019. As always, the conference was terrific. Travis Wheeler talked about “Sequence Methods for Increasing Sensitivity and Reducing Errors in TE Annotation”, while Wheeler lab member Kaitlin Carey presented her cool recent work in a poster “Annotation Confidence Estimates Improve Transposable Element Annotation with Subfamilies”.
Meanwhile, Dfam collaborators Jeb Rosen (with help from Robert Hubley and Arian Smit, not shown) presented their poster “Dfam 3: An open community resource for transposable element annotations, consensus sequences, and profile Hidden Markov Models”.
Several of us recently attended AlCoB 2019 in Berkeley. All six attending students presented both talks and posters (sampled in pictures below). Alex Nord discussed his work on splice aware profile HMMs, Jack Roddy presented work on reducing the nasty problem of overextension of sequence alignments, Kaitlin Carey described her cool results on using sequence annotation confidence to improve annotation (including of homologous recombination), Tim Anderson described his new FPGA accelerator for profile HMM search, Sarah Walling described progress in understanding surprising alternative splicing outcomes, and Daniel Olson presented advances in annotating tandemly-repetitive sequence regions with ULTRA.
We also got a chance to visit the Computer Research Division at LBNL (where Genevieve Krause will be spending a summer). Part of that visit included an introduction to a test FPGA system (thanks Andrew and Farzad!)
Our collaborators in the Insel lab have been awarded an R15 grant from the NIH, to study learning and neural coding of social expectations. The work will be performed mostly by folks in the Insel group, but we’re excited to help develop computational methods for classification of video and neural recordings.
Mammalian and most other eukaryotic genomes contain a large amount of repetitive sequence, mostly the remnants of ancient duplications of DNA segments called transposable elements (TEs). TEs have played a critical role in mammalian evolution, and their presence complicates genome sequence analysis in ways that demand high quality methods for identifying and labeling them.
In 2012, we released Dfam, an open-access database of profile hidden Markov models (HMMs) and corresponding metadata for transposable elements in the human genome, and showed that the use of profile HMMs enabled annotation of an additional 5% of the human genome (>150 million nucleotides). We used the human TE families for this proof of principal project and shortly thereafter expanded to include TE families from 4 additional model organisms, demonstrating both the utility and viability of this resource. The Dfam datasets have been utilized in a wide variety of research endeavors and despite the small number of species represented in this proof-of-principle resource the Dfam papers have been cited nearly 200 times [ 1, 2 ]. In addition, we integrated Dfam with RepeatMasker, using our software nhmmer, making it possible to produce high-quality annotations of TE families in complete genomes.
The Dfam consortium has now been awarded a 5-year, $3.2M NIH resource grant to build a sustainable framework for the expansion and improvement of the Dfam resource, with ~$400K supporting work in the Wheeler lab at the University of Montana. With support from this grant, we will develop the Dfam infrastructure to expand to 1000s of genomes, and establish a self-sustaining TE Data Commons that enables community contribution of TE datasets with limited centralized curation. We will also improve the quality of repeat annotation through development of methods for more reliable alignment adjudication, will expand approaches to visualization of this complex data type, and will improve the modeling of TE subfamilies. By further developing this open access database, we will provide a strong incentive to reverse the trend of proliferation of unaffiliated non-standard repeat datasets and ease the burden of data management for those developing TE libraries.
Progress is already underway. Kaitlin Carey, a graduate student in the Wheeler lab, has made important progress in understanding the landscape of annotation confidence in this complex domain. Meanwhile, Jeb Rosen (recent graduate from the University of Montana Computer Science program) has joined forces with Robert Hubley and Arian Smit, where the three are developing the infrastructure required to support future Dfam growth.