Notable changes in Augur v25

The Nextstrain team

The Nextstrain team has released several new versions of Augur – our bioinformatics toolkit that assists with phylogenomic analyses – over the past few months. We wanted to highlight some of the significant feature improvements in these new releases.

# Excel and OpenOffice support

One of the features we are most excited to announce is that the augur curate command now supports both Excel (.xlsx and .xls) and OpenOffice (.ods) files as metadata inputs. This allows the easy conversion of Excel or OpenOffice files into the expected metadata TSV used by other Augur commands. This new feature is available in Augur v25.2 and later.

# Merging multiple metadata files

We are also very pleased to announce a new command, augur merge, which allows for generalized merging of two or more metadata tables based on a common field. We hope that this will make curation of metadata inputs significantly easier. We also expect that this will facilitate starting with existing pathogen repos with curated data available via data.nextstrain.org and “spiking in” extra, non-public metadata for additional analysis. We plan to extend this functionality to support merging of sequence files in a future Augur release. The merge feature is available in Augur v25.3 and later.

# Weighted sampling

We have extended augur filter with a new flag, --group-by-weights, that adds weighted sampling. This allows subsampling to follow quantities like population size or case counts to facilitate more representative analyses. Or conversely, this allows for sampling of a specific “focal” region with more intensity than other “contextual” regions. This is described in detail in an updated Filtering and Subsampling guide, as well as the help text of the --group-by-weights flag. This new flag is available in Augur v25.3 and later.

# Generalized read/write commands

We have added two new I/O related commands, augur read-file and augur write-file. By piping to/from these new commands, external programs can do I/O “the Augur way”, including transparent handling of compression formats and newlines consistent with the rest of Augur. We hope that exposing this functionality will make it easier to integrate external programs into Augur pipelines in a consistent, convenient way. These new commands are available in Augur v25.3 and later.

# Six new augur curate commands for transforming metadata

We added a number of new sub commands to augur curate. These commands ease the manipulation of various sorts of dataset metadata:

All of these new sub-commands are available in Augur v25.0 and later.

# Reduction in Auspice JSON sizes of around 30%

augur export v2 now limits numerical precision on floats in the exported JSON file. This should not change how a dataset is displayed or interpreted in Auspice, but reduces the gzipped and minimised JSON file size by around 30%, depending on the dataset. This will improve page load times in Auspice and nextstrain.org. This change is present in Augur v25.2 and later.

# Debug mode (verbose logging)

We added a new debugging mode to the entire Augur suite, which is enabled by setting the AUGUR_DEBUG environment variable to a non-empty value (e.g., export AUGUR_DEBUG=1).

Currently, setting this variable only causes Augur commands to print additional information about the specifics of handled (i.e., anticipated) errors. For example, stack traces and parent exceptions in an exception chain, which are normally not displayed, will be included when this variable is set.

In future Augur releases, we anticipate using this variable to conditionalize new debugging and troubleshooting features, such as verbose operation logging. This debugging mode is available in Augur v25.3 and later.

# Thanks

Thanks for reading our summary of new Augur features! We hope they prove to be useful in your Augur pipelines and workflows. We welcome feedback about these new features, and suggestions for additional ones, either via our discussion site, or by opening issues in the Augur GitHub repo.

All source code is freely available under the terms of the GNU Affero General Public License. Screenshots may be used under a CC-BY-4.0 license and attribution to nextstrain.org must be provided.

This work is made possible by the open sharing of genetic data by research groups from all over the world. We gratefully acknowledge their contributions. Special thanks to Kristian Andersen, Josh Batson, David Blazes, Jesse Bloom, Peter Bogner, Anderson Brito, Matt Cotten, Ana Crisan, Tulio de Oliveira, Gytis Dudas, Vivien Dugan, Karl Erlandson, Nuno Faria, Jennifer Gardy, Nate Grubaugh, Becky Kondor, Dylan George, Ian Goodfellow, Betz Halloran, Christian Happi, Jeff Joy, Paul Kellam, Philippe Lemey, Nick Loman, Duncan MacCannell, Erick Matsen, Sebastian Maurer-Stroh, Placide Mbala, Danny Park, Oliver Pybus, Andrew Rambaut, Colin Russell, Pardis Sabeti, Katherine Siddle, Kristof Theys, Dave Wentworth, Shirlee Wohl and Cecile Viboud for comments, suggestions and data sharing.

Nextstrain is supported by

logologologologologologologologo