The concept of a 'build'

Nextstrain’s focus on providing a real-time snapshot of evolving pathogen populations necessitates reproducible analysis that can be rerun when new sequences are available. The individual steps necessary to repeat analysis together comprise a “build”.

Because no two datasets or pathogens are the same, we build augur to be flexible and suitable for different analyses. The individual augur commands are composable, and can be mixed and matched with other scripts as needed. These steps, taken together, are what we refer to as a “build”.

Example build

The zika virus tutorial describes a build which contains the following steps:

  1. Prepare pathogen sequences and metadata
  2. Align sequences
  3. Construct a phylogeny from aligned sequences
  4. Annotate the phylogeny with inferred ancestral pathogen dates, sequences, and traits
  5. Export the annotated phylogeny and corresponding metadata into auspice-readable format

and each of these can be run via a separate augur command.

Snakemake

While it would be possible to run a build by running each of the individual steps — they’re just self-contained commands after all — we typically group these together into a make-type file. Snakemake is “a tool to create reproducible and scalable data analyses… via a human readable, Python based language.”

Snakemake is installed as part of the conda environment or the docker container. If you ever see a build which has a “Snakefile” then you can run this simply by typing snakemake or nextstrain build ., respectively.

Next steps

  • Have a look at some of the tutorials (listed in the sidebar). Each one will use a slightly different combination of augur commands depending on the pathogen.

  • Read about how to customize a build

All source code is freely available under the terms of the GNU Affero General Public License. Screenshots etc may be used as long as a link to nextstrain.org is provided.

This work is made possible by the open sharing of genetic data by research groups from all over the world. We gratefully acknowledge their contributions. Special thanks to Kristian Andersen, Allison Black, David Blazes, Peter Bogner, Matt Cotten, Ana Crisan, Gytis Dudas, Vivien Dugan, Karl Erlandson, Nuno Faria, Jennifer Gardy, Becky Garten, Dylan George, Ian Goodfellow, Nathan Grubaugh, Betz Halloran, Christian Happi, Jeff Joy, Paul Kellam, Philippe Lemey, Nick Loman, Sebastian Maurer-Stroh, Louise Moncla, Oliver Pybus, Andrew Rambaut, Colin Russell, Pardis Sabeti, Katherine Siddle, Kristof Theys, Dave Wentworth, Shirlee Wohl and Nathan Yozwiak for comments, suggestions and data sharing.

logologologo
logologologo

© 2015-2019 Trevor Bedford and Richard Neher