Nextstrain: analysis and visualization of pathogen sequence data

Nextstrain is an open-source project to harness the scientific and public health potential of pathogen genome data. We provide a continually-updated view of publicly available data with powerful analytics and visualizations showing pathogen evolution and epidemic spread. Our goal is to aid epidemiological understanding and improve outbreak response. If you have any questions, or simply want to say hi, please give us a shout at hello@nextstrain.org or introduce yourself at discussion.nextstrain.org.

Table of Contents:


What is Nextstrain?

nextstrain.org aims to provide a real-time snapshot of evolving pathogen populations and to provide interactive data visualizations to virologists, epidemiologists, public health officials, and community scientists. Through interactive data visualizations, we aim to allow exploration of continually up-to-date datasets, providing a novel surveillance tool to the scientific and public health communities.

To that end, we have created a number of open-source tools which have allowed a growing community to produce similar analyses, and we want to promote this community through Nextstrain. Please see the docs for how to to run your own analyses.

Broadly speaking, Nextstrain consists of

  • “Augur” — a series of composable, modular (Unix-like) bioinformatics tools. We use these to create recipes for different pathogens and different analyses, which are easy to reproduce when new data is available.
  • “Auspice” — a web-based visualization program, to present & interact with phylogenomic & phylogeographic data. This is what you see when, for example, you visit nextstrain.org/zika, but it can also run locally on your computer.

We use these tools to provide a continually-updated view of publicly available data for certain important pathogens such as influenza, Ebola, and Zika viruses. These data are continually updated whenever new genomes are made available, thus providing the most up-to-date view possible.


Motivation

If pathogen genome sequences are going to inform public health interventions, then analyses have to be rapidly conducted and results widely disseminated. Current scientific publishing practices hinder the rapid dissemination of epidemiologically relevant results. We think an open, online system that implements robust bioinformatic pipelines to synthesize data from across research groups has the highest capacity to make epidemiologically actionable inferences. Additionally, we have open-sourced all the tools we use, and hope to create a community around Nextstrain which supports and promotes genomic analyses of various kinds.


Contact us

We are keen to keep expanding the scope of Nextstrain and empowering other researchers to better analyze and understand their data. Please get in touch with us if you have questions or comments, or create a post at discussion.nextstrain.org.


Publication

If you use nextstrain.org, Augur, or Auspice as part of your analysis, please cite 👇👇

All source code is freely available under the terms of the GNU Affero General Public License. Screenshots may be used under a CC-BY-4.0 license and attribution to nextstrain.org must be provided.

This work is made possible by the open sharing of genetic data by research groups from all over the world. We gratefully acknowledge their contributions. Special thanks to Kristian Andersen, David Blazes, Peter Bogner, Matt Cotten, Ana Crisan, Gytis Dudas, Vivien Dugan, Karl Erlandson, Nuno Faria, Jennifer Gardy, Becky Kondor, Dylan George, Ian Goodfellow, Betz Halloran, Christian Happi, Jeff Joy, Paul Kellam, Philippe Lemey, Nick Loman, Sebastian Maurer-Stroh, Oliver Pybus, Andrew Rambaut, Colin Russell, Pardis Sabeti, Katherine Siddle, Kristof Theys, Dave Wentworth, Shirlee Wohl and Nathan Yozwiak for comments, suggestions and data sharing.

logologologologologologologo

© 2015-2020 Trevor Bedford and Richard Neher