Format of files used and created by augur

A full listing of the input and output data formats used in augur will soon be available here.

Table of Contents:


exported JSON(s) for auspice

We use JSONs as the interchange file format between Augur (the bioinformatics tooling) and Auspice (the visualization app). JSONs are reasonably easy for humans to read, easy to parse in most languages, and easy to extend. Any compatible JSONs can be used by Auspice, not just those produced by Augur. Augur produces these JSONs via the augur export command — see the augur docs for more information [external link]. We also define schemas for these JSONs (see below) and provide a validation tool to check JSONs against these schemas, auspice validate — see the augur docs for more information [external link].

Auspice (version 1.x) currently requires two JSON files, with a third optional JSON.

Tree JSON (required)

The tree structure is encoded as a deeply nested JSON object, with traits (such as country, divergence, collection date, attributions etc) stored on each node. The presence of a children property indicates that it’s an internal node and contains the child objects.

See the JSON schema for more details, or see the current live zika build’s tree JSON here.

The filename must end with _tree.json.

Metadata JSON (required)

Additional data to control and inform the visualization is stored via the metadata property (key) at the top level of the JSON.

See the JSON schema for more details, or see the current live zika build’s metadata JSON here.

The filename must end with _meta.json and have the same prefix as the tree JSON above.

Frequency JSON (optional)

Currently this is only used by the flu builds, and generates the frequencies panel you can see at nextstrain.org/flu. Here is an example of this file.

All source code is freely available under the terms of the GNU Affero General Public License. Screenshots etc may be used as long as a link to nextstrain.org is provided.

This work is made possible by the open sharing of genetic data by research groups from all over the world. We gratefully acknowledge their contributions. Special thanks to Kristian Andersen, Allison Black, David Blazes, Peter Bogner, Matt Cotten, Ana Crisan, Gytis Dudas, Vivien Dugan, Karl Erlandson, Nuno Faria, Jennifer Gardy, Becky Garten, Dylan George, Ian Goodfellow, Nathan Grubaugh, Betz Halloran, Christian Happi, Jeff Joy, Paul Kellam, Philippe Lemey, Nick Loman, Sebastian Maurer-Stroh, Louise Moncla, Oliver Pybus, Andrew Rambaut, Colin Russell, Pardis Sabeti, Katherine Siddle, Kristof Theys, Dave Wentworth, Shirlee Wohl and Nathan Yozwiak for comments, suggestions and data sharing.

logologologo
logologologo

© 2015-2019 Trevor Bedford and Richard Neher