Trevor Bedford, James Hadfield, Emma Hodcoft, John Huddleston, Richard Neher, Thomas Sibley
Our goal with Nextstrain is to harness the scientific and public health potential of pathogen genome data and we’ve built our software to be agnostic to the source of data. Genomic surveillance efforts particularly in the US, UK, Germany, Switzerland, Australia, Bangladesh, Chile, Egypt, Ghana, India, Kenya, and Peru have been sharing SARS-CoV-2 genomic sequence data to INSDC databases, which mirror data between GenBank, ENA and DDBJ. Data shared to GenBank and other INSDC databases is shared under Open Data principles. ”Open Data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike”. This allows us at Nextstrain to share sequence data and metadata from GenBank, as well as intermediate files, in an open fashion and help to facilitate data use by academic and public health bioinformaticians.
Please note that although data generators have generously shared data in an open fashion, that does not mean there should be a free license to publish on this data. At Nextstrain we strongly believe in appropriate credit and recognise that “scooping” can make future data generation harder and less appealing to share, and can be particularly problematic for scientists from low and middle income countries. Data generators should be cited where possible and collaborations should be sought in some circumstances. To facilitate such attribution, the open metadata includes links to the original records as well as author information where available. Please try to avoid scooping someone else’s work, be mindful of what plans others might have, and reach out if uncertain.
The degree of sequencing and sequence sharing during the pandemic has been remarkable. We thank data submitters from all over the world for generously sharing sequencing data, alongside GISAID and NCBI for providing platforms to easily share this sequence data.
This work is made possible by the open sharing of genetic data by research groups from all over the world. We gratefully acknowledge their contributions. Special thanks to Kristian Andersen, Josh Batson, David Blazes, Jesse Bloom, Peter Bogner, Anderson Brito, Matt Cotten, Ana Crisan, Tulio de Oliveira, Gytis Dudas, Vivien Dugan, Karl Erlandson, Nuno Faria, Jennifer Gardy, Nate Grubaugh, Becky Kondor, Dylan George, Ian Goodfellow, Betz Halloran, Christian Happi, Jeff Joy, Paul Kellam, Philippe Lemey, Nick Loman, Duncan MacCannell, Erick Matsen, Sebastian Maurer-Stroh, Placide Mbala, Danny Park, Oliver Pybus, Andrew Rambaut, Colin Russell, Pardis Sabeti, Katherine Siddle, Kristof Theys, Dave Wentworth, Shirlee Wohl and Cecile Viboud for comments, suggestions and data sharing.