The COVID Tracking Project (CTP) was a volunteer organization launched from The Atlantic and dedicated to collecting and publishing the data required to understand the COVID-19 outbreak in the United States.
Every day from March 7, 2020 to March 7, 2021, they collected data on COVID-19 testing and patient outcomes from all 50 states, 5 territories, and the District of Columbia. Their dataset was in use by national and local news organizations across the United States and by research projects and agencies worldwide. The over 800 volunteers and project staff built a large online-only community of people from a broad set of backgrounds, and developed a unique culture of work and community.
The UCSF Archives and Special Collections department worked with former CTP volunteers and staff to document the public work and internal community of the project. The collection includes public websites, white papers, and datasets that were critical to a national understanding of the pandemic in a critical time.
Online platforms such as Slack and Google Apps were critical to the daily operation of CTP, but these systems have never been archived for future researchers at a large scale. The archival process included building open-source tools for the retrieval of data from hosted products and storing them in a format that future researchers can access and understand.
Archival collection
The records of the CTP, which include data products, organizational documents, code repositories, and Slack exports, are preserved at and available through UCSF Archives and Special Collections.
Data explorer
The Data Explorer, developed by UCSF, offers a look into the history of CTP’s data. In addition to daily COVID-19 data, this tool provides the original data source, any updates made to the data, and Slack discussion regarding the numbers. A data dictionary, along with a history of its updates, is also available.
Data sources
Data sources used by the CTP were captured and preserved as part of their daily process. These sources will be available on GitHub soon.
Data journalism course-in-a-box
The data journalism course-in-a-box is a guide for undergraduate instructors interested in using COVID Tracking Project archive content to teach the conceptual foundations of data journalism. The open-source set of five modules contains lecture materials, class exercises, technical walkthroughs, pacing guides and other course content that can be taught from start to finish in a standalone course or integrated into an existing course.
Digital preservation tools
Scripts and tools used by UCSF to acquire and preserve social media and code repositories are available on GitHub.
Open datasets
View the datasets already published by the COVID Tracking Project archive in Dryad:
- Daily United States COVID-19 Testing and Outcomes Data By State, March 7, 2020 to March 7, 2021
- Daily United States COVID-19 data for select cities and counties, May 29, 2020 to October 21, 2020
- Annotations on COVID-19 State Data Definitions as of March 7, 2021
- Weekly United States COVID-19 Long-term Care Data By State, May 28, 2020 to March 4, 2021
- Weekly United States COVID-19 Racial Data By State, April 12, 2020 to March 7, 2021
Oral histories
Starting in late 2020, the CTP Community team recorded oral histories of Project members. These interviews provide insight into CTP’s processes and organizational culture, the people that contributed to it, and experiences during the COVID-19 pandemic. With permission of the interviewees, these recordings are available on Calisphere for public access.
Funding for this project comes from the Alfred P. Sloan Foundation. The data journalism course-in-a-box project was also funded by the National Library of Medicine, National Institutes of Health, Department of Health and Human Services, under Cooperative Agreement Number UG4LM013725 with the University of Washington. View the full bibliography used to inform the development of this archive.