RAPID: Collecting Reliable COVID-19 Datasets in Crisis Conditions

  • Funded by National Science Foundation (NSF)
  • Total publications:0 publications

Grant number: 2029457

Grant search

Key facts

  • Disease

    COVID-19
  • Start & end year

    2020
    2020
  • Known Financial Commitments (USD)

    $69,998
  • Funder

    National Science Foundation (NSF)
  • Principal Investigator

    Rastislav Bodik
  • Research Location

    United States of America
  • Lead Research Institution

    University of Washington
  • Research Priority Alignment

    N/A
  • Research Category

    Epidemiological studies

  • Research Subcategory

    N/A

  • Special Interest Tags

    Innovation

  • Study Type

    Non-Clinical

  • Clinical Trial Details

    N/A

  • Broad Policy Alignment

    Pending

  • Age Group

    Not Applicable

  • Vulnerable Population

    Not applicable

  • Occupations of Interest

    Not applicable

Abstract

Office of the Director - This RAPID project enables approaches to mitigate the negative impacts of COVID-19 on public health, society, and the economy by deploying technologies to enable collecting reliable COVID-19-related data sets under crisis conditions.

In the midst of a crisis, such as the COVID-19 pandemic, generators of critical new data, such as hospitals and critical health organizations, lack the time and resources to make this important data readily available for use by others. One cannot expect the already overburdened primary data providers to do the extra work needed to make the data more accessible for others to use. Even those who already publish data on their websites often do not have the time to edit/modify the data, for example to apply newly introduced tags, such as Schema.org?s new tags related to coronavirus. Yet, these data are critical in a crisis in order to inform the public; improve emergency response; and aid the scientific community in its efforts to find solutions. Currently, the teams that are engaged in dataset collection are employing slow, tedious, and painstaking manual techniques. The interactive dataset collection tools to be developed by this project will provide an alternative approach, empowering a community of volunteers to help with data collection efforts. The data collection tools developed can be used with only an internet connection, a web browser, and brief training, thereby putting the effort well within reach of a large population of potential volunteers.

Existing automatic data extractors assume that (i) webpages in a single website are structured uniformly, because they were produced from the same template and (ii) relevant webpages originate from a single website. As a result, much of the prior work in the area of web data extraction and ingestion focuses on ?syntactic? extraction. Currently, dedicated data collection teams are collecting data with a combination of expertise and time-consuming and painstaking manual effort. Other teams are hiring call centers to call hospitals in each state to collect their capacities. Such high-cost, high-effort approaches do not scale well to all the datasets that one would like to be able to access and analyze. Many COVID-19-related datasets are scattered over thousands of websites with similar information but no structural similarities--e.g., each hospital?s website may look different but may contain very similar and related data. The technical challenge that this project will tackle will be to build a ?semantic? data extractor that locates the information of interest despite divergent website structures. The software tools that will be created for data ingestion can be used by the many individuals who are keen to contribute their time and effort to help combat COVID-19, without compromising their physical distancing efforts.

This RAPID award is made by the Convergence Accelerator program in the Office of Integrative Activities and is associated with the Convergence Accelerator Track A: Open Knowledge Network.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.