RAPID: Collecting Reliable COVID-19 Datasets in Crisis Conditions
- Funded by National Science Foundation (NSF)
- Total publications:0 publications
Grant number: 2029457
Grant search
Key facts
Disease
COVID-19Start & end year
20202020Known Financial Commitments (USD)
$69,998Funder
National Science Foundation (NSF)Principal Investigator
Rastislav BodikResearch Location
United States of AmericaLead Research Institution
University of WashingtonResearch Priority Alignment
N/A
Research Category
Epidemiological studies
Research Subcategory
N/A
Special Interest Tags
Innovation
Study Type
Non-Clinical
Clinical Trial Details
N/A
Broad Policy Alignment
Pending
Age Group
Not Applicable
Vulnerable Population
Not applicable
Occupations of Interest
Not applicable
Abstract
Office of the Director - This RAPID project enables approaches to mitigate the negative impacts of COVID-19 on public health, society, and the economy by deploying technologies to enable collecting reliable COVID-19-related data sets under crisis conditions.
In the midst of a crisis, such as the COVID-19 pandemic, generators of critical new data, such as hospitals and critical health organizations, lack the time and resources to make this important data readily available for use by others. One cannot expect the already overburdened primary data providers to do the extra work needed to make the data more accessible for others to use. Even those who already publish data on their websites often do not have the time to edit/modify the data, for example to apply newly introduced tags, such as Schema.org?s new tags related to coronavirus. Yet, these data are critical in a crisis in order to inform the public; improve emergency response; and aid the scientific community in its efforts to find solutions. Currently, the teams that are engaged in dataset collection are employing slow, tedious, and painstaking manual techniques. The interactive dataset collection tools to be developed by this project will provide an alternative approach, empowering a community of volunteers to help with data collection efforts. The data collection tools developed can be used with only an internet connection, a web browser, and brief training, thereby putting the effort well within reach of a large population of potential volunteers.
Existing automatic data extractors assume that (i) webpages in a single website are structured uniformly, because they were produced from the same template and (ii) relevant webpages originate from a single website. As a result, much of the prior work in the area of web data extraction and ingestion focuses on ?syntactic? extraction. Currently, dedicated data collection teams are collecting data with a combination of expertise and time-consuming and painstaking manual effort. Other teams are hiring call centers to call hospitals in each state to collect their capacities. Such high-cost, high-effort approaches do not scale well to all the datasets that one would like to be able to access and analyze. Many COVID-19-related datasets are scattered over thousands of websites with similar information but no structural similarities--e.g., each hospital?s website may look different but may contain very similar and related data. The technical challenge that this project will tackle will be to build a ?semantic? data extractor that locates the information of interest despite divergent website structures. The software tools that will be created for data ingestion can be used by the many individuals who are keen to contribute their time and effort to help combat COVID-19, without compromising their physical distancing efforts.
This RAPID award is made by the Convergence Accelerator program in the Office of Integrative Activities and is associated with the Convergence Accelerator Track A: Open Knowledge Network.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
In the midst of a crisis, such as the COVID-19 pandemic, generators of critical new data, such as hospitals and critical health organizations, lack the time and resources to make this important data readily available for use by others. One cannot expect the already overburdened primary data providers to do the extra work needed to make the data more accessible for others to use. Even those who already publish data on their websites often do not have the time to edit/modify the data, for example to apply newly introduced tags, such as Schema.org?s new tags related to coronavirus. Yet, these data are critical in a crisis in order to inform the public; improve emergency response; and aid the scientific community in its efforts to find solutions. Currently, the teams that are engaged in dataset collection are employing slow, tedious, and painstaking manual techniques. The interactive dataset collection tools to be developed by this project will provide an alternative approach, empowering a community of volunteers to help with data collection efforts. The data collection tools developed can be used with only an internet connection, a web browser, and brief training, thereby putting the effort well within reach of a large population of potential volunteers.
Existing automatic data extractors assume that (i) webpages in a single website are structured uniformly, because they were produced from the same template and (ii) relevant webpages originate from a single website. As a result, much of the prior work in the area of web data extraction and ingestion focuses on ?syntactic? extraction. Currently, dedicated data collection teams are collecting data with a combination of expertise and time-consuming and painstaking manual effort. Other teams are hiring call centers to call hospitals in each state to collect their capacities. Such high-cost, high-effort approaches do not scale well to all the datasets that one would like to be able to access and analyze. Many COVID-19-related datasets are scattered over thousands of websites with similar information but no structural similarities--e.g., each hospital?s website may look different but may contain very similar and related data. The technical challenge that this project will tackle will be to build a ?semantic? data extractor that locates the information of interest despite divergent website structures. The software tools that will be created for data ingestion can be used by the many individuals who are keen to contribute their time and effort to help combat COVID-19, without compromising their physical distancing efforts.
This RAPID award is made by the Convergence Accelerator program in the Office of Integrative Activities and is associated with the Convergence Accelerator Track A: Open Knowledge Network.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.