RAPID: Advanced Topic Modeling Methods to Analyze Text Responses in COVID-19 Survey Data

  • Funded by National Science Foundation (NSF)
  • Total publications:1 publications

Grant number: 2031736

Grant search

Key facts

  • Disease

    COVID-19
  • Start & end year

    2020
    2021
  • Known Financial Commitments (USD)

    $176,785
  • Funder

    National Science Foundation (NSF)
  • Principal Investigator

    Philip Resnik
  • Research Location

    United States of America
  • Lead Research Institution

    University of Maryland College Park
  • Research Priority Alignment

    N/A
  • Research Category

    Policies for public health, disease control & community resilience

  • Research Subcategory

    Communication

  • Special Interest Tags

    Innovation

  • Study Type

    Non-Clinical

  • Clinical Trial Details

    N/A

  • Broad Policy Alignment

    Pending

  • Age Group

    Not Applicable

  • Vulnerable Population

    Not applicable

  • Occupations of Interest

    Not applicable

Abstract

Computer and Information Science and Engineering - As the COVID-19 pandemic continues, public and private organizations are deploying surveys to inform responses and policy choices. Survey designs using multiple choice responses are by far the most common -- "open ended" questions, where survey participants provide a longer-form written response, are used far less. This is true despite the fact that when you allow people to provide unconstrained spoken or text responses, it is possible to obtain richer, fine-grained information clarifying the other responses, as well as useful ?bottom up? information that the survey designers did not know to ask for. A key problem is that analyzing the unstructured language in open-ended responses is a labor-intensive process, creating obstacles to using them especially when speedy analysis is needed and resources are limited. Computational methods can help, but they often fail to provide coherent, interpretable categories, or they can fail to do a good job connecting the text in the survey with the closed-end responses. This project will develop new computational methods for fast and effective analysis of survey data that includes text responses, and it will apply these methods to support organizations doing high-impact survey work related to COVID-19 response. This will improve these organizations? ability to understand and mitigate the impact of the COVID-19 pandemic.

This project?s technical approach builds on recent techniques bringing together deep learning and Bayesian topic models. Several key technical innovations will be introduced that are specifically geared toward improving the quality of information available in surveys that include both closed- and open-ended responses. A common element in these approaches is the extension of methods commonly used in supervised learning settings, such as task-based fine-tuning of embeddings and knowledge distillation, to unsupervised topic modeling, with a specific focus on producing diverse, human-interpretable topic categories that are well aligned with discrete attributes such as demographic characteristics, closed-end responses, and experimental condition. Project activities include assisting in the analysis of organizations' survey data, conducting independent surveys aligned with their needs to obtain additional relevant data, and the public release of a clean, easy to use computational toolkit facilitating more widespread adoption of these new methods.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Publicationslinked via Europe PMC

Last Updated:14 hours ago

View all publications at Europe PMC

"Should I stay or should I go?" Nurses' perspectives about working during the Covid-19 pandemic's first wave in the United States: A summative content analysis combined with topic modeling.