RAPID: Advanced Topic Modeling Methods to Analyze Text Responses in COVID-19 Survey Data

Computer and Information Science and Engineering - As the COVID-19 pandemic continues, public and private organizations are deploying surveys to inform responses and policy choices. Survey designs using multiple choice responses are by far the most common -- "open ended" questions, where survey participants provide a longer-form written response, are used far less. This is true despite the fact that when you allow people to provide unconstrained spoken or text responses, it is possible to obtain richer, fine-grained information clarifying the other responses, as well as useful ?bottom up? information that the survey designers did not know to ask for. A key problem is that analyzing the unstructured language in open-ended responses is a labor-intensive process, creating obstacles to using them especially when speedy analysis is needed and resources are limited. Computational methods can help, but they often fail to provide coherent, interpretable categories, or they can fail to do a good job connecting the text in the survey with the closed-end responses. This project will develop new computational methods for fast and effective analysis of survey data that includes text responses, and it will apply these methods to support organizations doing high-impact survey work related to COVID-19 response. This will improve these organizations? ability to understand and mitigate the impact of the COVID-19 pandemic.

This project?s technical approach builds on recent techniques bringing together deep learning and Bayesian topic models. Several key technical innovations will be introduced that are specifically geared toward improving the quality of information available in surveys that include both closed- and open-ended responses. A common element in these approaches is the extension of methods commonly used in supervised learning settings, such as task-based fine-tuning of embeddings and knowledge distillation, to unsupervised topic modeling, with a specific focus on producing diverse, human-interpretable topic categories that are well aligned with discrete attributes such as demographic characteristics, closed-end responses, and experimental condition. Project activities include assisting in the analysis of organizations' survey data, conducting independent surveys aligned with their needs to obtain additional relevant data, and the public release of a clean, easy to use computational toolkit facilitating more widespread adoption of these new methods.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

RAPID: Advanced Topic Modeling Methods to Analyze Text Responses in COVID-19 Survey Data

Key facts

Abstract

Publicationslinked via Europe PMC

Using co-sharing to identify use of mainstream news for promoting potentially misleading narratives.

Authors

Publish Year

Journal

DOI

"Should I stay or should I go?" Nurses' perspectives about working during the Covid-19 pandemic's first wave in the United States: A summative content analysis combined with topic modeling.

Authors

Publish Year

Journal

DOI