CAREER: Discourse Processing and Content Generation for Document Simplification

  • Funded by National Science Foundation (NSF)
  • Total publications:0 publications

Grant number: 2145479

Grant search

Key facts

  • Disease

    COVID-19
  • Start & end year

    2022
    2027
  • Known Financial Commitments (USD)

    $110,797
  • Funder

    National Science Foundation (NSF)
  • Principal Investigator

    Junyi Li
  • Research Location

    United States of America
  • Lead Research Institution

    University of Texas at Austin
  • Research Priority Alignment

    N/A
  • Research Category

    Secondary impacts of disease, response & control measures

  • Research Subcategory

    Other secondary impacts

  • Special Interest Tags

    Data Management and Data Sharing

  • Study Type

    Non-Clinical

  • Clinical Trial Details

    N/A

  • Broad Policy Alignment

    Pending

  • Age Group

    Unspecified

  • Vulnerable Population

    Unspecified

  • Occupations of Interest

    Other

Abstract

This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2).

Simplification is the process of making a text more accessible to a target audience, e.g., language learners, children, and individuals with language impairments, while preserving its meaning and content. The lack of accessible material can exacerbate social issues, for example, the complexity of language used in college admission and financial aid applications has contributed to the lagging access to higher education among emergent bilingual students; the WHO has recognized the urgency of accessible technical information, given the rise of medical misinformation especially in the wake of the COVID-19 pandemic. While there has been much work on sentence simplification, very few datasets are large enough to train supervised models; simplifying a document also involves different operations from those at the sentence level, including content addition, and how sentences connect with each other. This project aims to develop new resources and data-driven approaches for document simplification, with the potential to address information transparency and fair access across a range of high-stake domains. This project will also support the education and training of a diverse body of undergraduate and graduate students across disciplines.

To substantially advance document simplification, this CAREER project will tackle several key issues in existing simplification work, including corpora diversity, explanation generation, and document-level approaches. This is achieved by the following research activities: (1) introducing new corpora that tackle the pressing challenge of data diversity in simplification research and enable new application scenarios, especially in the accessibility of technical and jargon-laden texts; (2) tackling content addition and elaboration during simplification---a previously little-explored challenge, and propose a novel, linguistically-informed framework that characterizes and generates elaborations; (3) develop models for document simplification that are informed by structures of discourse, using both coherence structure and entity salience. The innovative ways to integrate discourse target a larger challenge for models to take stretches of discourse into account.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.