CAREER: Discourse Processing and Content Generation for Document Simplification
- Funded by National Science Foundation (NSF)
- Total publications:0 publications
Grant number: 2145479
Grant search
Key facts
Disease
COVID-19Start & end year
20222027Known Financial Commitments (USD)
$110,797Funder
National Science Foundation (NSF)Principal Investigator
Junyi LiResearch Location
United States of AmericaLead Research Institution
University of Texas at AustinResearch Priority Alignment
N/A
Research Category
Secondary impacts of disease, response & control measures
Research Subcategory
Other secondary impacts
Special Interest Tags
Data Management and Data Sharing
Study Type
Non-Clinical
Clinical Trial Details
N/A
Broad Policy Alignment
Pending
Age Group
Unspecified
Vulnerable Population
Unspecified
Occupations of Interest
Other
Abstract
This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2).
Simplification is the process of making a text more accessible to a target audience, e.g., language learners, children, and individuals with language impairments, while preserving its meaning and content. The lack of accessible material can exacerbate social issues, for example, the complexity of language used in college admission and financial aid applications has contributed to the lagging access to higher education among emergent bilingual students; the WHO has recognized the urgency of accessible technical information, given the rise of medical misinformation especially in the wake of the COVID-19 pandemic. While there has been much work on sentence simplification, very few datasets are large enough to train supervised models; simplifying a document also involves different operations from those at the sentence level, including content addition, and how sentences connect with each other. This project aims to develop new resources and data-driven approaches for document simplification, with the potential to address information transparency and fair access across a range of high-stake domains. This project will also support the education and training of a diverse body of undergraduate and graduate students across disciplines.
To substantially advance document simplification, this CAREER project will tackle several key issues in existing simplification work, including corpora diversity, explanation generation, and document-level approaches. This is achieved by the following research activities: (1) introducing new corpora that tackle the pressing challenge of data diversity in simplification research and enable new application scenarios, especially in the accessibility of technical and jargon-laden texts; (2) tackling content addition and elaboration during simplification---a previously little-explored challenge, and propose a novel, linguistically-informed framework that characterizes and generates elaborations; (3) develop models for document simplification that are informed by structures of discourse, using both coherence structure and entity salience. The innovative ways to integrate discourse target a larger challenge for models to take stretches of discourse into account.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Simplification is the process of making a text more accessible to a target audience, e.g., language learners, children, and individuals with language impairments, while preserving its meaning and content. The lack of accessible material can exacerbate social issues, for example, the complexity of language used in college admission and financial aid applications has contributed to the lagging access to higher education among emergent bilingual students; the WHO has recognized the urgency of accessible technical information, given the rise of medical misinformation especially in the wake of the COVID-19 pandemic. While there has been much work on sentence simplification, very few datasets are large enough to train supervised models; simplifying a document also involves different operations from those at the sentence level, including content addition, and how sentences connect with each other. This project aims to develop new resources and data-driven approaches for document simplification, with the potential to address information transparency and fair access across a range of high-stake domains. This project will also support the education and training of a diverse body of undergraduate and graduate students across disciplines.
To substantially advance document simplification, this CAREER project will tackle several key issues in existing simplification work, including corpora diversity, explanation generation, and document-level approaches. This is achieved by the following research activities: (1) introducing new corpora that tackle the pressing challenge of data diversity in simplification research and enable new application scenarios, especially in the accessibility of technical and jargon-laden texts; (2) tackling content addition and elaboration during simplification---a previously little-explored challenge, and propose a novel, linguistically-informed framework that characterizes and generates elaborations; (3) develop models for document simplification that are informed by structures of discourse, using both coherence structure and entity salience. The innovative ways to integrate discourse target a larger challenge for models to take stretches of discourse into account.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.