III: Medium: Collaborative Research: Collaborative Machine-Learning-Centric Data Analytics at Scale
- Funded by National Science Foundation (NSF)
- Total publications:0 publications
Grant number: 2106859; 2107150
Grant search
Key facts
Disease
COVID-19Start & end year
20212024Known Financial Commitments (USD)
$397,732Funder
National Science Foundation (NSF)Principal Investigator
Wei Wang, Chen LiResearch Location
United States of AmericaLead Research Institution
University of California-Los Angeles, University of California-IrvineResearch Priority Alignment
N/A
Research Category
Secondary impacts of disease, response & control measures
Research Subcategory
N/A
Special Interest Tags
N/A
Study Type
Non-Clinical
Clinical Trial Details
N/A
Broad Policy Alignment
Pending
Age Group
Not Applicable
Vulnerable Population
Not applicable
Occupations of Interest
Not applicable
Abstract
In recent years our society has enjoyed the huge value of online collaboration and sharing, as evidenced by popular cloud-based services such as Google Docs, Dropbox, GitHub, and Overleaf. These benefits become even more attractive due to the new norm of working remotely caused by the unprecedented Covid-19 pandemic. In this award the investigators want to answer the following question: is it possible to develop online systems to support cloud-based services for collaborative data analytics? This computing paradigm allows collaborators to jointly conduct an analysis job on a large amount of data. The investigator team is particularly interested in scenarios where collaborators are from multiple disciplines with different backgrounds, and the analytics is machine learning centric, since such tasks are becoming increasingly common and important. While collaborators in data analytics want to focus on their research topics and fully utilize their expertise and skills, they are also facing challenges due to their complementary backgrounds and asynchronous working schedules. As a consequence, the collaboration has both inter-disciplinary obstacles and intra-disciplinary obstacles. The goal of this award is to study these challenges and develop new techniques to support such novel online services to support collaborative data analytics.
The investigator team identifies four unique research topics: 1) Allowing collaborators to debug the training process of a machine learning model by pausing and resuming the process or setting conditional breakpoints, as these tasks tend to be computationally intensive; 2) Enabling collaborative debugging of external user-defined functions in order to not only harness the popular data science libraries in Python and R, but also achieve a high performance using a parallel data-processing engine often written in other languages such as Java and Scala; 3) Supporting collaborative instance labeling and machine learning training and deployment between domain scientists and machine learning experts; and 4) Analyzing and mining the collected data workflows from collaborators to improve the user productivity to formulate new data analytics tasks. The developed techniques will bring the success of many cloud-based collaboration services to the increasingly important space of scalable data analytics using machine learning techniques. The solutions will significantly lower the barriers to entry in terms of enabling domain-specific analysts -- as opposed to computer-science-trained Big Data experts -- to gather and to efficiently, effectively, and interactively analyze large quantities of data in different domains.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
The investigator team identifies four unique research topics: 1) Allowing collaborators to debug the training process of a machine learning model by pausing and resuming the process or setting conditional breakpoints, as these tasks tend to be computationally intensive; 2) Enabling collaborative debugging of external user-defined functions in order to not only harness the popular data science libraries in Python and R, but also achieve a high performance using a parallel data-processing engine often written in other languages such as Java and Scala; 3) Supporting collaborative instance labeling and machine learning training and deployment between domain scientists and machine learning experts; and 4) Analyzing and mining the collected data workflows from collaborators to improve the user productivity to formulate new data analytics tasks. The developed techniques will bring the success of many cloud-based collaboration services to the increasingly important space of scalable data analytics using machine learning techniques. The solutions will significantly lower the barriers to entry in terms of enabling domain-specific analysts -- as opposed to computer-science-trained Big Data experts -- to gather and to efficiently, effectively, and interactively analyze large quantities of data in different domains.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.