III: Medium: Collaborative Research: Collaborative Machine-Learning-Centric Data Analytics at Scale

In recent years our society has enjoyed the huge value of online collaboration and sharing, as evidenced by popular cloud-based services such as Google Docs, Dropbox, GitHub, and Overleaf. These benefits become even more attractive due to the new norm of working remotely caused by the unprecedented Covid-19 pandemic. In this award the investigators want to answer the following question: is it possible to develop online systems to support cloud-based services for collaborative data analytics? This computing paradigm allows collaborators to jointly conduct an analysis job on a large amount of data. The investigator team is particularly interested in scenarios where collaborators are from multiple disciplines with different backgrounds, and the analytics is machine learning centric, since such tasks are becoming increasingly common and important. While collaborators in data analytics want to focus on their research topics and fully utilize their expertise and skills, they are also facing challenges due to their complementary backgrounds and asynchronous working schedules. As a consequence, the collaboration has both inter-disciplinary obstacles and intra-disciplinary obstacles. The goal of this award is to study these challenges and develop new techniques to support such novel online services to support collaborative data analytics.

The investigator team identifies four unique research topics: 1) Allowing collaborators to debug the training process of a machine learning model by pausing and resuming the process or setting conditional breakpoints, as these tasks tend to be computationally intensive; 2) Enabling collaborative debugging of external user-defined functions in order to not only harness the popular data science libraries in Python and R, but also achieve a high performance using a parallel data-processing engine often written in other languages such as Java and Scala; 3) Supporting collaborative instance labeling and machine learning training and deployment between domain scientists and machine learning experts; and 4) Analyzing and mining the collected data workflows from collaborators to improve the user productivity to formulate new data analytics tasks. The developed techniques will bring the success of many cloud-based collaboration services to the increasingly important space of scalable data analytics using machine learning techniques. The solutions will significantly lower the barriers to entry in terms of enabling domain-specific analysts -- as opposed to computer-science-trained Big Data experts -- to gather and to efficiently, effectively, and interactively analyze large quantities of data in different domains.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Visualise a dataset

Research Funding Tracker

Clinical Research Registrations Tracker

Explore a dataset

Research Funding Tracker

Clinical Research Registrations Tracker

III: Medium: Collaborative Research: Collaborative Machine-Learning-Centric Data Analytics at Scale

Key facts

Abstract