Deep Mining With The Covid-19 Data Warehouse

  • Funded by Luxembourg National Research Fund
  • Total publications:0 publications

Grant number: unknown

Grant search

Key facts

  • Disease

    COVID-19
  • Known Financial Commitments (USD)

    $22,788
  • Funder

    Luxembourg National Research Fund
  • Principal Investigator

    Christoph Schommer
  • Research Location

    Luxembourg
  • Lead Research Institution

    University of Luxembourg
  • Research Priority Alignment

    N/A
  • Research Category

    13

  • Research Subcategory

    N/A

  • Special Interest Tags

    Data Management and Data Sharing

  • Study Type

    Non-Clinical

  • Clinical Trial Details

    N/A

  • Broad Policy Alignment

    Pending

  • Age Group

    Not Applicable

  • Vulnerable Population

    Not applicable

  • Occupations of Interest

    Not applicable

Abstract

In a time where COVID-19 is attracting worldwide attention, the data quantity and variety is increasing dramatically. The result are data lakes, where (raw) data appears in different formats and quality. In the case of COVID-19, the Johns Hopkins University Center for Systems Science and Engineering (JSU-CCSE) has compiled a number of various data sources including data from the World Health Organization and others, where the published data itself is largely time-series data that covers worldwide mortality rates, infected and recovered cases of the Covid-19 disease for more than 200 countries. The Open Research Dataset Challenge (CORD-19) is a resource of almost 60000 scholarly articles, where more than 75% of these are full text articles. These are only two examples of publicly available data that aims to provide a comprehensible analysis of the entire disease development. The decisive problem here, however, is that the heterogeneity, diversity, and (partially) unstructuredness of data makes a deep analysis more difficult rather than easier. In this view, DEEPHOUSE has two central goals: first, we consolidate the available text data and time series data in a Covid-19 data warehouse, e.g., along multidimensional axes (time, place, and topic) by applying appropriate data integration techniques. Second, we build a web-based platform being extendable, which demonstrates the successful discovery of time-related sequences or time series, for example by visualization or tracking of topics over time. Since data underpins the warehouse, the methodology of DEEPHOUSE is transferable to other diseases.